Strange TLE... - Codeforces

→ Pay attention

Before contest
Educational Codeforces Round 173 (Rated for Div. 2)
38:25:48
Register now »

→ Streams

By aryanc403

Before stream 40:35:47

View all →

→ Top rated

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

View all →

→ Find user

→ Recent actions

Detailed →

diego_v1's blog

Strange TLE...

By diego_v1, 11 years ago, In English

I just encountered a weird issue with problem 333D.

Check these two codes:

Accepted: 4201145

TLE: 4201157

The only difference between these two submissions is that in the first one, it says if(A[i][k]>B&&A[j][k]>B) and in the other one it says if(min(A[i][k],A[j][k])>B). How can two statements that are practically identical be the difference between getting accepted with 1122 ms and getting TLE over 3000 ms ?!?!

Another coder suggested that maybe the std::min function is slow, so I changed it to a statement of the form a<b?a:b, and it still gets TLE, so the issue is not the std::min function.

Another TLE: 4201326

I'm puzzled...

time, tle, issue

diego_v1
11 years ago
30

Comments (30)

Write comment?

professorbrill

11 years ago, # |

-9

It is true that min/max functions from are slower than just straight check. It is probably due to the template nature of min/max functions. They have to identify variables of which types are given as parameters. I do not know much about this. Hope the more advanced coders can explain this phenomenon to you.

→ Reply

diego_v1

11 years ago, # ^ |

I checked that.

I replaced the std::min function with a statement of the form a<b?a:b, and it still gets TLE, so it's not the min function what's making the program get TLE.

For some ghostly reason, if(a<x && b<x) is at least 3x faster than if((a<b?a:b) < x)...

→ Reply

shamir0xe

11 years ago, # |

← Rev. 2 →

professor bill says right, the code of min function in c++98 is: //// template const T& min (const T& a, const T& b) { return !(b<a)?a:b; // or: return !comp(b,a)?a:b; for version (2) } //// first, this is template function that is slower than the normal. second, you call the min function 10^6 times, and each time you call this "non inline" function, it has overhead. however, anybody knows your code must be accepted first try ;) it's the reason of some people use: define Min(a, b) (a < b ? a : b)

→ Reply

diego_v1

11 years ago, # ^ |

← Rev. 2 →

Just like I said to Professor Bill, I replaced the min function with a statement of the form a<b?a:b and it still gets TLE.

Check the submission -> 4201326

For some ghostly reason, if(a<x && b<x) is at least 3x faster than if((a<b?a:b) < x)...

→ Reply

shamir0xe

11 years ago, # ^ |

wow its very strange, man :)) like professor Bill said, "Hope the more advanced coders can explain this phenomenon to you" :D

→ Reply

kojak_

11 years ago, # ^ |

← Rev. 3 →

I don't know the details of the problem but have you asked yourself whether

if(A[i][k]>B&&A[j][k]>B)

and

if(min(A[i][k],A[j][k])>B)

a̶r̶e̶ ̶s̶e̶m̶a̶n̶t̶i̶c̶a̶l̶l̶y̶ ̶t̶h̶e̶ ̶s̶a̶m̶e̶ ̶t̶h̶i̶n̶g̶?̶ ̶I̶n̶ ̶t̶h̶e̶ ̶f̶i̶r̶s̶t̶ ̶o̶n̶e̶,̶ ̶y̶o̶u̶ ̶a̶r̶e̶ ̶r̶e̶q̶u̶i̶r̶i̶n̶g̶ ̶b̶o̶t̶h̶ ̶v̶a̶l̶u̶e̶s̶ ̶t̶o̶ ̶b̶e̶ ̶h̶i̶g̶h̶e̶r̶ ̶t̶h̶a̶n̶ ̶B̶.̶ ̶I̶n̶ ̶t̶h̶e̶ ̶s̶e̶c̶o̶n̶d̶ ̶o̶n̶e̶,̶ ̶j̶u̶s̶t̶ ̶t̶h̶e̶ ̶s̶m̶a̶l̶l̶e̶r̶ ̶o̶f̶ ̶t̶h̶e̶ ̶t̶w̶o̶.̶

UPD: I'm sorry, I didn't make any sense here. I meant something like what mukel said below. The if is short-circuited so everytime the left-side statement is false, you save one comparison. In the min (or ternary) version, you always do two comparisons.

I believe CF uses something like gcc -O2 to compile (level 2 optimizations). I generated the assembly code (with extra flags -Wa,-aln=asmsource.asm) of the following two simpler codes. Turns out the Ternary one does more jumping and more operations. So maybe that's why, you're getting a much bigger constant-factor by using ternary instead of if's .

int a, b, c;
a = 33, b = 22, c= 55;
if ((a < b ? a : b) < c) {
	b = 5;
}

int a, b, c;
a = 33, b = 22, c= 55;
if (a > c && b > c) {
	b = 5;
}

→ Reply

kingofnumbers

11 years ago, # ^ |

if the smaller value is bigger than B then both the smaller and the bigger values are bigger than B

→ Reply

diego_v1

11 years ago, # ^ |

Well, if we have X>Y and X>B, then we have Y>X>B, so yes, if we check if the minimum of the two values is higher than B, we are checking that both values are higher than B.

→ Reply

kingofnumbers

11 years ago, # |

← Rev. 3 →

I heard about something called "compiler optimizations" there are many types of these optimizations but I'm not experience in these optimizations, but I guess one of these types is the reason for this

look at this blog

→ Reply

mukel

11 years ago, # |

I have some possible reasons. First is the that the && operator does not evaluate the whole expression if the first comparison is false, so avoiding a redundant comparison in some cases. Second: the cache unfriendly code, which means that the way you access array A can make the difference since A[i][j] is not the same as A[j][i], but in this case I don't see any problem since k is in the inner-most loop. Third: Compiler sometimes make light fast optimizations, maybe you were just lucky.

→ Reply

TobiichiOrigami

11 years ago, # ^ |

+15

Seems like && operator's feature avoided most of cache unfriendly code(A[j][k]) to be run.

min(A[i][k], A[j][k]) needs to actually access both A[i][k] and A[j][k] to work and this is VERY cache unfriendly and may be significantly slow.

But with A[i][k] > B && A[j][k] > B, in most case A[i][k] > B does not hold, so we won't access A[j][k] and avoided a large number of cache misses.

→ Reply

Jarekczek

11 years ago, # ^ |

Exchanging the checks still produces fast solution 4204040 (1310ms compared to 1028ms):

f(k,0,M)if(A[j][k]>B && A[i][k]>B)

But I still believe your explanation is correct. Just sticking to one array line, no matter whether it is i or j allows to stay in cache.

→ Reply

kingofnumbers

11 years ago, # ^ |

← Rev. 3 →

I don't think that the reason of TLE that accessing both A[i][k] and A[j][k].

because I tried as BekzhanKassenov said and used fastMin and got AC in 1600ms 4203936

however, I tried the following code on custom test first using normal min and second using fastMin , this time normal min was faster:

normal min runs in time about 5550ms

#include <iostream>
using namespace std;
int fastMax(int x, int y) { return (((y-x)>>(31))&(x^y))^y; }
int fastMin(int x, int y) { return (((y-x)>>(31))&(x^y))^x; }

int main(){
int h=5;
for(int i=0;i<1000000000;i++){
h=min(h,i%544);
}
cout<<h<<endl;
}

fast min run in time about 6146 ms

#include <iostream>
using namespace std;
int fastMax(int x, int y) { return (((y-x)>>(31))&(x^y))^y; }
int fastMin(int x, int y) { return (((y-x)>>(31))&(x^y))^x; }

int main(){
int h=5;
for(int i=0;i<1000000000;i++){
h=fastMin(h,i%544);
}
cout<<h<<endl;
}

so how fastMin is faster than normal min in diego_v1 's code but it's slower in custom test? very confusing

→ Reply

TobiichiOrigami

11 years ago, # ^ |

I recommend reading this: http://en.wikipedia.org/wiki/Branch_predication

Generally, fastMin got "fast" by avoiding branches.

In your test case i%544 >= h almost always holds, in this case processor can predict the if-branch perfectly and has nearly no misprediction at all. fastMin did more calculation, so it's slower.

→ Reply

kingofnumbers

11 years ago, # ^ |

← Rev. 2 →

although I didn't understand "Branch_predication" ,I changed this line

h=fastMin(h,i%544);

into

h=min((7*i)%3468,(6*i)%3468);

and

h=fastMin((7*i)%3468,(6*i)%3468);

now I think there's no prediction but still normal min faster

→ Reply

Egor

11 years ago, # ^ |

Still there are big consecutive chunks of i, where right operand is smaller (same for left), so prediction had good hit rate. Try to use random values

→ Reply

kingofnumbers

11 years ago, # ^ |

What about this test

h=min((1+i)%2,(i)%2);

still normal min is faster and also with bigger difference this time

→ Reply

TobiichiOrigami

11 years ago, # ^ |

Alternate values are also GOOD for prediction.

Please, take some time to briefly understand how modern processor works.

→ Reply

kingofnumbers

11 years ago, # ^ |

← Rev. 2 →

Well Alternate values avoid the consecutive chunks where the same side is smaller.

how does the algorithm of this prediction works?

the the processors turned into humans and have brians for thinking?

UPD: I used random numbers this time and this time FastMin was faster but the deference is very small ,but in diego_v1 's code FastMin was very faster with big deference, how?

anyway when solving problems in contests the test cases may not be randomly generated, this is the problem

→ Reply

TobiichiOrigami

11 years ago, # ^ |

http://en.wikipedia.org/wiki/Branch_predictor

→ Reply

Jarekczek

11 years ago, # ^ |

Good test! You ruled out the cache theory. After reading more about branch optimizations I found this. A simple and elegant way for fastMin, any of us could invent it.

inline int min2(int a, int b)
{
  bool bCond = a < b;
  return bCond * a + !bCond * b;
}

It works! 1778 ms, 4205726

→ Reply

kingofnumbers

11 years ago, # ^ |

your function is faster than normal min in diego_v1 's code but in my test it's slower than both normal min and FastMin , how?

→ Reply

BekzhanKassenov

11 years ago, # |

-10

Try to use this functions:

int fastMax(int x, int y) { return (((y-x)>>(32-1))&(x^y))^y; }
int fastMin(int x, int y) { return (((y-x)>>(32-1))&(x^y))^x; }

→ Reply

Sylvanas

11 years ago, # ^ |

WAT???

→ Reply

BekzhanKassenov

11 years ago, # ^ |

So many negative votes :) Here's the explanation.

→ Reply

andreyv

11 years ago, # ^ |

The result of right-shifting a negative value is implementation defined. Furthermore, if y - x is less than INT_MIN, then a signed overflow occurs and causes undefined behavior.

I would not rely on such a function.

→ Reply

BekzhanKassenov

11 years ago, # ^ |

Oh, understand. And what about this function:

int fastAbs(int n) {
	return (n ^ (n >> 31)) - (n >> 31);
}

→ Reply

Egor

11 years ago, # |

← Rev. 2 →

Actually current processors are trying to predict which way to go. As worst case scenario for this condition to be false (as otherwise with two such "hits" we will stop searching), in most cases we would not execute if body and processor correctly skip it in its prediction. While if min is used, min condition can go both ways and processor "misses" about half time. Hence the difference

→ Reply

Jarekczek

11 years ago, # ^ |

← Rev. 2 →

Any chance to make it understandable?

UPD: Thanks, after edit I begin to get it. "if body" caused also some confusion to me but now I see it's the body of "if" statement.

→ Reply

jasison

11 years ago, # |

I have a Similar problems in USACO, I got 0.44s using min function in algorithm while 0.22s using "#define MIN(a,b) ((a)<(b)?(a):(b)). In my Opinion， it wastes time in addressing three times and min function problem.

→ Reply