I just encountered a weird issue with problem 333D.
Check these two codes:
Accepted: 4201145
TLE: 4201157
The only difference between these two submissions is that in the first one, it says if(A[i][k]>B&&A[j][k]>B)
and in the other one it says if(min(A[i][k],A[j][k])>B)
. How can two statements that are practically identical be the difference between getting accepted with 1122 ms and getting TLE over 3000 ms ?!?!
Another coder suggested that maybe the std::min function is slow, so I changed it to a statement of the form a<b?a:b
, and it still gets TLE, so the issue is not the std::min function.
Another TLE: 4201326
I'm puzzled...
It is true that min/max functions from are slower than just straight check. It is probably due to the template nature of min/max functions. They have to identify variables of which types are given as parameters. I do not know much about this. Hope the more advanced coders can explain this phenomenon to you.
I checked that.
I replaced the std::min function with a statement of the form
a<b?a:b
, and it still gets TLE, so it's not the min function what's making the program get TLE.For some ghostly reason,
if(a<x && b<x)
is at least 3x faster thanif((a<b?a:b) < x)
...professor bill says right, the code of min function in c++98 is: //// template const T& min (const T& a, const T& b) { return !(b<a)?a:b; // or: return !comp(b,a)?a:b; for version (2) } //// first, this is template function that is slower than the normal. second, you call the min function 10^6 times, and each time you call this "non inline" function, it has overhead. however, anybody knows your code must be accepted first try ;) it's the reason of some people use: define Min(a, b) (a < b ? a : b)
Just like I said to Professor Bill, I replaced the min function with a statement of the form
a<b?a:b
and it still gets TLE.Check the submission -> 4201326
For some ghostly reason,
if(a<x && b<x)
is at least 3x faster thanif((a<b?a:b) < x)
...wow its very strange, man :)) like professor Bill said, "Hope the more advanced coders can explain this phenomenon to you" :D
I don't know the details of the problem but have you asked yourself whether
if(A[i][k]>B&&A[j][k]>B)
and
if(min(A[i][k],A[j][k])>B)
a̶r̶e̶ ̶s̶e̶m̶a̶n̶t̶i̶c̶a̶l̶l̶y̶ ̶t̶h̶e̶ ̶s̶a̶m̶e̶ ̶t̶h̶i̶n̶g̶?̶ ̶I̶n̶ ̶t̶h̶e̶ ̶f̶i̶r̶s̶t̶ ̶o̶n̶e̶,̶ ̶y̶o̶u̶ ̶a̶r̶e̶ ̶r̶e̶q̶u̶i̶r̶i̶n̶g̶ ̶b̶o̶t̶h̶ ̶v̶a̶l̶u̶e̶s̶ ̶t̶o̶ ̶b̶e̶ ̶h̶i̶g̶h̶e̶r̶ ̶t̶h̶a̶n̶ ̶B̶.̶ ̶I̶n̶ ̶t̶h̶e̶ ̶s̶e̶c̶o̶n̶d̶ ̶o̶n̶e̶,̶ ̶j̶u̶s̶t̶ ̶t̶h̶e̶ ̶s̶m̶a̶l̶l̶e̶r̶ ̶o̶f̶ ̶t̶h̶e̶ ̶t̶w̶o̶.̶
UPD: I'm sorry, I didn't make any sense here. I meant something like what mukel said below. The if is short-circuited so everytime the left-side statement is false, you save one comparison. In the min (or ternary) version, you always do two comparisons.
I believe CF uses something like gcc -O2 to compile (level 2 optimizations). I generated the assembly code (with extra flags -Wa,-aln=asmsource.asm) of the following two simpler codes. Turns out the Ternary one does more jumping and more operations. So maybe that's why, you're getting a much bigger constant-factor by using ternary instead of if's .
if the smaller value is bigger than B then both the smaller and the bigger values are bigger than B
Well, if we have X>Y and X>B, then we have Y>X>B, so yes, if we check if the minimum of the two values is higher than B, we are checking that both values are higher than B.
I heard about something called "compiler optimizations" there are many types of these optimizations but I'm not experience in these optimizations, but I guess one of these types is the reason for this
look at this blog
I have some possible reasons. First is the that the && operator does not evaluate the whole expression if the first comparison is false, so avoiding a redundant comparison in some cases. Second: the cache unfriendly code, which means that the way you access array A can make the difference since A[i][j] is not the same as A[j][i], but in this case I don't see any problem since k is in the inner-most loop. Third: Compiler sometimes make light fast optimizations, maybe you were just lucky.
Seems like && operator's feature avoided most of cache unfriendly code(A[j][k]) to be run.
min(A[i][k], A[j][k]) needs to actually access both A[i][k] and A[j][k] to work and this is VERY cache unfriendly and may be significantly slow.
But with A[i][k] > B && A[j][k] > B, in most case A[i][k] > B does not hold, so we won't access A[j][k] and avoided a large number of cache misses.
Exchanging the checks still produces fast solution 4204040 (1310ms compared to 1028ms):
But I still believe your explanation is correct. Just sticking to one array line, no matter whether it is i or j allows to stay in cache.
I don't think that the reason of TLE that accessing both A[i][k] and A[j][k].
because I tried as BekzhanKassenov said and used fastMin and got AC in 1600ms 4203936
however, I tried the following code on custom test first using normal min and second using fastMin , this time normal min was faster:
normal min runs in time about 5550ms
fast min run in time about 6146 ms
so how fastMin is faster than normal min in diego_v1 's code but it's slower in custom test? very confusing
I recommend reading this: http://en.wikipedia.org/wiki/Branch_predication
Generally, fastMin got "fast" by avoiding branches.
In your test case i%544 >= h almost always holds, in this case processor can predict the if-branch perfectly and has nearly no misprediction at all. fastMin did more calculation, so it's slower.
although I didn't understand "Branch_predication" ,I changed this line
h=fastMin(h,i%544);
into
h=min((7*i)%3468,(6*i)%3468);
and
h=fastMin((7*i)%3468,(6*i)%3468);
now I think there's no prediction but still normal min faster
Still there are big consecutive chunks of i, where right operand is smaller (same for left), so prediction had good hit rate. Try to use random values
What about this test
h=min((1+i)%2,(i)%2);
still normal min is faster and also with bigger difference this time
Alternate values are also GOOD for prediction.
Please, take some time to briefly understand how modern processor works.
Well Alternate values avoid the consecutive chunks where the same side is smaller.
how does the algorithm of this prediction works?
the the processors turned into humans and have brians for thinking?
UPD: I used random numbers this time and this time FastMin was faster but the deference is very small ,but in diego_v1 's code FastMin was very faster with big deference, how?
anyway when solving problems in contests the test cases may not be randomly generated, this is the problem
http://en.wikipedia.org/wiki/Branch_predictor
Good test! You ruled out the cache theory. After reading more about branch optimizations I found this. A simple and elegant way for fastMin, any of us could invent it.
It works! 1778 ms, 4205726
your function is faster than normal min in diego_v1 's code but in my test it's slower than both normal min and FastMin , how?
Try to use this functions:
WAT???
So many negative votes :) Here's the explanation.
The result of right-shifting a negative value is implementation defined. Furthermore, if
y - x
is less thanINT_MIN
, then a signed overflow occurs and causes undefined behavior.I would not rely on such a function.
Oh, understand. And what about this function:
Actually current processors are trying to predict which way to go. As worst case scenario for this condition to be false (as otherwise with two such "hits" we will stop searching), in most cases we would not execute if body and processor correctly skip it in its prediction. While if min is used, min condition can go both ways and processor "misses" about half time. Hence the difference
Any chance to make it understandable?
UPD: Thanks, after edit I begin to get it. "if body" caused also some confusion to me but now I see it's the body of "if" statement.
I have a Similar problems in USACO, I got 0.44s using min function in algorithm while 0.22s using "#define MIN(a,b) ((a)<(b)?(a):(b)). In my Opinion, it wastes time in addressing three times and min function problem.