Hi. I want to preface this by saying i am a noobie and i totally understand if i am missing something obvious here , but i have seen that the vector object in c++ is way too efficient. What do i mean by way too efficient ? I mean that — even solutions which should be O(N^2) complexity tend to run within 1 second.
An example submission where this is happening —
Example: https://codeforces.net/contest/1883/submission/241050682
Consider the code snippet —
for(int i=0;i<n;i++)
{
auto it = lower_bound( a.begin(), a.end(), b[i] );
it--;
if(it >= a.begin())
{
int val = *it;
if( val < b[i] )cnt++;
a.erase(it);
}
}
This code snipped from above submission is finding a lower bound in vector A, and then erasing the element. In the worse case, erase should have complexity O(N), which means that the loop overall should have complexity O(N^2)
see this gfg article which https://www.geeksforgeeks.org/vector-erase-and-clear-in-cpp/ where it says clearly that erase is O(N) worst case
Can someone help me understand how the heck its so dang efficient ?
Indeed it is $$$O(N^2)$$$ worst case, but even in that worst case the complexity is not everything. The compiler might be able to optimize a lot (maybe by using SIMD / somehow paralelizing some of the operations), cache is very well behaved for each removal so that might cut down the time as well. There could be more factors that influence this. I suggest you to try to submit again using a different compiler (or even the same) and see if the time differs. Remember: Time complexity $$$\ne$$$ Time
Firstly, note that decrementing the begin iterator is undefined behaviour, and it may or may not lead to intended behaviour, so you might want to fix that in your code.
When you look at the assembly for
vector.erase
, you can find amemmove
call (the rest of the work is $$$O(1)$$$). Andmemmove
is heavily optimized for moving large amounts of data (well, for any amount of data, but cache does help here too). This is a case where the time complexity is $$$O(n)$$$ but the constant is very small, so things just pass. This benchmark shows that in the "average" case, when you empty a vector from the middle (and moving around half the elements each iteration), you take about $$$2 \times 10^9$$$ times a no-op — if a no-op in the benchmark above was 1 cycle (this is a big if, though, but reasonable), then on a 2GHz processor, this would pass in 1s.Coming to your implementation: you sort $$$b$$$ in non-increasing order and $$$a$$$ in non-decreasing order. On average, it is likely that you are removing elements to the right before elements to the left (in the original array). This is intuitively even better than the average case we talked about above. This means that
memmove
has to do a smaller fraction of the work in the average case (in the best case, it just needs to do $$$O(n)$$$ work). You can probably show that it will not take more than $$$n^2/4 + O(n)$$$ move operations, which is around $$$2.5 \times 10^9$$$ for $$$n = 10^5$$$, and the number of operations (mathematically counted) also coincidentally matches the average case we showed the benchmark for.