CLOSED
Can someone please explain me why this piece of code is taking around 25 secs to run, though the complexity is O(N^2log(N)) where N<=5000.
int n = 5000;
for(int i=0; i<n; i++) {
set<int> s;
for(int j=n; j>=1; j--) s.insert(j);
}
# | User | Rating |
---|---|---|
1 | jiangly | 4039 |
2 | tourist | 3841 |
3 | jqdai0815 | 3682 |
4 | ksun48 | 3590 |
5 | ecnerwala | 3542 |
6 | Benq | 3535 |
7 | orzdevinwang | 3526 |
8 | gamegame | 3477 |
9 | heuristica | 3357 |
10 | Radewoosh | 3355 |
# | User | Contrib. |
---|---|---|
1 | cry | 168 |
2 | -is-this-fft- | 165 |
3 | atcoder_official | 160 |
3 | Um_nik | 160 |
5 | djm03178 | 158 |
6 | Dominater069 | 156 |
7 | adamant | 153 |
8 | luogu_official | 152 |
9 | awoo | 151 |
10 | TheScrasse | 147 |
CLOSED
Can someone please explain me why this piece of code is taking around 25 secs to run, though the complexity is O(N^2log(N)) where N<=5000.
int n = 5000;
for(int i=0; i<n; i++) {
set<int> s;
for(int j=n; j>=1; j--) s.insert(j);
}
Name |
---|
N^2 log(N) is big. More than 10^8 operations.
It's only 3e8, in some cases I'd submit that. Seems like the constant is too bad here though.
It's slow because:
set
andunordered_set
are inherently slow, probably because they need to pass pointers/references around and have bad cache localitydefnotmee's reply below this has a better benchmark than mine. Read that and upvote it!
Just claiming things isn't good though, so I benchmarked it:
Output:
I can't really measure allocation time (I wish there was a
set::reserve
method but that wouldn't really make sense in a BST), but the deallocation time is about 4 seconds, which is very surprising.Moving
set<int>
to outside the outer loop made it take 10.85 seconds, and by timing the program from the terminal and by putting it in a separate function so that the destructor could be called, I found no extra time wasted on deallocating that 100MB (?).Also, I tested
std::priority_queue<int>
to see the difference, and it was pretty shocking: 7.5s with it in the loop, and 5.6s with it outside.unordered_set
was also faster, taking 13s including deallocations, 9.76s without, and 2.67 (!) seconds when declaring it outside the loop.Useful things from my original benchmark:
So an Expert tip: Use sorting (which actually tends to be faster than a priority_queue) or a priority_queue instead of a
[unordered] [multi] set
if you don't need it, because that could be a significant slowdown to your program.I was looking for something exactly like this. Thanks for the help.
There are some problems with your benchmark, and specially with "Moving X to outside the outer loop" if you want to test for destructions. With your original code (and not using -O2 because it sometimes cuts part of the code when compiling), that was the output:
Now this happens when i take set out of the loop:
This makes no sense, and the reason why that is happening is because the code is basically inserting once and checking whether the element is in the set in the other iterations, which it always is since they are all duplicates, and therefore nothing will be inserted. In fact, here is the benchmark of doing exactly that:
Close enough.
Here is i think a better test for not having to destruct any sets and actually doing what OP wanted (and also not making a bigger set because we arent testing for that):
Even if there is a bit of overhead of using the array of sets, i think its pretty safe to say that destructions were a bigger deal than it.
Here are benchmarks comparing those other data structures with the original test, being taken out of the loop and making an array of them:
Here the order you're inserting makes a big difference. Im not particularly familiar on how priority_queue works but it probably has to do with how deep on the heap you need to go to insert the element
===============================
===============================
===============================
Funnilly enough this actually slows down priority queue, since this data structure allows for duplicate elements
===============================
Priority_queue is really quirky huh. I guess the height of the heap doesnt actually go that far if we repeat elements
===============================
===============================
===============================
===============================
All of those were run with this command:
g++ help.cpp -std=c++17 -Wall -Wextra -Wfloat-equal -o help; .\help
Conclusion:
Thank you for this. I guess I have a lot to learn with benchmarking! Would you mind if I edited my comment to say that your benchmark is better?
Of course i wouldnt mind. Btw have a nice day :)
Ok my bad, I calculated the operations to be around 3e7 but it comes out to be 3e8 and since i and j are changing and we are already in 1e8 range so it pushed the time to 25 seconds. And I noticed this and thought it was something weird but if it would have happened in 1e7 range then it wouldn't even be noticed. Understood.
Complexity analysis can help you predict scaling with problem size, but won't predict absolute times.
You have declared your
set<int> s
inside the for loop doing which you are instructing to deallocate the previously stored set inside that varibales
(which takes O(N) ). To prove this try declaring oneunordered_set
(which takes O(1) time) once outside the loop and once inside the loop with the same N or you can try by declaring theset
outside the loop directly.So, the set is needed to be implemented inside the loop as before starting the inner loop I need a set with all elements from 1 to n. I have omitted other details of the solution so it might have been confusing. I know my solution pertaining to that problem is not efficient but I want to know why exactly this happens. Another way could be to declare set outside but after the completion of inner loop I'll need to clear the set for my implementation, which won't improve the results as I'm still clearing the set.
This is compilation time not run timeAuto comment: topic has been updated by Pieck (previous revision, new revision, compare).