Getting TLE inserting elements in set

→ Pay attention

Before contest
Ethflow Round 1 (Codeforces Round, Div. 1 + Div. 2)
4 days

→ Top rated

#	User	Rating
1	jiangly	4039
2	tourist	3841
3	jqdai0815	3682
4	ksun48	3590
5	ecnerwala	3542
6	Benq	3535
7	orzdevinwang	3526
8	gamegame	3477
9	heuristica	3357
10	Radewoosh	3355

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	168
2	-is-this-fft-	165
3	atcoder_official	160
3	Um_nik	160
5	djm03178	158
6	Dominater069	156
7	adamant	153
8	luogu_official	152
9	awoo	151
10	TheScrasse	147

View all →

→ Find user

→ Recent actions

Detailed →

Pieck's blog

Getting TLE inserting elements in set

By Pieck, history, 3 years ago, In English

CLOSED

Can someone please explain me why this piece of code is taking around 25 secs to run, though the complexity is O(N^2log(N)) where N<=5000.

int n = 5000;
for(int i=0; i<n; i++) {
    set<int> s;
    for(int j=n; j>=1; j--) s.insert(j);
}

set, time complexity

Pieck
3 years ago
14

Comments (13)

Show archived | Write comment?

waltz47

3 years ago, # |

← Rev. 3 →

+10

N^2 log(N) is big. More than 10^8 operations.

→ Reply

-is-this-fft-

3 years ago, # ^ |

It's only 3e8, in some cases I'd submit that. Seems like the constant is too bad here though.

→ Reply

Ritwin

3 years ago, # ^ |

← Rev. 3 →

It's slow because:

set and unordered_set are inherently slow, probably because they need to pass pointers/references around and have bad cache locality
It's allocating 100MB dynamically. That's a lot of memory, so obviously it takes a while to allocate and deallocate.

defnotmee's reply below this has a better benchmark than mine. Read that and upvote it!

Original post

Just claiming things isn't good though, so I benchmarked it:

Benchmarking Code

#include <iostream>
#include <chrono>
#include <set>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    for (int i=0; i<n; i++) {
        auto st = now();
        std::set<int> s;
        for (int j=n; j>=1; j--) s.insert(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Output:

Total time:       21.7573 s
No deallocations: 17.8228 s

I can't really measure allocation time (I wish there was a set::reserve method but that wouldn't really make sense in a BST), but the deallocation time is about 4 seconds, which is very surprising.

Moving set<int> to outside the outer loop made it take 10.85 seconds, and by timing the program from the terminal and by putting it in a separate function so that the destructor could be called, I found no extra time wasted on deallocating that 100MB (?).

Also, I tested std::priority_queue<int> to see the difference, and it was pretty shocking: 7.5s with it in the loop, and 5.6s with it outside. unordered_set was also faster, taking 13s including deallocations, 9.76s without, and 2.67 (!) seconds when declaring it outside the loop.

Useful things from my original benchmark:

std Library Container	Including deallocations	Pure runtime
std::set	21.76s	17.8s
std::unordered_set	13s	9.76s
std::priority_queue	7.5s	7.5s

So an Expert tip: Use sorting (which actually tends to be faster than a priority_queue) or a priority_queue instead of a [unordered] [multi] set if you don't need it, because that could be a significant slowdown to your program.

→ Reply

Pieck

3 years ago, # ^ |

I was looking for something exactly like this. Thanks for the help.

→ Reply

defnotmee

3 years ago, # ^ |

There are some problems with your benchmark, and specially with "Moving X to outside the outer loop" if you want to test for destructions. With your original code (and not using -O2 because it sometimes cuts part of the code when compiling), that was the output:

Total time:       13.5812 s
No deallocations: 12.0908 s

Now this happens when i take set out of the loop:

Total time:       7.74518 s
No deallocations: 7.74462 s

This makes no sense, and the reason why that is happening is because the code is basically inserting once and checking whether the element is in the set in the other iterations, which it always is since they are all duplicates, and therefore nothing will be inserted. In fact, here is the benchmark of doing exactly that:

Code

#include <iostream>
#include <chrono>
#include <set>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    std::set<int> s;
    for (int j=n; j>=1; j--) s.insert(j);
    for (int i=0; i<n; i++) {
        auto st = now();
        for (int j=n; j>=1; j--) s.count(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       6.42716 s
No deallocations: 6.4223 s

Close enough.

Here is i think a better test for not having to destruct any sets and actually doing what OP wanted (and also not making a bigger set because we arent testing for that):

Code

#include <iostream>
#include <chrono>
#include <set>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    std::set<int> s[n];
    for (int i=0; i<n; i++) {
        auto st = now();
        for (int j=n; j>=1; j--) s[i].insert(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       13.1702 s
No deallocations: 13.1694 s

Even if there is a bit of overhead of using the array of sets, i think its pretty safe to say that destructions were a bigger deal than it.

Here are benchmarks comparing those other data structures with the original test, being taken out of the loop and making an array of them:

priority_queue

Here the order you're inserting makes a big difference. Im not particularly familiar on how priority_queue works but it probably has to do with how deep on the heap you need to go to insert the element

===============================

Original test, inserting biggest to smallest

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>
#include<queue>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    
    for (int i=0; i<n; i++) {
        auto st = now();
        std::priority_queue<int> s;
        for (int j=n; j>=1; j--) s.push(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       1.76891 s
No deallocations: 1.76038 s

===============================

Original test, inserting smallest to biggest

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>
#include<queue>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    
    for (int i=0; i<n; i++) {
        auto st = now();
        std::priority_queue<int> s;
        for (int j=1; j<= n; j++) s.push(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       8.59461 s
No deallocations: 8.58484 s

===============================

Just out of the loop, biggest to smallest

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>
#include<queue>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    
    std::priority_queue<int> s;

    for (int i=0; i<n; i++) {
        auto st = now();
        
        for (int j=n; j>=1; j--) s.push(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       2.43477 s
No deallocations: 2.43448 s

Funnilly enough this actually slows down priority queue, since this data structure allows for duplicate elements

===============================

Just out of the loop, smallest to biggest

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>
#include<queue>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    
    std::priority_queue<int> s;

    for (int i=0; i<n; i++) {
        auto st = now();
        
        for (int j=1; j<= n; j++) s.push(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       3.5903 s
No deallocations: 3.58994 s

Priority_queue is really quirky huh. I guess the height of the heap doesnt actually go that far if we repeat elements

===============================

Making an array, biggest to smallest

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>
#include<queue>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    
    std::priority_queue<int> s[n];

    for (int i=0; i<n; i++) {
        auto st = now();
        
        for (int j=n; j >= 1; j--) s[i].push(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       1.87247 s
No deallocations: 1.87196 s

===============================

Making an array, smallest to biggest

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>
#include<queue>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    
    std::priority_queue<int> s[n];

    for (int i=0; i<n; i++) {
        auto st = now();
        
        for (int j=1; j<= n; j++) s[i].push(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       8.59998 s
No deallocations: 8.59925 s

unordered_set

Original test

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    
    for (int i=0; i<n; i++) {
        auto st = now();
        std::unordered_set<int> s;
        for (int j=n; j>=1; j--) s.insert(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       7.63321 s
No deallocations: 6.32708 s

===============================

Just out of the loop

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    std::unordered_set<int> s;
    for (int i=0; i<n; i++) {
        auto st = now();
        for (int j=n; j>=1; j--) s.insert(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       2.30835 s
No deallocations: 2.30802 s

===============================

Making an array

#include <iostream>
#include <chrono>
#include <set>
#include<unordered_set>

inline decltype(auto) now() { 
    return std::chrono::high_resolution_clock::now();
}

int main() {
    int64_t t1 = 0;
    auto beg = now();

    int n = 5000;
    std::unordered_set<int> s[n];
    for (int i=0; i<n; i++) {
        auto st = now();
        for (int j=n; j>=1; j--) s[i].insert(j);
        auto en = now();
        t1 += (en - st).count();
    }

    auto end = now();

    std::cout << "Total time:       " << (end - beg).count() / 1e9 << " s\n";
    std::cout << "No deallocations: " << t1 / 1e9 << " s\n";
}

Total time:       6.8326 s
No deallocations: 6.83171 s

All of those were run with this command: g++ help.cpp -std=c++17 -Wall -Wextra -Wfloat-equal -o help; .\help

Conclusion:

Destructing actually consumes some time, but not that much
Destructing is particularly negligible with priority queue (i.e vector)
Benchmarks are fun to make
Go priority queue!

→ Reply

Ritwin

3 years ago, # ^ |

Thank you for this. I guess I have a lot to learn with benchmarking! Would you mind if I edited my comment to say that your benchmark is better?

→ Reply

defnotmee

3 years ago, # ^ |

Of course i wouldnt mind. Btw have a nice day :)

→ Reply

Pieck

3 years ago, # ^ |

Ok my bad, I calculated the operations to be around 3e7 but it comes out to be 3e8 and since i and j are changing and we are already in 1e8 range so it pushed the time to 25 seconds. And I noticed this and thought it was something weird but if it would have happened in 1e7 range then it wouldn't even be noticed. Understood.

→ Reply

AIireza

3 years ago, # |

Complexity analysis can help you predict scaling with problem size, but won't predict absolute times.

→ Reply

Bitweisser

3 years ago, # |

← Rev. 7 →

You have declared your set<int> s inside the for loop doing which you are instructing to deallocate the previously stored set inside that varibale s (which takes O(N) ). To prove this try declaring one unordered_set (which takes O(1) time) once outside the loop and once inside the loop with the same N or you can try by declaring the set outside the loop directly.

→ Reply

Pieck

3 years ago, # ^ |

So, the set is needed to be implemented inside the loop as before starting the inner loop I need a set with all elements from 1 to n. I have omitted other details of the solution so it might have been confusing. I know my solution pertaining to that problem is not efficient but I want to know why exactly this happens. Another way could be to declare set outside but after the completion of inner loop I'll need to clear the set for my implementation, which won't improve the results as I'm still clearing the set.

→ Reply

imtiyazrasool92

3 years ago, # ^ |

← Rev. 2 →

~~This is compilation time not run time~~

→ Reply

Pieck

3 years ago, # |

Auto comment: topic has been updated by Pieck (previous revision, new revision, compare).

→ Reply