Блог пользователя Pieck

Автор Pieck, история, 3 года назад, По-английски

CLOSED

Can someone please explain me why this piece of code is taking around 25 secs to run, though the complexity is O(N^2log(N)) where N<=5000.

int n = 5000;
for(int i=0; i<n; i++) {
    set<int> s;
    for(int j=n; j>=1; j--) s.insert(j);
}
  • Проголосовать: нравится
  • +1
  • Проголосовать: не нравится

»
3 года назад, # |
Rev. 3   Проголосовать: нравится +10 Проголосовать: не нравится

N^2 log(N) is big. More than 10^8 operations.

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится +4 Проголосовать: не нравится

    It's only 3e8, in some cases I'd submit that. Seems like the constant is too bad here though.

    • »
      »
      »
      3 года назад, # ^ |
      Rev. 3   Проголосовать: нравится +2 Проголосовать: не нравится

      It's slow because:

      1. set and unordered_set are inherently slow, probably because they need to pass pointers/references around and have bad cache locality
      2. It's allocating 100MB dynamically. That's a lot of memory, so obviously it takes a while to allocate and deallocate.

      defnotmee's reply below this has a better benchmark than mine. Read that and upvote it!

      Original post

      Useful things from my original benchmark:

      std Library Container Including deallocations Pure runtime
      std::set 21.76s 17.8s
      std::unordered_set 13s 9.76s
      std::priority_queue 7.5s 7.5s

      So an Expert tip: Use sorting (which actually tends to be faster than a priority_queue) or a priority_queue instead of a [unordered] [multi] set if you don't need it, because that could be a significant slowdown to your program.

      • »
        »
        »
        »
        3 года назад, # ^ |
          Проголосовать: нравится 0 Проголосовать: не нравится

        I was looking for something exactly like this. Thanks for the help.

      • »
        »
        »
        »
        3 года назад, # ^ |
          Проголосовать: нравится +7 Проголосовать: не нравится

        There are some problems with your benchmark, and specially with "Moving X to outside the outer loop" if you want to test for destructions. With your original code (and not using -O2 because it sometimes cuts part of the code when compiling), that was the output:

        Total time:       13.5812 s
        No deallocations: 12.0908 s
        

        Now this happens when i take set out of the loop:

        Total time:       7.74518 s
        No deallocations: 7.74462 s
        

        This makes no sense, and the reason why that is happening is because the code is basically inserting once and checking whether the element is in the set in the other iterations, which it always is since they are all duplicates, and therefore nothing will be inserted. In fact, here is the benchmark of doing exactly that:

        Code
        Total time:       6.42716 s
        No deallocations: 6.4223 s
        

        Close enough.

        Here is i think a better test for not having to destruct any sets and actually doing what OP wanted (and also not making a bigger set because we arent testing for that):

        Code
        Total time:       13.1702 s
        No deallocations: 13.1694 s
        

        Even if there is a bit of overhead of using the array of sets, i think its pretty safe to say that destructions were a bigger deal than it.

        Here are benchmarks comparing those other data structures with the original test, being taken out of the loop and making an array of them:

        priority_queue
        unordered_set

        All of those were run with this command: g++ help.cpp -std=c++17 -Wall -Wextra -Wfloat-equal -o help; .\help

        Conclusion:

        • Destructing actually consumes some time, but not that much
        • Destructing is particularly negligible with priority queue (i.e vector)
        • Benchmarks are fun to make
        • Go priority queue!
        • »
          »
          »
          »
          »
          3 года назад, # ^ |
            Проголосовать: нравится 0 Проголосовать: не нравится

          Thank you for this. I guess I have a lot to learn with benchmarking! Would you mind if I edited my comment to say that your benchmark is better?

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Ok my bad, I calculated the operations to be around 3e7 but it comes out to be 3e8 and since i and j are changing and we are already in 1e8 range so it pushed the time to 25 seconds. And I noticed this and thought it was something weird but if it would have happened in 1e7 range then it wouldn't even be noticed. Understood.

»
3 года назад, # |
  Проголосовать: нравится +1 Проголосовать: не нравится

Complexity analysis can help you predict scaling with problem size, but won't predict absolute times.

»
3 года назад, # |
Rev. 7   Проголосовать: нравится +1 Проголосовать: не нравится

You have declared your set<int> s inside the for loop doing which you are instructing to deallocate the previously stored set inside that varibale s (which takes O(N) ). To prove this try declaring one unordered_set (which takes O(1) time) once outside the loop and once inside the loop with the same N or you can try by declaring the set outside the loop directly.

  • »
    »
    3 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    So, the set is needed to be implemented inside the loop as before starting the inner loop I need a set with all elements from 1 to n. I have omitted other details of the solution so it might have been confusing. I know my solution pertaining to that problem is not efficient but I want to know why exactly this happens. Another way could be to declare set outside but after the completion of inner loop I'll need to clear the set for my implementation, which won't improve the results as I'm still clearing the set.

  • »
    »
    3 года назад, # ^ |
    Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

    This is compilation time not run time

»
3 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Auto comment: topic has been updated by Pieck (previous revision, new revision, compare).