Блог пользователя TheOpChicken123

Автор TheOpChicken123, история, 2 года назад, По-английски

Hey all, I know that this is a very common topic and you are probably quite turned off by it but i still don't quite understand it. Please take the time to read through this post. Thanks a lot!

During the contest yesterday, I was solving 3SUM — closure: https://codeforces.net/contest/1698/problem/C I figured out the logic to the question and started implementing it. I used unordered_map — passed the pretests — but then TLE'd on the main tests. This was my submission: https://codeforces.net/contest/1698/submission/162250574

But then, i made some changes to this code and got AC. This was my next submission: https://codeforces.net/contest/1698/submission/162252261

Notice the difference between these two pieces of code? Yes! I used a map instead of unordered_map. Now, I have read many threads and articles advising to use maps instead of unordered_maps in my life, for all sorts of reasons, like slow hash functions, frequent collisions which cause the use of buckets which slow the unordered_map down, etc. But in this case, the only thing I'm doing is inserting and using the unordered_map::count function. are these not examples of where the unordered_map's O(1) time complexity should beat the map's O(logN)? Unlike, for example, iterating through the map, in which case, using a map instead of unordered_map would be faster? I was specifically asking myself this question during the contest and because of the fact that this was the only thing I was doing, I chose to use unordered_map — which turned out to stab me in the back and lose me quite a bit of rating. Basically, my question is, during competitive programming, is there EVER a time in which I would want to use unordered_map instead of map?

  • Проголосовать: нравится
  • +12
  • Проголосовать: не нравится

»
2 года назад, # |
  Проголосовать: нравится +16 Проголосовать: не нравится
»
2 года назад, # |
  Проголосовать: нравится +14 Проголосовать: не нравится

My personnel suggestion : Never use unordered_map on codeforces.

Use map instead.

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    How will you solve problems involving O(1) access of large values, where an extra O(logN) factor would be too slow then?

    • »
      »
      »
      2 года назад, # ^ |
        Проголосовать: нравится +18 Проголосовать: не нравится

      gp_hash_table

    • »
      »
      »
      2 года назад, # ^ |
      Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

      True but that is a special and rare occasion. (at least in my short CP career I never meet things like that)

      If map cannot fit in the time limit,I will use unordered_map and make some additional hashes to avoid hacks,which means more code.

    • »
      »
      »
      2 года назад, # ^ |
        Проголосовать: нравится +41 Проголосовать: не нравится

      You leave contest and shit on problemsetter in comments in that case

»
2 года назад, # |
  Проголосовать: нравится +10 Проголосовать: не нравится

It's sometimes useful for attempting to squeeze a suboptimal solution through TL (hopefully with a custom written hash function), especially for some classic string/hashing problems on old judges. But it's quite rare to find a problem that requires or expects an unordered_map solution.

»
2 года назад, # |
  Проголосовать: нравится +10 Проголосовать: не нравится

In your case, it looks like authors decided to create a test, which kills solutions with unordered maps (every operation works in O(n)). Easiest way to prevent this is using randomized hash functions like splitmix64. You can read about these in another blogs. In other cases, unordered maps are much faster than maps, so you should always use them :)

»
2 года назад, # |
Rev. 2   Проголосовать: нравится +36 Проголосовать: не нравится

I disagree with jiangbowen, basically the choice of std::map preferred over std::unordered_map is unjustified if you don't exploit the ordered nature of std::map (e.g. by choosing the minimum element std::map::begin(), or finding the nearest element to a value via std::map::lower_bound()), because std::unordered_map is simply quicker. The problem is that you may fall victim to an antihash test. I would like to believe that usually antihash tests are made in the hacks, and I view it as bad manners to include antihash tests in the problem being a problemsetter. However, this sometimes happens, and if it happens, you should be ready. (This is an actual disadvantage of std::unordered_map because std::map has the guaranteed logarithmic time per query, and std::unordered_map has constant time per query on average. But choosing std::map simply because of this is somewhat conformist's way of thinking.)

Xephon remembered just the right blog. Along with the deep analysis of the issue and its cause, a hotfix is given there. You can simply include it in your C++ template and always, without thinking, attach a custom hashing function every time you need an unordered container: https://codeforces.net/contest/1698/submission/162271598

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится +3 Проголосовать: не нравится

    std::unordered_map is simply quicker

    Let's see if this is true.

    You can simply include it in your C++ template and always, without thinking, attach a custom hashing function every time you need an unordered container: https://codeforces.net/contest/1698/submission/162271598

    And your submission gets 78ms time, which is far from impressive. Everything boils down to having a choice between:

    1. a vulnerable O(1) std::unordered_map solution with a low constant factor, which can degrade to O(N) in the worst case if hacked: https://codeforces.net/contest/1698/submission/162388677
    2. a hardened O(1) std::unordered_map solution with a high constant factor because of custom hash: https://codeforces.net/contest/1698/submission/162271598
    3. a guaranteed O(logN) worst case time complexity std::map solution with a low constant factor: https://codeforces.net/contest/1698/submission/162387139

    So I agree with jiangbowen and I also think that std::map is a reasonable default choice. But std::unordered_map still can be faster on larger data sets. And if you are worried about a possible TLE, then it may be a good idea to benchmark a solution via custom test before submitting it.

    • »
      »
      »
      2 года назад, # ^ |
        Проголосовать: нравится +9 Проголосовать: не нравится

      Are you sure that the solution No. 2 is with a high constant factor? Because of difference between 78 ms in 162271598 and 46 ms in 162387139? It is a way too little difference! The work time may fluctuate at these scales. Please compare std::map and std::unordered_map in a more computation-heavy problem.

      • »
        »
        »
        »
        2 года назад, # ^ |
          Проголосовать: нравится +11 Проголосовать: не нравится

        Interestingly, both your solution — with std::unordered_map with custom hash — and my solution — with just std::map — both took 78 ms. Kind of weird if you ask me. Why do they take exactly the same amount of time?

        • »
          »
          »
          »
          »
          2 года назад, # ^ |
          Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

          All submission execution durations on Codeforces are divisible by $$$\sim15.5$$$ ms. So even if our execution times varied within 10 ms, after rounding they might become exact same numbers.

      • »
        »
        »
        »
        2 года назад, # ^ |
          Проголосовать: нравится +10 Проголосовать: не нравится

        Are you sure that the solution No. 2 is with a high constant factor?

        Yes, the splitmix64 function is pretty expensive because it arranges many arithmetic operations (including somewhat slow 64-bit multiplications) in a dependency chain.

        Because of difference between 78 ms in 162271598 and 46 ms in 162387139? It is a way too little difference!

        Sorry, but you can't dismiss the difference between 46 ms and 78 ms as too little. That's a bit too much of handwaving.

        Please compare std::map and std::unordered_map in a more computation-heavy problem.

        I already mentioned that std::unordered_map will be faster on larger data sets because of better time complexity. But we are unlikely to see really large input data fed through stdin in competitive programming problems, because console input is a performance bottleneck on its own even with the ios_base::sync_with_stdio(0); cin.tie(0); magic. And on smaller data sets the std::unordered_map's performance advantage is not always guaranteed. I'm just saying that your suggestion to use unordered containers "without thinking" isn't always the best choice.

»
2 года назад, # |
Rev. 2   Проголосовать: нравится +10 Проголосовать: не нравится

Hey guys, thank you so much for your help. Just a side question tho — do you think that there could be a time where a std::map could still outperform the std::unordered_map with custom hash functions (for example the one orz added to my solution) given that you are only looking-up and inserting into the map?

Edit: I believe that antihashing tests were the reason that my code TLEd as the unordered_map solution took over 1 second in one of the tests whereas the maximum time the map took in all the tests was only 78ms

»
2 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

I send your code with c++20 compiler and got AC. It seems they improve the default hash function, can anyone provide some documentation to corroborate this?

Submition

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Hello, so do you think I can use unordered_map with c++20 compiler without any worries? Is it still possible for me to get hacked?

    • »
      »
      »
      2 года назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      I believe that you cannot use std::unordered_map with C++20 compiler without any worries.

      Moreover, let's distinguish C++ standard version from C++ compiler version. 20 is the number of standard, and standard only imposes some certain restrictions on compiler that claims to comply with this standard. There are no differences in behavior of std::unordered_map stated in the 17th standard and in the 20th standart, so the only difference is in compiler implementation of std::unordered_map. С++20 version could equally well perform worse than C++17 version.

      So ideally we should look into the implementation code of hashing in GNU C++20, and probably there is no serious difference between the old version and the new one. The vulnerability is pretty much the same, I guess, and the countertest can be generated with the same generator, but with somewhat tweaked constants in it.

      But one cannot say for sure until they in fact look into the code.

»
2 года назад, # |
  Проголосовать: нравится +7 Проголосовать: не нравится

Sometimes, unordered maps are much faster than maps so you should choose wisely what to use

»
2 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Use All on Problems you Solve with Map or Unordered Map(With Custom Hash) or gp_hash_table(With Custom Hash) And Analyze The performance.

Personally I use Map (most of the Times)

Situation 1 : Once I used Unordered Map(With Custom Hash) It gave me TLE I tried multiple Types of hashes Still It Didn't Worked. With Map It got Accepted.

TLE With Unordered Map Only (Execution Time >=2000 ms):162435874

TLE With Unordered Map + Custom hash (Execution Time >= 2000 ms):162435379

Accepted With Map (Execution Time == 702 ms):162435610

Accepted With gp hash table (Execution Time == 717 ms):162435952

Accepted With gp hash table + custom hash (Execution Time == 686 ms):162435482

Situation 2 : Once I Used Map It Gave me TLE but When I used unordered map It got Accepted.

With Experience You Will Know What To use When.

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится +10 Проголосовать: не нравится

    TLE With Unordered Map + Custom hash (Execution Time >= 2000 ms):162435379

    Your custom hash is lightweight and fast, but vulnerable:

    struct chash {
        int operator()(int x) const { return x ^ RANDOM; }
    };
    

    Doing XOR with a random key doesn't help to safeguard against Hash DoS hacks, because this only changes the bucket number. All inserted values still go to the same hash table bucket and time complexity still degenerates to O(N). A more heavyweight custom hash (such as the one suggested by orz) is necessary for real Hash DoS resistance, but it will impact performance to some extent and this will show up in benchmarks too.

    Situation 2 : Once I Used Map It Gave me TLE but When I used unordered map It got Accepted.

    No examples for this situation 2?

    • »
      »
      »
      2 года назад, # ^ |
        Проголосовать: нравится 0 Проголосовать: не нравится

      Check Out This : 158081683 I Took This Custom Hash From neal's blog

      Actually Situation 2 Was Long Time ago so I Kind of forgot which problem It was But I still remember What I've learned From that Problem :)

      • »
        »
        »
        »
        2 года назад, # ^ |
          Проголосовать: нравится 0 Проголосовать: не нравится

        Check Out This : 158081683 I Took This Custom Hash From neal's blog

        The reason for TLE in this submission is the mpp.clear(); operation. The std::unordered_map container allocates many buckets, but never reduces their number later (see the unordered_map::rehash documentation). So every clear operation has to go over all these allocated buckets and this is very slow. A workaround is to assign an empty container instead of clearing the old one: 162486169 (218 ms).

        Removing custom hash makes it somewhat faster 162486256 (187 ms), but leaves vulnerable to hacks.

        All of this needs to be compared to std::map 162480822 (202 ms), which is slightly faster than the use of std::unordered_map with custom hash at least for this problem.

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится +1 Проголосовать: не нравится

    Accepted With Map (Execution Time == 702 ms): 162435610

    BTW, changing endl to "\n" reduces time to 202 ms: 162480822