Блог пользователя AnotherRound

Автор AnotherRound, история, 9 лет назад, По-английски

After getting TLE on test 132 for problem C on CF #350, I tried to optimise. I was surprised when I got AC, because the only thing I needed to change was to replace std::unordered_map with std::map. Does anyone have ideas what causes the different performance? I expected that unordered_map should be faster that map, but it is not the case. And also, is there a way to speed up unordered_map? Here are both submissions:

AC — map: http://codeforces.net/contest/670/submission/17757996

TLE — unordered_map: http://codeforces.net/contest/670/submission/17730894

  • Проголосовать: нравится
  • +32
  • Проголосовать: не нравится

»
9 лет назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Auto comment: topic has been updated by AnotherRound (previous revision, new revision, compare).

»
9 лет назад, # |
  Проголосовать: нравится +10 Проголосовать: не нравится

It's anti-hash-map test. Every hash-map implementation fails on certain anti-hash-map tests. The only way to prevent it is to use your own randomized implementation.

»
9 лет назад, # |
Rev. 3   Проголосовать: нравится +33 Проголосовать: не нравится

Apparently simple reserve() helps to avoid TLE (I also got TLE and felt very sad):

http://codeforces.net/contest/670/submission/17728217 — TLE 132 (unordered_map)

http://codeforces.net/contest/670/submission/17758293 — AC (still unordered_map)

According to reserve() documentation, it rehashes the hash table for better bucket distribution.

  • »
    »
    9 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Another way I found is to xor values with some (random) constant when you access the hashmap.

»
9 лет назад, # |
Rev. 2   Проголосовать: нравится +26 Проголосовать: не нравится

During one educational round, I was pretty sure my solution for D will pass the system tests since it was not that hard — it was a combination of coordinate compression and fenwick tree. I wanted to submit it fast so I just copied my previous codes for the compression and the tree and it passed the pretests. Then, during the hacking phase, I saw halyavin had hacked my solution with TLE. After looking at my code for a while I saw that my coordinate_compression struct uses unordered_map and I asked him if he somehow managed to make it slow and here is what he said:

Yes, you are right. unordered_set and unordered_map are O(n^2) is the worst case. I just needed to put a lot of eggs elements in the single bucket. It is not the first contest I (and others) use an anti-hash test. There were quite a lot of discussion on codeforces about them too. Well, now you know about them the hard way.

So maybe you don't want to use unordered_map/unordered_set anymore :D

Here are my code and his test generator — http://codeforces.net/contest/652/submission/16923953, http://codeforces.net/contest/652/hacks/222128/test :)

  • »
    »
    9 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    I tried dreamzor's suggestion and it works great. But at least in Bulgarian competitions, I'll make sure to upload versions with both map and unordered_map :D

  • »
    »
    8 лет назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Hi, Sorry for bringing this up a bit delayed, but would you mind explaining how we know the buckets of the unordered_map already and in conjunction how the test then puts all the elements into that bucket? Thanks!

    • »
      »
      »
      8 лет назад, # ^ |
      Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

      I don't really know, maybe someone who knows can explain, I am also interested in this :)

      • »
        »
        »
        »
        8 лет назад, # ^ |
          Проголосовать: нравится +5 Проголосовать: не нравится

        Haha okay, maybe someone will see it now and help us, otherwise I might post a blog asking for help

        • »
          »
          »
          »
          »
          8 лет назад, # ^ |
            Проголосовать: нравится +18 Проголосовать: не нравится

          There was an IPSC problem (H) in 2014 that asked to create a test case that breaks both GCC and OpenJDK implementations of hashset. You may read the solution here

»
9 лет назад, # |
  Проголосовать: нравится +15 Проголосовать: не нравится

I also failed on the test but figured out a nice way to circumvent the anti-unordered_map test. You can use this technique if you really need the unordered_map, because it seems to be a tad faster than map in some cases. The trick is to "encrypt" the unordered_map input with xor, you choose some constant e, and every time you need to use the unordered_map you do unordered_map[input^e]. If you choose the constant randomly it will pass systests, but will be hackable if the hacker is not lazy. code here: http://codeforces.net/contest/670/submission/17757763

  • »
    »
    8 лет назад, # ^ |
      Проголосовать: нравится +8 Проголосовать: не нравится

    Thanks for the advice! It is the shortest and most elegant solution I've seen for that issue ever.

    If the constant is chosen in runtime (for example, with rand() with previous seed modification) is it possible to design a killer test for any constant value?

    • »
      »
      »
      8 лет назад, # ^ |
        Проголосовать: нравится +5 Проголосовать: не нравится

      It is possible to design a killer test for any constant value, but you can't design a killer test for all constant values :). So choosing the constant at runtime with rand() and proper seeding should be unhackable and pass the systests.