Блог пользователя harryzhengipsum

Автор harryzhengipsum, история, 2 года назад, По-английски

On 704A - Thor, the policy-based hash table easily passes under the time limit, but the unordered set version TLEs on case 56.

With gp_hash_table: 181117819

With unordered_set: 181133673

Both submissions use a custom hash function, so it can't be due to collisions. On test case 56, the unordered_set suddenly gets TLE but it's only slightly slower than the gp_hash_table on all the other test cases. Why is this?

  • Проголосовать: нравится
  • -3
  • Проголосовать: не нравится

»
2 года назад, # |
  Проголосовать: нравится +8 Проголосовать: не нравится

It is getting TLE because erasing from unordered_set is slow, a normal set should get AC.

  • »
    »
    2 года назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    Is it because unordered_set is returning the iterator after when calling .erase()? Didn't know that it could result in an almost 10x slowdown. Are there any ways to prevent that, or should we simply resort to using gp_hash_table/set?

    • »
      »
      »
      2 года назад, # ^ |
      Rev. 4   Проголосовать: нравится +12 Проголосовать: не нравится

      It's because when an unordered_set is cleared, the entries are removed, but the underlying allocation still remains the same size. Then when you iterate over it, it has to check the entire allocation to find the elements. In other words, iteration is $$$O(N)$$$ for $$$N$$$ being the hashtable's current capacity, not the number of elements actually in it.

      So if you add a lot of elements to a set, then clear it or iterate over it many times, it'll be slow.

      One way to avoid this is to construct a new set instead of clearing the current set.

      Vectors / dynamic arrays do not have this problem, as even if the allocation is much larger than the number of actual elements, it still knows how many elements it has and where they are (at the beginning), so it only has to iterate or clear those. But a hashmap/hashset could have elements scattered anywhere in its allocation, and would typically need extra metadata and implementation if you want it to be able to iterate and clear proportional to number of elements rather than its capacity.

      • »
        »
        »
        »
        2 года назад, # ^ |
        Rev. 2   Проголосовать: нравится +3 Проголосовать: не нравится

        But .erase() and .find() should both be $$$O(1)$$$ regardless of hashset/table size. Plus, it TLEs even when the hashsets are cleared by using ={}.

        Without using .clear(): 181154447

      • »
        »
        »
        »
        2 года назад, # ^ |
        Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

        Actually, I got it to work by clearing it via swapping with a temporary empty array. I'm still not seeing where the actual $$$O(N)$$$ bottleneck is though (is it because .clear() is $$$O(N)$$$ each time?), since .erase() and .find() are both $$$O(1)$$$ on a hashset with few collisions.

        With swapping: 181154447

        • »
          »
          »
          »
          »
          2 года назад, # ^ |
            Проголосовать: нравится +6 Проголосовать: не нравится

          Yes, clearing is also proportional to the size of the allocation / the capacity, not the number of elements (because the elements could be anywhere in the allocation)

»
2 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Do this instead of unreadItems[b].clear()

Spoiler
»
2 года назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Everyone hates unordered_set