Chilli's blog

By Chilli, history, 7 years ago, In English

TL;DR

The Policy Hash Table has 3-6x faster insertion/deletion and 4-10x increase for writes/reads. As far as I can tell, there are no downsides. The policy hash table (specifically the open-addressing version), beats out unordered_map in all my benchmarks.

PS: Make sure you read the section a better hash function and use it — I'd recommend this one: https://github.com/kth-competitive-programming/kactl/blob/master/content/data-structures/HashMap.h

Background

I've often been irritated by how slow unordered_map is in C++. Too often, I have something that runs fast enough in terms of complexity, but the constant factor from unordered_map slows down the solution too much. Yesterday though, after using the useful order statistics tree from https://codeforces.net/blog/entry/11080, I was curious if there were any other useful data structures hiding in the Policy STL. And lo and behold, I found a hash table.

Benchmarks

Well, enough backstory, let's look at some numbers. All benchmarks below are compiled with C++14 -O2.

unordered_maplinear insertion:                                  0.689846
cc_hash_tablelinear insertion:                                  0.408233
gp_hash_tablelinear insertion:                                  0.256131

unordered_maplinear read/write:                                 1.69783
cc_hash_tablelinear read/write:                                 0.202474
gp_hash_tablelinear read/write:                                 0.26842

unordered_maprandom insertion:                                  2.90184
cc_hash_tablerandom insertion:                                  3.15129
gp_hash_tablerandom insertion:                                  0.56553

unordered_maprandom read/write:                                 2.02336
cc_hash_tablerandom read/write:                                 0.333415
gp_hash_tablerandom read/write:                                 0.403486

While for linear insertions, the policy hash table gives modest improvements, the policy hash table blows the unordered_map out of the water when it comes to reads/writes. These are order of magnitude improvements that make hash tables usable when they previously weren't. "Official" benchmarks done by the DS authors can also be found here

Example

Benchmarks of course, don't always reflect the real world. So here's an example of it allowing a solution to be accepted that TLE's with unordered_map.

Example problem (5000 ms time limit)
Solution with unordered_map: (TLE on test case 8)
Solution with policy hash table directly substituted in: (TLE on test case 26)
Solution with unordered_map, rewritten to not use clears: (TLE on test case 26)
Solution with policy hash table and rewritten to not use clears: (AC with max time of 3180 ms)

Usage

To use this data structure:

#include <ext/pb_ds/assoc_container.hpp>
using namespace __gnu_pbds;
gp_hash_table<int, int> table;

From there, the API seems almost exactly the same.

Defeating Anti-Hash tests

One weakness of hash tables is that mean people can find hash collisions offline and blow up the complexity of your hashmap. In my opinion, the easiest way of solving this is below. There's no need to define your own custom hash function.

const int RANDOM = chrono::high_resolution_clock::now().time_since_epoch().count();
struct chash {
    int operator()(int x) const { return x ^ RANDOM; }
};
gp_hash_table<key, int, chash> table;

Why is it so much faster?

See my comment here

A better hash function

There is one issue with using policy hash tables as is. The default hash function for numerics in C++ is just the identity. This is especially problematic for using hash tables for something like a fenwick tree, especially since the default bucket structure for policy_hash_tables is based of powers of 2 and not primes. If you're using policy hash tables for fenwick trees, you have 2 options. 1. Choose to use prime table sizes, as done here. 2. Choose a better hash function for integers, as done here.

Thanks to adamant for his post that revealed to me the existence of policy data structures, and thanks to ed1d1a8d for the discussion.

PS: In other posts for unordered_map, I've seen people claim that reserve and max_load_factor could increase performance drastically. They didn't seem to do much for me. However, if you want to do something similar for these hash tables, check out the example here

Code for the benchmarks can be found here

EDIT: I realized that gp_hash_table is the way to go, not cc_hash_table. gp_hash_table sacrifices ~10% reading/writing speed to gain 3-6x in insertion/deletion/clearing. I updated the post to reflect the new numbers.

  • Vote: I like it
  • +328
  • Vote: I do not like it

| Write comment?
»
7 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Chilli (previous revision, new revision, compare).

»
7 years ago, # |
  Vote: I like it +3 Vote: I do not like it

Thank you for sharing this!

How can I utilize a pair as a key?

  • »
    »
    7 years ago, # ^ |
    Rev. 5   Vote: I like it 0 Vote: I do not like it

    Unluckily, C++ doesn't provide a hashing operator for pairs by default. Thus, you need to define your own. I've typically done it like this.

    struct chash {
        int operator()(pii x) const { return x.first* 31 + x.second; }
    };
    gp_hash_table<pii, int, chash> table;
    

    For unordered_map, simply defining the operator in the std namespace seems to work. However, for policy hash table, since it's in a different namespace, it seems like we need to pass it in manually.

    gp_hash_table continues to perform great with pairs.

    unordered_map linear insertion:                                 1.44404
    cc_hash_table linear insertion:                                 1.49816
    gp_hash_table linear insertion:                                 0.749987
    
    unordered_map linear read/write:                                1.94124
    cc_hash_table linear read/write:                                1.91796
    gp_hash_table linear read/write:                                1.49307
    
    unordered_map random insertion:                                 3.99146
    cc_hash_table random insertion:                                 4.10996
    gp_hash_table random insertion:                                 0.804868
    
    unordered_map random read/write:                                5.76351
    cc_hash_table random read/write:                                0.915423
    gp_hash_table random read/write:                                1.06964
    

    EDIT: Updated with gp_hash_table results. EDIT2: Updated with proper way of defining custom hashes.

    • »
      »
      »
      7 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      Thank you for the response.

      How did you do to create such hash function? Is there some useful tutorial about hash functions? I would appreciate learning how to create a hash function when I utilize a general struct as key.

      • »
        »
        »
        »
        7 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Somebody else suggested that one at some point, and I found it worked pretty well, so I've stuck with it.

        All hash needs to do is return something that's fairly random. Usually I just multiply each element by a prime, and that seems to work decently.

    • »
      »
      »
      7 years ago, # ^ |
        Vote: I like it +11 Vote: I do not like it

      For unordered_map, simply defining the operator in the std namespace seems to work

      Don't do this. It seems to work, but technically it doesn't. It is undefined behavior to add a specialization of the std::hash template class to the namespace std for std::pair<int, int>:

      A program may add a template specialization for any standard library template to namespace std only if the declaration depends on a user-defined type and the specialization meets the standard library requirements for the original template and is not explicitly prohibited.

      You can achieve the same result without invoking undefined behavior if you define a custom class with an appropriate operator() and pass it to as a template argument to either an std::unordered_map or another hash table.

      • »
        »
        »
        »
        7 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        I see, thanks! I'll update my post to reflect this.

  • »
    »
    7 years ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    As an alternative solution you can use nested hash tables, for example:

    gp_hash_table<first, gp_hash_table<second, value>> table;

    The first key is the first element in a pair. The value is a hash table that uses the key as the second element in a pair with the final value of the value of pair.

    It is much easier and faster to implement it, and you don't need to think about the pair hashing function.

    But it actually will be interesting how this approach will affect time and memory efficiency, comparing to custom hashing function.

    • »
      »
      »
      7 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      I don't think defining a custom hash function is too hard. You just need to make a struct with a custom operator.

      As for benchmarks, that solution is going to be (as you might imagine, a ton slower).

      double_hash linear insertion:                                 5.48207
      gp_hash_table linear insertion:                                 0.861065
      
      double_hash linear read/write:                                1.44702
      gp_hash_table linear read/write:                                2.04147
      
      double_hash random insertion:                                 10.3088
      gp_hash_table random insertion:                                 0.880112
      
      double_hash random read/write:                                9.34554
      gp_hash_table random read/write:                                1.1673
      

      I think defining a custom operator should be something that C++ competitive programmers know how to do. It's useful for a lot of things.

»
7 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Chilli (previous revision, new revision, compare).

»
7 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by Chilli (previous revision, new revision, compare).

»
7 years ago, # |
  Vote: I like it +3 Vote: I do not like it

What's the difference between cc_hash_table and gp_hash_table? Noticed that they were both in the gnu pbds.

  • »
    »
    7 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Cc_hash_table is a chained hash table, while gp_hash_table is an open address hash table. I have some benchmarks in the post.

    It seems like in practice, gp_hash_table has far superior performance for insertions/deletions, while cc has marginally better performance for reads/writes.

»
7 years ago, # |
  Vote: I like it +6 Vote: I do not like it

Thank you for an interesting post. I have thought that open addressing is an obsolete technique that should not be used.

But why is the standard unordered_map implementation so slow? Could there still be some downsides in the policy based structures that can't be seen in your experiments?

  • »
    »
    7 years ago, # ^ |
    Rev. 4   Vote: I like it +16 Vote: I do not like it

    TL;DR: The C++ standard requires certain unnecessary things that make unordered_map implementations very slow.

    So I was pretty curious about this too. Reading around, it seems like it's because the C++ standard requires some bizarre things from unordered_map, namely, that you can iterate over buckets, as well as being able to iterate through elements in a bucket.

    I found this comment on HN: "A bigger issue IMO is that the existence of some methods in various standard classes/templates really murder performance, no matter how good the implementation is. One particularly bad example is the bucket() method, local_iterator types, etc. in c++11's unordered_map (probably the rest of the unordered family too). These force the table to be implemented in such a way that every insertion requires a heap allocation, and iteration requires much more pointer chasing than would otherwise be necessary (e.g. with a probed implementation), which is... unfortunate for the cache."

    Also, there's this talk which cites the same point. I linked to the relevant portion here

    Due to the C++ standard, unordered_map's elements must be stored in what is basically a linked list. I don't think there's any real use in accessing the elements in a bucket, nor is there an use in iterating through the buckets, so the policy hash map seems strictly superior to me.

»
7 years ago, # |
Rev. 4   Vote: I like it +6 Vote: I do not like it

How easy to hack solution with policy hash table in contest? Hacker need to maximize number of collision, like dacin21 wrote in post about rolling hashes. Is there a way to randomize the standard hash function?

UPD. Let n — number of buckets in hash table. I think that worst test is 0, n, 2·n, 3·n, ..., k·n, ... — all integers will be pasted in one bucket. We need to modify hash function like:

Gen rdseed
  • »
    »
    7 years ago, # ^ |
    Rev. 3   Vote: I like it 0 Vote: I do not like it

    In my opinion, this is the easiest way to avoid hacks. Why is it necessary to xor with a random string?

    const int RANDOM = chrono::high_resolution_clock::now().time_since_epoch().count();
    struct chash {
        int operator()(int x) const { return x ^ RANDOM; }
    };
    gp_hash_table<int, int, chash> table;
    
    • »
      »
      »
      7 years ago, # ^ |
      Rev. 3   Vote: I like it +3 Vote: I do not like it

      Why is it necessary to xor with a random string

      Xor with random address in memory from heap manager, on every testing system this value will be different for each program from users. Not accidentally the memory is called Random-access memory (RAM). high_resolution_clock() may tick in microseconds (on codeforces, MinGW), but if we get XOR with pointer from heap manager, this number will be almost perfectly random

      • »
        »
        »
        »
        7 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        I see. I remember seeing somebody hacking a solution by generating anti-hash cases for random seeds in an entire time range, and then checking custom invocation for the exact time to submit it.

»
7 years ago, # |
  Vote: I like it +14 Vote: I do not like it

Is there any way to use max_load_factor and reserve with gp_hash_table? I believe doing so would make it much faster in many cases. I looked online but couldn't find anything besides unreadable GCC jargon. In addition, I wrote a custom hash table out of curiosity. Interestingly enough, the benchmarks (all gcc with -O2) vary a lot based on how the program is compiled and where it's run. The benchmark consists of adding and then erasing 4 million random integers.

My machine (Intel i5-6300HQ) with Windows and MinGW 4.9.2:

Custom hash set: 269ms
gp_hash_table: 522ms
cc_hash_table: 1709ms
unordered_set: 2009ms

My machine with Linux Subsystem for Windows and g++ 7.3.0:

Custom hash set: 405ms
gp_hash_table: 588ms
cc_hash_table: 2461ms
unordered_set: 1303ms

My machine with Ubuntu on Virtualbox with g++ 7.3.0:

Custom hash set: 592ms
gp_hash_table: 627ms
cc_hash_table: 3268ms
unordered_set: 1498ms

Ideone with C++14:

Custom hash set: 234ms
gp_hash_table: 247ms
cc_hash_table: 1713ms
unordered_set: 858ms

Benchmarking code

  • »
    »
    7 years ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    I wrote the definition to do so in the PS (the important thing is the last 2 trues). It didn't seem to do much for me. After using that typedef, you can call table.resize() and table.set_load_factor(). set_load_factor takes a pair (min_load_factor and max_load_factor).

    The benchmarks don't seem to vary that much to me (except for the performance of cc_hash_table on your machine). Gp_hash_table outperforms unordered_map by 2-5x somewhat consistently, and is fairly close to your custom implementation for insertion/deletion.

    • »
      »
      »
      7 years ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      I tried calling resize and set_load_factor after using the typedef, but I get a long compiler error :(

      • »
        »
        »
        »
        7 years ago, # ^ |
        Rev. 2   Vote: I like it 0 Vote: I do not like it

        So, a couple more test results. Your custom hash table seems to outperform gp_hash_table by 1-2x for insertion/erasing, but seems to be outperformed by gp_hash_table by 1-2x for reads.

        Custom hash set: 490ms
        gp_hash_table: 261ms
        cc_hash_table: 378ms
        unordered_set: 1467ms
        

        As for the template, it seems that gp_hash_table requires an extra parameter. C++ template errors are awful to debug, aren't they?

        This is the correct one:

        typedef gp_hash_table<
            int, null_type, hash<int>, equal_to<int>, direct_mask_range_hashing<int>, linear_probe_fn<>,
            hash_standard_resize_policy<hash_exponential_size_policy<>, hash_load_check_resize_trigger<true>, true>>
            gp;
        
        typedef cc_hash_table<
            int, null_type, hash<int>, equal_to<int>, direct_mask_range_hashing<int>,
            hash_standard_resize_policy<hash_exponential_size_policy<>, hash_load_check_resize_trigger<true>, true>>
            cc;
        

        It didn't seem to do much for me. And here's the benchmark, where there's only 1e5 total elements, but they're reinserted 100 times. It also shows an example for the typedefs needed as well as how to use them.

        https://ideone.com/N0Wha6

  • »
    »
    7 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    One more thing. If you're interested in playing around with the settings to see if there's a faster version, check out their benchmarks here: https://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/assoc_performance_tests.html

»
7 years ago, # |
  Vote: I like it 0 Vote: I do not like it

How does this work internally? i.e. how is it implememted? Any explanations as to why it is faster?

»
7 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Thank you so much~

»
6 years ago, # |
  Vote: I like it -10 Vote: I do not like it

gp_hash_table<int, int> m; gp_hash_table<int, int>::iterator it; it=m.lower_bound(23);

it have compilation error. how to do lower_bound in gp_hash_table?

»
6 years ago, # |
  Vote: I like it 0 Vote: I do not like it

How to rehash (set the number of buckets) in gp_hash_table?

»
6 years ago, # |
  Vote: I like it +1 Vote: I do not like it

As far as I can tell, there are no downsides.

Please tell that to my Clang compiler.

»
6 years ago, # |
  Vote: I like it +8 Vote: I do not like it

I solved this problem using gp_hash_table and got accepted. Using unordered_map or map causes TLE.

»
6 years ago, # |
  Vote: I like it 0 Vote: I do not like it

I got on this problem:

AC (483 ms) using regular map: 41736058

AC (265 ms) using unordered_map: 41736051

TLE (over 3000 ms) using gp_hash_table: 41735940

I also submitted a code using the custom hash function just to make sure, but I still got TLE: 41736522

Am I doing something wrong?

  • »
    »
    6 years ago, # ^ |
    Rev. 5   Vote: I like it 0 Vote: I do not like it

    55 test case contains only powers of two, so, hash function from Chilli does not help. I changed hash function in your code, try to use this. Source code. Maybe someone can suggest even better hash function.

    UPD: Try to use this variant of hash function too.

    • »
      »
      »
      6 years ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      Thanks. Got AC in 436 ms with your first hash function, and 171 ms with your second one.

      But I'm curious: why exactly the default hash function for unordered_map works fine for this case but the one from gp_hash_table doesn't?

      At least IMO, having to carefully choose and write a custom hash function every time I use policy hash table is kind of a downside.

      • »
        »
        »
        »
        6 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        The hash function I typically use results in 156 ms. https://codeforces.net/contest/1006/submission/41804666

        Basically, this is a combination of a couple of things. Open addressing and power of 2 buckets are both not very robust to bad hash functions.

        Unordered_map uses collision chaining + prime buckets. gp_hash_table uses open addressing + power of 2 buckets. That makes it run into issues with the default "hash function" for integers, which is just the identity.

»
6 years ago, # |
  Vote: I like it +18 Vote: I do not like it

Thanks for the informative post Chilli!

One thing worth mentioning: return x ^ RANDOM is not enough for a safe hash function. gp_hash_table uses a modulo power of two policy, so giving it many values that are the same modulo a large power of two such as 216 is enough to make it extremely slow. x ^ RANDOM doesn't do anything to mitigate this, since two values that are equivalent modulo 216 are still equivalent after xor-ing with the same number.

Better to use the high-resolution clock idea in combination with a more complex hash function. I'm planning to write a blog post soon with more details!

  • »
    »
    6 years ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    Thanks for the heads up!

    I did realize that after writing the section, which is the main purpose of the "A Better Hash Function" section (I also chose (x<<16) ^ RANDOM as my "good" hash function...) Looking back at it, I should probably have put the hash function stuff together... I'll also update the code with a better hash.

    I use a fairly jank hash function I found online at some point, but I'd definitely like to know more!

    Looking forward to your blog post!

»
6 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Is there any way to check the existence of an entry without creating it?

»
5 years ago, # |
  Vote: I like it 0 Vote: I do not like it

While unordered_set contains the count() function, gp_hash_table does not. Instead, you should use hashmap.find(x)!=hashmap.end()

  • »
    »
    20 months ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    God bless you man, I was searching it for so long

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Hey Chilli, I have recently seen this post. I am facing a problem with this hash table. According to your analysis gp hash table is faster than unordered map.

Here, I have used unordered map and time was 1278 ms but using gp hash table gets time 1402 ms. What is the problem here or I did something wrong?

»
3 years ago, # |
Rev. 4   Vote: I like it 0 Vote: I do not like it

gp_hash_table swaps slower than unordered_map, i've got TL in 1455G - Forbidden Value because of using gp_hash_table instead of unordered_map. 131955549(unordered_map) and 131955617(gp_hash_table)

  • »
    »
    3 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    From what I can see you are using gp_hash_table<int, ll> m; instead of gp_hash_table<int, ll, custom_hash> m;. There is probably some anti hash test case on G, so you need to use the custom_hash to not get TLE.

    • »
      »
      »
      3 years ago, # ^ |
        Vote: I like it 0 Vote: I do not like it

      i've tried with gp_hash_table<int, ll, custom_hash> m; and got TL 131976574

      • »
        »
        »
        »
        3 years ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Did you find out the reason? I'm confused.

        • »
          »
          »
          »
          »
          3 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          Probably because they have some buffers, but IDK. But it's good to know

»
5 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Hi. What is time complexity for clear()?