priyanshuog's blog

By priyanshuog, history, 7 months ago, In English

In this question 822C - Hacker, pack your bags! , when I used set it gave TLE on test case 10 but when I used unordered_set, it got Accepted. I read some blogs where they told not to use unordered_set/map on CF as it can be hacked, so what should I do, should I use unordered set/map or not?

Submission 1 (TLE) : 256948385 Submission 2 (Accepted) : 256948457

  • Vote: I like it
  • +9
  • Vote: I do not like it

»
7 months ago, # |
Rev. 2   Vote: I like it +39 Vote: I do not like it

It's a very common question, that has already appeared multiple times in Codeforces blogs.

unordered_map gives you O(1) complexity on average, while map gives O(logn). Seems that unordered_map is better in such case, but at what cost does it provide such complexity?

Try to think about it by yourself

How to prevent this? Everything described here.

And the same thing works for unordered_set and similar data structures as well. As a rule of thumb, if you don't know how to prevent your solution from getting hacked using unstable data structures — use their stable analogues, but not them.

  • »
    »
    7 months ago, # ^ |
      Vote: I like it +13 Vote: I do not like it

    This is a really nice comment

  • »
    »
    7 months ago, # ^ |
      Vote: I like it +30 Vote: I do not like it

    But what the OP said here is the other way around, Unordered got accepted and ordered got TLE

    • »
      »
      »
      7 months ago, # ^ |
        Vote: I like it +36 Vote: I do not like it

      ordered_set and just set are different data structures from STL, don't confuse them with each other.

      Solution with set has got TLE because, once again, on random data unstable structures as unordered_set give better performance, but it applies only and only on tasks with closed solutions. If solutions are opened — wait for successful hack.

      Very often when your solution can't pass because of the additional logn it's either because your idea is inefficient or because your implementation is inefficient. It's very rare when task requires to use unstable data structures with no alternatives to them

      • »
        »
        »
        »
        7 months ago, # ^ |
          Vote: I like it +10 Vote: I do not like it

        I meant the regular set.

        Got your point thanks, but what do you say about using custom hash functions ?

        • »
          »
          »
          »
          »
          7 months ago, # ^ |
          Rev. 3   Vote: I like it +10 Vote: I do not like it

          Using custom hash functions allows you to modify unordered_set hashing so that it's less likely to be hacked.

          Here's my template:

          mt19937_64 rng(chrono::steady_clock::now().time_since_epoch().count());
          struct custom_hash {
            	static uint64_t splitmix64(uint64_t x) {
             	 	x += 0x9e3779b97f4a7c15;
              	x = (x^(x>>30))*0xbf58476d1ce4e5b9;
              	x = (x^(x>>27))*0x94d049bb133111eb;
              	return x^(x>>31);
            	}	
          
            	size_t operator()(uint64_t x) const {
              	static const uint64_t FIXED_RANDOM = rng();
              	return splitmix64(x+FIXED_RANDOM);
            	}
          };
          
          template<typename T> using safe_set = unordered_set<T, custom_hash>;
          

          Please note that not everything is able to be given a hash, so only stick to basic data types such as int, string etc.

          • »
            »
            »
            »
            »
            »
            7 months ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            ah yes string, my favorite basic data type

        • »
          »
          »
          »
          »
          7 months ago, # ^ |
            Vote: I like it +5 Vote: I do not like it

          If you're using unordered_map or unordered_set you MUST have them anyway. I've already gave a link to the blog that gives comprehensive information about which exactly function you should use, so I won't duplicate it.

          About what orz said: I completely agree with him in terms of competitive programming. When you have the prewritten protection code then it's much better to use unordered_map or unordered_set if you only don't need the sorted elements in these structures. Also, most of the time, you really don't need anti-anti-hack protection, because it's very rare now that someone will try to fetch the random and write the test based on it, because it's enormously time consuming and it doesn't cost it.

          If you look in my solutions, then you will see that it's very rare when I'm using such structures, and I have an explanation for this as well. Actually, I've a couple of reasons:

          1. You can't use random hashing in real programming, and I'm trying to write my code as clear as possible and as similar to what you would write in reality (without including the principles of OOP, this is unnecessary in conditions of contests), so these structures are bad for me and I use them only when I really can't think of something else.
          2. If I can't solve the problem without them then either I can't solve problem efficiently at all or problem was designed to be solved in only one way, which is enough to consider (for me) most of the time the problem annoying and boring, and I really don't like such problems.
          • »
            »
            »
            »
            »
            »
            7 months ago, # ^ |
              Vote: I like it +4 Vote: I do not like it

            What do you mean by “You can't use random hashing in real programming”?

            • »
              »
              »
              »
              »
              »
              »
              7 months ago, # ^ |
              Rev. 2   Vote: I like it +9 Vote: I do not like it

              Every time you restart your program your hashing function will have different power and/or module, and therefore with every new launch of the program the hash for the same data will be different, and therefore if you are saving it somewhere it will produce new and new entries without finding the old ones.

              If these structures are used just for temporary data, then yes, you can use it even there.

  • »
    »
    7 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    thanks a lot!

»
7 months ago, # |
Rev. 2   Vote: I like it +14 Vote: I do not like it

I personally believe it is vice versa — use unordered_map if you do not really need the elements to be stored in the sorted order.

Yes, there is a danger of hacks, but, firstly, the cure is not very complicated (one of the commenters already gave a link to a comprehensive material), secondly, I suppose the problem is maybe exaggerated. I believe I don't even have an anti-anti-hash test protection in my template, but I still have never been hacked this way.

For me personally a logical line of behavior looks like this: just use unordered_map like hacks don't exist. And only if someone hacks your solution or it fails system tests, then probably it is worth researching into this problem further.

If you want to be secure and prefer to put on an extra layer of protection, better read https://codeforces.net/blog/entry/62393, add some needed lines to your code and never remember again about unordered_map being hackable.

»
7 months ago, # |
  Vote: I like it -10 Vote: I do not like it

I use ordered map sorted by key value pairs