xuyiluo's blog

By xuyiluo, history, 2 days ago, In English

as the title said,I had a problem solving some questions about hashing:

If you write a hash with high accuracy, you will exceed the time limit, but if you write a hash with low accuracy, you do not know whether it will be hacked by the data.

What should I do then?

  • Vote: I like it
  • +9
  • Vote: I do not like it

»
2 days ago, # |
  Vote: I like it 0 Vote: I do not like it

As same as you.I want to know why.

»
2 days ago, # |
  Vote: I like it +9 Vote: I do not like it

I've never seen a problem of which the intended solution is hash but you can't write a 'safe' hash function just because it's just a little bit slower than a straightforward version. Perhaps you should think again if the intended solution is not hash at all?

  • »
    »
    2 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    our test's solution is hash,but if I use unsigned overflow hash,I will get hacked,I only got 1/10 points.But if I use double hash,I will get TLE.But common hash will get hacked,too.

  • »
    »
    2 days ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    So what do I do in this situation?

    • »
      »
      »
      37 hours ago, # ^ |
      Rev. 3   Vote: I like it +6 Vote: I do not like it

      I think this question is becoming an XY problem. You should probably specify what you're actually trying to solve. We need to know:

      • The type of elements that you want to be hashed and how you want to use them: The strategy can be very different when it's a string, integer, or a combination of variables. For example, even if it's a string, you don't always need to make hashes based on traditional rolling hash method on the full string, if you only need to identify some fixed strings and you don't need to check arbitrary substrings.
      • The actual constraints and the time complexity of the solution: In hash problems, generally other parts than the hash itself is the bottleneck, such as I/O, or the actual logic where you're using these hash values, because hashing itself is supposed to be one of the "faster" operations.
      • What you actually tried: The solution can be as simple as defining the coefficients as constants rather than variables, if that's what you missed. We don't know without your actual code.
      • »
        »
        »
        »
        32 hours ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        You're right.I'm working on a string hash problem. This problem requires checking arbitrary substrings, and I can only think of ordinary hashes. But the constant problem is with this string hash, and if I use a double hash, I run out of time. Thank you again for your answer.

»
2 days ago, # |
  Vote: I like it 0 Vote: I do not like it

While solving from the problemset, I end up binary searching on the accuracy until i get AC ;-; , generally 10**9 + 7 works quite well tho , and to make it more randomized , i store the values XORed with some random integer. So as to make it harder to hack.

»
2 days ago, # |
  Vote: I like it 0 Vote: I do not like it

Stupid question, but which kind of hash are you asking about?

»
2 days ago, # |
  Vote: I like it 0 Vote: I do not like it

If time limit is not a problem (if you're hashing and it's not fitting in the time limit, probably the intended solution is not hashing), you can try double or even triple hashing. This just means you combine several modulos and bases (for example, mod $$$1e12+39$$$, base $$$69$$$ and unsigned overflow hashing) and instead of a single integer, your hashed value would be a tuple of integers.

Note that apart from changing modulos, changing bases is also effective. And if you're still getting hacked, try randomizing mods and bases. Hope this helps!

  • »
    »
    2 days ago, # ^ |
    Rev. 2   Vote: I like it 0 Vote: I do not like it

    This is very helpful to me, thank you!

    But what if it's a matter of time? What if there are some evil quiz makers who know the hash constant (like our quiz maker)?

»
2 days ago, # |
  Vote: I like it 0 Vote: I do not like it

Using XOR-hashing will be a lot quicker than hashing strings into numbers since it doesn't use the expensive modulo operator. If you're worried about the accuracy then just use two or three of them in conjunction. I never needed to use more than two in my experience.

»
30 hours ago, # |
  Vote: I like it 0 Vote: I do not like it

use double or triple hash and random base