Блог пользователя xuyiluo

Автор xuyiluo, история, 11 дней назад, По-английски

as the title said,I had a problem solving some questions about hashing:

If you write a hash with high accuracy, you will exceed the time limit, but if you write a hash with low accuracy, you do not know whether it will be hacked by the data.

What should I do then?

  • Проголосовать: нравится
  • +9
  • Проголосовать: не нравится

»
11 дней назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

As same as you.I want to know why.

»
11 дней назад, # |
  Проголосовать: нравится +9 Проголосовать: не нравится

I've never seen a problem of which the intended solution is hash but you can't write a 'safe' hash function just because it's just a little bit slower than a straightforward version. Perhaps you should think again if the intended solution is not hash at all?

  • »
    »
    11 дней назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    our test's solution is hash,but if I use unsigned overflow hash,I will get hacked,I only got 1/10 points.But if I use double hash,I will get TLE.But common hash will get hacked,too.

  • »
    »
    11 дней назад, # ^ |
      Проголосовать: нравится 0 Проголосовать: не нравится

    So what do I do in this situation?

    • »
      »
      »
      10 дней назад, # ^ |
      Rev. 3   Проголосовать: нравится +6 Проголосовать: не нравится

      I think this question is becoming an XY problem. You should probably specify what you're actually trying to solve. We need to know:

      • The type of elements that you want to be hashed and how you want to use them: The strategy can be very different when it's a string, integer, or a combination of variables. For example, even if it's a string, you don't always need to make hashes based on traditional rolling hash method on the full string, if you only need to identify some fixed strings and you don't need to check arbitrary substrings.
      • The actual constraints and the time complexity of the solution: In hash problems, generally other parts than the hash itself is the bottleneck, such as I/O, or the actual logic where you're using these hash values, because hashing itself is supposed to be one of the "faster" operations.
      • What you actually tried: The solution can be as simple as defining the coefficients as constants rather than variables, if that's what you missed. We don't know without your actual code.
      • »
        »
        »
        »
        10 дней назад, # ^ |
          Проголосовать: нравится 0 Проголосовать: не нравится

        You're right.I'm working on a string hash problem. This problem requires checking arbitrary substrings, and I can only think of ordinary hashes. But the constant problem is with this string hash, and if I use a double hash, I run out of time. Thank you again for your answer.

»
11 дней назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

While solving from the problemset, I end up binary searching on the accuracy until i get AC ;-; , generally 10**9 + 7 works quite well tho , and to make it more randomized , i store the values XORed with some random integer. So as to make it harder to hack.

»
11 дней назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Stupid question, but which kind of hash are you asking about?

»
11 дней назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

If time limit is not a problem (if you're hashing and it's not fitting in the time limit, probably the intended solution is not hashing), you can try double or even triple hashing. This just means you combine several modulos and bases (for example, mod $$$1e12+39$$$, base $$$69$$$ and unsigned overflow hashing) and instead of a single integer, your hashed value would be a tuple of integers.

Note that apart from changing modulos, changing bases is also effective. And if you're still getting hacked, try randomizing mods and bases. Hope this helps!

  • »
    »
    11 дней назад, # ^ |
    Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

    This is very helpful to me, thank you!

    But what if it's a matter of time? What if there are some evil quiz makers who know the hash constant (like our quiz maker)?

»
11 дней назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Using XOR-hashing will be a lot quicker than hashing strings into numbers since it doesn't use the expensive modulo operator. If you're worried about the accuracy then just use two or three of them in conjunction. I never needed to use more than two in my experience.

»
10 дней назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

use double or triple hash and random base