Kinderr's blog

By Kinderr, history, 8 years ago, In English

The classical hashing way of hashing a number is keeping it as the reminder of the division with two distinct prime numbers i.e. hash(X) = (X mod prime1, X mod prime2), same thing with strings (considering them as polynomials). But modulo is a very heavy operation, and using a power of 2 (e.g. 2^64, using unsigned long long overflow) instead of a prime number fails when you hash a Thue-Morse sequence, so... Do you know any way of replacing the modulo with a smart bitwise operation or something else? I'd find it extremely useful :)

  • Vote: I like it
  • +30
  • Vote: I do not like it

»
8 years ago, # |
  Vote: I like it +6 Vote: I do not like it

You can use Mersenne primes https://en.wikipedia.org/wiki/Mersenne_prime for such purpose.

  • »
    »
    8 years ago, # ^ |
    Rev. 2   Vote: I like it +3 Vote: I do not like it

    Could you please explain how to find module fast for mersenne primes? OR operation on 2^31-1 is not right I guess because it gives mod on 2^31 not 2^31-1.

    • »
      »
      »
      8 years ago, # ^ |
        Vote: I like it +49 Vote: I do not like it

    • »
      »
      »
      8 years ago, # ^ |
        Vote: I like it +56 Vote: I do not like it

      so write a number x as x = a2n + b and we have . For details, see "Matters Computational" section 39.2.

»
8 years ago, # |
  Vote: I like it +2 Vote: I do not like it

The problem comes from 'I know your polynomial hash arguments' not from 'your mod is not prime' so you need randomization. I haven't made a research for hashing like mod (2^64 — 1) with random a.

  • »
    »
    8 years ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    No, you can shuffle the alphabet and they won't know enough. The mod shouldn't necessarily be prime, but if it's a power of two, when given a Thue-Sequence it will converge to something specific, no matter the character codes :<

»
8 years ago, # |
  Vote: I like it +3 Vote: I do not like it

In my opinion, when the intended solution is to use hash, then the time limit will consider the "heaviness" of the mod operation so no need to worry.

  • »
    »
    8 years ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    Indeed. But mod operations are generally problematic sometimes when the intended solution does not rely on hashing. Nonetheless, I too, for instance would like to know more about this if there is a way around.