whatthemomooofun1729's blog

By whatthemomooofun1729, history, 14 months ago, In English

Hi, I am trying to check if an array (consisting of the numbers from 1 to N, where N is the length of the array) is a permutation of N elements using hashing. My idea is to assign a randomized value as a hash for each value between 1 and N. The hash value of an array would be them sum of these individual hashes, and I compare this sum to the sum of the hashes for a permutation of length N.

I am wondering, what is the probability of collisions using this type of hashing? I couldn't find much online other than the probability for rolling hashes. Thank you!

| Write comment?
»
14 months ago, # |
  Vote: I like it +11 Vote: I do not like it

Why not use xor-sum instead of normal sum? The probability is easier to bound this way. Assume that the hashes that you generate are h[1..n]. Suppose that your hashes retain $$$B$$$ bits of information and that they are randomly chosen from an uniform distribution: 0 <= h[i] < 2**B, P(h[i] = X | 0 <= X < 2**B) = 1 / 2**B.

Notice that this means that P(the jth bit of h[i] = 0) = P(the jth bit of h[i] = 1) = 1/2 for any 1 <= i <= n, 0 <= j < B.

Suppose that the array we are checking is v[1..n] and we know that 1 <= v[i] <= n.

The correct answer for a permutation would be h_ok = h[1] ^ h[2] ^ .. ^ h[n]. Suppose that v[1..n] is not a permutation and we want to calculate the probability that h[v[1]] ^ h[v[2]] ^ .. ^ h[v[n]] = h_ok.

By moving all terms on one side, we would be left with an equality like {0, 1} * h[1] ^ {0, 1} * h[2] ^ .. ^ {0, 1} * h[n] = 0. (as in h[1] either once or none ...)

For example, if v[1..7] = [3 4 5 3 3 2 7], we would have the equality h[3] ^ h[3] ^ h[1] ^ h[6] = 0 <=> h[1] ^ h[6] = 0.

We will prove that the number of terms that we have on the left hand side doesn't matter and the probability of equality is $$$1/2^B$$$.

Let's prove first that the probability of the LSB to be equal to $$$0$$$ is $$$1/2$$$ no matter how many terms we have on the LHS.

If we have $$$1 \leq k \leq n$$$ terms on the left hand side, then the result is $$$0$$$ if either $$$0, 2, 4, ... $$$ out of the $$$k$$$ bits are $$$1$$$:

$$$P(LHS = 0) = \frac{{k \choose 0} + {k \choose 2} + {k \choose 4} + ...}{2^k} = \frac{2^{k-1}}{2^k} = \frac{1}{2}$$$.

Now, since the hashes have been uniformly chosen in $$$[0 .. 2^B)$$$, the bits' values are independent, so the probability for the LHS to be $$$0$$$ taking in consideration all of the bits is $$$(1/2) ^ B = \frac{1}{2 ^ B}$$$.

So if you use $$$B = 64$$$, the collision probability is $$$< 10^{-19}$$$.

Monte-Carlo simulation to test the expected probability: