Probability of hash collisions

→ Pay attention

Contest is running
Codeforces Round 995 (Div. 3)
02:14:41
Register now »

→ Streams

Codeforces Round 995 Solution Discussion

By aryanc403

Before stream 02:19:41

View all →

→ Top rated

#	User	Rating
1	tourist	3985
2	jiangly	3814
3	jqdai0815	3682
4	Benq	3529
5	orzdevinwang	3526
6	ksun48	3517
7	Radewoosh	3410
8	hos.lyric	3399
9	ecnerwala	3392
9	Um_nik	3392

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	169
2	maomao90	162
2	Um_nik	162
4	atcoder_official	161
5	djm03178	158
6	-is-this-fft-	157
7	adamant	155
8	awoo	154
8	Dominater069	154
10	luogu_official	150

View all →

→ Find user

→ Recent actions

Detailed →

whatthemomooofun1729's blog

Probability of hash collisions

By whatthemomooofun1729, history, 14 months ago, In English

Hi, I am trying to check if an array (consisting of the numbers from 1 to N, where N is the length of the array) is a permutation of N elements using hashing. My idea is to assign a randomized value as a hash for each value between 1 and N. The hash value of an array would be them sum of these individual hashes, and I compare this sum to the sum of the hashes for a permutation of length N.

I am wondering, what is the probability of collisions using this type of hashing? I couldn't find much online other than the probability for rolling hashes. Thank you!

hashing

whatthemomooofun1729
14 months ago
2

Comments (1)

Show archived | Write comment?

catalystgma

14 months ago, # |

+11

Why not use xor-sum instead of normal sum? The probability is easier to bound this way. Assume that the hashes that you generate are h[1..n]. Suppose that your hashes retain $$$B$$$ bits of information and that they are randomly chosen from an uniform distribution: 0 <= h[i] < 2**B, P(h[i] = X | 0 <= X < 2**B) = 1 / 2**B.

Notice that this means that P(the jth bit of h[i] = 0) = P(the jth bit of h[i] = 1) = 1/2 for any 1 <= i <= n, 0 <= j < B.

Suppose that the array we are checking is v[1..n] and we know that 1 <= v[i] <= n.

The correct answer for a permutation would be h_ok = h[1] ^ h[2] ^ .. ^ h[n]. Suppose that v[1..n] is not a permutation and we want to calculate the probability that h[v[1]] ^ h[v[2]] ^ .. ^ h[v[n]] = h_ok.

By moving all terms on one side, we would be left with an equality like {0, 1} * h[1] ^ {0, 1} * h[2] ^ .. ^ {0, 1} * h[n] = 0. (as in h[1] either once or none ...)

For example, if v[1..7] = [3 4 5 3 3 2 7], we would have the equality h[3] ^ h[3] ^ h[1] ^ h[6] = 0 <=> h[1] ^ h[6] = 0.

We will prove that the number of terms that we have on the left hand side doesn't matter and the probability of equality is $$$1/2^B$$$.

Let's prove first that the probability of the LSB to be equal to $$$0$$$ is $$$1/2$$$ no matter how many terms we have on the LHS.

If we have $$$1 \leq k \leq n$$$ terms on the left hand side, then the result is $$$0$$$ if either $$$0, 2, 4, ... $$$ out of the $$$k$$$ bits are $$$1$$$:

$$$P(LHS = 0) = \frac{{k \choose 0} + {k \choose 2} + {k \choose 4} + ...}{2^k} = \frac{2^{k-1}}{2^k} = \frac{1}{2}$$$.

Now, since the hashes have been uniformly chosen in $$$[0 .. 2^B)$$$, the bits' values are independent, so the probability for the LHS to be $$$0$$$ taking in consideration all of the bits is $$$(1/2) ^ B = \frac{1}{2 ^ B}$$$.

So if you use $$$B = 64$$$, the collision probability is $$$< 10^{-19}$$$.

Monte-Carlo simulation to test the expected probability:

import functools
import random

n = 1000
B = 10
h = [0] + [random.randint(0, (1<<B) - 1) for _ in range(1, n+1)]
correct_hash = functools.reduce(lambda a, b: a ^ b, h[1:])

nt = 10**5
cntCollisions = 0
for t in range(nt):
    #P(we generate a v that actually is a permutation) = n!/n^n < 1e-432.
    if functools.reduce(lambda a, b: a ^ b, [h[random.randint(1, n)] for i in range(1, n+1)]) == correct_hash:
        cntCollisions += 1

print(f"({cntCollisions}, {nt}) P_collision = {cntCollisions / nt}, expected = {1 / (2 ** B)}.")

#B = 5: P_collision = 0.03056, expected = 0.03125.
#B = 10: P_collision = 0.00087, expected = 0.0009765625.

→ Reply