Problem requiring solution that is vulnerable to hash collisions

→ Pay attention

Before contest
Codeforces Round 1002 (Div. 2)
03:47:47
Register now »

*has extra registration

→ Streams

Codeforces Round 1002 (Div 2) — Solution Discussion

By Shayan

Before stream 05:57:45

View all →

→ Top rated

#	User	Rating
1	jiangly	3898
2	tourist	3840
3	orzdevinwang	3706
4	ksun48	3691
5	jqdai0815	3682
6	ecnerwala	3525
7	gamegame	3477
8	Benq	3468
9	Ormlis	3381
10	maroonrk	3379

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	167
2	-is-this-fft-	165
3	Dominater069	161
4	atcoder_official	160
5	Um_nik	159
6	djm03178	156
7	adamant	153
8	luogu_official	151
9	awoo	149
10	TheScrasse	146

View all →

→ Find user

→ Recent actions

Detailed →

kfqg's blog

Problem requiring solution that is vulnerable to hash collisions

By kfqg, history, 7 years ago, In English

I was reading the tutorial of CF 763D, a problem about tree isomorphism. The solution involved computing polynomial hashes of all possible subtrees of a tree (max 10^5 vertices), and then comparing the hashes as a substitute for checking tree equality. It seems to me like the solution is vulnerable to hash collisions.

I'm relatively new to competitive programming. Is it normal for there to be problems that require solutions that are vulnerable to hash collisions? Shouldn't a good problem be one that is provably solvable for 100% of all possible inputs?

The problem is here: http://codeforces.net/problemset/problem/763/D

The tutorial is here: http://codeforces.net/blog/entry/50205

rng_58 also wrote a blog post on this: http://rng-58.blogspot.com/2017/02/hashing-and-probability-of-collision.html. The solution given there also has a nonzero hash collision probability.

#hashing, #graph, #trees

kfqg
7 years ago
10

Comments (9)

Show archived | Write comment?

arif.ozturk

7 years ago, # |

Think about it like this: usually, there are specific tests that make hash collision a problem, as there's a small probability that it'll happen for randomly generated test. Still, one trick that I was taught was to use 2 hashes and so the probability of hash collision for both hashes is smaller(though it takes a bit more time and you can also go up to 3).

→ Reply

Enchom

7 years ago, # |

+19

I've seen some people objecting to problems with nonzero collision chance, but I personally think that doesn't make much sense.

As mentioned, you can use double or triple hashing — and the chance of radiation from the sun changing some bits in your PC is most likely larger than a random triple hash collision.

The only sensible argument could be that if somebody with malicious intentions knows the hashing you are using, they could create an input on which your program behaves badly. However, this is a problem only in specific situations and you could use more sophisticated hashing functions than polynomial hash if that's an issue.

Hashing is quite practical in my opinion and it should be completely fine to set problems in which the intended solution has a nonzero collision chance, as long as that chance is small enough.

→ Reply

farmersrice

7 years ago, # ^ |

+16

A simple way to counter the anti-hash "malicious intentions" would just be to randomly choose mods from a large array at the start. That way no one will be able to hack your solution.

→ Reply

sgtlaugh

7 years ago, # ^ |

Right, that should work. But why do we need a large array for this? Why not just pick a random prime every time?

→ Reply

farmersrice

7 years ago, # ^ |

Wouldn't you need to take calculation time to find a prime then?

→ Reply

sgtlaugh

7 years ago, # ^ |

Right, but even sqrt(n) loops to check for primality shouldn't take much time, assuming mods fit in integer range. Otherwise we can always use fermat's theorem or miller rabin :)

→ Reply

farmersrice

7 years ago, # ^ |

I still think pasting some hundred primes into an array is faster than writing prime generator both in amount of writing time and execution time.

→ Reply

dalex

7 years ago, # |

http://acm.timus.ru/problem.aspx?space=1&num=1989

→ Reply

ko_osaga

7 years ago, # |

Shouldn't a good problem be one that is provably solvable for 100% of all possible inputs?

Yeah, obviously. It's a condition that "non-wrong" problem should satisfy. But it is missing some details, so I will write that.

The problem is "not wrong" if it's 100% solvable for all possible inputs specified, assuming the random generators are completely random. Let's see some examples :

Output a + b : Not wrong.
Problem with a random solution, with provably low rate of being wrong or TL : Although there is a room for debate, but the general consensus of the CP community (including me) is to consider such solution as "correct" ones, because we believe that the modern PRNG's are sufficiently good — good enough to believe, that it's a compiler's fault if not random.
Problem which explicitly says that it is a "random input, at the best of setter's knowledge" : This goes same with point 2.
Problem with a random solution, which is unproved, but setter claims that is "sensible" : I've seen a lot of rookie setters who do that, and find a counterexample that breaks the solution regardless of PRNG. If that's the case, teach him to not make such mistakes again.
A hashing problem : Okay, so we pick a random (sometimes even constant) prime, and do hashing.. Obviously I think it's sensible, but history shows that people are not sensible (we don't even need to talk WW2, Anti-hash test is evil enough to break the "sense"), so we need to prove here. I don't know why hashing has provably low rates of collision — but according to rng_58's article (namely Schwartz-Zippels lemma) it seems that there is an enough theory that illustrates why it works. So, now we can see this is same to case 2, so it's OK.

The solution is indeed vulnerable to hash collisions, but if we assume random primes, then we can prove that it does not collide — the result of Schwartz-Zippels lemma. Actually those proofs doesn't come from the setters, it's from rng_58's article — so I don't think it was a good problem, but a problem that was "saved" by rng_58.

→ Reply