Any help on this problem? (String Hashing)

→ Pay attention

Contest is running
Codeforces Round 1008 (Div. 1)
02:05:38
Register now »

Contest is running
Codeforces Round 1008 (Div. 2)
02:05:36
Register now »

Before contest
Codeforces Round 1009 (Div. 3)
23:25:38
Register now »

→ Top rated

#	User	Rating
1	tourist	3857
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3463
9	ecnerwala	3451
10	heuristica	3431

Countries | Cities | Organizations

View all →

→ Top contributors

#	User	Contrib.
1	cry	165
2	-is-this-fft-	161
3	Qingyu	160
4	Dominater069	158
5	atcoder_official	157
6	adamant	154
7	Um_nik	151
8	djm03178	150
9	luogu_official	149
10	awoo	147

View all →

→ Find user

→ Recent actions

Detailed →

PedroCastillo's blog

Any help on this problem? (String Hashing)

By PedroCastillo, history, 6 years ago, In English

Hi, I'm attempting this problem with string hashing.

It works well for small inputs but gives wrong answer on very large inputs.

Any help on what could be wrong?

Problem : https://codeforces.net/contest/271/problem/D

Submission: https://codeforces.net/contest/271/submission/46239564

PedroCastillo
6 years ago
8

Comments (8)

Write comment?

Volpe

6 years ago, # |

I think you just need to use double hashing to avoid collision .

→ Reply

PedroCastillo

6 years ago, # ^ |

How exactly?

Also, how can I tell I need to use double hashing? I mean, when is it necessary?

Furthermore, I've seen solutions to this problem with just one normal hashing :(

→ Reply

Volpe

6 years ago, # ^ |

I mean with double hahsing is to use two hash values for the string with two different base and MOD values .

In general you can't tell when will a single hash solution will pass the test cases for a problem as the collision happens with a probability and you can't tell if your solution will collide or not but you can reduce the probability of collision as much as you can .

You can calculate this probabilty by assuming that the hash values will be uniformly distrubted over the different values of strings so as much as you increase the value of the MOD you will gain more probability of getting ACC (less probability of collision) or by using double hashing for solutions based on rolling hash in your case .

→ Reply

PedroCastillo

6 years ago, # ^ |

← Rev. 2 →

Thanks, it worked. However, how could I tell I needed the double hashing before submitting?

→ Reply

Noam527

6 years ago, # ^ |

You don't need to detect when you should use 2 or more hashes. One could say you should do according to your intuition, but I suggest always using multiple hashes, depending on how memory and time consuming it is to build this many hashes. Say, 2 or 3 is the usual amount I use.

→ Reply

CodingKnight

6 years ago, # |

← Rev. 7 →

The following is an accepted solution based on collision-free substring hashing. The main idea is to enumerate small letters between a and z as integers between 0 and M - 1, where M = 26. Then, up to P consecutive symbols in the string are packed in a single integer as digits of a base-M integer using iterated multiplication and addition without overflow, and P = 13 for a 64-bit signed integer. The sequence of integers generated from packing a substring represents a collision-free hash key for all substrings with the same length. A two-dimensional array of hash-key sets is used to store the distinct keys generated from all substrings in the input string, where the first index represents the number of bad letters in the substring and the second index represents the length of the substring. It is guaranteed that two substrings are different if the number of bad letters they contain are different or their lengths are different. In other words, all substrings stored in one item of the two-dimensional array have the same number of bad letters and the same length.

46247924

UPDATE:

The following is an update for the previous solution using one-dimensional array to store the collision-free hash key (using the second index only of the previous solution, i.e. the substring length). This update improved both the execution time and memory used.

46257737

→ Reply

Stecher

6 years ago, # |

← Rev. 3 →

Always Use double hashing if possible. The probability of collision in single hashing is N/MOD. While Using double hashing the probability of collision becomes (N*N/MOD*MOD1). In case of worst case, N/MOD might become 10e-4 which will lead you to trouble. Instead while using double hashing, In the worst case, the probability of collision will remain 10e-8 at least.

→ Reply

BledDest

6 years ago, # |

I think that the birthday paradox is a convenient way to measure this: if we generate something like $\text{[math]}$ random integers from 0 to MOD - 1, the probability of collision will be somewhere near 0.5. So if you want to make a lot of string comparisons using 32-bit hashing, the probability of collision is high (and it becomes even higher assuming there are multiple tests, and you should pass all of them).

Taking two (or three) 32-bit hashes or one (or two) 64-bit hash should be enough almost in every problem.

→ Reply