For rolling hash, how do you decide which MOD to pick/ whether to use a tuple of 2 or more hashes? It is hard to know a priori whether the mod is high enough to prevent collisions / too high to cause memory limit / and multiple hashes could cause time limit. In This, 10**9 + 7 gave an incorrect answer, while 10**18 + 3 was accepted. Using a pair of hashes with 10**9 + 7 and 998244353 gave memory limit exceeded.
Auto comment: topic has been updated by SamuelTull (previous revision, new revision, compare).
2^61 — 1 is best
I also want to know the same, how to select the prime number and MOD value pair. As sometimes for me (31, 1e9+9) (31, 2**61-1) pair haven't worked, But (33, 2**61-1) has worked. and 999999999999989 this value for MOD has worked in some cases as well.
Hi. Each comparision has a probability of failing, I'm not that good at calculating it, but you can read about it somewhere. Usually you're gonna have to use one big mod ($$$2^61 -1$$$) for example, or two mods like ($$$10^9+9$$$ or $$$10^9+7$$$. Now for the numbers to multiply by, what is usually recommended is an integer close to the alphabet size ($$$31$$$). But if there are hacks in the contest, the code can be hacked. One way to avoid that is using a random prime up to $$$n$$$, like I got hacked on the last contest because of using $$$31$$$ and $$$29$$$ but when I submitted with random primes it got accepted. You can read more on cp algos or usaco guide
Got headache after reading multiple things on this and I don't know why but it seems like keeping a greater base and MOD might be better than keeping it close to the limits. Like instead of 31 we can keep a greater base and instead of 1e9+9, 2^61-1 should be better. Don't know if this is a guaranteed pair but at least this might lower the chances of hacks.
Maybe a greater mod is better, but still the base shouldn't be constant. Just use a random one. I think any prime works though. Even if you use a big base, if it's constant then it can be hacked.
I haven't analyzed my hashing procedure formally, but I've never had any issues using $$$10^9 + 7$$$ and $$$10^9 + 9$$$ as moduli with a random base (not necessarily prime). To improve memory usage in your last submission, you can represent hashes with two moduli as a single integer $$$H_1 \cdot MOD_2 + H_2$$$ so that your map can work with integers rather than pairs. I'm not sure what the correct way to optimize time efficiency is in Python, but in C++ taking numbers modulo a constant prime is fairly efficient, so even with two hashes TLE won't be an issue so long as you declare
MOD
andMOD2
asconst
. See here for my (somewhat janky) template.