SamuelTull's blog

By SamuelTull, history, 12 months ago, In English

For rolling hash, how do you decide which MOD to pick/ whether to use a tuple of 2 or more hashes? It is hard to know a priori whether the mod is high enough to prevent collisions / too high to cause memory limit / and multiple hashes could cause time limit. In This, 10**9 + 7 gave an incorrect answer, while 10**18 + 3 was accepted. Using a pair of hashes with 10**9 + 7 and 998244353 gave memory limit exceeded.

  • Vote: I like it
  • +2
  • Vote: I do not like it

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by SamuelTull (previous revision, new revision, compare).

»
12 months ago, # |
  Vote: I like it 0 Vote: I do not like it

2^61 — 1 is best

»
12 months ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

I also want to know the same, how to select the prime number and MOD value pair. As sometimes for me (31, 1e9+9) (31, 2**61-1) pair haven't worked, But (33, 2**61-1) has worked. and 999999999999989 this value for MOD has worked in some cases as well.

  • »
    »
    12 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Hi. Each comparision has a probability of failing, I'm not that good at calculating it, but you can read about it somewhere. Usually you're gonna have to use one big mod ($$$2^61 -1$$$) for example, or two mods like ($$$10^9+9$$$ or $$$10^9+7$$$. Now for the numbers to multiply by, what is usually recommended is an integer close to the alphabet size ($$$31$$$). But if there are hacks in the contest, the code can be hacked. One way to avoid that is using a random prime up to $$$n$$$, like I got hacked on the last contest because of using $$$31$$$ and $$$29$$$ but when I submitted with random primes it got accepted. You can read more on cp algos or usaco guide

    • »
      »
      »
      12 months ago, # ^ |
      Rev. 2   Vote: I like it 0 Vote: I do not like it

      Got headache after reading multiple things on this and I don't know why but it seems like keeping a greater base and MOD might be better than keeping it close to the limits. Like instead of 31 we can keep a greater base and instead of 1e9+9, 2^61-1 should be better. Don't know if this is a guaranteed pair but at least this might lower the chances of hacks.

      • »
        »
        »
        »
        12 months ago, # ^ |
          Vote: I like it 0 Vote: I do not like it

        Maybe a greater mod is better, but still the base shouldn't be constant. Just use a random one. I think any prime works though. Even if you use a big base, if it's constant then it can be hacked.

»
12 months ago, # |
  Vote: I like it +1 Vote: I do not like it

I haven't analyzed my hashing procedure formally, but I've never had any issues using $$$10^9 + 7$$$ and $$$10^9 + 9$$$ as moduli with a random base (not necessarily prime). To improve memory usage in your last submission, you can represent hashes with two moduli as a single integer $$$H_1 \cdot MOD_2 + H_2$$$ so that your map can work with integers rather than pairs. I'm not sure what the correct way to optimize time efficiency is in Python, but in C++ taking numbers modulo a constant prime is fairly efficient, so even with two hashes TLE won't be an issue so long as you declare MOD and MOD2 as const. See here for my (somewhat janky) template.