codeDominus's blog

By codeDominus, history, 7 years ago, In English

I am trying to solve this problem using rabin-karp maching algorithm.

My hash function is

ll hash = 0;
ll P = 3; // only 3 characters are considered here 'a' ,'b', 'c'; 
ll MOD = 1e9+7;
string s; // hash of this string is calculated 
for(int i=0; i<s.size();i++)
{
    hash = (hash + (pow(P,i)*(s[i]-'a'+1))%MOD)%MOD;
}

Is there any better way to calculate the hash function to avoid conflict?

I tried to change (s[i] — 'a' + 1) by s[i]. Still getting conflict.

Thanks in Advance.

  • Vote: I like it
  • +5
  • Vote: I do not like it

»
7 years ago, # |
  Vote: I like it +4 Vote: I do not like it

You most likely encountered the Birthday Paradox.

To fix this, either maintain two hashes with different P and MOD, or calculate one hash with a large enough MOD (watch out for overflow!).

  • »
    »
    7 years ago, # ^ |
      Vote: I like it +5 Vote: I do not like it

    Thanks for quick reply.

    I changed the MOD value from 1e9+7 to 1e10+7 and using P = 29. My solution is accepted. 36981980

    But, Maintaining two hashes will be a better.

»
7 years ago, # |
  Vote: I like it 0 Vote: I do not like it

You can use values 54059 and 257 as two seeds for two hashes and 1e9+7 as MOD in both hash functions. This gets you through most of the cases.

»
5 months ago, # |
  Vote: I like it 0 Vote: I do not like it
#include <bits/stdc++.h>
#define ll long long int


using namespace std;

ll MOD1 = 1e9+7;
ll MOD2 = 1e9+7;
ll prime1 = 54059;
ll prime2 = 257;


void solve() {
    vector<ll> p1(1e6, 1), p2(1e6, 1);

    for(int i=1; i<=7e5; i++) {
        p1[i] = (p1[i-1]*prime1) % MOD1;
        p2[i] = (p2[i-1]*prime2) % MOD2;
    }

    int n, m;
    cin >> n >> m;

    map<ll, int> mpp1;
    map<ll, int> mpp2;


    while(n--) {
        string s;
        cin >> s;

        int l = s.length();

        ll hash1 = 0;
        ll hash2 = 0;
        for(int i=0; i<l; i++) {
            int idx = s[i]-'a'+1;

            hash1 = (hash1 + (idx * p1[i])%MOD1) % MOD1;
            hash2 = (hash2 + (idx * p2[i])%MOD2) % MOD2;
        }

        mpp1[hash1] = l;
        mpp2[hash2] = l;
    }

    while(m--) {
        string s;
        cin >> s;

        int l = s.length();

        ll hash1 = 0;
        ll hash2 = 0;
        for(int i=0; i<l; i++) {
            int idx = s[i]-'a'+1;

            hash1 = (hash1 + idx * p1[i]) % MOD1;
            hash2 = (hash2 + idx * p2[i]) % MOD2;
        }


        bool f = false;
        for(int i=0; i<l; i++) {
            int idx = s[i]-'a'+1;
            for(int j=1; j<=3; j++) {
                ll _hash1 = (hash1 - idx*p1[i] + j*p1[i] + MOD1) % MOD1;
                ll _hash2 = (hash2 - idx*p2[i] + j*p2[i] + MOD2) % MOD2;

                if(mpp1[_hash1] == l and mpp2[_hash2] == l) {
                    f |= true;
                }
            }
        }

        if(f) {
            cout << "YES" << endl;
        } else {
            cout << "NO" <<endl;
        }
    }
}

int main() {
    solve();
    return 0;
}

i tried it with 2 hashes. i think i am still getting collision. can someone help me why?