Rajan_sust's blog

By Rajan_sust, history, 7 years ago, In English

The problem was set in acm icpc preliminary contest 2017 in Dhaka site. Problem Link : E.Anti Hash

Problem is : you will given a string S of length N consisting of lowercase letters (a-z) only.Also given a base B and mod-value M for doing polynomial hashing. Note : B and M are both prime.

Your task is to find another string T, satisfying all of the following constraints: Length of T is exactly N. T consists of only lowercase letters (a-z). T and S have the same hash value that means, collision happens. For hashing in both case you have to use B and M.

Any idea? Thanks in advance.

  • Vote: I like it
  • +26
  • Vote: I do not like it

| Write comment?
»
7 years ago, # |
  Vote: I like it +5 Vote: I do not like it

For the subtask with M ≤ 106, you can have atmost 106 different hashes. So, if you generate more than 106 random strings, there is high probability that you'll hit one with same hash as the given string. However, you can't generate string naively. You can start with a string and try changing letters randomly.

For M < 231 I don't know what to do.

»
7 years ago, # |
Rev. 2   Vote: I like it +11 Vote: I do not like it

Generate 2 sets of random strings of size ceil(N/2) and floor(N/2). Lets denote them as set A and set B. Let the size of each set is X. You can generate random strings by changing just a single random character of previous string as this will change the hash value randomly. Now generate their hashes. Now on, we only will deal with the hash values, forget the strings.

For each hash value of set B, we can say we need a desired hash from set A such that we can make the final hash value. Say hash value of an element of set B is H2. So we need another hash value from set A, H1 such that this equation satisfies ( H1 + (H2 << ceil(N/2)) ) % M = final_hash. Here shift operator denotes the polynomial shifting, hope it is understandable.

Now, for the first H2 of set B, we will not be able to find the desired H1 with a probability (1-X/M). Now, as our desired hash value is not found in set A for the first element, for the second element we can assume it's desired hash value will not collide with the first one's. So, now the sample space is reduced by one. So, this will also not match with a probability (1-X/(M-1)). For the third element the probability is (1-X/(M-2)) and so on. This is the birthday paradox. With a good enough size of set A and set B the probability of not finding desired pair becomes 0. Like when X = 1e4, chance of not finding is more than 95%, when X = 1e5, chance of not finding is less than 1%. But when you take X = 1e6, chance of not finding is way way close to 0.

I might miss some calculation here, but this is the idea.