Convolution and String Matching (small alphabet)

Revision en2, by snacache, 2018-05-10 21:17:51

Hi Codeforces community! I wanted to write this blog entry to help coders who would like to see a nice example of convolutions used in a problem which, at first sight, has little to no relation with polynomial multiplication (of course it has, it's a convolution after all)

The problem is as follows: You are given two strings S and P, with lengths n and m respectively, where m ≤ n

For each substring T of S with length m, we have to find the one that maximizes the number of positions with the same character, i. e T[i] = P[i]

For the sake of simplicity, let us assume that our strings consists only of letters a and b. For example, if S = baabab and P = bba, let's calculate the "similarity" between P and each substring of S of length 3.

s(baa, bba) = 2

s(aab, bba) = a

s(aba, bba) = 2

s(bab, bba) = b

We can see that there are two substrings which maximize the similarity (any answer will be fine)

A naive approach has a O(n·m) time complexity and uses O(n + m) memory.

This should be OK if 1 ≤ n, m ≤ 1000. What about 1 ≤ n, m ≤ 105? We have to find a faster way to apply for every substring T of S

First, let's try to change the formula above with a more mathematical approach. If we map each character {a,b}={0,1}, we can apply the next formula:

. This will give us the number of 1's (originally, b's) that match.

Let Sc be the complement of S (0 becomes 1, and viceversa). To compute the number of 0's that match, we apply

Now what? This doesn't improve the complexity at all.

Let's remember how the convolution f of two discrete functions g and h is defined:

Now we're talking! This looks pretty similar to the formulas previously explained.

Next we will try to change our formulas to a simple convolution. First we will say that our array S and P are two functions g and h whose domain is [0, n - 1] and range is {0,1}.

We have one issue to address, the length of P (m). We can easily fix this issue with adding some padding zerores to the right of P until m = n

But the formula is incorrect, this convolution, for each x applies the similarity formula, but reversing P! This is our second issue, which we can easily solve (again) by reversing P ourselves

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en6 English snacache 2018-05-11 05:08:43 4
en5 English snacache 2018-05-11 00:23:31 17 Tiny change: 'f_{a_i}(j) + f^{c}_{a_i}(j)}' -> 'f_{a_i}(j)}'
en4 English snacache 2018-05-10 23:54:16 62
en3 English snacache 2018-05-10 23:49:52 2787 Tiny change: '$ memory.<\b>\n\nThis' -> '$ memory.</b>\n\nThis' (published)
en2 English snacache 2018-05-10 21:17:51 2109 Tiny change: 'y, where $m\leqn$' -> 'y, where $n\beqm$'
en1 English snacache 2018-05-10 19:07:47 444 Initial revision (saved to drafts)