Convolution and String Matching (small alphabet)

#	User	Rating
1	tourist	4009
2	jiangly	3823
3	Benq	3738
4	Radewoosh	3633
5	jqdai0815	3620
6	orzdevinwang	3529
7	ecnerwala	3446
8	Um_nik	3396
9	ksun48	3390
10	gamegame	3386

#	User	Contrib.
1	cry	166
2	maomao90	163
2	Um_nik	163
4	atcoder_official	161
5	adamant	159
6	-is-this-fft-	158
7	awoo	157
8	TheScrasse	154
9	nor	153
9	Dominater069	153

Hi Codeforces community! I wanted to write this blog entry to help coders who would like to see a nice example of convolutions used in a problem which, at first sight, has little to no relation with polynomial multiplication (of course it has, it's a convolution after all)

The problem is as follows: You are given two strings S and P, with lengths n and m respectively, where m ≤ n

For each substring T of S with length m, we have to find the one that maximizes the number of positions with the same character, i. e T[i] = P[i]

For the sake of simplicity, let us assume that our strings consists only of letters a and b. For example, if S = baabab and P = bba, let's calculate the "similarity" between P and each substring of S of length 3.

s(baa, bba) = 2

s(aab, bba) = a

s(aba, bba) = 2

s(bab, bba) = b

We can see that there are two substrings which maximize the similarity (any answer will be fine)

A naive approach has a O(n·m) time complexity and uses O(n + m) memory.

This should be OK if 1 ≤ n, m ≤ 1000. What about 1 ≤ n, m ≤ 10⁵? We have to find a faster way to apply $\text{[math]}$ for every substring T of S

First, let's try to change the formula above with a more mathematical approach. If we map each character {a,b}={0,1}, we can apply the next formula:

$\text{[math]}$ . This will give us the number of 1's (originally, b's) that match.

Let S^c be the complement of S (0 becomes 1, and viceversa). To compute the number of 0's that match, we apply

$\text{[math]}$

Now what? This doesn't improve the complexity at all.

Let's remember how the convolution f of two discrete functions g and h is defined:

$\text{[math]}$

Now we're talking! This looks pretty similar to the formulas previously explained.

Next we will try to change our formulas to a simple convolution. First we will say that our array S and P are two functions g and h whose domain is [0, n - 1] and range is {0,1}.

$\text{[math]}$

We have one issue to address, the length of P (m). We can easily fix this issue with adding some padding zerores to the right of P until m = n

But the formula is incorrect, this convolution, for each x applies the similarity formula, but reversing P! This is our second issue, which we can easily solve (again) by reversing P ourselves

Rev.	By	When	Δ	Comment
en6	snacache	2018-05-11 05:08:43	4
en5	snacache	2018-05-11 00:23:31	17	Tiny change: 'f_{a_i}(j) + f^{c}_{a_i}(j)}' -> 'f_{a_i}(j)}'
en4	snacache	2018-05-10 23:54:16	62
en3	snacache	2018-05-10 23:49:52	2787	Tiny change: '$ memory.<\b>\n\nThis' -> '$ memory.</b>\n\nThis' (published)
en2	snacache	2018-05-10 21:17:51	2109	Tiny change: 'y, where $m\leqn$' -> 'y, where $n\beqm$'
en1	snacache	2018-05-10 19:07:47	444	Initial revision (saved to drafts)

Rev.

Lang.

When

Comment

en6