Hunt-Szymanski Algorithm Explained

Revision en1, by lethan3, 2021-06-09 00:42:29

Hello Codeforces,

Just wanted to leave this explanation here in case anyone needs one for the Hunt-Szymanski algorithm; I wasn't able to find a good one online and mostly figured it out using this.

The Hunt-Szymanski Algorithm is an algorithm that finds the LCS in O((N + K) log N) time complexity, where N is the maximum length of a string and K is the number of matches between the two strings. This is especially useful when the number of occurrences of each letter is bounded, like if each letter is guaranteed to appear at most twice or thrice in each string.

For example, if the two strings are "abbcd" and "cdea", the matches would be (0, 3), (3, 0), and (4, 1) where the indices are zero-based. Then K would be 3 since there are 3 pairs where the characters of each string are the same. Clearly, K = N^2 at worst.

The usual LCS algorithm figures out the LCS of every pair of prefixes of the two strings: you should know this algorithm before learning this one. This has O(N^2) runtime.

However, we can do better by noting that the only pairs of indices that really matter are the ones where the two strings match. For example if our strings are "acbdd" and "cabad", we can mark the matching indices as follows:

Now, our problem is reduced to finding a sequence of blue squares such that their x-coordinates are strictly increasing, while their y-coordinates are strictly decreasing. In the above diagram there are 4 such possible LCS's with length 3.

How do we find this maximum length?

First, we can "reverse" our second string such that every character maps to a vector of indices where that character appears in the second string. This allows us to quickly find matches by searching through the first string.

Then,

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en7 English lethan3 2021-06-09 06:31:20 207
en6 English lethan3 2021-06-09 03:19:31 100
en5 English lethan3 2021-06-09 03:10:12 8
en4 English lethan3 2021-06-09 03:09:16 4
en3 English lethan3 2021-06-09 03:08:34 91 (published)
en2 English lethan3 2021-06-09 01:24:02 2958 Tiny change: 'nd a good one online an' -> 'nd a good explanation online an'
en1 English lethan3 2021-06-09 00:42:29 1931 Initial revision (saved to drafts)