Recently, i was learning kmp(from cp-algorithms) and in it's application last part was Building an automaton according to the prefix function
link to article.
I am having hard time understanding what aut[i][c]
stores here and how can we use it to find longest suffix which is also a prefix of text by adding character c
at position i
. I tried printing the values stored in aut[i][c]
and the corresponding substring but couldn't understand what do they store?
Any help or example to demonstrate this would be much appreciated.
Thanks in advance.
The automaton it's building has
N+1
states0, 1, ..., N
, with the initial state0
and the accepting stateN
. The stateu
represents "the length of largest prefix ofS
(the string we're building automaton with) which is also a suffix of the string being fed into the automaton, isu
". For this specific automaton, the set of alphabet is{a, b, ..., z}
. By the definition of automaton, each stateu
needs an outgoing edge for each alphabet. The endpoint of the edge for alphabetc
from stateu
isaut[u][c]
foru < n
, andaut[n][c] = aut[n - 1][c]
.Let's say you feed a string
T = t_0 t_1 ... t_(m-1)
into the automaton built with stringS
. Initially, you're at the stateu_0 = 0
. At thei
-th step (I'm counting step in0
-index), you simply follow the edge marked by charactert_i
fromu_i
. So you move from the stateu_i
to stateu_(i+1) := aut[u_i][c]
. Ati
-th step,u_i
represents "the length of largest prefix ofS
which is also a suffix of the prefix of T of lengthi
, isu_i
".I still have some doubt, let's say our string is
"abcab"
and we are constructing automaton for this string. It's prefix function looks like{0, 0, 0, 1, 2}
. Let's consider 0-base indexing and start constructing automaton. let's say we have constructed automaton from index0
to3
and now we want to add characterb
to index 4. In this case the string from[0,3]
isabca
and adding characterb
makes the stringabcab
and it's longest prefix that is also suffix has length 2, but theaut[4]['b'(equivalent to 1)]
stores 5. I don't understand why is it so?it's longest prefix that is also suffix has length 2
You see, the whole string is the longest prefix that is also a suffix in this case. So its length is 5. Note that prefix function's definition is the longest PROPER prefix that is also a suffix, so they're a bit different.
I got it now. Thanks a ton man.