OverKiller's blog

By OverKiller, history, 4 years ago, In English

Recently, i was learning kmp(from cp-algorithms) and in it's application last part was Building an automaton according to the prefix function link to article.
I am having hard time understanding what aut[i][c] stores here and how can we use it to find longest suffix which is also a prefix of text by adding character c at position i. I tried printing the values stored in aut[i][c] and the corresponding substring but couldn't understand what do they store?
Any help or example to demonstrate this would be much appreciated.
Thanks in advance.

  • Vote: I like it
  • +1
  • Vote: I do not like it

»
4 years ago, # |
  Vote: I like it +3 Vote: I do not like it

The automaton it's building has N+1 states 0, 1, ..., N, with the initial state 0 and the accepting state N. The state u represents "the length of largest prefix of S (the string we're building automaton with) which is also a suffix of the string being fed into the automaton, is u". For this specific automaton, the set of alphabet is {a, b, ..., z}. By the definition of automaton, each state u needs an outgoing edge for each alphabet. The endpoint of the edge for alphabet c from state u is aut[u][c] for u < n, and aut[n][c] = aut[n - 1][c].

Let's say you feed a string T = t_0 t_1 ... t_(m-1) into the automaton built with string S. Initially, you're at the state u_0 = 0. At the i-th step (I'm counting step in 0-index), you simply follow the edge marked by character t_i from u_i. So you move from the state u_i to state u_(i+1) := aut[u_i][c]. At i-th step, u_i represents "the length of largest prefix of S which is also a suffix of the prefix of T of length i, is u_i".

  • »
    »
    4 years ago, # ^ |
    Rev. 2   Vote: I like it +1 Vote: I do not like it

    I still have some doubt, let's say our string is "abcab" and we are constructing automaton for this string. It's prefix function looks like {0, 0, 0, 1, 2}. Let's consider 0-base indexing and start constructing automaton. let's say we have constructed automaton from index 0 to 3 and now we want to add character b to index 4. In this case the string from [0,3] is abca and adding character b makes the string abcab and it's longest prefix that is also suffix has length 2, but the aut[4]['b'(equivalent to 1)] stores 5. I don't understand why is it so?

    code
    • »
      »
      »
      4 years ago, # ^ |
      Rev. 2   Vote: I like it +2 Vote: I do not like it

      it's longest prefix that is also suffix has length 2

      You see, the whole string is the longest prefix that is also a suffix in this case. So its length is 5. Note that prefix function's definition is the longest PROPER prefix that is also a suffix, so they're a bit different.