Safrout's blog

By Safrout, history, 9 years ago, In English

I am having a lot of trouble with understanding suffix automaton with all it's details.

I have solved about 5 problems that contained very basic applications for it and I am stuck at some points.

I can't really understand how suffix links works, and what the congruence classes are. I know that the best resource is e-maxx site but actually I don't understand Russian and the translators (Google, Yandex , ... etc) sucks. Proofs are translated grammatically wrong and I really can't understand it in depth. The other resources like Crochemore's books are nice but they are actually long and they go beyond what an ACMer needs to understand a data structure.

So, it would be very nice if some one can explain suffix links and congruence classes to me.

Also, there are lots of people do calculate something called the right array in their codes and their comments are either Russian or Chinese and I can't really understand what and why is those 3 loops there.

I hope that someone can really explain all that.

  • Vote: I like it
  • +25
  • Vote: I do not like it

»
9 years ago, # |
  Vote: I like it +1 Vote: I do not like it

Am I asking for something stupid so that people are ignoring to answer or am I asking for an extremely difficult stuff that no one can answer that?!

I really want to know if I do have problems.

  • »
    »
    9 years ago, # ^ |
    Rev. 2   Vote: I like it +14 Vote: I do not like it

    Well, I suppose that suffix structures are extremely not interesting for most of people here :)

    Well, ok. I can help you, but you need to come up with some certain questions for it.

    By the way, what is "right array"? Can you give me some russian comment where it is mentioned?

    • »
      »
      »
      9 years ago, # ^ |
      Rev. 9   Vote: I like it 0 Vote: I do not like it

      Well, what I need is the following :

      1- A good explanation for suffix links (some examples that clarifies what an end position class is). I know that the suffix link points to the node with the largest suffix of my node that is not included in my end position class. but why is that and what is that used for ?

      2- I saw these loops in lots of people solutions who use Suffix Automaton and I really couldn't understand what does they do (They look useful).

      for (int i = 0,p = S; i < n; ++i)
      	++r[p = ch[p][s[i] &mdash; 'a']];
      static int b[N],t[N];
      for (int i = 1; i <= cnt; ++i) ++b[l[i]];
      for (int i = 1; i <= n; ++i) b[i] += b[i - 1];
      for (int i = 1; i <= cnt; ++i) t[b[l[i]]--] = i;
      for (int i = cnt; i; --i) r[fa[t[i]]] += r[t[i]];
      

      3- An explanation for generating the longest common substring of a set of strings (if getting the solution is easy after understanding the last 2 points then don't answer this point).

      Thank you very much

      • »
        »
        »
        »
        9 years ago, # ^ |
          Vote: I like it +14 Vote: I do not like it

        Omg, I don't know for sure but second code looks like counting sort from suffix array. Can you refer to some comments or submissions with it, please?

      • »
        »
        »
        »
        9 years ago, # ^ |
          Vote: I like it +14 Vote: I do not like it

        Ok, I think, I understood what this code states for. This is very weird way to do this thing but anyway.

        ==1==

        for (int i = 0,p = S; i < n; ++i)
        	++r[p = ch[p][s[i] - 'a']];
        

        You want to count for every state of suffix automata how many times does it appear in string. It will be array r. You initialize it with 1 in every state which corresponds to some prefix of string.

        ==2==

        static int b[N],t[N];
        for (int i = 1; i <= cnt; ++i) ++b[l[i]];
        for (int i = 1; i <= n; ++i) b[i] += b[i - 1];
        for (int i = 1; i <= cnt; ++i) t[b[l[i]]--] = i;
        

        Actually in this part you obtain lexicographical order of states. It is equivalent to counting sort by lengths of states. At the end array t contains topological sort of states.

        ==3==

        for (int i = cnt; i; --i) r[fa[t[i]]] += r[t[i]];
        

        Finally you finish calculating your array r by pushing initial values through the suffix links of automaton in reversed topological sort order. This procedure is described on e-maxx, I suppose.

        ==4==

        Here is equivalent simplificated code for such thing.

        int occur[max_size]; // occur[state] = 1 if state is prefix of string
        vector<int> state_with_len[max_len];
        for(int state = 0; state < state_count; state++)
        	state_with_len[len[state]].push_back(state);
        // If you traverse states from bigger length to smallest you will have
        // Reversed topological order. The same which you use in dfs actually. 
        for(int length = max_len - 1; length > 0; length--) 
        	for(auto state: state_with_len[length])
        		occur[link[state]] += occur[state];
        

        P.S. I suppose this is Chinese code? Such crazy one :)

        • »
          »
          »
          »
          »
          9 years ago, # ^ |
            Vote: I like it 0 Vote: I do not like it

          Yes it is a Chinese code indeed. Thank you very much for your time.

          I am really happy to talk to you again. BTW you are first guy to point to me that Suffix Automaton actually exists. So, thank you for that too.

          • »
            »
            »
            »
            »
            »
            9 years ago, # ^ |
              Vote: I like it +8 Vote: I do not like it

            Yeah, I remember. You're welcome :)

      • »
        »
        »
        »
        9 years ago, # ^ |
          Vote: I like it +13 Vote: I do not like it

        I wrote about endpos concept here. Please write if more clarification is needed.

        • »
          »
          »
          »
          »
          9 years ago, # ^ |
          Rev. 2   Vote: I like it 0 Vote: I do not like it

          Now I understand the definition clearly. Thank you for that. I will spend some time understanding why is that useful. If you have any nice references on understanding why is it useful that would nice. Thank you again.

          • »
            »
            »
            »
            »
            »
            9 years ago, # ^ |
              Vote: I like it 0 Vote: I do not like it

            Well, I know how to use them, but it is hard to me to clearly explain things it is based on. It is kind of intuitive for me. And I still have references except e-maxx. Maybe I'll write entry about suffix automaton but very next time...

            The main thing why it is very very useful is the fact that suffix links of automaton of string s is simply suffix tree of reversed s. This makes suffix automaton incredibly powerful structure.

            • »
              »
              »
              »
              »
              »
              »
              9 years ago, # ^ |
              Rev. 2   Vote: I like it 0 Vote: I do not like it

              It somehow became more clear after pointing to the suffix tree thing.

              I will be waiting for your blog entry.

              Thank you very much.

            • »
              »
              »
              »
              »
              »
              »
              9 years ago, # ^ |
                Vote: I like it 0 Vote: I do not like it

              I will bother you with one more question. From where did you get that experience and intuition. What have you gone through to get it ?!

              Also please include the other references if you can.

              • »
                »
                »
                »
                »
                »
                »
                »
                9 years ago, # ^ |
                  Vote: I like it 0 Vote: I do not like it

                For the first time suffix automaton was explained to me by I_love_natalia. After that I solved a lot of problem, sometimes using some information from e-maxx and/or my friends... That's all, I suppose.

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  9 years ago, # ^ |
                    Vote: I like it 0 Vote: I do not like it

                  Okay, so can you give me the problems that made a difference to your understanding (if you remember them for sure) ?

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  9 years ago, # ^ |
                    Vote: I like it 0 Vote: I do not like it

                  This and this were useful. BUt actually I solved very big amount of problems...

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  9 years ago, # ^ |
                  Rev. 2   Vote: I like it +5 Vote: I do not like it

                  [user:adamant]Hi,I understood the suffix automata algorithm given on e-maxx however I am not able to understand how to find longest common substring of multiple string. (The translation of that part isn't very clear :( ). Could you please give me a brief idea on how suffix automata is used to solve this problem? (SPOJ-LCS2) Thanks in advance.