Блог пользователя Safrout

Автор Safrout, история, 9 лет назад, По-английски

I am having a lot of trouble with understanding suffix automaton with all it's details.

I have solved about 5 problems that contained very basic applications for it and I am stuck at some points.

I can't really understand how suffix links works, and what the congruence classes are. I know that the best resource is e-maxx site but actually I don't understand Russian and the translators (Google, Yandex , ... etc) sucks. Proofs are translated grammatically wrong and I really can't understand it in depth. The other resources like Crochemore's books are nice but they are actually long and they go beyond what an ACMer needs to understand a data structure.

So, it would be very nice if some one can explain suffix links and congruence classes to me.

Also, there are lots of people do calculate something called the right array in their codes and their comments are either Russian or Chinese and I can't really understand what and why is those 3 loops there.

I hope that someone can really explain all that.

  • Проголосовать: нравится
  • +25
  • Проголосовать: не нравится

»
9 лет назад, # |
  Проголосовать: нравится +1 Проголосовать: не нравится

Am I asking for something stupid so that people are ignoring to answer or am I asking for an extremely difficult stuff that no one can answer that?!

I really want to know if I do have problems.

  • »
    »
    9 лет назад, # ^ |
    Rev. 2   Проголосовать: нравится +14 Проголосовать: не нравится

    Well, I suppose that suffix structures are extremely not interesting for most of people here :)

    Well, ok. I can help you, but you need to come up with some certain questions for it.

    By the way, what is "right array"? Can you give me some russian comment where it is mentioned?

    • »
      »
      »
      9 лет назад, # ^ |
      Rev. 9   Проголосовать: нравится 0 Проголосовать: не нравится

      Well, what I need is the following :

      1- A good explanation for suffix links (some examples that clarifies what an end position class is). I know that the suffix link points to the node with the largest suffix of my node that is not included in my end position class. but why is that and what is that used for ?

      2- I saw these loops in lots of people solutions who use Suffix Automaton and I really couldn't understand what does they do (They look useful).

      for (int i = 0,p = S; i < n; ++i)
      	++r[p = ch[p][s[i] &mdash; 'a']];
      static int b[N],t[N];
      for (int i = 1; i <= cnt; ++i) ++b[l[i]];
      for (int i = 1; i <= n; ++i) b[i] += b[i - 1];
      for (int i = 1; i <= cnt; ++i) t[b[l[i]]--] = i;
      for (int i = cnt; i; --i) r[fa[t[i]]] += r[t[i]];
      

      3- An explanation for generating the longest common substring of a set of strings (if getting the solution is easy after understanding the last 2 points then don't answer this point).

      Thank you very much

      • »
        »
        »
        »
        9 лет назад, # ^ |
          Проголосовать: нравится +14 Проголосовать: не нравится

        Omg, I don't know for sure but second code looks like counting sort from suffix array. Can you refer to some comments or submissions with it, please?

      • »
        »
        »
        »
        9 лет назад, # ^ |
          Проголосовать: нравится +14 Проголосовать: не нравится

        Ok, I think, I understood what this code states for. This is very weird way to do this thing but anyway.

        ==1==

        for (int i = 0,p = S; i < n; ++i)
        	++r[p = ch[p][s[i] - 'a']];
        

        You want to count for every state of suffix automata how many times does it appear in string. It will be array r. You initialize it with 1 in every state which corresponds to some prefix of string.

        ==2==

        static int b[N],t[N];
        for (int i = 1; i <= cnt; ++i) ++b[l[i]];
        for (int i = 1; i <= n; ++i) b[i] += b[i - 1];
        for (int i = 1; i <= cnt; ++i) t[b[l[i]]--] = i;
        

        Actually in this part you obtain lexicographical order of states. It is equivalent to counting sort by lengths of states. At the end array t contains topological sort of states.

        ==3==

        for (int i = cnt; i; --i) r[fa[t[i]]] += r[t[i]];
        

        Finally you finish calculating your array r by pushing initial values through the suffix links of automaton in reversed topological sort order. This procedure is described on e-maxx, I suppose.

        ==4==

        Here is equivalent simplificated code for such thing.

        int occur[max_size]; // occur[state] = 1 if state is prefix of string
        vector<int> state_with_len[max_len];
        for(int state = 0; state < state_count; state++)
        	state_with_len[len[state]].push_back(state);
        // If you traverse states from bigger length to smallest you will have
        // Reversed topological order. The same which you use in dfs actually. 
        for(int length = max_len - 1; length > 0; length--) 
        	for(auto state: state_with_len[length])
        		occur[link[state]] += occur[state];
        

        P.S. I suppose this is Chinese code? Such crazy one :)

        • »
          »
          »
          »
          »
          9 лет назад, # ^ |
            Проголосовать: нравится 0 Проголосовать: не нравится

          Yes it is a Chinese code indeed. Thank you very much for your time.

          I am really happy to talk to you again. BTW you are first guy to point to me that Suffix Automaton actually exists. So, thank you for that too.

      • »
        »
        »
        »
        9 лет назад, # ^ |
          Проголосовать: нравится +13 Проголосовать: не нравится

        I wrote about endpos concept here. Please write if more clarification is needed.

        • »
          »
          »
          »
          »
          9 лет назад, # ^ |
          Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

          Now I understand the definition clearly. Thank you for that. I will spend some time understanding why is that useful. If you have any nice references on understanding why is it useful that would nice. Thank you again.

          • »
            »
            »
            »
            »
            »
            9 лет назад, # ^ |
              Проголосовать: нравится 0 Проголосовать: не нравится

            Well, I know how to use them, but it is hard to me to clearly explain things it is based on. It is kind of intuitive for me. And I still have references except e-maxx. Maybe I'll write entry about suffix automaton but very next time...

            The main thing why it is very very useful is the fact that suffix links of automaton of string s is simply suffix tree of reversed s. This makes suffix automaton incredibly powerful structure.

            • »
              »
              »
              »
              »
              »
              »
              9 лет назад, # ^ |
              Rev. 2   Проголосовать: нравится 0 Проголосовать: не нравится

              It somehow became more clear after pointing to the suffix tree thing.

              I will be waiting for your blog entry.

              Thank you very much.

            • »
              »
              »
              »
              »
              »
              »
              9 лет назад, # ^ |
                Проголосовать: нравится 0 Проголосовать: не нравится

              I will bother you with one more question. From where did you get that experience and intuition. What have you gone through to get it ?!

              Also please include the other references if you can.

              • »
                »
                »
                »
                »
                »
                »
                »
                9 лет назад, # ^ |
                  Проголосовать: нравится 0 Проголосовать: не нравится

                For the first time suffix automaton was explained to me by I_love_natalia. After that I solved a lot of problem, sometimes using some information from e-maxx and/or my friends... That's all, I suppose.

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  9 лет назад, # ^ |
                    Проголосовать: нравится 0 Проголосовать: не нравится

                  Okay, so can you give me the problems that made a difference to your understanding (if you remember them for sure) ?

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  9 лет назад, # ^ |
                    Проголосовать: нравится 0 Проголосовать: не нравится

                  This and this were useful. BUt actually I solved very big amount of problems...

                • »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  »
                  9 лет назад, # ^ |
                  Rev. 2   Проголосовать: нравится +5 Проголосовать: не нравится

                  [user:adamant]Hi,I understood the suffix automata algorithm given on e-maxx however I am not able to understand how to find longest common substring of multiple string. (The translation of that part isn't very clear :( ). Could you please give me a brief idea on how suffix automata is used to solve this problem? (SPOJ-LCS2) Thanks in advance.