zuizui_123's blog

By zuizui_123, 12 years ago, In English

A few days ago, I found an article about algorithm of finding Longest Common Subsequence (LCS). Here is its link: https://docs.google.com/viewer?a=v&q=cache:3xhhf3n6TEMJ:www.sms.edu.pk/journals/jprm/jprmvol4/jprm9_4.pdf+&hl=vi&gl=vn&pid=bl&srcid=ADGEESiYAIGwMFziedBggqJPQN8ipIweV-KZUqCOnGA2ZnweAV3wNM11uQNC7HF4tYyTFvUhebP2BszIKI5m-ZYnF4O7t6MBtR0QV8ZJlzOI3T1Ex_mmnd2fiyhPaf0-lxsF0W-1wUu8&sig=AHIEtbR2Uaubbu_0sd9HzfW0NsQNFFYmhg

The article mentions an algorithm of O(n) for finding LCS of two strings X and Y (m is length of X, n is length of Y) with preprocessing of O(m). This algorithm is a greedy approach to solve LCS. I think it is very interesting and decide to learn it fully. But I stuck in implementation of this algorithm. I did try to write code for it but I can't do it correctly.

I hope somebody can help me for this algorithm's implementation. Thank you very much!

Thanks for reading!

  • Vote: I like it
  • +4
  • Vote: I do not like it

| Write comment?
»
12 years ago, # |
  Vote: I like it +5 Vote: I do not like it

This greedy approach doesn't seem always producing the best answer. On pair of strings ('bcaaaa', 'aaaabc') it will find 'bc' as longest common subsequence, not 'aaaa'. Are you still sure you want this algorithm? Maybe you should use classic dynamic programming approach?

  • »
    »
    12 years ago, # ^ |
    Rev. 4   Vote: I like it +3 Vote: I do not like it

    Maybe you didn't think about that algorithm carefully. I still believe the algorithm is true. Let me explain your case:

    Let X="bcaaaa" and Y="aaaabc".

    First, if you preprocess Y, then scan X to find the answer, you will get the result "bc". Second, if you preprocess X, then scan Y to find the answer, the result is "aaaa". We compare 2 results and the best result is "aaaa", as we expected.

    • »
      »
      »
      12 years ago, # ^ |
        Vote: I like it +4 Vote: I do not like it

      And what if we take X="bcaaaade" and Y="deaaaabc"?

      Correct LCS is "aaaa", not "bc" or "de".

      • »
        »
        »
        »
        12 years ago, # ^ |
        Rev. 2   Vote: I like it +11 Vote: I do not like it

        Oh, I see. That greedy approach is just a heristic and it's not true for all cases. Thanks for your comment. BTW, do you know better way to solve LCS (complexity is smaller than O(nm))?

        • »
          »
          »
          »
          »
          12 years ago, # ^ |
            Vote: I like it -21 Vote: I do not like it

          You can find LCS with Suffix Tree in O(N+M) !!!

          • »
            »
            »
            »
            »
            »
            12 years ago, # ^ |
              Vote: I like it +5 Vote: I do not like it

            Largest common substring, not subsequence.

            • »
              »
              »
              »
              »
              »
              »
              12 years ago, # ^ |
                Vote: I like it +3 Vote: I do not like it

              Substring means continuous subsequence, right? But here, I want to solve Longest Common Subsequence (not neccessary continuous).

              • »
                »
                »
                »
                »
                »
                »
                »
                12 years ago, # ^ |
                  Vote: I like it +5 Vote: I do not like it

                I understand.Reza_H said about suffix trees, they work only with continuous substrings.I was answering him.

        • »
          »
          »
          »
          »
          12 years ago, # ^ |
            Vote: I like it +11 Vote: I do not like it

          There is no better than O(nm) in general case, but you can optimize constant factor using for example bitset. This will give you something like O(nm/c) complexity. You can try that on LCS0.