Блог пользователя zuizui_123

Автор zuizui_123, 12 лет назад, По-английски

A few days ago, I found an article about algorithm of finding Longest Common Subsequence (LCS). Here is its link: https://docs.google.com/viewer?a=v&q=cache:3xhhf3n6TEMJ:www.sms.edu.pk/journals/jprm/jprmvol4/jprm9_4.pdf+&hl=vi&gl=vn&pid=bl&srcid=ADGEESiYAIGwMFziedBggqJPQN8ipIweV-KZUqCOnGA2ZnweAV3wNM11uQNC7HF4tYyTFvUhebP2BszIKI5m-ZYnF4O7t6MBtR0QV8ZJlzOI3T1Ex_mmnd2fiyhPaf0-lxsF0W-1wUu8&sig=AHIEtbR2Uaubbu_0sd9HzfW0NsQNFFYmhg

The article mentions an algorithm of O(n) for finding LCS of two strings X and Y (m is length of X, n is length of Y) with preprocessing of O(m). This algorithm is a greedy approach to solve LCS. I think it is very interesting and decide to learn it fully. But I stuck in implementation of this algorithm. I did try to write code for it but I can't do it correctly.

I hope somebody can help me for this algorithm's implementation. Thank you very much!

Thanks for reading!

  • Проголосовать: нравится
  • +4
  • Проголосовать: не нравится

»
12 лет назад, # |
  Проголосовать: нравится +5 Проголосовать: не нравится

This greedy approach doesn't seem always producing the best answer. On pair of strings ('bcaaaa', 'aaaabc') it will find 'bc' as longest common subsequence, not 'aaaa'. Are you still sure you want this algorithm? Maybe you should use classic dynamic programming approach?

  • »
    »
    12 лет назад, # ^ |
    Rev. 4   Проголосовать: нравится +3 Проголосовать: не нравится

    Maybe you didn't think about that algorithm carefully. I still believe the algorithm is true. Let me explain your case:

    Let X="bcaaaa" and Y="aaaabc".

    First, if you preprocess Y, then scan X to find the answer, you will get the result "bc". Second, if you preprocess X, then scan Y to find the answer, the result is "aaaa". We compare 2 results and the best result is "aaaa", as we expected.

    • »
      »
      »
      12 лет назад, # ^ |
        Проголосовать: нравится +4 Проголосовать: не нравится

      And what if we take X="bcaaaade" and Y="deaaaabc"?

      Correct LCS is "aaaa", not "bc" or "de".

      • »
        »
        »
        »
        12 лет назад, # ^ |
        Rev. 2   Проголосовать: нравится +11 Проголосовать: не нравится

        Oh, I see. That greedy approach is just a heristic and it's not true for all cases. Thanks for your comment. BTW, do you know better way to solve LCS (complexity is smaller than O(nm))?

        • »
          »
          »
          »
          »
          12 лет назад, # ^ |
            Проголосовать: нравится -21 Проголосовать: не нравится

          You can find LCS with Suffix Tree in O(N+M) !!!

          • »
            »
            »
            »
            »
            »
            12 лет назад, # ^ |
              Проголосовать: нравится +5 Проголосовать: не нравится

            Largest common substring, not subsequence.

            • »
              »
              »
              »
              »
              »
              »
              12 лет назад, # ^ |
                Проголосовать: нравится +3 Проголосовать: не нравится

              Substring means continuous subsequence, right? But here, I want to solve Longest Common Subsequence (not neccessary continuous).

              • »
                »
                »
                »
                »
                »
                »
                »
                12 лет назад, # ^ |
                  Проголосовать: нравится +5 Проголосовать: не нравится

                I understand.Reza_H said about suffix trees, they work only with continuous substrings.I was answering him.

        • »
          »
          »
          »
          »
          12 лет назад, # ^ |
            Проголосовать: нравится +11 Проголосовать: не нравится

          There is no better than O(nm) in general case, but you can optimize constant factor using for example bitset. This will give you something like O(nm/c) complexity. You can try that on LCS0.