A few days ago, I found an article about algorithm of finding Longest Common Subsequence (LCS). Here is its link: https://docs.google.com/viewer?a=v&q=cache:3xhhf3n6TEMJ:www.sms.edu.pk/journals/jprm/jprmvol4/jprm9_4.pdf+&hl=vi&gl=vn&pid=bl&srcid=ADGEESiYAIGwMFziedBggqJPQN8ipIweV-KZUqCOnGA2ZnweAV3wNM11uQNC7HF4tYyTFvUhebP2BszIKI5m-ZYnF4O7t6MBtR0QV8ZJlzOI3T1Ex_mmnd2fiyhPaf0-lxsF0W-1wUu8&sig=AHIEtbR2Uaubbu_0sd9HzfW0NsQNFFYmhg
The article mentions an algorithm of O(n) for finding LCS of two strings X and Y (m is length of X, n is length of Y) with preprocessing of O(m). This algorithm is a greedy approach to solve LCS. I think it is very interesting and decide to learn it fully. But I stuck in implementation of this algorithm. I did try to write code for it but I can't do it correctly.
I hope somebody can help me for this algorithm's implementation. Thank you very much!
Thanks for reading!
This greedy approach doesn't seem always producing the best answer. On pair of strings ('bcaaaa', 'aaaabc') it will find 'bc' as longest common subsequence, not 'aaaa'. Are you still sure you want this algorithm? Maybe you should use classic dynamic programming approach?
Maybe you didn't think about that algorithm carefully. I still believe the algorithm is true. Let me explain your case:
Let X="bcaaaa" and Y="aaaabc".
First, if you preprocess Y, then scan X to find the answer, you will get the result "bc". Second, if you preprocess X, then scan Y to find the answer, the result is "aaaa". We compare 2 results and the best result is "aaaa", as we expected.
And what if we take X="bcaaaade" and Y="deaaaabc"?
Correct LCS is "aaaa", not "bc" or "de".
Oh, I see. That greedy approach is just a heristic and it's not true for all cases. Thanks for your comment. BTW, do you know better way to solve LCS (complexity is smaller than O(nm))?
You can find LCS with Suffix Tree in O(N+M) !!!
Largest common substring, not subsequence.
Substring means continuous subsequence, right? But here, I want to solve Longest Common Subsequence (not neccessary continuous).
I understand.Reza_H said about suffix trees, they work only with continuous substrings.I was answering him.
There is no better than O(nm) in general case, but you can optimize constant factor using for example bitset. This will give you something like O(nm/c) complexity. You can try that on LCS0.