I encountered this problem at a past local competition. Here is the brief statement:
Given two string $$$s$$$ of length $$$n$$$ and t of length $$$m$$$ ($$$n$$$, $$$m$$$ $$$\le$$$ $$$10^{5}$$$) find the longest, lexicographically smallest substring of $$$s$$$ that appears in string $$$t$$$ at least $$$k$$$ times. The input is given in a way that there is at least one substring satisfies the constraints.
I have tried to find answers on Google and the best I could find is this problem from HackerEarth . The suggested solutions were either hashing with binary search, or suffix tree. And I am not knowledgeable enough to transform the problem into the original problem. So far the best I could have done is to iterate every substring of $$$s$$$ in $$$O(n^2)$$$, then use string search algorithm (KMP/Z-function) to count number of occurrences, yielding total complexity of $$$O(n^2*(n+m))$$$. Please give hints to an approach for this problem.
I forgot. The first line of input contains 3 positive integers $$$n$$$, $$$m$$$, and $$$k$$$. The second line contains the string $$$s$$$ and the third line contains the string $$$t$$$. Here is an example test case:
Input:
Output:
Explanation:
TAC
andACA
are the two longest substring of $$$s$$$ that satisfy, andACA
is lexicographically smaller thanTAC
.this is my code about this problem:
I have found O((M+N)*LOG_M*LOG_N) solution using binary search on the length of substring + suffix array for sorting substrings of length X. We also need to use hashing for counting occurences.
I don't know if there is any better solution than this one.
That is good enough, please explain more.
We can use binary search on the answer technique. We fix the length of substring. Let's call the mid value X. We can use binary search because suppose that the answer is A(length of the substring). Then, an answer with length (A-1) must exist because you can pop the last character of the valid substring.
We use hashing to calculate the number of occurences (on string t with substring length X). This part is O(MlogM) (due to std::map). Finally , to find the lexicographically minimum substring we need to use suffix array to sort substring of length X. You can learn suffix array and it's applications from CP-ALGO. Also sorry for my poor english.
No problem. Thank you for your idea, I'll try to implement it.
Your welcome :)
Translation: find a string of maximal length that has at least one occurrence in $$$s$$$ and at least $$$k$$$ occurrences in $$$t$$$. One way to do that is:
A similar solution can be done with suffix arrays and two pointers technique:
Thank you so much. That is so clever.