Hello,
Recently my friend encountered the following problem during a coding interview:
You are given a string $$$S$$$ and a set $$$T$$$ of $$$n$$$ strings. You need to choose a multiset from $$$T$$$ (you can choose the same string any number of times) and then concatenate the strings from the multiset in any order to obtain the string $$$S$$$. What is the maximum possible size of such a multiset?
There is a simple $$$O(|S|^2)$$$ solution using DP + trie. First add all strings from $$$T$$$ to a trie. Let $$$DP[i]$$$ be the answer for $$$S_{i...|S|-1}$$$. To calculate $$$DP[i]$$$, we traverse the trie according to the substring $$$S_{i...|S|-1}$$$, and try to update $$$DP[i]$$$ for each string we encounter in the trie.
Assuming the sum of the lengths of strings from the set $$$T$$$ is $$$X$$$, there is also a solution with complexity $$$O(|S| * sqrt(X))$$$:
Is there a faster solution to this problem? I've tried for a long time without any success.
Any help would be appreciated, thanks!
You can use Suffix Automaton to somewhat speedup the squared DP solution:
Build a suffix automaton for string $$$S$$$ in $$$O(|S|)$$$ complexity, and for each string $$$T_j$$$ iterate over it and follow the transitions of each letter starting from the root (while you can, or else the current string is not a substring of $$$S$$$) and store $$$|T_j|$$$ in its corresponding state, then we can follow suffix links from each prefix of $$$S$$$ and iterate over all lengths of $$$T_j$$$ on the way (which would be suffixes of the current prefix of $$$S$$$) and update $$$dp_i$$$, this way we'll iterate over all occurrences of strings $$$T$$$ in $$$S$$$. However, we need to calculate the nearest state (reachable through suffix links) which has at least one string from $$$T$$$ for each state, because we don't want to iterate over states which don't represent any string from $$$T$$$, we can do that running a simple $$$DFS$$$ from the root using suffix links.
Now assuming $$$|S| <= 10^5$$$, $$$\sum_{i=0}^{|T|-1} |T_i| <= 10^5$$$ sum of all occurrences of $$$T_i$$$ would be around $$$4 * 10^7$$$ which is slightly worse than $$$O(|S| * sqrt(\sum_{i=0}^{|T|-1} |T_i|))$$$, I just thought this solution was nice :)
Thanks, that's a really nice solution! Suffix Automaton is amazing :D
I don't think there is a faster solution. But there is a simpler one just use aho-corasick to find the occurrences and at each index get the array of lengths $$$L$$$ where $$$S[i-L[i] + 1 \dots i]$$$ exists in the set and just find the minimum $$$DP[i]$$$.
Thanks :)
Somebody should really post this somewhere. Somewhere where you can submit code for it.
Technically, RedNextCentury is a body, and Codeforces is somewhere :D
Not what I meant. Just edited.
very similar problem