question example:
given a string "aaaabbbasd" and dictionary, for example {a, aa, aaa, b, bb, s, d}
1) find one possible way to split the string into words from dictionary
2) find all possible ways to split the string into words from dictionary
Searching for the best approach on web I found different ones: dp, backtracking + memorization, dfs + memorization.
I lost in all of this huge variety.
Does somebody know what approach is the best here?
You can run a DP + trie solution. Let the state be dp[i][u], representing the number of ways to split the string starting from position i and being at node u in the trie. Let s[i] be the current letter. The transition is either of (or both, if conditions hold):
This has a complexity of around O(|s|*|dict|) where |dict| is the size of the dictionary trie (about sum of lengths of all dictionary words).
Another related idea is preprocessing the "match positions" with aho corasick, and keeping a single state dp[i], number of ways to split the string starting from position i. Here the transition would be:
dp[i] = sum (over all j, words that match s[i,j)) of dp[j].
This runs in O(|dict| + |s|) for the preprocessing, plus O(|num_matches| + |s|) for the dp, where num_matches is the total amount of matches from words of the dictionary into s. While this can be as bad as the first dp in pathological cases like aaaaaaaa and such, it should be faster on average.
thank you
could you also tell something regarding the case when dictionary is limited to be hash table? can we achieve better than
O(n^3)
complexity in this case?I don't know what n is. You can do the preprocessing in the second solution I proposed by using a rolling hash, but you'll need several hashes as you'll run into problems because of the birthday paradox. It doesn't change the complexity.
Complexity >> O(length of string * maximum length of word in dictionary)