[Help] A String Problem Solved By DP + Aho?

#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	156
7	djm03178	152
7	adamant	152
9	luogu_official	150
10	awoo	147

Given a string $$$S (|S| \le 100)$$$ and $$$n (n \le 1000)$$$ strings $$$T_1, T_2, \dots, T_n (|T_i| \le 30)$$$. Each string $$$T_i$$$ has a corresponding point $$$p_i (1 \le p_i \le 10000)$$$.

You can do the following operation any number of times (until you cannot choose any substring): Choose a substring in $$$S$$$ that is equal to one of $$$T_i$$$, delete that substring in $$$S$$$, you get $$$p_i$$$ points and the remaining characters of $$$S$$$ concatenate.

What is the maximum points you can get by doing the operations?

Sample input

I was thinking about a solution using DP and Aho Corasick.

Specifically, let $$$dp(i, j) = $$$ the maximum points you can get with the first $$$i$$$ characters of $$$S$$$ and you are standing at node $$$j$$$ in the Trie after doing some operations.

But I came across a problem. When we move from $$$dp(i, j)$$$ to $$$dp(i + 1, k)$$$, if we jump from node $$$j$$$ to node $$$k$$$ with suffix link, we lost some information of the prefix that we haven't used in any operation yet.

Could you guys suggest any alternatives?

Thanks in advance.

for L = len(S)-1 .. 0: for R = L .. len(S)-1: // you can get rid of this for-loop because dp[][] just depends on L B[L][R] = -INF for k = 0 .. n-1: // consider T[k] to be the last scored string in substring S[L, R] int dp[len(S)+1][len(Tk)+1] = -INF dp[L][0] = 0 for is = L..R: for itk = 0..len(Tk)-1: if S[is] == Tk[itk]: dp[is][itk] -> dp[is+1][itk+1] for r = is..R: dp[is][itk] + B[is][r] -> dp[r+1][itk] B[L][R] = max(B[L][R], score[k] + dp[len(S)][len(Tk)])

Comments (7)

Write comment?

Errichto

3 years ago, # |

← Rev. 4 →

+32

I think that we need an interval dp solution. Here's mine in $$$O(S^3 \cdot N \cdot T)$$$, which is too slow but should be a good start.

Let $$$B[L][R]$$$ be the best score we can get by erasing the substring $$$S[L..R]$$$. In order to compute this value, let's iterate what is the last move, i.e. which string $$$T_k$$$ we use last. The substring $$$S[L..R]$$$ must contain $$$T_k$$$ as a subsequence, split by some shorter intervals $$$[l, r]$$$, which were erased earlier. We need $$$dp[is][itk]$$$ where $$$is$$$ is position in $$$S$$$, and $$$itk$$$ is position in $$$T_k$$$. Transitions are either eating the next character from $$$T_k$$$ (if matching) or jumping a short interval $$$[l, r]$$$ by using $$$B[l][r]$$$. Here's pseudocode.

→ Reply

kpw29

3 years ago, # ^ |

+27

I recall seeing the exact same problem on some ICPC contest, and this solution was fast enough in practice. One could speed it up with tries, I think.

Edit: https://codeforces.net/gym/101635 problem D.

My code doesn't involve tries, but for the higher constraints you may need to use them.

Libraion

+48

Thanks <3

Igor_Parfenov

+16

It seems, I've found better algorithm, but I'm not good in Aho-Corasick.

Let's preprocess for every $$$pos$$$ in string $$$S$$$ the set of lengthes of substrings, which are in $$$T$$$, and their last position is $$$pos$$$. To do it, we bring the subtree of trie, which contain only terminal vertices and inversed links, and for every vertice we want to be able to check, if there is an ancestor with label $$$x$$$, where label is a depth of vertice in original trie. Notice, that when we are moving to root, the labels decrease, so we can apply binary lifting. We can do it in $$$O(n \cdot T \cdot \log(n \cdot T))$$$. Then using the binary lifting, we can find the sets for $$$S$$$, which takes is $$$O(S \cdot \log(n \cdot T))$$$.

Now, we can in $$$O(set)$$$ check if substring of $$$S$$$ is in $$$T$$$. Do simple $$$DP$$$ on substrings in $$$O(S^4 \cdot set)$$$.

If I did not make mistake in algorithm, the total complexity is $$$O(n \cdot T \cdot \log(n \cdot T) + S^4 \cdot set)$$$.

Could you elaborate a little bit more? Like with a real example. Thanks

Oh, it seems I was yesterday a little tired, I tried to check for every substring of $$$S$$$ isn't it some $$$T_i$$$, but it is easy to make using hashes.)

But anyway, this algorithm probably correct, I hope I explained it understandable.

Recall what is Aho-Corasich trie: it is a trie and suffix links from every vertice to vertice with smaller deep. Obviously, the suffix links form tree. We assume here, that root doesn't have suffix link. What do we do, after we built tree and before we scan our text? We want to check for every vertice, isn't it contain some $$$T_i$$$ as suffix. It is not only the vertices, which are at the end of path from root with characters of $$$T_i$$$, it is also every vertice, such that there is path from it through inversed suffix links to one of vertices, which are at the end of path from root with characters of $$$T_i$$$. So we do propagation from every such vertice through inversed suffix links. We call all this vertices terminal.

Back to our algorithm. Now look at tree with suffix links of trie. Let's look at subset of vertices, which are terminal. Every descendant of a terminal vertice is terminal. Also, let's write on every vertice its depth in trie. If we move to root, the depth is decreases (but not by $$$-1$$$). Also, if we move to root, firstly we stay on only terminal vertices, then only on non-terminal vertices. All ancestors of vertice are some $$$T_i$$$, which are suffix of this vertice. All we have to do, is to be able to check for vertice $$$v$$$ and number $$$d$$$ is there an ancestor with such number, and if there is, is it terminal.

Now, we can for every vertice check, whether there is $$$T_i$$$ with length $$$d$$$ which is suffix of it. We want to implement DP. We firstly preprocess for every position in $$$S$$$ the corresponding vertice of trie. Now we want to check, if the substring of $$$S$$$ is some $$$T_i$$$. We bring the vertice of trie, corresponding to the last index of $$$S$$$ and ask, if it has terminal ansestor with label $$$length$$$.

I moreover hadn't read task correctly: how am I going to find answer for string after deleting substring from middle? Sorry for waisting time, I should think first.

Libraion's blog