Many of my friends asked this interesting OA problem to me:
Given an array of strings $$$A$$$ of size $$$N$$$ and two integers $$$B$$$ and $$$C$$$.
Let $$$D$$$ be the number of strings of length $$$C$$$ that contains exactly $$$B$$$ of the strings in $$$A$$$ as substring.
Return $$$D$$$ mod $$$10^9+9$$$.
Constraints
$$$1 \le N \le 6$$$
$$$1 \le |A[i]| \le 50$$$
All the $$$N$$$ strings are distinct
$$$0 \le B \le N$$$
$$$1 \le C \le 50.$$$
Note that if a substring belonging to set $$$A$$$ occurs multiple times in my resulting string, it is counted only once.
My approach:
Let $$$Z$$$ be the size of the alphabet $$$(26)$$$.
Let $$$dp[i][j][k][m]$$$ denote the number of strings satisfying the constraints:
1) It is of length $$$i$$$.
2) The longest suffix present in it which is a proper prefix of some string belonging to set $$$A$$$ is the substring $$$[k...i]$$$ and the string whose proper prefix is the $$$j^{th}$$$ string in set $$$A$$$, in case of multiple such strings in $$$A$$$ choose the one with longest length. In case no such suffix exists, we can put a fake "empty string" at index $$$0$$$ in set $$$A$$$ (rest of the strings are numbered from $$$1$$$ to $$$N$$$) and assume that substring is $$$[i+1 , i]$$$.
3) The substrings (belonging to set $$$A$$$) which has already been occurred is denoted by mask $$$m$$$, more formally, if the $$$w^{th}$$$ string in $$$A$$$ has already occurred, then $$$w^{th}$$$ bit of $$$m$$$ is set, otherwise its not.
I'll write transitions via forward style dp, if I'm adding the $$$(i+1)^{th}$$$ character, then, it might "complete" some substrings, by this I mean, some suffix which was a proper prefix of some string in $$$A$$$ before adding character will now be a complete string belonging to set $$$A$$$. Note that all such strings will be the suffix of that longest suffix.
So, some new bits in mask $$$m$$$ will be set, All this can be calculated, since we already know the longest suffix, in fact lets precalculate, $$$info[i][j][k]$$$ which gives a tuple $$$(bitmask, L, idx)$$$. If we consider the prefix of length $$$j$$$ of $$$i^{th}$$$ string in set $$$A$$$ and add character $$$k$$$ at the end, $$$w^{th}$$$ bit in bitmask is set iff, entire $$$w^{th}$$$ string in $$$A$$$ occurs as a substring in that prefix after adding character $$$k$$$, $$$L$$$ denotes the length of the longest suffix in resulting string (after adding character $$$k$$$) that is a proper prefix of $$$idx^{th}$$$ string in set $$$A$$$, this precomputation can be done naively in $$$O(N*C*Z*N)$$$.
So, after adding $$$(i+1)^{th}$$$ character (denote it by $$$c$$$), new mask is $$$ (m | info[j][i-k+1][c][0])$$$, new length of longest suffix is $$$info[j][i-k+1][c][1]$$$ so, add its contribution towards state $$$dp[i+1][info[j][i-k+1][c][2]][i+2-info[j][i-k+1][c][1]][(m | info[j][i-k+1][c][0])]$$$.
I think the complexity would be $$$O(C*N*C*2^{N}*Z)$$$, which might pass. However, I think I'm overkilling it and there has to be simpler solution. I'm not even sure whether my solution is correct or not.
Auto comment: topic has been updated by ShivanshJ (previous revision, new revision, compare).
Auto comment: topic has been updated by ShivanshJ (previous revision, new revision, compare).
Edit: incorrect
How do you ensure that we count all strings that occur? For example, suppose that you expect the strings to appear in the order 1, 2, 3, 4, 5, 6 and are counting occurrences where four of the strings appear. How are you ensuring that string 5 never appears before any of the first four strings?
I do not, thank you for correcting me.
A simpler solution (in the sense of requiring less original thought after applying a standard algorithm; the underlying idea is probably basically the same) that has the same time complexity as yours is to build an Aho-Corasick automaton on the $$$N$$$ input strings. For each node in the automaton, create a bitmask indicating which of the $$$N$$$ input strings could end in this node.
Then, run a DP where your state is the number of characters used, a mask indicating which of the $$$N$$$ strings have appeared in the string you're building, and the node of the Aho-Corasick automaton corresponding to the string you've built so far. To transition, iterate over the next letter in the string, transition to the corresponding Aho-Corasick node, and add any input strings appearing in that node to your bitmask.
There are $$$O(C2^N N |A[i]|)$$$ states and $$$26$$$ transitions per state, giving the same complexity as your solution.
Thanks! I don't know Aho Corasick, time for me to learn this!
It's literally the same solution though, is it not?
I am little confused in understanding the task.
(Edit: I got it now , so we need to find A_i as substring of assumed Good string S not in other way. I thought we need to find number of substring in A_i or in total A)
yup N <= 6, DP bitmask seems the good intuitive way here.