ShivanshJ's blog

By ShivanshJ, history, 16 months ago, In English

Many of my friends asked this interesting OA problem to me:

Given an array of strings $$$A$$$ of size $$$N$$$ and two integers $$$B$$$ and $$$C$$$.

Let $$$D$$$ be the number of strings of length $$$C$$$ that contains exactly $$$B$$$ of the strings in $$$A$$$ as substring.

Return $$$D$$$ mod $$$10^9+9$$$.

Constraints

$$$1 \le N \le 6$$$

$$$1 \le |A[i]| \le 50$$$

All the $$$N$$$ strings are distinct

$$$0 \le B \le N$$$

$$$1 \le C \le 50.$$$

Note that if a substring belonging to set $$$A$$$ occurs multiple times in my resulting string, it is counted only once.

My approach:

Let $$$Z$$$ be the size of the alphabet $$$(26)$$$.

Let $$$dp[i][j][k][m]$$$ denote the number of strings satisfying the constraints:

1) It is of length $$$i$$$.

2) The longest suffix present in it which is a proper prefix of some string belonging to set $$$A$$$ is the substring $$$[k...i]$$$ and the string whose proper prefix is the $$$j^{th}$$$ string in set $$$A$$$, in case of multiple such strings in $$$A$$$ choose the one with longest length. In case no such suffix exists, we can put a fake "empty string" at index $$$0$$$ in set $$$A$$$ (rest of the strings are numbered from $$$1$$$ to $$$N$$$) and assume that substring is $$$[i+1 , i]$$$.

3) The substrings (belonging to set $$$A$$$) which has already been occurred is denoted by mask $$$m$$$, more formally, if the $$$w^{th}$$$ string in $$$A$$$ has already occurred, then $$$w^{th}$$$ bit of $$$m$$$ is set, otherwise its not.

I'll write transitions via forward style dp, if I'm adding the $$$(i+1)^{th}$$$ character, then, it might "complete" some substrings, by this I mean, some suffix which was a proper prefix of some string in $$$A$$$ before adding character will now be a complete string belonging to set $$$A$$$. Note that all such strings will be the suffix of that longest suffix.

So, some new bits in mask $$$m$$$ will be set, All this can be calculated, since we already know the longest suffix, in fact lets precalculate, $$$info[i][j][k]$$$ which gives a tuple $$$(bitmask, L, idx)$$$. If we consider the prefix of length $$$j$$$ of $$$i^{th}$$$ string in set $$$A$$$ and add character $$$k$$$ at the end, $$$w^{th}$$$ bit in bitmask is set iff, entire $$$w^{th}$$$ string in $$$A$$$ occurs as a substring in that prefix after adding character $$$k$$$, $$$L$$$ denotes the length of the longest suffix in resulting string (after adding character $$$k$$$) that is a proper prefix of $$$idx^{th}$$$ string in set $$$A$$$, this precomputation can be done naively in $$$O(N*C*Z*N)$$$.

So, after adding $$$(i+1)^{th}$$$ character (denote it by $$$c$$$), new mask is $$$ (m | info[j][i-k+1][c][0])$$$, new length of longest suffix is $$$info[j][i-k+1][c][1]$$$ so, add its contribution towards state $$$dp[i+1][info[j][i-k+1][c][2]][i+2-info[j][i-k+1][c][1]][(m | info[j][i-k+1][c][0])]$$$.

I think the complexity would be $$$O(C*N*C*2^{N}*Z)$$$, which might pass. However, I think I'm overkilling it and there has to be simpler solution. I'm not even sure whether my solution is correct or not.

  • Vote: I like it
  • +14
  • Vote: I do not like it

| Write comment?
»
16 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by ShivanshJ (previous revision, new revision, compare).

»
16 months ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by ShivanshJ (previous revision, new revision, compare).

»
16 months ago, # |
Rev. 2   Vote: I like it +3 Vote: I do not like it

Edit: incorrect

  • »
    »
    16 months ago, # ^ |
      Vote: I like it +3 Vote: I do not like it

    How do you ensure that we count all strings that occur? For example, suppose that you expect the strings to appear in the order 1, 2, 3, 4, 5, 6 and are counting occurrences where four of the strings appear. How are you ensuring that string 5 never appears before any of the first four strings?

»
16 months ago, # |
  Vote: I like it +6 Vote: I do not like it

A simpler solution (in the sense of requiring less original thought after applying a standard algorithm; the underlying idea is probably basically the same) that has the same time complexity as yours is to build an Aho-Corasick automaton on the $$$N$$$ input strings. For each node in the automaton, create a bitmask indicating which of the $$$N$$$ input strings could end in this node.

Then, run a DP where your state is the number of characters used, a mask indicating which of the $$$N$$$ strings have appeared in the string you're building, and the node of the Aho-Corasick automaton corresponding to the string you've built so far. To transition, iterate over the next letter in the string, transition to the corresponding Aho-Corasick node, and add any input strings appearing in that node to your bitmask.

There are $$$O(C2^N N |A[i]|)$$$ states and $$$26$$$ transitions per state, giving the same complexity as your solution.

  • »
    »
    16 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    Thanks! I don't know Aho Corasick, time for me to learn this!

  • »
    »
    16 months ago, # ^ |
      Vote: I like it 0 Vote: I do not like it

    It's literally the same solution though, is it not?

»
16 months ago, # |
Rev. 3   Vote: I like it 0 Vote: I do not like it

I am little confused in understanding the task.

(Edit: I got it now , so we need to find A_i as substring of assumed Good string S not in other way. I thought we need to find number of substring in A_i or in total A)

yup N <= 6, DP bitmask seems the good intuitive way here.