Блог пользователя ShivanshJ

Автор ShivanshJ, история, 14 месяцев назад, По-английски

Many of my friends asked this interesting OA problem to me:

Given an array of strings $$$A$$$ of size $$$N$$$ and two integers $$$B$$$ and $$$C$$$.

Let $$$D$$$ be the number of strings of length $$$C$$$ that contains exactly $$$B$$$ of the strings in $$$A$$$ as substring.

Return $$$D$$$ mod $$$10^9+9$$$.

Constraints

$$$1 \le N \le 6$$$

$$$1 \le |A[i]| \le 50$$$

All the $$$N$$$ strings are distinct

$$$0 \le B \le N$$$

$$$1 \le C \le 50.$$$

Note that if a substring belonging to set $$$A$$$ occurs multiple times in my resulting string, it is counted only once.

My approach:

Let $$$Z$$$ be the size of the alphabet $$$(26)$$$.

Let $$$dp[i][j][k][m]$$$ denote the number of strings satisfying the constraints:

1) It is of length $$$i$$$.

2) The longest suffix present in it which is a proper prefix of some string belonging to set $$$A$$$ is the substring $$$[k...i]$$$ and the string whose proper prefix is the $$$j^{th}$$$ string in set $$$A$$$, in case of multiple such strings in $$$A$$$ choose the one with longest length. In case no such suffix exists, we can put a fake "empty string" at index $$$0$$$ in set $$$A$$$ (rest of the strings are numbered from $$$1$$$ to $$$N$$$) and assume that substring is $$$[i+1 , i]$$$.

3) The substrings (belonging to set $$$A$$$) which has already been occurred is denoted by mask $$$m$$$, more formally, if the $$$w^{th}$$$ string in $$$A$$$ has already occurred, then $$$w^{th}$$$ bit of $$$m$$$ is set, otherwise its not.

I'll write transitions via forward style dp, if I'm adding the $$$(i+1)^{th}$$$ character, then, it might "complete" some substrings, by this I mean, some suffix which was a proper prefix of some string in $$$A$$$ before adding character will now be a complete string belonging to set $$$A$$$. Note that all such strings will be the suffix of that longest suffix.

So, some new bits in mask $$$m$$$ will be set, All this can be calculated, since we already know the longest suffix, in fact lets precalculate, $$$info[i][j][k]$$$ which gives a tuple $$$(bitmask, L, idx)$$$. If we consider the prefix of length $$$j$$$ of $$$i^{th}$$$ string in set $$$A$$$ and add character $$$k$$$ at the end, $$$w^{th}$$$ bit in bitmask is set iff, entire $$$w^{th}$$$ string in $$$A$$$ occurs as a substring in that prefix after adding character $$$k$$$, $$$L$$$ denotes the length of the longest suffix in resulting string (after adding character $$$k$$$) that is a proper prefix of $$$idx^{th}$$$ string in set $$$A$$$, this precomputation can be done naively in $$$O(N*C*Z*N)$$$.

So, after adding $$$(i+1)^{th}$$$ character (denote it by $$$c$$$), new mask is $$$ (m | info[j][i-k+1][c][0])$$$, new length of longest suffix is $$$info[j][i-k+1][c][1]$$$ so, add its contribution towards state $$$dp[i+1][info[j][i-k+1][c][2]][i+2-info[j][i-k+1][c][1]][(m | info[j][i-k+1][c][0])]$$$.

I think the complexity would be $$$O(C*N*C*2^{N}*Z)$$$, which might pass. However, I think I'm overkilling it and there has to be simpler solution. I'm not even sure whether my solution is correct or not.

  • Проголосовать: нравится
  • +14
  • Проголосовать: не нравится

»
14 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Auto comment: topic has been updated by ShivanshJ (previous revision, new revision, compare).

»
14 месяцев назад, # |
  Проголосовать: нравится 0 Проголосовать: не нравится

Auto comment: topic has been updated by ShivanshJ (previous revision, new revision, compare).

»
14 месяцев назад, # |
Rev. 2   Проголосовать: нравится +3 Проголосовать: не нравится

Edit: incorrect

  • »
    »
    14 месяцев назад, # ^ |
      Проголосовать: нравится +3 Проголосовать: не нравится

    How do you ensure that we count all strings that occur? For example, suppose that you expect the strings to appear in the order 1, 2, 3, 4, 5, 6 and are counting occurrences where four of the strings appear. How are you ensuring that string 5 never appears before any of the first four strings?

»
14 месяцев назад, # |
  Проголосовать: нравится +6 Проголосовать: не нравится

A simpler solution (in the sense of requiring less original thought after applying a standard algorithm; the underlying idea is probably basically the same) that has the same time complexity as yours is to build an Aho-Corasick automaton on the $$$N$$$ input strings. For each node in the automaton, create a bitmask indicating which of the $$$N$$$ input strings could end in this node.

Then, run a DP where your state is the number of characters used, a mask indicating which of the $$$N$$$ strings have appeared in the string you're building, and the node of the Aho-Corasick automaton corresponding to the string you've built so far. To transition, iterate over the next letter in the string, transition to the corresponding Aho-Corasick node, and add any input strings appearing in that node to your bitmask.

There are $$$O(C2^N N |A[i]|)$$$ states and $$$26$$$ transitions per state, giving the same complexity as your solution.

»
14 месяцев назад, # |
Rev. 3   Проголосовать: нравится 0 Проголосовать: не нравится

I am little confused in understanding the task.

(Edit: I got it now , so we need to find A_i as substring of assumed Good string S not in other way. I thought we need to find number of substring in A_i or in total A)

yup N <= 6, DP bitmask seems the good intuitive way here.