Solution for "Problem G. Automaton" from GP of Dolgoprudny

Revision en1, by ftiasch, 2019-05-24 03:48:02

I finally solved problem G of GP of Dolgoprudny (also problem 83 in New Year Contest 2018). Though there are discussions on Codeforces, they all look very abstract to me. Hence, I decided to write down a full editorial.

First, we need some definitions. For given string $$$s$$$, we write $$$\mathrm{occ}_s(p) = {i : s[i - |p| + 1: i] = p}$$$. We write $$$x \equiv y$$$ if $$$\mathrm{occ}_s(x) = \mathrm{occ}_s(y)$$$. The "$$$\equiv$$$" relation groups all non-empty substrings of $$$s$$$ into equivalence classes. We write the equivalence class where $$$x$$$ is in as $$$[x]^R$$$. We also write $$$\overleftarrow{x}$$$ for the unique longest string in $$$[x]^R$$$.

The number of states in the automaton of $$$s$$$ equals to the number of equivalence classes plus $$$1$$$. We will then compute the number of equivalence classes. By the linearity of expectation, we can compute for each string $$$p$$$, the number of strings $$$s$$$ which contains an equivalence class $$$[x]^R$$$ where $$$\overleftarrow{x} = p$$$.

For $$$\overleftarrow{x} = p$$$ to hold, two conditions should be satisfied:

  1. $$$p$$$ occurs as a substring in $$$s$$$;
  2. $$$[ap]^R \neq [p]^R$$$ for any character $$$a$$$.

We first compute the number of strings $$$s$$$ satisfying condition 1 solely. Then we subtract the number of strings $$$s$$$ where $$$[ap]^R = [p]^R$$$ for all $$$a$$$.

Condition 1

$$$p$$$ may end in string $$$s$$$ in positions $$$P = {|p|, |p|+1, \dots, n}$$$. Using the principle of Inclusion and Exclusion (i.e., PIE), we can alternatively compute the sum

$$$\sum_{A \neq \emptyset, A \subseteq P} (-1)^{|A|} \mathrm{count}(p, A)$$$

where $$$\mathrm{count}(p, A)$$$ the number of strings $$$s$$$ where $$$p$$$ ends in positions $$$A$$$.

If we know the string $$$p$$$ in advance, this can be done by a $$$O(n^2)$$$ DP. But how if the number of possible strings $$$p$$$ can be large?

Let's introduce another notation $$$\mathrm{per}(s) = {k : s[1 : |s| - k + 1] = s[k + 1 : |s|]}$$$ for a string $$$s$$$. We notice that the above sum is determined only by the set $$$\mathrm{per}(p)$$$, not the string $$$p$$$ itself. The number of different $$$\mathrm{per}(p)$$$ is around 1 thousand when $$$|p|=40$$$. Thus, we can instead enumerate different $$$\mathrm{per}(p)$$$ and compute the desired sum.

Condition 2

For given string $$$s$$$ and pattern $$$ap$$$, we subtract from the result $$$\mathrm{occ}_s(ap) \neq \emptyset$$$ and $$$\mathrm{occ}_s(ap) = \mathrm{occ}_s(p)$$$. By PIE, the second condition boils down to the sum

$$$\sum_{s \in \Sigma^n} \sum_{X \subseteq \mathrm{occ}_s(ap), X \neq \emptyset} \sum_{Y \subseteq \mathrm{occ}_s(p), X \cap Y = \emptyset} (-1)^{|Y|}.$$$

Noticing that $$$\mathrm{occ}(ap) \subseteq \mathrm{occ}(p)$$$, we can sum over $$$U = X \cup Y$$$. The sum equals to

\begin{align} & \sum_{s \in \Sigma^n} \sum_{U \subseteq \mathrm{occ}_s(p)} \sum_{X \subseteq U \cap \mathrm{occ}_s(ap), X \neq \emptyset} (-1)^{|U| — |X|} \ = & -\sum_{s \in \Sigma^n} \sum_{U \subseteq \mathrm{occ}_s(p)} (-1)^{|U|} [U \cap \mathrm{occ}_s(ap) \neq \emptyset] \ = & -\sum_{s \in \Sigma^n} \sum_{U \subseteq \mathrm{occ}_s(p)} (-1)^{|U|} (1 — [U \cap \mathrm{occ}_s(ap) = \emptyset]) \ = & -\sum_{s \in \Sigma^n} \sum_{U \subseteq \mathrm{occ}_s(p)} (-1)^{|U|} + \sum_{s \in \Sigma^n} \sum_{U \subseteq \mathrm{occ}_s(p)} \sum_{V \subseteq U \cap \mathrm{occ}_s(ap)}(-1)^{|U| + |V|} \ = & -\sum_{U \subseteq [n]} (-1)^{|U|} \sum_{s \in \Sigma^n} [U \subseteq \mathrm{occ}_s(p)] + \sum_{U \subseteq [n]} \sum_{V \subseteq U}(-1)^{|U| + |V|} \sum_{s \in \Sigma^n} [U \subseteq \mathrm{occ}_s(p) \wedge V \subseteq \mathrm{occ}_s(ap)]\ \end{align}

Both parts can be computed using similar DP mentioned in the previous section. The only thing to notice is that we need not the set $$$\mathrm{per}(p)$$$, but the pair $$$(\mathrm{per}(p), \mathrm{per}(ap))$$$.

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en3 English ftiasch 2019-05-24 03:50:18 0 (published)
en2 English ftiasch 2019-05-24 03:48:44 26
en1 English ftiasch 2019-05-24 03:48:02 3795 Initial revision (saved to drafts)