A note on CF1679E Typical Party in Dorm (CF791DIV2E)

#	User	Rating
1	jiangly	3898
2	tourist	3840
3	orzdevinwang	3706
4	ksun48	3691
5	jqdai0815	3682
6	ecnerwala	3525
7	gamegame	3477
8	Benq	3468
9	Ormlis	3381
10	maroonrk	3379

#	User	Contrib.
1	cry	168
2	-is-this-fft-	165
3	Dominater069	161
4	atcoder_official	160
5	Um_nik	159
6	djm03178	157
7	adamant	153
8	luogu_official	151
9	awoo	149
10	TheScrasse	146

Problem link, link: https://codeforces.net/contest/1679/problem/E

Part 1: Notations

$$$s$$$: The original string that might contain '?'.

$$$s[i,j]$$$: The substring including $$$s[i]$$$ and $$$s[j]$$$: $$$(s[i], s[i+1],..., s[j])$$$. If $$$i > j$$$, the $$$s[i, j]$$$ is an empty string.

$$$t$$$: The query string.

We also define some notations that will be clarified in the Part 2:

$$$ok[i,j]$$$: A $$$2D$$$ bool array indicating whether $$$s[i,j]$$$ is valid or not. We will define the word "valid" in Part 2. If $$$s[i,j]$$$ is valid, then $$$ok[i,j]$$$ is true, i.e., $$$1$$$. If $$$s[i,j]$$$ is invalid, then $$$ok[i,j]$$$ is false, i.e., $$$0$$$.

$$$set[i, j]$$$: The required character set to make $$$s[i,j]$$$ a palindrome.

$$$free[i, j]$$$: The number of free places. We will define the terminology "free places" in part 2.

$$$num[i, j]$$$: The number of '?' in the substring $$$s[i, j]$$$. For example, if s[i, j] == "???a", then $$$num[i, j] == 3$$$.

Part 2: ok, set and free

(1)We call a substring $$$s[i,j]$$$ invalid if and only if:

there exists an index $$$x (i \leq x \leq j)$$$ such that s[x] != '?' && s[i+j-x] != '?' && s[x] != s[i+j-x].

For example,

"ab?a" is valid.

"ax?xc" is invalid because the first character 'a' mismatches with the last character 'c'.

"tbct" is invalid because 'b' != 'c'

The validity of a substring is irrelevant to the query subset. For example, if the query set $$$t$$$ does not contain the character 'b', we still call the string "ab?a" valid, although it certainly cannot form a palindrome. We will deal this case using the $$$set$$$ array.

(2) $$$set[i, j]$$$ denotes all the characters needed to make $$$s[i, j]$$$ a palindrome. For example, if s[i, j] == "ab?a", then set[i, j] == {b}, as we need a character 'b' to fill in the '?'. Note that there are at most $$$17$$$ query characters in this problem, i.e., $$$|t| \leq 17$$$, so we can compress $$$set[i, j]$$$ to a $$$32$$$-bit integer to accelerate computation. Don't use std::set!

(3) $$$free[i, j]$$$ means the number of free characters in the substring $$$s[i, j]$$$. For example, in "ax?xc", the middle '?' is a free character as we could replace it with any character in the query string $$$t$$$. In "ab???a", the first and the second '?' count one free character (Warning: not two!!!). The first '?' can be replaced with an arbitrary character from $$$t$$$, and the second '?' has to be the same with the first '?', so only one of them is free. The third '?' is not free because it has to be 'b'.

Part 3: The naive formula

What is the naive formula for each query string $$$t$$$? It is:

$$$ans(t) = \sum\limits_{1 \leq i,j \leq len(s)} ok[i, j] \cdot [set[i, j] \subseteq t] \cdot |t|^{free[i, j] + num[1, i-1] + num[j+1, len(s)]}$$$ (1)

The $$$[set[i, j] \subseteq t]$$$ is like a Kronecker symbol. If $$$set[i, j] \subseteq t$$$ is true, then $$$[set[i, j] \subseteq t]==1$$$, otherwise $$$[set[i, j] \subseteq t]==0$$$. We need to analyze each $$$i, j$$$. If $$$ok[i, j]$$$ is $$$0$$$, it would never be a palindrome no matter how you fill in the '?'. If $$$ok[i, j]$$$ is $$$1$$$, it depends on the query string $$$t$$$. In part 2, I say that "ab?a" is valid even if $$$t$$$ does not contain 'b'. In this case, $$$ok[i, j]==1$$$ but $$$[set[i, j] \subset t] == 0$$$, the contribution of $$$i,j$$$ is still $$$0$$$. Be careful with the third item, how many strings $$$s$$$ can make $$$s[i, j]$$$ a palindrome? If you only count $$$|t|^{free[i, j]}$$$, you are wrong! Because we can use arbitrary characters to fill in the '?' in $$$s[1, i-1]$$$ and $$$s[j+1, len(s)]$$$, that doesn't affect whether $$$s[i, j]$$$ is a palindrome or not! So, we should not forget $$$num[1, i-1]$$$ and $$$num[j+1, len(s)]$$$.

Part 4: Computation

What is the bottleneck of the naive formula Eq.(1)? Obviously, it takes $$$O(n^2)$$$ for every query, which is too slow. We first figure out:

(1) Computing $$$ok,\,set,\,free$$$ requires $$$O(n^2)$$$ time. Possibly there are $$$O(n)$$$ algorithms, but I don't know. Anyway, $$$O(n^2)$$$ is sufficient to pass. The idea is simple. For odd-length palindromes, we extend from one central character. For even-length palindromes, we extend from two adjacent characters. Initialize the $$$ok$$$ array to $$$0$$$, when you extend, set $$$ok[i, j]$$$ to $$$1$$$, and stop extension when you encounter an invalid pair $$$(i, j)$$$, i.e., s[i] != '?' && s[j] != '?' && s[i] != s[j]. We can compute $$$num$$$ in $$$O(n)$$$ using a prefix array.

(2) After computing $$$ok,\,set\,free$$$ for each $$$(i, j)$$$, try to use these three arrays to precompute $$$ans$$$. We could brute force the length easily in this manner:

for(i, j){ //O(n^2)
   for(int length = 0; length <= 17; ++length){ //O(18)
       ...
   }
}

that is not very difficult. The key problem is when you get a $$$set(i, j)$$$, we should update all its supersets, which might cost $$$O(2^{|t|})$$$ for each $$$(i, j)$$$. We use the "separation of variables trick". We define two $$$dp$$$ arrays, $$$dp1$$$ to solve the original problem and $$$dp2$$$ to solve $$$dp1$$$. The final output is $$$dp2$$$:

$$$dp1[submask][k] := \sum\limits_{1 \leq i,j \leq len(s); set[i, j] == submask; ok[i, j] == 1} k^{free[i, j] + num[1, i-1] + num[j+1, len(s)]}$$$. (2)

$$$dp2[mask][k] := \sum\limits_{submask \subseteq mask} dp1[submask][k]$$$. (3)

$$$ans(t) = dp2[t][|t|]$$$. (4)

This trick is important. You can see that $$$dp2$$$ is irrelevant to the original formula. $$$dp1$$$ could be calculated in $$$O(n^2|t|)$$$ time (we need to brute force the length). You might think the computation of $$$dp2$$$ is hard, but we have a powerful tool for it: The SOSDP! Here is a super good tutorial for SOSDP: https://codeforces.net/blog/entry/45223. If you don't grasp SOSDP, this problem might be hard for you, because SOSDP costs $$$O(2^{|t|}|t|^2)$$$ times to solve it while the naive method costs $$$O(3^{|t|}|t|)$$$! A clarification: The tutorial says SOSDP is $$$O(2^{|t|}|t|)$$$ and the naive method is $$$O(3^{|t|})$$$. This is definitely right. But in this problem, we need to brute force the length, so there is another $$$|t|$$$ in both the SOSDP and the naive methods.

Briefly speaking, SOSDP solves this problem in $$$O(|t|2^{|t|})$$$ time, and this is almost exactly the Eq. (3) in the definition of $$$dp2$$$:

Given $$$G$$$ that maps $$$2^{|t|}$$$ bitmasks to real numbers, calculate $$$F$$$:

$$$F(mask) := \sum\limits_{submask \subseteq mask} G(submask)$$$.

The code is brief:

memcpy(F, G, sizeof(G)); 
for(int i = 0; i < t; ++i) for(int mask = 0; mask < (1<<t); ++mask){ //O(t2^|t|)
	if(mask & (1<<i)) F[mask] += F[mask^(1<<i)];
}

We also note that SOSDP could be used in place, if we don't need the original G after computation. That means the memcpy step could be omitted. The below algorithm is also correct, because when we calculate G[mask], all the submasks of mask are correctly calculated.

//Here: Original G
for(int i = 0; i < t; ++i) for(int mask = 0; mask < (1<<t); ++mask){ //O(t2^|t|)
	if(mask & (1<<i)) G[mask] += G[mask^(1<<i)];
}
//now G is the F and the original G gets lost!

(3)Overall complexity:

$$$O(|t|(n^2 + 2^{|t|}|t| + |q|))$$$. Eq. (2) is $$$O(|t|n^2)$$$, Eq. (3) is $$$O(|t|^22^{|t|})$$$. For each query, we need $$$O(|t|)$$$ times to convert the query string into a $$$32$$$-bit integer. The three parts make the whole complexity.

(4)Submission: here.

Aveiro_quanyue's blog