juruo_lzy's blog

By juruo_lzy, history, 11 months ago, In English
  • Given two strings $$$s,t$$$, write $$$r_i$$$ for the string obtained by inserting $$$t$$$ into $$$s_i$$$ and $$$s_{i+1}$$$. If $$$i=0$$$, insert at the beginning, if $$$i=|s|$$$, insert at the end.

  • Formally, $$$r_i=\overline{s_1\dots s_it_1\dots t_{|t|}s_{i+1}\dots s_{|s|}}$$$.

  • There are $$$q$$$ queries, each of which gives $$$l,r,k,x,y$$$. You need to find the $$$i$$$ with the smallest $$$r_i$$$ dictionary order among all $$$i$$$ satisfying $$$i\in[l,r]$$$ and $$$i\bmod k\in[x,y]$$$. If there is more than one such $$$i$$$, the answer is the smallest $$$i$$$. Or report no solution.

  • $$$|s|,|t|,q\le 10^5$$$.

  • $$$\text{2 s / 256 MB}$$$.

The approach to this problem is roughly put together in two parts.

$$$\bf{Part\space1}$$$ Sorting.

Consider sorting all the binary groups $$$(r_i,i)$$$ as required by the question. First splice $$$s$$$ and $$$t$$$ into a large string $$$S$$$. It is easy to see that any $$$r_i$$$ is the $$$\mathcal{O}(1)$$$ substring of $$$S$$$ stitched together.

Consider the method of comparing string sizes by finding $$$\text{LCP}$$$ and comparing the next bit. Then we use a vector to store all the segments of a string, and match the string with a hash, one at a time, up to the segment endpoint. If they all match up to the end of the segment, then we move on to the next segment. Otherwise, compare the size of the mismatched positions.

Note that to multimode, I used the natural overflow and [] birthday.

If two strings are the same compare the size of $$$i$$$. The above procedure is then overloaded to less than and sorted using stable_sort.

The time complexity is $$$\mathcal{O}(|s|\log |s|\log (|s|+|t|))$$$ and the space complexity is $$$\mathcal{O}(|s|+|t|)$$$.

**$$$\bf{Part\space 2}$$$ Answer the query **

After doing $$$\text{Part 1}$$$, the answer is the one with the smallest rank after sorting among the $$$i$$$ satisfying the condition. So only the smallest value of the ranking is required.

Seeing the restriction with $$$\bmod$$$, consider threshold partitioning. Let the threshold be $$$B$$$.

  • $$$k>B$$$.

We make $$$i=uk+v$$$. At this point $$$0\le u\le \dfrac{|s|}{B}$$$. Enumerating $$$u$$$ gives us $$$i$$$ that satisfies the condition within $$$\mathcal{O}\left(\dfrac{|s|}{k}\right)$$$ intervals. Enumerate these intervals for querying, using the ST table to maintain interval minima.

  • $$$k\le B$$$

We consider solving for each $$$k$$$ individually. At this point $$$0\le i\bmod k<k\le B$$$. That is, there will only be $$$\mathcal{O}(B)$$$ remainders. Then for each remainder $$$v$$$, the rankings of the positions where $$$i\bmod k=v$$$ are placed in order in an array, and the ST table is maintained for these arrays. It is easy to see that for a particular remainder, the position $$$i$$$ within $$$[l,r]$$$ is also a continuous interval in the new array. And the total length of these arrays is $$$\mathcal{O}(|s|)$$$. Since there will only be $$$\mathcal{O}\left(\dfrac{|s|}{k}\right)$$$ positions for each kind of remainder, multiplying by $$$k$$$ is $$$\mathcal{O}(s)$$$. Then the time to build the ST table for these arrays is $$$\mathcal{O}(|s|\log |s|)$$$.

Then iterate through each query, enumerating each remainder in $$$[x,y]$$$, at which point the enumeration is $$$\mathcal{O}(B)$$$. Then get the interval of $$$i$$$ in $$$[l,r]$$$ in the array corresponding to the current remainder, and query the interval minimum.

The time complexity is $$$\mathcal{O}\left(\dfrac{q|s|}{B}+|s|B\log|s|+qB\log|s|\right)$$$ and the space complexity is $$$\mathcal{O}(|s|\log |s|)$$$.

The time complexity is $$$\mathcal{O}\left((q+|s|)\sqrt{\dfrac{|s|}{\log |s|}}\right)$$$ when $$$B=\mathcal{O}\left(\sqrt{\dfrac{|s|}{\log |s|}}\right)$$$ is taken.

If the ST table is replaced with the method of four Russians, the time complexity $$$\mathcal{O}\left((q+|s|)\sqrt{|s|\log |s|}\right)$$$ can be achieved, but it is not necessary.

The bottleneck under easy-to-know limit data is $$$\text{Part 2}$$$, which is an acceptable time complexity.

AC code

Full text and comments »

  • Vote: I like it
  • +11
  • Vote: I do not like it

By juruo_lzy, history, 12 months ago, In English
  • Given $$$n$$$ strings $$$s_1\sim s_n$$$, find how many pairs $$$(i,j)$$$, satisfy:
  • $$$1\le i,j\le n$$$.
  • $$$s_j$$$ is a true substring of $$$s_i$$$.
  • There exists no $$$k$$$ ($$$i,j,k$$$ two different) such that $$$s_j$$$ is a true substring of $$$s_k$$$ and $$$s_k$$$ is a true substring of $$$s_i$$$.
  • $$$n,\sum\limits_{i=1}^n|s_i|\le 10^6$$$. If $$$i\ne j$$$, then $$$s_i\ne s_j$$$.
  • $$$\text{2 s / 512 MB}$$$.

Suffix-order all strings by first collapsing them into one large string $$$S$$$. Consider enumerating $$$i$$$ to find out which $$$j$$$ will contribute.

Consider for each suffix $$$s_i$$$ of $$$s_i[x,|s_i|]$$$, find a longest string $$$s_y$$$ in $$$s_1\sim s_n$$$ that satisfies that it is a true prefix of $$$s_i[x,|s_i|]$$$, denoted as $$$l_{i,x}=|s_y|$$$. If no such $$$s_y$$$ can be found, $$$l_{i,x}$$$ is said to be meaningless. If some $$$l_{i,x}$$$ can appear in an equation or inequality for an operation, $$$l_{i,x}$$$ is meaningful if and only if $$$l_{i,x}$$$ is meaningful.

Construct a binary irreducible set $$$T_i$$$ where a binary $$$(u,v)$$$ appears in $$$T_i$$$ if and only if the following four conditions are satisfied:

  • $$$1\le u,v\le |s_i|$$$.
  • $$$l_{i,u}$$$ makes sense.
  • $$$v=u+l_{i,u}-1$$$.
  • There exists no $$$w\in[1,u)$$$ such that $$$u+l_{i,u}-1\le w+l_{i,w}-1$$$.

Call $$$s_j$$$ occurs in $$$T_i$$$ if and only if there exists $$$(u,v)\in T_i$$$ such that $$$s_i[u,v]=s_j$$$.

Construct another binary irreducible set $$$R_{i,j}$$$ where a binary $$$(u,v)$$$ occurs in $$$R_{i,j}$$$ if and only if the following two conditions are satisfied:

  • $$$1\le u\le v\le |s_i|$$$.
  • $$$s_i[u,v]=s_j$$$.

Then the binary $$$(i,j)$$$ in the original title satisfies the condition, if and only if $$$\boldsymbol{R_{i,j}\ne \varnothing}$$$ and $$$\boldsymbol{R_{i,j}\subseteq T_i}$$$.

$$$R_{i,j}\ne \varnothing$$$ Obviously, there would be no proof.

Proof:

  • Sufficiency:

Consider the converse, assuming that $$$k$$$ exists when $$$R_{i,j} \subseteq T_i$$$ such that $$$s_j$$$ is a substring of $$$s_k$$$ and $$$s_k$$$ is a true substring of $$$s_i$$$.

Let $$$s_i[a,b]=s_k$$$ and $$$s_k[c,d]=s_j$$$. Then $$$s_i[a+c-1,a+d-1]=s_j$$$. From the known conditions we get $$$(a+c-1,a+d-1)\in R_{i,j},T_i$$$.

If $$$l_{i,a+c-1}\ne d-c+1$$$, it does not match the third condition of $$$(a+c-1,a+d-1)\in T_i$$$.

Otherwise, at this point $$$c>1$$$. By the definition of $$$l_{i,a}$$$ it follows that $$$l_{i,a}\ge b-a+1$$$, i.e. $$$a+l_{i,a}-1\ge b$$$. But $$$a+c-1+l_{i,a+c-1}-1=a+c-1+d-c+1-1=a+d-1$$$. From $$$d\le b-a+1$$$ we can get $$$a+d-1\le b$$$. Inconsistent with the fourth condition of $$$(a+c-1,a+d-1)\in T_i$$$.

Therefore the assumption is not valid. When $$$R_{i,j} \subseteq T_i$$$ there must be no $$$k$$$ such that $$$s_j$$$ is a substring of $$$s_k$$$ and $$$s_k$$$ is a true substring of $$$s_i$$$.

  • Necessity:

Consider $$$(u,v)\in R_{i,j}$$$ but $$$(u,v)\notin T_i$$$. At this point $$$(u,v)$$$ must satisfy the first two conditions for some binary to be in $$$T_i$$$.

If $$$l_{i,u}\ne v-u+1$$$, then there is a longer string $$$s_y$$$ that is a prefix of $$$s_i[u,|s_i|]$$$. At this point $$$s_j$$$ is a true substring of $$$s_y$$$.

Otherwise, there must exist $$$w\in[1,u)$$$ such that $$$u+l_{i,u}-1\le w+l_{i,w}-1$$$. Show that there exists a string $$$s_y=s_i[w,w+l_{i,w}-1]$$$. At this point $$$s_y$$$ contains $$$s_j$$$ in the middle, i.e., $$$s_j$$$ is a true substring of $$$s_y$$$ (which is guaranteed to be true because $$$w\in[1,u)$$$).

So not satisfying $$$R_{i,j}\subseteq T_i$$$ must not satisfy the original condition, which is a necessary condition.

This conclusion is not enough, it is always impossible to derive the set and then enumerate the judgments.

Further reasoning yields that it is actually equivalent to $$$\boldsymbol{\sum\limits_{(u,v)\in T_i}P(s_i[u,v]=s_j)=|R_{i,j}|}$$$.

To distinguish between middle and Iverson brackets, $$$P(A)$$$ is used to indicate whether the $$$A$$$ proposition is true. $$$P(A)=1$$$ if and only if $$$A$$$ is true; $$$P(A)=0$$$ if and only if $$$A$$$ is false.

Why? It is not hard to find $$$\sum\limits_{(u,v)\in T_i}P(s_i[u,v]=s_j)=|T_i\bigcap R_{i,j}|\le |R_{i,j}|$$$. And $$$|T_i\bigcap R_{i,j}|=|R_{i,j}|$$$ is equivalent to $$$R_{i,j}\subseteq T_i$$$.

Then we only need to find $$$\sum\limits_{(u,v)\in T_i}P(s_i[u,v]=s_j)$$$ and $$$|R_{i,j}|$$$ for a $$$j$$$.

  • The former is solved:

First get $$$T_i$$$. Consider for each string $$$s_a$$$, which ranked suffix's it will contribute to. This suffix has to contain $$$s_a$$$, which is equivalent to both $$$|\text{LCP}|\ge |s_a|$$$. One can maintain the ST table of the $$$\text{height}$$$ array and then bisect it to get the ranking interval, letting the $$$l$$$ values in this interval take $$$\max$$$ for $$$|s_a|$$$. Line segment tree maintenance is sufficient.

It can then be swept from left to right to maintain the $$$u+l_{i,u}-1\le w+l_{i,w}-1$$$ maximum value of the prefix $$$pre$$$. Query at a single point in the line tree for the current suffix ranking that position worth to $$$l_{i,x}$$$. If $$$x+l_{i,x}-1>pre$$$, add $$$(x,x+l_{i,x}-1)$$$ to $$$T_i$$$.

Then use buckets to maintain the longest prefix of how many suffixes each of the strings in $$$s_1\sim s_n$$$ in $$$T_i$$$ is used as.

  • The latter is solved for:

Consider $$$s_j$$$ appearing as the prefix of some suffix. the same can be done to find the ranked interval of the suffixes that contain it. It then becomes a question of how many rankings in the interval are such that the suffix of that ranking comes from the string numbered $$$i$$$.

For each $$$i$$$, a vector is used to store the suffix occurrences from smallest to largest, bisecting the left and right endpoints $$$l,r$$$, and the answer is $$$r-l+1$$$.

This still requires enumerating $$$j$$$. But notice:

Since $$$R_{i,j}\ne \varnothing$$$, those $$$s_j$$$ that don't appear in $$$T_i$$$ must not contribute. Because at this point $$$|T_i\bigcap R_{i,j}|=0< |R_{i,j}|$$$. Only those $$$s_j$$$ that occur in $$$T_i$$$ need to be considered. And for each $$$u$$$, there will be only one $$$v$$$ such that $$$(u,v)\in T_i$$$, hence $$$|T_i|=\mathcal{O}(|s_i|)$$$. This results in a total of only $$$\mathcal{O}(|S|)$$$ times to be processed.

In order not to count the recalculation omissions, consider that for each string occurring in $$$T_i$$$, it is counted at the position of its first occurrence.

Finally, do this for each string in $$$s_1\sim s_n$$$ and you'll get the correct answer.

The space-time complexity is all $$$\mathcal{O}(|S|\log |S|)$$$.

AC code

Full text and comments »

  • Vote: I like it
  • +14
  • Vote: I do not like it