Given an empty graph of n vertices and q queries.
Each queries is in the form of (x, y, l) which means you add edges (x + i, y + i) for 0 <= i <= l — 1
Report the number of connected components after q queries.
If I'm not mistaken, I remember there's a trick involving creating log2(n) DSU but I can't make out the details.
Can someone help? Thanks.
N <= 1e5
Q <= 1e5
Can someone help?
I know the trick from problems like https://codeforces.net/gym/104030/problem/K
Thank you for asking this, now I have better understanding of the solution. Here goes an explanation that fits the way that you asked for:
Change your query to make the substring in [x, x+l) equal to the substring in [y, y+l) to make it fit the context of the problem that I linked. Then, we split a query into queries of l power of 2. For example, for l = 10 we split it into (x, y, 8) and (x+8, y+8, 2).
We keep one dsu per power of 2. For a dsu of the power of 2 2^p, if x is in the same component as y, then the substring [x, x+2^p) is the same as the substring [y, y+2^p). Note: this is the logical if, not if and only if.
For a query (x, y, 2^p) we try to connect x and y in the dsu for 2^p. If they're connected then nothing changed. If they're not connected, then when we connect them all substrings in this connected component must be the same. Since before connecting both components are already ok, then by simply making [x, x+2^p) and [y, y+2^p) the same it already accounts for everything. In order to do that, we propagate it into a lower dsu, by splitting (x, y, 2^p) to (x, y, 2^(p-1)) and (x+2^(p-1), y+2^(p-1), 2^(p-1)). Solve those in the same way as this paragraph recursively. In the end, that propagates the changes that the query demands up until the dsu of 2^0, and you use that dsu to answer the queries.
About the complexity of this, we make a recursive call only when we connect unconnected components. Since that happens O(N) times per dsu, the recursive call happens O(NlogN) times. Then the complexity is O(NlogN) dsu operations and O(QlogN) for splitting queries into powers of 2.
My intuition guesses that there's some way to also change the "if" to "if and only if" by propagating things up, removing the necessity of a segment tree of hashes in the problem that I linked. If someone figures out how to do so, please explain it here :)
Regarding the linked problem, you can propagate things up lazily during queries. The update procedure doesn't change. Suppose we want to check if substrings $$$[x,x+2^p)$$$ and $$$[y,y+2^p)$$$ are known to be equal. We do this as follows:
The important thing is connecting x and y in step (4): this way we pay for branching on children. You can convince yourself that each query amortizes to $$$\mathcal{O}(\log n)$$$ recursive calls. Shameless promotion: this is one of ideas used in my master's thesis.
Also one simplification: we can split query ranges into two overlapping ranges, like you do in sparse tables for RMQ. For example, for $$$[x,x+14)$$$ we split into $$$[x,x+8)$$$ and $$$[x+6,x+14)$$$.
I realized shortly after editting that I'm able to maintain the structure updated with O(log^3) updates by doing small to large merging and keeping some sets of (component of i, component of i+2^p). I also realized that lazily updating during queries most likely would work in O(logN) but I hadn't written down the proof. Good to see that I'm not insane lol. Still I wonder if there's some way to cut those logs for the non-lazy way.
Also a little fun fact for you: some people were discussing a problem in order to set it in the brazilian ICPC subregional and they ended up in "Modular Subset Sum, Dynamic Strings, and Zero-Sum Sets" which is referenced by your thesis. They might've ended up in your thesis as well, not sure since I wasn't in the discussion.