#	User	Rating
1	tourist	3856
2	jiangly	3747
3	orzdevinwang	3706
4	jqdai0815	3682
5	ksun48	3591
6	gamegame	3477
7	Benq	3468
8	Radewoosh	3462
9	ecnerwala	3451
10	heuristica	3431

#	User	Contrib.
1	cry	167
2	-is-this-fft-	162
3	Dominater069	160
4	Um_nik	158
5	atcoder_official	157
6	Qingyu	156
7	adamant	151
7	djm03178	151
7	luogu_official	151
10	awoo	146

This blog is about maintaining mst with online edge insertions (or msf — minimum spanning forest to be exact) by using a data structure I came up with in $$$O(\log n)$$$ for edge insertion. although this problem can solved quite easily for anyone who knows LCT (link-cut tree), coding LCT is not really practical and has really bad constants. The data structure I present has very good constants because as you'll see all it does is manipulation on arrays. For obvious reasons that will be shown later I have called this data structure "weighted DSU".

Warmup problem

The problem is stated as follows: you are given an undirected graph where each edge has a time $$$t_i$$$. you are asked to answer queries of the form "what is the smallest time t such that you can get between u and v with all edges in time $$$<= t$$$". This problem is analogous to calculating the msf of the graph where the weights are the times and being able to answer what is the maximum on the simple path between u and v. This problem can be solved a lot of different ways, I will present the dsu solution:
first we want to sort the edges by time/weight, than we will add the edges in a dsu like way but without path compression. Additionally we we will store on each edge in the dsu the weight of the edges it represents. To be specific we will store 3 arrays: parent, weight and size. when connecting two representatives u to v we update $$$parent[u] = v, weight[u] = w, size[v] += size[u]$$$. now we can find the representative of v at time w. One way to answer the queries will be to binary search on the answer, which will be $$$O(\log^2 n)$$$. A faster way is to essentially find the LCA of u and v in the dsu tree. this can be done by moving up with the vertex with current smallest out-edge weight. This will be $$$O(\log n)$$$.

The data structure

We will use the same idea from the warmup problem and try to turn it to a dynamic one. When adding an edge we want to think what would have happened if we had added the edges in increasing order, this will result in the following almost complete algorithm:
when adding an edge with weight w between u and v we can get the root of u at time w, get the root of v at time w and connect the one with smaller subtree size to the bigger. assume we connect u to v, this means we discarded the out-edge of u. To correct this we will recursively add the edge between u and parent[u] with weight weight[u] (adding the edge we deleted).

This algorithm has 3 problems:
1. The underlying increasing nature of the weight of the edges going up was what our original DS was relying on, now we may destroy it.
2. the array size is dynamic so we cant rely on it like we did in the warmup problem.
3. what if the vertices are already connected?

The first problem has a straightforward solution — when trying to access the parent of a vertex, jump over all the edges that have weight less than the edge to the parent and connect the vertex to the new parent. The get root function will now look like this:

int getRoot(int v, int w = INF - 1) {
	while (weight[v] <= w) {
		while (weight[parent[v]] <= weight[v])
			parent[v] = parent[parent[v]];
		v = parent[v];
	}
	return v;
}

The second problem can be solved in an elegant and simple way — instead of connecting by size connect by a random index! To be specific we create an array index that is initialized with random values. We connect the vertex with smaller index to the one with the bigger index. (this union technique is known to be with same expected complexity as union by size which is optimal complexity)
To solve the third problem we can use mst properties. when adding an edge {w, u, v} we find the "main" edge along the path between u and v — the edge with maximum weight (or an edge if there are multiple). if it is weight is bigger than w, than we delete it and add the edge with the algorithm described above. Putting all of this together we get the following DS:

code

struct weightedDSU {
	vector<int> parent, weight, index;
 
	weightedDSU(int n) : parent(n), weight(n, INF), index(n) {
		for (int i = 0; i < n; i++) {
			parent[i] = i;
			index[i] = rand();
		}
	}
 
	int getRoot(int v, int w = INF - 1) {
		while (weight[v] <= w) {
			while (weight[parent[v]] <= weight[v])
				parent[v] = parent[parent[v]];
			v = parent[v];
		}
		return v;
	}
 
	void addEdgeHelper(int u, int v, int w) {
		while (u != v) {
			u = getRoot(u, w);
			v = getRoot(v, w);
			if (index[u] < index[v])
				swap(u, v);
			int temp_weight = weight[v], temp_parent = parent[v];
			parent[v] = u;
			weight[v] = w;
			u = temp_parent;
			w = temp_weight;
		}
	}

	int mainEdge(int u, int v) {
		if (getRoot(u) != getRoot(v))
			return -1;
		while (parent[u] != v && parent[v] != u) {
			if (weight[u] < weight[v])
				u = parent[u];
			else
				v = parent[v];
		}
		if (parent[u] == v)
			return u;
		else
			return v;
	}
 
	void addEdge(int u, int v, int w) {
		if (u == v)
			return;
		int p = mainEdge(u, v);
		if (p == -1)
			addEdgeHelper(u, v, w);
		else if (weight[p] > w) {
			parent[p] = p;
			weight[p] = INF;
			addEdgeHelper(u, v, w);
		}
	}
};

Time complexity

While I do not have a formal proof for the time complexity, I can provide a rough sketch/intuition for it:
what our DS does is it tries to replicate the DS from the warmup problem. It does that pretty good but has minor defects as described in problem 2. My claim is that this defects constitute at most the depth of the tree in the warmup problem, which will still result in max depth of $$$O(\log n)$$$. Testing that I had done highly support this claim. If anyone has a formal proof please share it!
another advantage of this DS are the constants — while in theory LCT has $$$O(\log n)$$$ complexity, it has really bad constants and this data structure easily defeats it often time being 3-6 times faster.

Additional operations

Deleting maximal edge:
Assume that all weights in the graph are distinct. We will notice that the maximal edge in each connected component is necessary — it is a bottleneck edge. Therefore we can delete the maximal edge and it will break the component into two components. Deleting the edge wont create a problem because no other edge addition used it to "jump" up. The easiest way to deal with the problem if the weights are not distinct is to just make them distinct.

Saving connected component size:
For some problems we may want to maintain the size of each connected component in the msf, even after edge deletions. We can do this buy maintaining the subtree size of each node in the tree that is saved in our DS. that is, we maintain an array size such that for each node v size[v] equals the number of nodes are there such that v is a parent of them. The easiest and most elegant way I found doing it was to change the size array as if the the edges in the parent path from u and v were disconnected and when going through an edge add the size back.

code

struct weightedDSU {
	vector<int> parent, index, size, weight;
 
	weightedDSU(int n) : parent(n), weight(n, INF), index(n), size(n, 1) {
		for (int i = 0; i < n; i++) {
			parent[i] = i;
			index[i] = rand();
		}
	}
 
	int getRoot(int v, int w = INF - 1) {
		while (weight[v] <= w) {
			while (weight[parent[v]] <= weight[v]){
				size[parent[v]] -= size[v];
				parent[v] = parent[parent[v]];
			}
			v = parent[v];
		}
		return v;
	}
    
	void disconnect(int v){
		if(parent[v] == v)
			return;
		disconnect(parent[v]);
		size[parent[v]] -= size[v];
	}
    
	int connect(int v, int w = INF - 1){
		while(weight[v] <= w){
			size[parent[v]] += size[v];
			v = parent[v];
		}
		return v;
	}
    
	void addEdgeHelper(int u, int v, int w) {
		disconnect(u);
		disconnect(v);
		while (u != v) {
			u = connect(u, w);
			v = connect(v, w);
			if (index[u] < index[v])
				swap(u, v);
			int temp_weight = weight[v], temp_parent = parent[v];
			parent[v] = u;
			weight[v] = w;
			u = temp_parent;
			w = temp_weight;
		}
		connect(v);
	}
    
	int mainEdge(int u, int v) {
		if (getRoot(u) != getRoot(v))
			return -1;
		while (parent[u] != v && parent[v] != u) {
			if (weight[u] < weight[v])
				u = parent[u];
			else
				v = parent[v];
		}
		if (parent[u] == v)
			return u;
		else
			return v;
	}
    
	void addEdge(int u, int v, int w) {
		if (u == v)
			return;
		int p = mainEdge(u, v);
		if (p == -1)
			addEdgeHelper(u, v, w);
		else if (weight[p] > w) {
			int p1 = p;
			while(parent[p1] != p1){
				p1 = parent[p1];
				size[p1] -= size[p];
			}
			parent[p] = p;
			weight[p] = INF;
			addEdgeHelper(u, v, w);
		}
	}
};

Problems

Offline Dynamic Connectivity

solution

codeforces 1423 — problem H

solution

BOI 2020 Joker

solution

codeforces 76 — problem A

solution

codeforces 603 — problem E

solution

Comment if you can think of more uses of this DS

Comments (3)

Write comment?

Berryisbetter

9 months ago, # |

Cool!

→ Reply

OtterZ

7 weeks ago, # |

Not only the solution,there are already other $$$O(n\log n)$$$ solution:

use vector to store queries including nodes in the set and use heristic merging to solve it offline.
while merging,make a new node and make its children be the node of the two set,known as kruskal reconstructed tree.which can solve the text online.

not_amir

7 weeks ago, # ^ |

I am not sure what algorithm you are exactly talking about but it is possible to solve the mentioned problem using offline techniques (I cannot think of an easy offline algorithm that has O(n*log(n)) complexity), but the main usage of the DS is when using it online. Most of the problems I mentioned need the algorithm to be online and since writing the post I encountered a lot of problems which are solvable using the DS and all need the DS to be online(One big application of the DS is doing undo priority queue in O(log(n)) on dsu).
You are correct to note that what I described has simmilarities with krusakl reconstruction tree, espically the warmup problem (which krusakl reconstruction tree can solve), but the rest of what I dsecribed cannot be done with krusakl reconstrucion tree since it doesnt support edge insertions. I highly advise you to read the post again in order to understand what my DS supports.

not_amir's blog

Warmup problem

The data structure

Time complexity

Additional operations

Problems