[Educational] DSU Study Notes (1)

Revision en1, by Black_Fate, 2023-01-18 06:43:03

Hello Codeforces!

Data Structure is a legendary subject that it's really hard for me write it down completely in a single blog, so let's discuss simple DSU first.

DSU is really easy for beginners to understand. Also, this blog talks about more than the simplest DSU.

Part 0 — How does DSU works?

Firstly, DSU is short for "Disjoint Set Union". Let's go to a simple task then.

There are $$$n$$$ elements, the $$$i$$$-th element is in the $$$i$$$-th set initially, and now you can do $$$m$$$ queries below: — M a b, to merge the set that $$$a$$$ belongs to and the set that $$$b$$$ belongs to into a new set. If $$$a$$$ and $$$b$$$'ve been in a set already, then do nothing. — Q a b, to tell if $$$a$$$ and $$$b$$$ are in the same set.

Let's go straight to the problem solution.

Solution:

Let's consider contructing a graph. Initially we have self-cycles for all vertecies, which means there are edges $$$i\to i$$$, and $$$\text{root} (i)=i$$$.

![](https://cdn.luogu.com.cn/upload/image_hosting/efo0j48x.png)

The graph above is the initial state of the graph.

Then if you do M a b operation, then we can match $$$\text{root} (a)$$$ with $$$\text{root} (b)$$$. Which means we can making a new edge $$$\text{root}(a)\to \text{root}(b)$$$.

![](https://cdn.luogu.com.cn/upload/image_hosting/l9dlz6p9.png)

The graph above is the state of the graph at some point, if we do M 1 3, then since the root of $$$1$$$ is $$$2$$$, the root of $$$3$$$ is $$$5$$$, so:

![](https://cdn.luogu.com.cn/upload/image_hosting/09yl8qiv.png)

The graph above is the state after M 1 3.

Note that the direction of the edges is not important.

Now we can write out a code to implement the algorithm above.

const int N = 1000005;
int fa[N];
void init(int n) {
    for(int i = 1; i <= n; i++) fa[i] = i;
} // the initial situation
int find(int x) {
    if(fa[x] == x) return x;
    return find(fa[x]);
} // get root(x)
bool check(int x, int y) {
    int fx = find(x), fy = find(y);
    return fx == fy;
} // query
void merge(int x, int y) {
    if(check(x, y)) return ;
    fa[find(x)] = find(y);
} // merge and make edge

Note that sometimes you may find that the algorithm is too slow to solve the problems with too large $$$n$$$. For example in an extreme case, that we do M 1 2, and then M 2 3, and then M 3 4, and so on. In this case, finally when you are doing the queries, the graph (or the forest) is degenerated into a list. In this case, if we are querying the bottom of the list, we will get the time complexity of $$$\mathcal O(n^2)$$$.

How to solve this problem in a better way? Let's consider making the traveling while we are find the root of the tree shorter, just point $$$i$$$ direct to $$$\text{root}(i)$$$, then the next time we find the root of the tree, we can find it in $$$\mathcal O(1)$$$ time.

The better find function:

int find(int x) {
    if(fa[x] == x) return x;
    return fa[x] = find(fa[x]);
} // get root(x)

Pay attention to the sentence above: fa[x] = find(fa[x]) returns the value of find(fa[x]) and it makes fa[x] become find(fa[x]). The first usage of it is to travel, the second is to note the root of the vertex, and finally to make the next queries of the same vertex $$$\mathcal O(1)$$$. After that, the graph becomes:

![](https://cdn.luogu.com.cn/upload/image_hosting/0sx94wep.png)

Which looks like a flower:

![](https://img1.baidu.com/it/u=239210027,1724697693&fm=253&fmt=auto&app=138&f=JPEG?w=280&h=186)

Picture taken from the Chinese searching website baidu.com.

So we can call it "a flower graph". Querying the root of the vertex in a flower graph takes $$$\mathcal O(1)$$$ time.

Part 1: Problems solved using dsu

ABC284C

For each edge x y, just do the operation merge(x, y). Then find all the roots of the vertecies, the count of the roots is the answer.

Part of my code:

int fa[100005];
int find(int x) { return fa[x] == x ? x : fa[x] = find(fa[x]); }
void merge(int x, int y) { fa[find(x)] = find(y); }
//----------------------------------------------------------------
namespace solution{int solve() {
    int n = read<int>(), m=  read<int>();
    for(int i = 1; i <= n; i++) fa[i] = i;
    for(int i = 1; i <= m;i ++) {
        int u = read<int>(), v = read<int>();
        merge(u, v);
    }
    set<int> s; //This is not really necessary.
    for(int i = 1; i <= n; i++) s.insert(find(i));
    write<int>(s.size(), '\n');
    return 0;
}}

Since $$$\text{root}(i)\in [1,n]$$$, then there is no need to use std::set to count the number of the roots, just use an array to count the number of thems is enough.

Extensions for ABC284C

If we are querying the answer of ABC284C when constructing the graph, call the answer cnt, then you can maintain the variable cnt. When merging the vertices, if the vertices've been in the same set already, then cnt remains unchanged. Otherwise, cnt = cnt - 1, since $$$2$$$ sets became $$$1$$$ set.

void merge(int x, int y) {
    if(check(x, y)) return ;
    fa[find(x)] = find(y), cnt--;
} // merge and make edge

Note that the initial value of cnt is the number of vertices, a.k.a. $$$n$$$.

USACO 2010 OCT Silver — Lake Counting

You are given a map of the farm. . stands for land, and W stands for water. Farmer John would like to figure out how many ponds have formed in his field. A pond is a connected set of squares with water in them, where a square is considered adjacent to all eight of its neighbors. Given a diagram of Farmer John's field, determine how many ponds he has.

For example, the map below:

10 12
W........WW.
.WWW.....WWW
....WW...WW.
.........WW.
.........W..
..W......W..
.W.W.....WW.
W.W.W.....W.
.W.W......W.
..W.......W.

Turns out that there are $$$3$$$ ponds.

Solution:

This task seems to have nothing to do with dsu. Then consider how to construct the graph.

If two W squares are adajacent to each other, then you can merge them. After these merging operations are done, we can find out that the answer of ABC284C is the answer we need. Note that you do not need to count the number of elements with ..

You can labelize the squares.

Code

Spoiler

Practice:

Try to turn the 2D problem into a 3D problem.

Code:

Spoiler

Part 2 — Time complexity of dsu algorithm

We usually call the time complexity of the algorithm $$$\mathcal O(\alpha (n))$$$ when there are $$$n$$$ elements to deal with.

What is $$$\alpha (n)$$$? We know that there is a famous function called ackman function, which is:

$$$A_{k}(j)=\left\{\begin{array}{ll} j+1 & k=0 \\ A_{k-1}^{(j+1)}(j) & k \geq 1 \end{array}\right.$$$

Part 3 — Another example on dsu

Let's think about a question: dsu can handle many different merging methods. What about splitting and deleting vertices? It's shown that dsu cannot handle splitting sets of vertices in a efficient time complexity. Let's see a simple example below.

Provincial selection contest in China, 2008 — Vertex deletion

Given a graph of vertices $$$1,2,3,\dots, n$$$. Then there are $$$q$$$ following operations:

  • Remove a vertex $$$x$$$ from the graph.

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en4 English Black_Fate 2023-01-19 07:59:44 1601
en3 English Black_Fate 2023-01-18 16:23:27 22
en2 English Black_Fate 2023-01-18 16:22:04 5783 (published)
en1 English Black_Fate 2023-01-18 06:43:03 11464 Initial revision (saved to drafts)