Introduction to Disjoint Set Data Structure | DSU

Revision en1, by kartik8800, 2023-09-13 22:28:36

Introduction to Disjoint Set Data Strucutres

Hello Codeforces!

I recently read about Disjoint Sets from the book Introduction to Algorithms and wanted to share the learnings in a simplified manner.

So below article(and corresponding videos) is an attempt to create a good starting point for people who:

  1. want to learn about DSU
  2. practice some good DSU problems

Contents

  1. Introduction through an illustrative problem (video version: https://youtu.be/JDycPHW4kIs?si=WEp5Ft2jBWp9KnO2)
  2. Sample SPOJ Problem and Implementation (video version: https://youtu.be/O4w-aX5mSks?si=vr0XSbyUswcXx-Yv)
  3. Upcoming Practice Problems
  4. Some Applications and References

We will learn about disjoint set data structures and their operations union and find through a sample problem.

Illustrative Problem

Q: Given a undirected Graph(V, E), answer Q queries of the form (u, v) where answer to the query is True if there is a path from u to v, false otherwise. image

So for above graph:

  1. query(D,A) = true
  2. query(D,C) = true
  3. query(D,F) = false

Some solutions to above problem

Solution1:

  1. For each query (u, v), perform a BFS/DFS.
  2. Time Complexity will be O(Q*(V + E))
  3. In worst case graph can have order of V^2 Edges.
  4. So worst case complexity can be O(Q*V^2)

Solution2:

  1. perform 1 BFS/DFS per node of the Graph
  2. When BFS is done using node X, store all the nodes that can be visited from X.
  3. Per Query time is O(1) so overall O(Q)
  4. Preprocessing time is O(V * (V + E)) = O(V^3)
  5. Hence overall O(Q + V^3)

What is a Disjoint Set Data Structure?

  1. It is a collection of disjoint dynamic sets. D = {S1, S2, S3, …………, Sk}
  2. Each set has a Representative R and consists of some elements.
  3. Assume total elements is N: Size(S1) + Size(S2) + … + Size(Sk) = N

A disjoint set structure supports:

  1. MAKE-SET(X): Creates a new Set with only element X and representative X.
  2. FIND(X): Returns the representative of the set to which X belongs.
  3. UNION(X, Y): Unites the sets containing the elements X and Y into a single new Set. The representative of this set is usually either the representative of Sx or representative of Sy

Diagramatic Example of a disjoint set with total 8 elements and 3 sets:

image

From above:

  1. Find(H) = G
  2. Find(F) = F
  3. Find(B) =Find(E) = A

[Assume that A, G and F are representative elements their sets]

Using Disjoint Set DS for solving the problem

  1. Run MAKE-SET(u) for each of the V nodes in the graph.
  2. Run UNION(u,v) for each edge (u,v) in the graph.
  3. For each Query (u,v): a) If FIND(u) == FIND(v) then answer = true b) Else answer = false

Running 1. and 2. on sample graph constructs the Disjoint set data structure shown in diagram.

Time complexity for DSU solution

Overall Complexity is sum of:

  1. O(V * MAKE-SET)
  2. O(E * UNION) = O(V^2 * UNION)
  3. O(Q * FIND)

Disjoint Set — Linked List Implementation

image

  1. Each set is represented as a link list.
  2. The set has HEAD pointer to representative element and also a TAIL pointer.
  3. Each element of the set has a back-pointer to the set.

Complexity Analysis for link list implementation

  1. Make Set is O(1) -> only need to create a new set with 1 element
  2. Find Set is O(1) -> thanks to back pointers
  3. Union is length of the longer set -> no-thanks to back pointers(all of 2nd set element back-pointers need to be updated to 1st set)

Note: For a total of N elements in the collection there will be at most N-1 union operations as post that all elements will be in the same set.

Worst Case cost of Union is when:

  1. All sets have size 1.
  2. 1st union we unite two sets of size 1 and get a set of size 2 -> cost is 1 back pointer change.
  3. 2nd time we unite a set of size 1 with a set of size 2 -> cost is 2 back pointer change.
  4. ith time we unite a set of size 1 with a set of size i -> cost is i back pointer change.
  5. Overall cost over n-1 union operations is 1 + 2 + 3 + .. + n-1 = O(N^2)

Hence union is still O(N) in the worst case.

Weighted Union Heuristic for link list Implementation

While performing union(x,y):

  1. Always take smaller set and attach it the larger set.
  2. Need to maintain size of set for each set(which should be easy)

Complexity analysis: Union is now O(logN), but why?

  1. The cost of a union operation is the cost of changing back pointers of the elements in the smaller set.
  2. Say we change the back pointer of an Element X belonging to Sx, the resulting set will have at least 2 * Sx elements.(since X belong to smaller set and hence it's backpointer was updated)
  3. If back pointer of X is changed K times there need to be >= (2^K) * Sx elements
  4. K can be at most log(N) as we only have N elements.
  5. hence for a given element we can change the back-pointer at most logN times and overall cost <= NlogN

Revisiting the sample problem

Worst Case complexity of Graph problem has now Improved :)

  1. O(V * MAKE-SET) = O(V)
  2. O(E * UNION) = O(V^2 * UNION) = O(V^2 * logV) = O(V^2 * logV)
  3. O(Q * FIND) = O(Q)

So O(V^2 * logV) instead of O(V^3)

Disjoint Set — Forest Implementation

image

  1. Each set is represented as a tree.
  2. Each element is a node of the tree and maintains a pointer to it's parent in the tree.
  3. The representative element is the parent of itself.

Find(X) = X if parent[X] = X else Find(X) = Find(parent[X])

Forest Implementation — Time Complexities

We may still end up getting a chain :(

Worst case complexities:

  1. UNION is O(1) * O(FIND) in worst case(only need to change parent pointer of one representative to another, problem is finding the representative using FIND)
  2. MAKE SET is O(1) in worst case(only need to create a set with 1 element which is it's own parent)
  3. FIND however is O(N) in the worst case(we may end up getting a link list)

Time Complexities with Heuristics

Heuristic: Union by Rank

While performing union always take the Set(tree) with less height and attach it to the set with greater height.

  1. Overall height after N-1 union will be order of LogN
  2. Hence ensuring Find is no worse than LogN

Heuristic: Path Compression When performing find operation, change the parent pointer of each node to the actual representative of the node. image

The time complexity when applying both heuristics together is:

  1. Make Set is O(1)
  2. Find Set is O(alpha(n))
  3. Union is amortised O(alpha(n))

What is alpha(n)?

  1. Where alpha is the inverse of Ackerman function Ak(1)
  2. Alpha(n) <= 4 for all N <= 16^512
  3. 16^512 >> 10^80
  4. 10^80 is the number of atoms in observable universe

Hence for all practical purposes alpha(n) = 4 = constant.

Proof is harder and omitted from scope of this article, refer Introduction To Algorithms by Thomas H. Cormen

Revisiting the sample problem

  1. Make Set is O(1)
  2. Find Set is O(1)
  3. Union is amortised O(alpha(n))

Worst Case complexity of Graph problem has now Improved :)

  1. O(V * MAKE-SET) = O(V)
  2. O(E * UNION) = O(V^2 * UNION) = O(V^2 * logV) = O(V^2 * alpha(V))
  3. O(Q * FIND) = O(Q)

Hence time complexity is now O(V^2 + Q) for all practical purposes.

SPOJ Problem — FRNDCIRC + Generic Implementation

Editorial

Upcoming Practice Problems

Currently I have planned the problem https://codeforces.net/problemset/problem/150/B and will be soon adding both a written and video editorial for the same.

Few other practice problems include: https://codeforces.net/blog/entry/55219?#comment-390897 (DSU tag). I will be using some of these to create more editorials.

If you have more suggestions please add in comments.

Applications and References

Some direct applications:

  1. Finding cycles in a graph
  2. Kruskals minimum spanning tree algorithm

Some references:

  1. https://www.youtube.com/@AlgosWithKartik
  2. Introduction to Algorithms Book
  3. CP algorithms: https://cp-algorithms.com/data_structures/disjoint_set_union.html

History

 
 
 
 
Revisions
 
 
  Rev. Lang. By When Δ Comment
en2 English kartik8800 2023-09-18 23:31:55 175 Tiny change: '1)$\n2. $\Alpha(n) <=' -> '1)$\n2. $\alpha(n) <='
en1 English kartik8800 2023-09-13 22:28:36 9218 Initial revision (published)