de_sousa's blog

By de_sousa, 6 months ago, In English

Motivation

I remember struggling to understand the Tarjan algorithm for Strongly Connected Components. The algorithm feels intuitive to me now and I no longer feel this struggle, but what I do still remember is feeling the lack of some good online resource to guide me through it. This is my attempt to write the article I wish I had read.

Directed DFS tree

Before we get to the problem, we have to first understand the most important concept: directed DFS trees. Understanding them will allow for the Tarjan algorithm to feel like a natural extension (and this article is actually about them, not about Tarjan).

A directed DFS tree is an explicit representation of a DFS traversal of a directed graph.

When a node is visited for the first time, it is added to the tree, either as a root if it was visited in the beginning of a DFS call, or as the child of the node it got visited from.

The edges that were used to visit a node for the first time will be considered as part of the tree and be called tree-edges.

Every other edge of the graph will be described in relation to this tree:

  • a forward-edge is an edge that connects a node to a descendant.
  • a back-edge is an edge that connects a node to an ancestor.
  • a cross-edge is any other edge.

Here is an example of the construction of a tree, with every kind of edge:

(Technically we are building a forest, since we can have multiple roots, but eh :P)

When we start a DFS call on a node, every node added to the tree before the call returns will be a descendant of that node.

A DFS call only finishes after the recursive call on all the children finishes. So every node on the subtree will be visited. Since we only call the DFS on unvisited nodes, we won't call again on any node on the subtree, so no new descendants will be added later.

Strongly connected components

A strongly connected component (SCC) of a graph is a maximal subset of nodes where every node reaches each other.

Every directed graph can be partitioned into SCCs.

Some properties related to DFS tree:

1. If a node and a descendant belong to the same SCC, all the nodes on the path between them also belong to the same SCC.

Explanation

2. Every SCC induces a connected subgraph of the DFS tree.

Explanation

This introduces the idea of a representative of a SCC. A representative of a SCC is the node of the SCC with lowest depth on the DFS tree; this node will be a common ancestor of all the nodes of the SCC.

Algorithm sketch

We have enough to sketch an algorithm to find the SCCs of a graph.

The idea is to do a DFS traversal of the graph keeping in mind the tree structure, and adding SCCs as we find their representatives.

We know a node isn't a representative if some node on its subtree reaches a node above it. So based on the edges we find, we send some signal indicating "the representative of this node is above you". If a node doesn't receive this signal, it is a representative.

When we find a represetantive, we will identify the nodes of its SCC and claim them as part of it. We know that every other SCC representative on its subtree was already found and claimed its own nodes. So we will claim every unclaimed node on its subtree.

Stack simplification

We could use a second function to claim these nodes, like so:

Code

To avoid referring explicitly to the tree, we can use a stack to keep track of the unclaimed nodes still on the subtree:

Code

Algorithm, first version

For the first version of the algorithm, we will explicitly build the tree.

The signal we will use will be the depth of the representative.

For every edge we find, we can derive some information on the depth of the representative:

  • forward-edges can be ignored, since every node reachable by a forward-edge can be reached by a sequence of tree-edges.
  • tree-edges will just be used to propagate the signal. If we know a child of a node is not a representantive, we know they will be on the same SCC, and the minimum depth of a node reachable by the child can be used to update the node's signal.
  • back-edges: If we have a back-edge from node $$$x$$$ to $$$y$$$, then we know that $$$x$$$ and $$$y$$$ belong to the same SCC, and that the depth of the representative of their SCC has to be at most the depth of $$$y$$$.
  • cross-edges: If we have a cross-edge connecting node $$$x$$$ to node $$$y$$$, two things can happen:
    • Node $$$y$$$ was already claimed. If this is the case, we know that the representative of node $$$y$$$ was already found, which means node $$$x$$$ and node $$$y$$$ don't belong to the same SCC.
    • Node $$$y$$$ is unclaimed. This means that no representative for $$$y$$$ was found, and the recursive call on every ancestor not in common with $$$x$$$ finished, continued on their lowest common ancestor, and after a sequence of recursive calls, reached node $$$x$$$. What is the conclusion? Since the representative of $$$y$$$ is at least as low as the lowest common ancestor, node $$$y$$$ is in the same SCC as the lowest common ancestor; the lowest common ancestor reaches $$$x$$$; $$$x$$$ reaches $$$y$$$. They all belong to the same SCC. We have to signal that the depth of the representative of $$$x$$$ is at most the depth of the lowest common ancestor.
Code

DFS order

We must now introduce the concept of DFS order.

We will call the DFS order of a node the order in which it got visited by the DFS.

We can make some observations relating to the corresponding DFS tree:

1. Ancestors have a lower DFS order than descendants.

Explanation

(Note: this directly implies that back-edges connect nodes to a node with a lower DFS order and that forward-edges connect nodes to a node with greater DFS order.)

2. The DFS order of all the nodes of a subtree form a contiguous range.

Explanation

3. Cross-edges connect nodes to nodes with a lower DFS order.

Explanation

Tarjan, at last

We can now use the concept of DFS order to build the final version of the algorithm.

The signal we will use will be the DFS order of the representative.

The reason why we previously had to explicitly build the tree was to calculate the depth of lowest common ancestors.

But we can do better if our signal is the DFS order: on cross-edges connecting nodes $$$x$$$ to $$$y$$$, all the ancestors of $$$x$$$ that are not ancestors of $$$y$$$ were visited after $$$y$$$, except all their common ancestors, which are the only possibilities for their representatives.

So, in short:

  • tree-edges and forward-edges are used similarly.
  • back-edges connecting node $$$x$$$ to node $$$y$$$, we know that the representative wasn't inserted after $$$y$$$.
  • cross-edges connecting node $$$x$$$ to node $$$y$$$ where $$$y$$$ is unclaimed, we know that the representative was inserted before $$$y$$$.

At last, we have the code:

Code
  • Vote: I like it
  • +199
  • Vote: I do not like it

»
5 months ago, # |
  Vote: I like it +18 Vote: I do not like it

Auto comment: topic has been updated by de_sousa (previous revision, new revision, compare).

»
5 months ago, # |
  Vote: I like it +16 Vote: I do not like it

Wonderful blog!

Just curious, what app did you use to make the drawings? they're really beautiful!

»
5 months ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

fun fact: if you start code golfing tarjans's scc algo you will achive code almost the same as code for finding two-edge-connected components

»
5 months ago, # |
  Vote: I like it +32 Vote: I do not like it

Good blog. Finally, someone writes up the blog I've been procrastinating on for the last 4 years...

  • »
    »
    5 months ago, # ^ |
      Vote: I like it +16 Vote: I do not like it

    Same here — I was going to write a tutorial on low-link with the dominator tree as one of the examples, but this blog kinda makes that redundant.

»
5 months ago, # |
  Vote: I like it +16 Vote: I do not like it

Nice blog! There's one more thing I want to know is how can we get the topo order in Tarjan?

  • »
    »
    5 months ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    You can just push the representatives into a list/vector when you find them and that gives you a reversed topological order.

    For the explanation, what are the properties we would want?

    We know that each node reaches all other nodes on their respective subtree. This means that all representatives reach all the other representatives on their subtree, meaning the respective SCC has to appear before the others' on the topological order. The representatives on the subtree were found first and therefore were already pushed into the said list/vector.

    The only thing missing is the reachability through cross-edges, and all the representatives of nodes reachable in this way were already found, and therefore were already pushed into the list/vector.

    If you want to submit code, you can do it here: https://judge.yosupo.jp/problem/scc

    If you want an implementation, here is mine in C++: https://judge.yosupo.jp/submission/224416