02/05/2021

[Java] Kosaraju's algorithm for finding strongly connected components in a graph

In a directed graph (not necessarily fully connected), a strongly connected component is a set of vertices whose edges allow each vertex in the set to reach all others.

A single node is always a SCC.

For example in the following graph:


       2             6 ->  7  
       ^ \           ^     /|  
      /   \          |    / |  
      /    \         |   /  |  
     /      v        |  v   v  
     1 <-  3 -> 4 -> 5 ->  8  

We have 4 SCCs:

  • 1,2,3
  • 4,
  • 5,6,7
  • 8

Kosaraju's algorithm gives us a O(V + E) solution provided we are using adjacency lists and our graph is represented by the list of vertices AND the list of edges.

The idea behind it relies on the following properties:

(1) inverting the direction of all edges in a graph yields the same graph in terms of connections:

 A -> B -> C  

is the same as:

 A <- B <- C  

And more interestingly:

 A <=> B  

remains the same, meaning that if two components can reach each other via some path, inverting the direction of all edges will not break this property.

(2) SCCs can be squashed and the overall information is preserved:

 A -> B  F -> G  
 ^    |   ^    |  
 |    v   |    v  
 D <- C -> E <- H  

If we squash the two subgraphs saying SCC1 = A,B,C,D and SCC2 = E,F,G,H we obtain:

 SCC1 -> SCC2  

which conveys the same high level information

We can then observe that only if a component is strongly connected we can visit all its nodes by traversing the graph in both directions, otherwise we would find a connection that no longer allows us to reach an already visited node.

Now the algorithm takes shape:

  1. perform a first DFS visit or BFS post-order visit of the graph, adding nodes to a stack. In the DFS case, if we add nodes to the stack just before returning from the call on that node, we obtain the desired ordering. If the graph is not fully connected, you can simply loop over all nodes in the graph and perform a DFS from there, tracking the visited ones. Whenever you would start a new DFS, verify first if that node wasn't already visited. Cost is O(V+E) since we skip already visited nodes
  2. reverse all edges in the graph. This is easy if you are tracking the edges in a separate structure, a bit more annoying if you are using adjacency lists. Combine both and you get the best result. Cost is O(V+E) since we need to reverse the edges in O(E) from the list of edges, then update the nodes with the new adjacency lists in O(V+E)
  3. for each node in the stack, perform a DFS, marking each visited node AND tracking them in two separate structures. Whenever the DFS terminates, all tracked nodes are part of one SCC. Both during the stack iteration and the DFS visit, if a visited node is encountered, it must be skipped. Cost is again O(V+E) since we skip already visited nodes

At the end you will have a collection of SCCs which represent all SCCs in the graph. Total additional space used is O(V) for the stack, set of visited nodes, and list of SCCs.

You can find my implementation of findStronglyConnectedComponentsKosaraju on my Gist along with some tests in KosarajuJTests. Additional structures used are InOutEdge, OutEdge, and Vertex. Since they are reused from previous exercises, you will see they contain information that is not necessary for this specific case, however I preferred not to duplicate everything all over again.

You can also check this tutorial video for a step by step explanation.

No comments:

Post a Comment

With great power comes great responsibility