Strongly Connected Component

strongly-connected-component: The Ultimate Cheatsheet

1. Detailed Explanation

What is strongly-connected-component?

A strongly connected component (SCC) in a directed graph is a subgraph in which every vertex is reachable from every other vertex within the subgraph. In other words, for any two vertices ‘u’ and ‘v’ in an SCC, there exists a path from ‘u’ to ‘v’ and a path from ‘v’ to ‘u’.

Why is it important and what kind of problems does it solve?

SCCs are important because they help analyze the structure of directed graphs. Identifying SCCs can simplify complex graphs into smaller, more manageable components. This is useful in various applications, including:

Network analysis: Identifying clusters of highly interconnected nodes.
Compiler design: Analyzing dependencies between modules.
Web crawling: Understanding the link structure of websites.
Social network analysis: Finding groups of users with strong mutual connections.
Finding cycle dependencies: In dependency graphs, SCCs indicate circular dependencies.

Core concepts, underlying principles, and key terminology.

Directed Graph: A graph where edges have a direction, meaning the edge (u, v) is distinct from the edge (v, u).
Reachability: A vertex ‘v’ is reachable from vertex ‘u’ if there exists a path from ‘u’ to ‘v’.
Strongly Connected: Two vertices ‘u’ and ‘v’ are strongly connected if ‘v’ is reachable from ‘u’ and ‘u’ is reachable from ‘v’.
Strongly Connected Component (SCC): A maximal set of vertices in a directed graph such that for every pair of vertices ‘u’ and ‘v’ in the set, ‘u’ and ‘v’ are strongly connected. “Maximal” means that no other vertex can be added to the set without violating the strongly connected property.
Condensed Graph: A graph formed by contracting each SCC into a single vertex. The resulting graph is always a directed acyclic graph (DAG).
Tarjan’s Algorithm: A common and efficient algorithm for finding SCCs based on Depth-First Search (DFS).
Kosaraju’s Algorithm: Another algorithm for finding SCCs that uses two DFS traversals.
Index (or Discovery Time): In Tarjan’s algorithm, the order in which a node is visited during the DFS traversal.
Lowlink Value: In Tarjan’s algorithm, the lowest index of any node reachable from a given node through the DFS tree, including the node itself.

2. When to Use strongly-connected-component (and When Not To)

Identify problem patterns that suggest strongly-connected-component is a good fit.

Directed graphs with cycles: If the problem involves analyzing relationships or dependencies in a directed graph that may contain cycles, SCCs are likely relevant.
Finding groups of interconnected nodes: When the goal is to identify clusters or groups of nodes that are strongly related to each other.
Simplifying dependency graphs: When analyzing complex dependency relationships, SCCs can help reduce the graph’s complexity.
Problems involving reachability in directed graphs: If the problem requires determining if nodes are reachable from each other in a cyclical manner.

Discuss scenarios where a different data structure or algorithm would be more appropriate.

Undirected graphs: If the graph is undirected, connected components can be found using simpler algorithms like DFS or BFS. SCCs are specifically for directed graphs.
Acyclic graphs: If the graph is known to be acyclic (a DAG), topological sorting and other DAG-specific algorithms are often more efficient. SCC algorithms will still work, but are overkill.
Simple reachability queries in non-cyclical graphs: If you only need to check if one node is reachable from another in a non-cyclical directed graph (or undirected graph), a simple DFS or BFS starting from the source node would be sufficient. No need for SCC.
Problems focused on shortest paths: If the primary goal is to find the shortest path between two nodes, algorithms like Dijkstra’s or Bellman-Ford are more suitable (although SCC may still be useful as a preprocessing step to simplify the graph).

3. Core Algorithm / Data Structure Template

Here’s a pseudo-code template for Tarjan’s algorithm, a common method for finding SCCs:

function tarjanSCC(graph):
  index = 0 // Global counter for assigning indices
  stack = [] // Stack to keep track of nodes currently in the recursion stack
  indices = {} // Map node -> index
  lowlinks = {} // Map node -> lowlink value
  onStack = {} // Map node -> boolean (true if node is on stack)
  sccs = [] // List to store the identified SCCs

  function strongConnect(node):
    indices[node] = index
    lowlinks[node] = index
    index = index + 1
    stack.append(node)
    onStack[node] = true

    for neighbor in graph.neighbors(node):
      if node is not in indices: // Neighbor not yet visited
        strongConnect(neighbor)
        lowlinks[node] = min(lowlinks[node], lowlinks[neighbor])
      else if onStack[neighbor]: // Neighbor is on stack (back edge)
        lowlinks[node] = min(lowlinks[node], indices[neighbor])

    if lowlinks[node] == indices[node]:
      // Root of an SCC found
      scc = []
      while True:
        w = stack.pop()
        onStack[w] = false
        scc.append(w)
        if w == node:
          break
      sccs.append(scc)

  for node in graph.nodes:
    if node not in indices:
      strongConnect(node)

  return sccs

4. Code Implementations (Python, Java, C++)

Python

class Graph:
    def __init__(self, vertices):
        self.V = vertices
        self.graph = [[] for _ in range(vertices)]

    def add_edge(self, u, v):
        self.graph[u].append(v)

    def tarjan_scc(self):
        index = 0
        stack = []
        indices = [-1] * self.V  # -1 indicates not visited
        lowlinks = [-1] * self.V
        on_stack = [False] * self.V
        sccs = []

        def strong_connect(node):
            nonlocal index
            indices[node] = index
            lowlinks[node] = index
            index += 1
            stack.append(node)
            on_stack[node] = True

            for neighbor in self.graph[node]:
                if indices[neighbor] == -1:  # Not visited
                    strong_connect(neighbor)
                    lowlinks[node] = min(lowlinks[node], lowlinks[neighbor])
                elif on_stack[neighbor]:  # On stack (back edge)
                    lowlinks[node] = min(lowlinks[node], indices[neighbor])

            if lowlinks[node] == indices[node]:
                # Root of an SCC found
                scc = []
                while True:
                    w = stack.pop()
                    on_stack[w] = False
                    scc.append(w)
                    if w == node:
                        break
                sccs.append(scc)

        for node in range(self.V):
            if indices[node] == -1:
                strong_connect(node)

        return sccs

# Example usage:
if __name__ == "__main__":
    g = Graph(5)
    g.add_edge(0, 1)
    g.add_edge(1, 2)
    g.add_edge(2, 0)
    g.add_edge(1, 3)
    g.add_edge(3, 4)

    sccs = g.tarjan_scc()
    print("Strongly Connected Components:")
    for scc in sccs:
        print(scc)

Java

import java.util.*;

class Graph {
    private int V;
    private List<List<Integer>> graph;

    public Graph(int vertices) {
        V = vertices;
        graph = new ArrayList<>(V);
        for (int i = 0; i < V; i++) {
            graph.add(new ArrayList<>());
        }
    }

    public void addEdge(int u, int v) {
        graph.get(u).add(v);
    }

    public List<List<Integer>> tarjanSCC() {
        int index = 0;
        Stack<Integer> stack = new Stack<>();
        int[] indices = new int[V];
        int[] lowlinks = new int[V];
        boolean[] onStack = new boolean[V];
        List<List<Integer>> sccs = new ArrayList<>();

        Arrays.fill(indices, -1);
        Arrays.fill(lowlinks, -1);
        Arrays.fill(onStack, false);

        for (int node = 0; node < V; node++) {
            if (indices[node] == -1) {
                index = strongConnect(node, index, stack, indices, lowlinks, onStack, sccs);
            }
        }

        return sccs;
    }

    private int strongConnect(int node, int index, Stack<Integer> stack, int[] indices, int[] lowlinks, boolean[] onStack, List<List<Integer>> sccs) {
        indices[node] = index;
        lowlinks[node] = index;
        index++;
        stack.push(node);
        onStack[node] = true;

        for (int neighbor : graph.get(node)) {
            if (indices[neighbor] == -1) {
                index = strongConnect(neighbor, index, stack, indices, lowlinks, onStack, sccs);
                lowlinks[node] = Math.min(lowlinks[node], lowlinks[neighbor]);
            } else if (onStack[neighbor]) {
                lowlinks[node] = Math.min(lowlinks[node], indices[neighbor]);
            }
        }

        if (lowlinks[node] == indices[node]) {
            List<Integer> scc = new ArrayList<>();
            int w;
            do {
                w = stack.pop();
                onStack[w] = false;
                scc.add(w);
            } while (w != node);
            sccs.add(scc);
        }
        return index;
    }

    public static void main(String[] args) {
        Graph g = new Graph(5);
        g.addEdge(0, 1);
        g.addEdge(1, 2);
        g.addEdge(2, 0);
        g.addEdge(1, 3);
        g.addEdge(3, 4);

        List<List<Integer>> sccs = g.tarjanSCC();
        System.out.println("Strongly Connected Components:");
        for (List<Integer> scc : sccs) {
            System.out.println(scc);
        }
    }
}

C++

#include <iostream>
#include <vector>
#include <stack>

using namespace std;

class Graph {
private:
    int V;
    vector<vector<int>> graph;

public:
    Graph(int vertices) : V(vertices), graph(vertices) {}

    void addEdge(int u, int v) {
        graph[u].push_back(v);
    }

    vector<vector<int>> tarjanSCC() {
        int index = 0;
        stack<int> stack;
        vector<int> indices(V, -1); // -1 indicates not visited
        vector<int> lowlinks(V, -1);
        vector<bool> onStack(V, false);
        vector<vector<int>> sccs;

        function<void(int)> strongConnect = [&](int node) {
            indices[node] = index;
            lowlinks[node] = index;
            index++;
            stack.push(node);
            onStack[node] = true;

            for (int neighbor : graph[node]) {
                if (indices[neighbor] == -1) { // Not visited
                    strongConnect(neighbor);
                    lowlinks[node] = min(lowlinks[node], lowlinks[neighbor]);
                } else if (onStack[neighbor]) { // On stack (back edge)
                    lowlinks[node] = min(lowlinks[node], indices[neighbor]);
                }
            }

            if (lowlinks[node] == indices[node]) {
                // Root of an SCC found
                vector<int> scc;
                int w;
                do {
                    w = stack.top();
                    stack.pop();
                    onStack[w] = false;
                    scc.push_back(w);
                } while (w != node);
                sccs.push_back(scc);
            }
        };

        for (int node = 0; node < V; node++) {
            if (indices[node] == -1) {
                strongConnect(node);
            }
        }

        return sccs;
    }
};

int main() {
    Graph g(5);
    g.addEdge(0, 1);
    g.addEdge(1, 2);
    g.addEdge(2, 0);
    g.addEdge(1, 3);
    g.addEdge(3, 4);

    vector<vector<int>> sccs = g.tarjanSCC();
    cout << "Strongly Connected Components:" << endl;
    for (const auto& scc : sccs) {
        cout << "[";
        for (size_t i = 0; i < scc.size(); ++i) {
            cout << scc[i];
            if (i < scc.size() - 1) {
                cout << ", ";
            }
        }
        cout << "]" << endl;
    }

    return 0;
}

5. Complexity Analysis

Operation	Time Complexity	Space Complexity	Best Case	Average Case	Worst Case
Tarjan’s Algorithm	O(V + E)	O(V)	Graph is already a single SCC (or DAG)	Depends on graph structure; generally O(V+E)	Graph is a single large SCC
Kosaraju’s Algorithm	O(V + E)	O(V)	Graph is already a single SCC (or DAG)	Depends on graph structure; generally O(V+E)	Graph is a single large SCC

V: Number of vertices
E: Number of edges

Explanation:

Time Complexity: Both Tarjan’s and Kosaraju’s algorithms perform two Depth-First Searches (DFS) on the graph. DFS has a time complexity of O(V + E), where V is the number of vertices and E is the number of edges.
Space Complexity: The space complexity is dominated by the recursion stack used in DFS and the storage for visited/discovered nodes. In the worst case, the recursion stack can grow to the size of the number of vertices (V). The indices, lowlinks, and onStack arrays/maps each require O(V) space. Therefore, the overall space complexity is O(V).

6. Pro Tips, Tricks, and Common Pitfalls

Understanding Lowlink: The core of Tarjan’s algorithm lies in the lowlink value. It represents the earliest visited node that can be reached from the current node through the DFS subtree or a back edge. A good grasp of how lowlink is updated is crucial.
Stack Management: The stack is essential for keeping track of nodes that are potentially part of the current SCC being explored. Ensure that nodes are pushed onto the stack when visited and popped off only when an SCC is identified.
Back Edges vs. Cross Edges: Distinguish between back edges (edges to ancestors in the DFS tree) and cross edges (edges to nodes already visited but not ancestors). Only back edges update the lowlink value.
Initialization: Properly initialize indices, lowlinks, and onStack arrays/maps. Using -1 for unvisited nodes is a common practice.
Edge Cases: Consider edge cases like empty graphs, graphs with only one vertex, and graphs that are already strongly connected.
Choosing the Right Algorithm: Tarjan’s and Kosaraju’s algorithms are both viable. Tarjan’s is often slightly faster in practice, but Kosaraju’s can be easier to understand conceptually.
Pitfall: Incorrectly Updating Lowlink: A common mistake is to update the lowlink using the lowlink of the neighbor when you should be using the index of the neighbor (specifically for back edges).

7. Classic Problem Examples

Example: Find Critical and Pseudo-Critical Edges in Minimum Spanning Tree

Description: Given a weighted undirected connected graph with n vertices numbered from 0 to n - 1, and an array edges where edges[i] = [ai, bi, weighti] represents a bidirectional and weighted edge between nodes ai and bi. A minimum spanning tree (MST) is a subset of the graph’s edges that connects all vertices without cycles and with the minimum possible total edge weight. Find all the critical and pseudo-critical edges in the given graph’s minimum spanning tree (MST). An MST edge whose deletion from the graph would cause the MST weight to increase is called a critical edge. On the other hand, a pseudo-critical edge is that which can appear in some MSTs but not all. Note that you can return the indices of the edges in any order.

High-Level Approach:

While this problem doesn’t directly scream “SCC,” the bridge concept (critical edges) is closely related. Bridges in a graph are edges whose removal increases the number of connected components. Finding bridges in a directed graph is similar to finding SCCs, because you are finding edges that if removed will disconnect the graph.

Compute the MST: Use Kruskal’s or Prim’s algorithm to find the Minimum Spanning Tree (MST) of the graph.
Iterate through each edge: For each edge in the original graph:
- Check for Critical Edges: Remove the edge from the graph. Compute the MST weight of the resulting graph. If the MST weight is greater than the original MST weight (or the graph becomes disconnected), the edge is critical.
- Check for Pseudo-Critical Edges: Include the edge in the MST calculation from the beginning (force its inclusion). Compute the MST weight. If the MST weight is the same as the original MST weight, the edge is pseudo-critical.

The key insight is that after removing a critical edge, the nodes it connected will now be in different connected components. SCC algorithms are most useful when dealing with directed graphs, so to use the concepts of SCCs here, one could artificially make the undirected graph directed. However, for finding bridges (which is what’s related to critical edges), simpler algorithms for undirected graphs are more suitable.