Skip to content

Recursive CTEs and Hierarchical Data

Difficulty: Advanced
Generated on: 2025-07-10 02:29:51
Category: SQL Cheatsheet for Database Development


SQL Cheatsheet: Recursive CTEs and Hierarchical Data (Advanced)

Section titled “SQL Cheatsheet: Recursive CTEs and Hierarchical Data (Advanced)”

What is it? A Recursive Common Table Expression (CTE) is a CTE that references itself. It’s used to query hierarchical data, where rows have parent-child relationships. Instead of knowing the depth of the hierarchy beforehand, a recursive CTE can traverse the entire structure.

When to use it?

  • Organizational charts (employees reporting to managers)
  • Bill of Materials (components making up products)
  • Category hierarchies (website navigation)
  • Social network connections (friends of friends)
  • Pathfinding algorithms (finding routes between locations)
WITH RECURSIVE cte_name AS (
-- Anchor Member (Base Case): Selects the starting rows
SELECT
column1,
column2,
...
parent_id, -- Crucial for linking
other_columns
FROM
your_table
WHERE
parent_id IS NULL -- Or some other condition for the root nodes
UNION ALL -- Or UNION DISTINCT depending on requirements
-- Recursive Member: References the CTE itself
SELECT
t.column1,
t.column2,
...
t.parent_id,
other_columns
FROM
your_table t
JOIN
cte_name c ON t.parent_id = c.id -- The recursive join condition
)
SELECT * FROM cte_name;
  • WITH RECURSIVE: Specifies that the CTE is recursive. Required in most databases.
  • cte_name: The name of your CTE.
  • Anchor Member: The initial SELECT statement. It identifies the root(s) of the hierarchy. Crucially, it must select columns used in the recursive join and any other columns you want to return in the final result.
  • UNION ALL / UNION DISTINCT: Combines the results of the anchor and recursive members. UNION ALL is usually faster but allows duplicates. UNION DISTINCT removes duplicates (slower).
  • Recursive Member: The SELECT statement that references the CTE itself. It uses a JOIN to connect parent rows to child rows. This is the core of the recursion. The JOIN condition is essential.
  • t.parent_id = c.id: This is the typical join condition, linking the parent_id column in the table t to the id (or whatever the primary key is) column of the CTE c.
  • SELECT * FROM cte_name;: The final SELECT statement retrieves the results from the CTE.
  • Organizational Hierarchy: Displaying employees and their managers.
  • Category Tree: Listing categories and subcategories on an e-commerce site.
  • Bill of Materials (BOM): Showing the components needed to build a product.
  • Network Topology: Mapping connections between devices in a network.
  • Performance:
    • Indexing: Ensure proper indexing on the parent_id and id (or primary key) columns used in the recursive join. This dramatically improves performance.
    • Anchor Member Optimization: Make the anchor member as efficient as possible. It’s the starting point, so a slow anchor member will slow down the entire CTE.
    • UNION ALL vs. UNION DISTINCT: Use UNION ALL unless you absolutely need to remove duplicates. UNION ALL is significantly faster.
    • Materialization: Be aware that some databases may materialize the CTE (store its results in a temporary table). This can impact performance.
    • MAXRECURSION (SQL Server): Consider using the OPTION (MAXRECURSION n) hint in SQL Server to limit the recursion depth and prevent infinite loops. Other databases have similar mechanisms (see Database Variations).
  • Clarity:
    • Meaningful Column Names: Use descriptive column names in the CTE.
    • Comments: Add comments to explain the purpose of each part of the CTE.
    • Formatting: Properly format the SQL code for readability.
  • Security:
    • Input Validation: If the parent_id is based on user input, validate it to prevent SQL injection vulnerabilities.
    • Access Control: Ensure that users only have access to the data they are authorized to see.

Sample Data (employees):

CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(255),
manager_id INT,
title VARCHAR(255)
);
INSERT INTO employees (id, name, manager_id, title) VALUES
(1, 'Alice Smith', NULL, 'CEO'),
(2, 'Bob Johnson', 1, 'VP Engineering'),
(3, 'Charlie Brown', 2, 'Software Engineer'),
(4, 'David Lee', 2, 'Software Engineer'),
(5, 'Eve Davis', 1, 'VP Sales'),
(6, 'Frank Miller', 5, 'Sales Representative'),
(7, 'Grace Wilson', 5, 'Sales Representative');

SQL Query:

WITH RECURSIVE employee_hierarchy AS (
-- Anchor Member: Select the CEO (no manager)
SELECT
id,
name,
manager_id,
title,
1 AS level -- Add a level column to track depth
FROM
employees
WHERE
manager_id IS NULL
UNION ALL
-- Recursive Member: Join employees to their managers in the CTE
SELECT
e.id,
e.name,
e.manager_id,
e.title,
eh.level + 1 AS level
FROM
employees e
JOIN
employee_hierarchy eh ON e.manager_id = eh.id
)
SELECT * FROM employee_hierarchy ORDER BY level, name;

Sample Output:

idnamemanager_idtitlelevel
1Alice SmithNULLCEO1
2Bob Johnson1VP Engineering2
5Eve Davis1VP Sales2
3Charlie Brown2Software Engineer3
4David Lee2Software Engineer3
6Frank Miller5Sales Representative3
7Grace Wilson5Sales Representative3

Sample Data (categories):

CREATE TABLE categories (
id INT PRIMARY KEY,
name VARCHAR(255),
parent_id INT
);
INSERT INTO categories (id, name, parent_id) VALUES
(1, 'Electronics', NULL),
(2, 'Computers', 1),
(3, 'Laptops', 2),
(4, 'Desktops', 2),
(5, 'Peripherals', 2),
(6, 'Mobile Phones', 1),
(7, 'Smartphones', 6),
(8, 'Accessories', 6);

SQL Query:

WITH RECURSIVE category_tree AS (
-- Anchor Member: Select top-level categories
SELECT
id,
name,
parent_id,
'/' || name AS path -- Build a path for each category
FROM
categories
WHERE
parent_id IS NULL
UNION ALL
-- Recursive Member: Join categories to their parents
SELECT
c.id,
c.name,
c.parent_id,
ct.path || '/' || c.name AS path
FROM
categories c
JOIN
category_tree ct ON c.parent_id = ct.id
)
SELECT * FROM category_tree ORDER BY path;

Sample Output:

idnameparent_idpath
1ElectronicsNULL/Electronics
2Computers1/Electronics/Computers
4Desktops2/Electronics/Computers/Desktops
3Laptops2/Electronics/Computers/Laptops
5Peripherals2/Electronics/Computers/Peripherals
6Mobile Phones1/Electronics/Mobile Phones
8Accessories6/Electronics/Mobile Phones/Accessories
7Smartphones6/Electronics/Mobile Phones/Smartphones

Sample Data (parts):

CREATE TABLE parts (
part_id INT PRIMARY KEY,
part_name VARCHAR(255),
parent_part_id INT,
quantity INT
);
INSERT INTO parts (part_id, part_name, parent_part_id, quantity) VALUES
(1, 'Car', NULL, 1),
(2, 'Engine', 1, 1),
(3, 'Chassis', 1, 1),
(4, 'Wheel', 1, 4),
(5, 'Piston', 2, 4),
(6, 'Crankshaft', 2, 1),
(7, 'Tire', 4, 1),
(8, 'Rim', 4, 1);

SQL Query:

WITH RECURSIVE bom AS (
-- Anchor Member: Select the top-level part (Car)
SELECT
part_id,
part_name,
parent_part_id,
quantity,
1 AS level
FROM
parts
WHERE
parent_part_id IS NULL
UNION ALL
-- Recursive Member: Join parts to their parent parts
SELECT
p.part_id,
p.part_name,
p.parent_part_id,
p.quantity,
b.level + 1 AS level
FROM
parts p
JOIN
bom b ON p.parent_part_id = b.part_id
)
SELECT * FROM bom ORDER BY level, part_name;

Sample Output:

part_idpart_nameparent_part_idquantitylevel
1CarNULL11
2Engine112
3Chassis112
4Wheel142
6Crankshaft213
5Piston243
8Rim413
7Tire413
  • Infinite Recursion: The most common problem. If the recursive member doesn’t eventually lead to a termination condition (e.g., reaching the root of the hierarchy), the CTE will loop indefinitely. This will usually cause the query to time out or throw an error. Double-check the join condition and ensure there’s a path to a root node. Use MAXRECURSION (SQL Server) or similar mechanisms to limit the recursion depth.
  • Incorrect Join Condition: The JOIN condition in the recursive member is critical. If it’s wrong, the CTE will produce incorrect results or loop indefinitely. Make sure you’re joining the parent_id to the correct id column in the CTE.
  • Missing Anchor Member: Forgetting the anchor member will result in an empty result set or an error. The anchor member is the starting point for the recursion.
  • Performance Issues: Recursive CTEs can be slow, especially on large datasets. Use indexing, optimize the anchor member, and consider alternative approaches (e.g., iterative queries) if performance is critical.
  • Duplicates with UNION ALL: If your data contains cycles or other issues, UNION ALL can produce duplicate rows. Use UNION DISTINCT to remove duplicates, but be aware that it’s slower. Investigate the data to understand why duplicates are occurring.
  • Stack Overflow: Very deep hierarchies can sometimes cause a stack overflow error. This is less common but can occur in some database systems. Consider limiting the recursion depth or using an iterative approach.
  • SQL Server:

    • Uses OPTION (MAXRECURSION n) to limit recursion depth. MAXRECURSION 0 means no limit.
    • UNION ALL and UNION are supported.
    • Commonly used for querying Active Directory hierarchies.
    WITH RECURSIVE employee_hierarchy AS (
    -- ... CTE definition ...
    )
    SELECT * FROM employee_hierarchy
    OPTION (MAXRECURSION 10); -- Limit recursion to 10 levels
  • PostgreSQL:

    • Supports WITH RECURSIVE.
    • Uses search breadth first set order_col or search depth first set order_col to control the order of traversal (breadth-first or depth-first). This is useful for pathfinding and other algorithms.
    • UNION ALL and UNION are supported.
    • Can be used with MATERIALIZED to store the CTE’s result in a temporary table for faster access.
    WITH RECURSIVE category_tree AS (
    -- ... CTE definition ...
    )
    SELECT * FROM category_tree
    SEARCH BREADTH FIRST BY id SET ordering; -- Example of breadth-first search
  • MySQL:

    • Prior to MySQL 8.0: Recursive CTEs were not supported. You had to use alternative approaches (e.g., stored procedures, iterative queries, or application-level logic). This was a major limitation.
    • MySQL 8.0 and later: Recursive CTEs are supported using the standard WITH RECURSIVE syntax.
    • UNION ALL and UNION DISTINCT are supported.
    • SET max_sp_recursion_depth = n; can be used to limit recursion depth, but it’s a server-level setting and affects all stored procedures and functions.
    -- MySQL 8.0+ example
    WITH RECURSIVE employee_hierarchy AS (
    -- ... CTE definition ...
    )
    SELECT * FROM employee_hierarchy;
    -- To limit recursion depth (use with caution):
    -- SET max_sp_recursion_depth = 10;
  • Oracle:

    • Oracle doesn’t directly support WITH RECURSIVE syntax. Instead, it uses the CONNECT BY clause for hierarchical queries. This is not a CTE.
    • CONNECT BY PRIOR is used to specify the parent-child relationship.
    • START WITH specifies the root node(s).
    • LEVEL is a pseudocolumn that indicates the depth of the hierarchy.
    • While CONNECT BY provides similar functionality, it’s not as flexible or powerful as recursive CTEs in other databases.
    SELECT
    id,
    name,
    manager_id,
    title,
    LEVEL
    FROM
    employees
    START WITH manager_id IS NULL
    CONNECT BY PRIOR id = manager_id
    ORDER SIBLINGS BY name;

This cheatsheet provides a solid foundation for understanding and using recursive CTEs in SQL. Remember to tailor the examples and best practices to your specific database system and data model. Good luck!