Skip to content

Window Functions (ROW_NUMBER, RANK, LAG, LEAD)

Difficulty: Advanced
Generated on: 2025-07-10 02:29:28
Category: SQL Cheatsheet for Database Development


SQL Window Functions Cheatsheet (Advanced) - ROW_NUMBER, RANK, LAG, LEAD

Section titled “SQL Window Functions Cheatsheet (Advanced) - ROW_NUMBER, RANK, LAG, LEAD”

Window functions perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions that collapse multiple rows into a single output row, window functions retain the individual rows. They allow you to calculate running totals, ranks, moving averages, and more within a result set.

When to Use:

  • Ranking rows within groups
  • Calculating differences between rows
  • Generating sequential numbers
  • Analyzing trends over time
  • Calculating cumulative sums or averages
SELECT
column1,
column2,
WINDOW_FUNCTION(argument1, argument2, ...) OVER (
[PARTITION BY column_list]
ORDER BY column_list [ASC | DESC]
[ROWS | RANGE BETWEEN frame_start AND frame_end]
) AS alias
FROM
table_name;
  • WINDOW_FUNCTION(): The window function to use (e.g., ROW_NUMBER(), RANK(), LAG(), LEAD()).
  • PARTITION BY: Divides the rows into partitions (groups) based on the specified column(s). The window function is applied separately to each partition. If omitted, the entire table is treated as a single partition.
  • ORDER BY: Defines the order of rows within each partition. Crucial for functions like ROW_NUMBER, RANK, LAG, and LEAD.
  • ROWS | RANGE BETWEEN frame_start AND frame_end (Optional): Defines the window frame (subset of rows within the partition) used for the calculation. If omitted, the default frame is all rows from the beginning of the partition to the current row.
    • frame_start and frame_end can be:
      • UNBOUNDED PRECEDING
      • n PRECEDING (where n is an integer)
      • CURRENT ROW
      • n FOLLOWING (where n is an integer)
      • UNBOUNDED FOLLOWING

Example (Basic):

SELECT
employee_id,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS salary_rank
FROM
employees;
Window FunctionUse Case
ROW_NUMBER()Assigning a unique sequential integer to each row within a partition.
RANK()Assigning a rank to each row within a partition based on a specific value. Handles ties.
LAG()Accessing data from a previous row within a partition.
LEAD()Accessing data from a subsequent row within a partition.

Detailed Use Cases:

  • ROW_NUMBER():

    • Pagination: Implementing numbered pages in search results.
    • Deduplication: Selecting only the most recent record for each unique key.
    • Ranking: Simple ranking without handling ties.
  • RANK():

    • Leaderboards: Ranking players based on their score.
    • Performance Analysis: Ranking employees based on sales figures.
    • Identifying Top Performers: Selecting the top N performers in each category.
  • LAG():

    • Calculating Differences: Finding the difference between the current value and the previous value (e.g., change in stock price).
    • Comparing Values Over Time: Analyzing sales growth from one month to the next.
    • Anomaly Detection: Identifying unusual spikes or dips in data.
  • LEAD():

    • Forecasting: Using future values to predict trends.
    • Scheduling: Determining the next scheduled event based on current data.
    • Calculating Time to Next Event: Finding the time difference between the current event and the next event.
  • Use PARTITION BY judiciously: Partitions can significantly affect performance. Only partition when necessary.
  • Optimize ORDER BY: The ORDER BY clause is critical for window functions. Ensure the columns used in ORDER BY are indexed.
  • Window Frame Specification: Carefully define the window frame (using ROWS or RANGE) to avoid unnecessary calculations. The default frame (rows from beginning of partition to current row) might not always be what you want.
  • Avoid Nested Window Functions (if possible): Nested window functions can negatively impact performance. Consider alternative approaches if possible. While some databases support them, they can be hard to read and debug.
  • Understand Data Distribution: Uneven data distribution within partitions can affect performance. Consider pre-aggregating data if necessary.
  • Test Thoroughly: Window functions can be complex. Test your queries with various data sets to ensure they produce the expected results.
  • Use indexes: If your ORDER BY clause includes a column that is frequently used in queries, create an index on that column. This can significantly speed up the execution of window functions.

Sample Data ( sales table`):

CREATE TABLE sales (
sale_id INT PRIMARY KEY,
product_id INT,
sale_date DATE,
amount DECIMAL(10, 2),
region VARCHAR(50)
);
INSERT INTO sales (sale_id, product_id, sale_date, amount, region) VALUES
(1, 101, '2023-01-15', 150.00, 'North'),
(2, 102, '2023-01-20', 200.00, 'North'),
(3, 101, '2023-02-10', 180.00, 'North'),
(4, 103, '2023-02-25', 250.00, 'North'),
(5, 101, '2023-01-05', 120.00, 'South'),
(6, 102, '2023-01-18', 180.00, 'South'),
(7, 103, '2023-02-01', 220.00, 'South'),
(8, 101, '2023-02-15', 200.00, 'South');

Example 1: ROW_NUMBER() - Assigning a rank to sales within each region based on amount.

SELECT
sale_id,
product_id,
sale_date,
amount,
region,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY amount DESC) AS region_sale_rank
FROM
sales;

Output:

sale_idproduct_idsale_dateamountregionregion_sale_rank
41032023-02-25250.00North1
21022023-01-20200.00North2
31012023-02-10180.00North3
11012023-01-15150.00North4
71032023-02-01220.00South1
81012023-02-15200.00South2
61022023-01-18180.00South3
51012023-01-05120.00South4

Example 2: RANK() - Ranking sales within each region based on amount, handling ties.

SELECT
sale_id,
product_id,
sale_date,
amount,
region,
RANK() OVER (PARTITION BY region ORDER BY amount DESC) AS region_sale_rank
FROM
sales;

(If two sales in the same region had the same amount, they would receive the same rank.)

Example 3: LAG() - Calculating the difference in sales amount from the previous sale within each region.

SELECT
sale_id,
product_id,
sale_date,
amount,
region,
LAG(amount, 1, 0) OVER (PARTITION BY region ORDER BY sale_date) AS previous_sale_amount,
amount - LAG(amount, 1, 0) OVER (PARTITION BY region ORDER BY sale_date) AS amount_difference
FROM
sales;

Output:

sale_idproduct_idsale_dateamountregionprevious_sale_amountamount_difference
11012023-01-15150.00North0.00150.00
21022023-01-20200.00North150.0050.00
31012023-02-10180.00North200.00-20.00
41032023-02-25250.00North180.0070.00
51012023-01-05120.00South0.00120.00
61022023-01-18180.00South120.0060.00
71032023-02-01220.00South180.0040.00
81012023-02-15200.00South220.00-20.00

(Note: The LAG function’s third argument (0 in this case) is the default value if there’s no previous row.)

Example 4: LEAD() - Finding the next sale date for each product.

SELECT
sale_id,
product_id,
sale_date,
amount,
region,
LEAD(sale_date, 1, NULL) OVER (PARTITION BY product_id ORDER BY sale_date) AS next_sale_date
FROM
sales;

Output:

sale_idproduct_idsale_dateamountregionnext_sale_date
51012023-01-05120.00South2023-01-15
11012023-01-15150.00North2023-02-10
31012023-02-10180.00North2023-02-15
81012023-02-15200.00SouthNULL
61022023-01-18180.00South2023-01-20
21022023-01-20200.00NorthNULL
71032023-02-01220.00South2023-02-25
41032023-02-25250.00NorthNULL

(Note: The LEAD function’s third argument (NULL in this case) is the default value if there’s no subsequent row.)

  • Missing ORDER BY: For ROW_NUMBER(), RANK(), LAG(), and LEAD(), the ORDER BY clause is crucial. Omitting it (or using an incorrect order) will lead to unpredictable results.
  • Incorrect PARTITION BY: Incorrect partitioning will apply the window function across the wrong set of rows, leading to incorrect calculations.
  • Confusing ROWS and RANGE: ROWS specifies the frame in terms of the number of rows, while RANGE specifies the frame based on the value of the ORDER BY column. Choose the appropriate frame specification based on your requirements. RANGE must have exactly one ORDER BY expression.
  • Performance Issues: Window functions can be resource-intensive. Monitor query execution plans and optimize as needed. Improper use can lead to full table scans.
  • Incorrect Default Values for LAG() and LEAD(): If you don’t provide a default value, LAG() and LEAD() will return NULL for the first/last row in a partition. Consider whether NULL is the desired behavior.

Troubleshooting Tips:

  • Examine Execution Plans: Use your database’s query execution plan tool to identify performance bottlenecks.
  • Simplify the Query: Remove unnecessary columns and clauses to isolate the problem.
  • Test with Small Data Sets: Use a small subset of your data to test your window function logic.
  • Break Down the Query: Calculate intermediate results using subqueries to understand the behavior of each part of the query.

While the core syntax is generally consistent, there are some database-specific variations:

  • MySQL: MySQL 8.0+ fully supports window functions. Older versions have limited support.
  • PostgreSQL: PostgreSQL has robust window function support, including advanced features like custom window functions.
  • SQL Server: SQL Server supports window functions from SQL Server 2005 onwards.
  • Oracle: Oracle has comprehensive window function support, including analytic functions that are similar to window functions.

Specific Differences:

  • Syntax: Minor syntax variations may exist, such as the placement of the OVER() clause.
  • Function Availability: Some databases may not support all window functions.
  • Performance: Performance characteristics can vary significantly across databases.
  • Frame Specification: The syntax for specifying the window frame (using ROWS or RANGE) may differ slightly.

Always consult the documentation for your specific database version for the most accurate and up-to-date information. Pay attention to database-specific performance tuning recommendations for window functions.