Skip to content

GROUP BY and Aggregate Functions

Difficulty: Basic
Generated on: 2025-07-10 02:23:20
Category: SQL Cheatsheet for Database Development


SQL GROUP BY and Aggregate Functions Cheat Sheet (Basic Level)

Section titled “SQL GROUP BY and Aggregate Functions Cheat Sheet (Basic Level)”

What are GROUP BY and Aggregate Functions?

  • GROUP BY: A SQL clause used to group rows that have the same values in one or more columns into a summary row. It’s like categorizing your data.

  • Aggregate Functions: Functions that perform calculations on a set of values (a group) and return a single value. Examples include COUNT, SUM, AVG, MIN, and MAX.

When to Use Them:

Use GROUP BY and aggregate functions when you need to:

  • Calculate summary statistics (e.g., average salary, total sales) for different categories.
  • Determine the number of items in each category.
  • Find the minimum or maximum value within each category.
  • Generate reports that summarize data by group.
SELECT
column1, -- Column to group by
aggregate_function(column2) AS alias_for_aggregate_result -- Aggregate function applied to another column
FROM
table_name
WHERE
condition -- Optional filter condition
GROUP BY
column1 -- The column(s) you want to group by
ORDER BY
aggregate_function(column2) DESC; -- Optional: Order the results (e.g., by highest aggregate value first)

Explanation:

  • SELECT column1, aggregate_function(column2): Specifies the columns to retrieve. column1 is the grouping column, and aggregate_function(column2) calculates the aggregated value. AS alias_for_aggregate_result gives the aggregated value a more descriptive name.
  • FROM table_name: Specifies the table to retrieve data from.
  • WHERE condition: Filters rows before grouping occurs. This is optional.
  • GROUP BY column1: Groups rows with the same value in column1. You can group by multiple columns by separating them with commas (e.g., GROUP BY column1, column3).
  • ORDER BY aggregate_function(column2) DESC: Orders the results based on the aggregated value in descending order. ASC can be used for ascending order.
  • Calculating total sales per product category:
    • Input: A table of sales transactions with product category and sales amount.
    • Output: A table showing the total sales for each product category.
  • Finding the number of customers in each city:
    • Input: A table of customer data with city information.
    • Output: A table showing the number of customers in each city.
  • Determining the average order value per customer:
    • Input: A table of order data with customer ID and order amount.
    • Output: A table showing the average order value for each customer.
  • Identifying the maximum salary within each department:
    • Input: A table of employee data with department and salary information.
    • Output: A table showing the maximum salary for each department.
  • Counting the number of products in each supplier’s catalog:
    • Input: A table of products with supplier information.
    • Output: A table showing the number of products per supplier.
  • Use WHERE clause to filter data before grouping: Filtering before grouping reduces the number of rows that need to be processed, improving performance.
  • Only include grouping columns and aggregate functions in the SELECT statement: Including other columns that are not in the GROUP BY clause and are not aggregate functions will result in errors in most database systems, or unpredictable results in others (like MySQL with ONLY_FULL_GROUP_BY disabled, which is strongly discouraged).
  • Use aliases for aggregate function results: Aliases make the query more readable and the output more understandable.
  • Index the grouping columns: Indexing the columns used in the GROUP BY clause can significantly improve query performance, especially for large tables.
  • Consider using window functions for more complex aggregation: If you need to access both the aggregated results and the original row data, window functions might be more appropriate. (Beyond the scope of this basic cheat sheet, but good to be aware of)
  • Avoid NULL values in grouping columns: NULL values can lead to unexpected results. Consider using COALESCE or ISNULL to handle NULL values.

Sample Data (Employees Table):

employee_iddepartmentsalarycity
1Sales60000New York
2Marketing55000London
3Sales70000New York
4IT80000San Francisco
5Marketing65000London
6IT90000San Francisco
7Sales65000Chicago

Example 1: Calculate the average salary for each department.

SELECT
department,
AVG(salary) AS average_salary
FROM
Employees
GROUP BY
department;

Output:

departmentaverage_salary
Sales65000.00
Marketing60000.00
IT85000.00

Example 2: Count the number of employees in each city.

SELECT
city,
COUNT(*) AS employee_count
FROM
Employees
GROUP BY
city;

Output:

cityemployee_count
New York2
London2
San Francisco2
Chicago1

Example 3: Find the maximum salary in each department, only for departments with more than one employee.

SELECT
department,
MAX(salary) AS max_salary
FROM
Employees
GROUP BY
department
HAVING
COUNT(*) > 1; -- Filter groups based on a condition applied *after* grouping.

Output:

departmentmax_salary
Sales70000
Marketing65000
IT90000

Example 4: Combining WHERE and GROUP BY (finding average salary for IT and Sales in New York):

SELECT
department,
AVG(salary) AS average_salary
FROM
Employees
WHERE
city = 'New York' AND department IN ('Sales', 'IT')
GROUP BY
department;

Output:

departmentaverage_salary
Sales65000.00

Example 5: Grouping by multiple columns (count employees by department and city):

SELECT
department,
city,
COUNT(*) AS employee_count
FROM
Employees
GROUP BY
department, city
ORDER BY department, city;

Output:

departmentcityemployee_count
ITSan Francisco2
MarketingLondon2
SalesChicago1
SalesNew York2
  • “Column is invalid in the SELECT list because it is not contained in either an aggregate function or the GROUP BY clause”: This is the most common error. You can only select columns that are either in the GROUP BY clause or are used within an aggregate function. MySQL (without ONLY_FULL_GROUP_BY enabled) may allow this, but the results are undefined and unreliable.
  • Using WHERE instead of HAVING: WHERE filters rows before grouping, while HAVING filters groups after grouping. Use HAVING to filter based on aggregate function results.
  • Incorrectly interpreting NULL values: Aggregate functions generally ignore NULL values, except for COUNT(*), which counts all rows regardless of NULL values. COUNT(column) will only count non-null values in that column.
  • Forgetting to include all necessary columns in the GROUP BY clause: If you want to group by multiple criteria, make sure to include all relevant columns in the GROUP BY clause. Otherwise, the results might be inaccurate.
  • Performance issues with large datasets: GROUP BY operations can be resource-intensive. Ensure you have appropriate indexes and optimize your queries for performance.
  • Security: SQL Injection: Always sanitize user inputs when building SQL queries dynamically to prevent SQL injection attacks. Use parameterized queries or prepared statements.

While the basic syntax is generally consistent across different database systems, there are some minor variations:

  • MySQL: Historically, MySQL has been more lenient with the GROUP BY clause, allowing non-aggregated columns to be selected without being included in the GROUP BY. However, this behavior is strongly discouraged and can be controlled using the ONLY_FULL_GROUP_BY SQL mode, which enforces standard SQL behavior. Enable ONLY_FULL_GROUP_BY in your MySQL configuration for predictable and correct results.
  • PostgreSQL, SQL Server, Oracle: These databases strictly enforce the standard SQL behavior, requiring all non-aggregated columns in the SELECT statement to be included in the GROUP BY clause.
  • Handling NULL values: The functions to handle NULL values may differ slightly. For example, COALESCE is standard SQL, while ISNULL is commonly used in SQL Server. Oracle uses NVL().
  • String concatenation: The syntax for string concatenation may vary. MySQL uses CONCAT(), SQL Server uses +, and PostgreSQL and Oracle use ||.

Example of Handling NULL values (using COALESCE):

If the city column can contain NULL values, you can use COALESCE to replace NULL with a default value:

SELECT
COALESCE(city, 'Unknown City') AS city_group,
COUNT(*) AS employee_count
FROM
Employees
GROUP BY
city_group;

This ensures that NULL values are grouped together under the ‘Unknown City’ category, rather than being treated as separate groups.

This cheat sheet provides a solid foundation for working with GROUP BY and aggregate functions in SQL. Remember to practice with these concepts and explore more advanced techniques as you become more proficient.