Skip to content

DISTINCT and Unique Values

Difficulty: Basic
Generated on: 2025-07-10 02:23:45
Category: SQL Cheatsheet for Database Development


SQL DISTINCT and Unique Values Cheatsheet (Basic Level)

Section titled “SQL DISTINCT and Unique Values Cheatsheet (Basic Level)”

This cheatsheet focuses on using DISTINCT to retrieve unique values from a table.

The DISTINCT keyword in SQL is used to retrieve only unique (distinct) values from one or more columns in a table. It eliminates duplicate rows from the result set, returning only unique combinations of values.

When to use:

  • To get a list of unique categories, countries, or other attributes.
  • To count the number of distinct items in a set.
  • To avoid redundant data in your query results.
-- Basic syntax to retrieve unique values from a single column
SELECT DISTINCT column_name
FROM table_name;
-- Retrieving unique combinations from multiple columns
SELECT DISTINCT column1, column2, ...
FROM table_name;
  • Finding unique product categories: Identify all the different product categories offered.
  • Listing unique customer countries: Determine the countries where customers are located.
  • Identifying unique order dates: Find all the distinct dates on which orders were placed.
  • Counting unique values: Determining the total number of unique values of a column.
  • Use DISTINCT sparingly: DISTINCT can be resource-intensive, especially on large tables. Consider if it’s truly necessary.
  • Index relevant columns: If you frequently use DISTINCT on a specific column, create an index on that column to improve performance.
  • Consider alternative approaches: In some cases, using GROUP BY might be more efficient than DISTINCT (see examples below).
  • Performance testing: Always test your queries with DISTINCT to ensure they perform adequately, especially on production-sized data.
  • Avoid DISTINCT *: Using DISTINCT * (all columns) is generally inefficient and should be avoided. Instead, specify the columns you need to consider for uniqueness.

Sample Data (Customers Table):

customer_idnamecitycountry
1AliceNew YorkUSA
2BobLondonUK
3CharlieNew YorkUSA
4DavidParisFrance
5EveLondonUK
6FrankNew YorkUSA
7GraceBerlinGermany
8HenryBerlinGermany

Example 1: Get a list of unique cities.

SELECT DISTINCT city
FROM Customers;

Output:

city
New York
London
Paris
Berlin

Example 2: Get a list of unique countries.

SELECT DISTINCT country
FROM Customers;

Output:

country
USA
UK
France
Germany

Example 3: Get unique combinations of city and country.

SELECT DISTINCT city, country
FROM Customers;

Output:

citycountry
New YorkUSA
LondonUK
ParisFrance
BerlinGermany

Example 4: Counting unique countries using COUNT(DISTINCT)

SELECT COUNT(DISTINCT country) AS unique_country_count
FROM Customers;

Output:

unique_country_count
4

Example 5: Alternative using GROUP BY (often more efficient for counting):

SELECT country, COUNT(*)
FROM Customers
GROUP BY country;

This will return the count of customers per country, effectively showing the unique countries and their respective counts.

Output:

countryCOUNT(*)
USA3
UK2
France1
Germany2

To just get the unique count, you could use a subquery:

SELECT COUNT(*) FROM (
SELECT country
FROM Customers
GROUP BY country
) AS UniqueCountries;

Example 6: Combining DISTINCT with other clauses

SELECT DISTINCT city
FROM Customers
WHERE country IN ('USA', 'UK');

Output:

city
New York
London
  • Incorrect column selection: Forgetting to specify the correct column(s) for uniqueness. This results in unexpected output.
  • Performance issues on large datasets: Using DISTINCT on very large tables without proper indexing can lead to slow queries.
  • Misunderstanding multi-column DISTINCT: DISTINCT across multiple columns considers the combination of those columns. It doesn’t return rows where each individual column is unique.
  • Case sensitivity: DISTINCT is often case-sensitive. To handle case-insensitive uniqueness, use functions like LOWER() or UPPER() to normalize the data before applying DISTINCT. For example: SELECT DISTINCT LOWER(city) FROM Customers;
  • Null Values: DISTINCT treats all NULL values as equal. If you have multiple NULL values in a column, DISTINCT will return only one NULL.
  • Applying DISTINCT to calculated columns: Applying DISTINCT to calculated columns or expressions can sometimes lead to unexpected results if the calculations have rounding errors or data type issues. Carefully review the calculations to ensure accuracy.

While the basic syntax of DISTINCT is generally consistent across different SQL databases, there might be subtle differences in behavior and performance:

  • MySQL: MySQL’s optimizer might handle DISTINCT and GROUP BY differently depending on the query structure. Experiment to find the most efficient approach.
  • PostgreSQL: PostgreSQL generally has good support for DISTINCT and offers various indexing options to optimize performance.
  • SQL Server: SQL Server also handles DISTINCT efficiently and provides indexing features to enhance performance.
  • Oracle: Oracle’s optimizer is sophisticated and often chooses the best execution plan for DISTINCT queries.

Key Takeaways:

  • DISTINCT is a powerful tool for retrieving unique values.
  • Use it judiciously and consider performance implications.
  • Index relevant columns for better performance.
  • Understand the behavior of DISTINCT across multiple columns.
  • Test your queries thoroughly, especially on large datasets.