DISTINCT and Unique Values

Difficulty: Basic
Generated on: 2025-07-10 02:23:45
Category: SQL Cheatsheet for Database Development

SQL DISTINCT and Unique Values Cheatsheet (Basic Level)

This cheatsheet focuses on using DISTINCT to retrieve unique values from a table.

1. Quick Overview

The DISTINCT keyword in SQL is used to retrieve only unique (distinct) values from one or more columns in a table. It eliminates duplicate rows from the result set, returning only unique combinations of values.

When to use:

To get a list of unique categories, countries, or other attributes.
To count the number of distinct items in a set.
To avoid redundant data in your query results.

2. Syntax

-- Basic syntax to retrieve unique values from a single column
SELECT DISTINCT column_name
FROM table_name;

-- Retrieving unique combinations from multiple columns
SELECT DISTINCT column1, column2, ...
FROM table_name;

3. Common Use Cases

Finding unique product categories: Identify all the different product categories offered.
Listing unique customer countries: Determine the countries where customers are located.
Identifying unique order dates: Find all the distinct dates on which orders were placed.
Counting unique values: Determining the total number of unique values of a column.

4. Best Practices

Use DISTINCT sparingly: DISTINCT can be resource-intensive, especially on large tables. Consider if it’s truly necessary.
Index relevant columns: If you frequently use DISTINCT on a specific column, create an index on that column to improve performance.
Consider alternative approaches: In some cases, using GROUP BY might be more efficient than DISTINCT (see examples below).
Performance testing: Always test your queries with DISTINCT to ensure they perform adequately, especially on production-sized data.
Avoid DISTINCT *: Using DISTINCT * (all columns) is generally inefficient and should be avoided. Instead, specify the columns you need to consider for uniqueness.

5. Examples

Sample Data (Customers Table):

customer_id	name	city	country
1	Alice	New York	USA
2	Bob	London	UK
3	Charlie	New York	USA
4	David	Paris	France
5	Eve	London	UK
6	Frank	New York	USA
7	Grace	Berlin	Germany
8	Henry	Berlin	Germany

Example 1: Get a list of unique cities.

SELECT DISTINCT city
FROM Customers;

Output:

city
New York
London
Paris
Berlin

Example 2: Get a list of unique countries.

SELECT DISTINCT country
FROM Customers;

Output:

country
USA
UK
France
Germany

Example 3: Get unique combinations of city and country.

SELECT DISTINCT city, country
FROM Customers;

Output:

city	country
New York	USA
London	UK
Paris	France
Berlin	Germany

Example 4: Counting unique countries using COUNT(DISTINCT)

SELECT COUNT(DISTINCT country) AS unique_country_count
FROM Customers;

Output:

unique_country_count
4

Example 5: Alternative using GROUP BY (often more efficient for counting):

SELECT country, COUNT(*)
FROM Customers
GROUP BY country;

This will return the count of customers per country, effectively showing the unique countries and their respective counts.

Output:

country	COUNT(*)
USA	3
UK	2
France	1
Germany	2

To just get the unique count, you could use a subquery:

SELECT COUNT(*) FROM (
  SELECT country
  FROM Customers
  GROUP BY country
) AS UniqueCountries;

Example 6: Combining DISTINCT with other clauses

SELECT DISTINCT city
FROM Customers
WHERE country IN ('USA', 'UK');

Output:

city
New York
London

6. Common Pitfalls

Incorrect column selection: Forgetting to specify the correct column(s) for uniqueness. This results in unexpected output.
Performance issues on large datasets: Using DISTINCT on very large tables without proper indexing can lead to slow queries.
Misunderstanding multi-column DISTINCT: DISTINCT across multiple columns considers the combination of those columns. It doesn’t return rows where each individual column is unique.
Case sensitivity: DISTINCT is often case-sensitive. To handle case-insensitive uniqueness, use functions like LOWER() or UPPER() to normalize the data before applying DISTINCT. For example: SELECT DISTINCT LOWER(city) FROM Customers;
Null Values: DISTINCT treats all NULL values as equal. If you have multiple NULL values in a column, DISTINCT will return only one NULL.
Applying DISTINCT to calculated columns: Applying DISTINCT to calculated columns or expressions can sometimes lead to unexpected results if the calculations have rounding errors or data type issues. Carefully review the calculations to ensure accuracy.

7. Database Variations

While the basic syntax of DISTINCT is generally consistent across different SQL databases, there might be subtle differences in behavior and performance:

MySQL: MySQL’s optimizer might handle DISTINCT and GROUP BY differently depending on the query structure. Experiment to find the most efficient approach.
PostgreSQL: PostgreSQL generally has good support for DISTINCT and offers various indexing options to optimize performance.
SQL Server: SQL Server also handles DISTINCT efficiently and provides indexing features to enhance performance.
Oracle: Oracle’s optimizer is sophisticated and often chooses the best execution plan for DISTINCT queries.

Key Takeaways:

DISTINCT is a powerful tool for retrieving unique values.
Use it judiciously and consider performance implications.
Index relevant columns for better performance.
Understand the behavior of DISTINCT across multiple columns.
Test your queries thoroughly, especially on large datasets.