DISTINCT and Unique Values
Difficulty: Basic
Generated on: 2025-07-10 02:23:45
Category: SQL Cheatsheet for Database Development
SQL DISTINCT and Unique Values Cheatsheet (Basic Level)
Section titled “SQL DISTINCT and Unique Values Cheatsheet (Basic Level)”This cheatsheet focuses on using DISTINCT to retrieve unique values from a table.
1. Quick Overview
Section titled “1. Quick Overview”The DISTINCT keyword in SQL is used to retrieve only unique (distinct) values from one or more columns in a table. It eliminates duplicate rows from the result set, returning only unique combinations of values.
When to use:
- To get a list of unique categories, countries, or other attributes.
- To count the number of distinct items in a set.
- To avoid redundant data in your query results.
2. Syntax
Section titled “2. Syntax”-- Basic syntax to retrieve unique values from a single columnSELECT DISTINCT column_nameFROM table_name;
-- Retrieving unique combinations from multiple columnsSELECT DISTINCT column1, column2, ...FROM table_name;3. Common Use Cases
Section titled “3. Common Use Cases”- Finding unique product categories: Identify all the different product categories offered.
- Listing unique customer countries: Determine the countries where customers are located.
- Identifying unique order dates: Find all the distinct dates on which orders were placed.
- Counting unique values: Determining the total number of unique values of a column.
4. Best Practices
Section titled “4. Best Practices”- Use
DISTINCTsparingly:DISTINCTcan be resource-intensive, especially on large tables. Consider if it’s truly necessary. - Index relevant columns: If you frequently use
DISTINCTon a specific column, create an index on that column to improve performance. - Consider alternative approaches: In some cases, using
GROUP BYmight be more efficient thanDISTINCT(see examples below). - Performance testing: Always test your queries with
DISTINCTto ensure they perform adequately, especially on production-sized data. - Avoid
DISTINCT *: UsingDISTINCT *(all columns) is generally inefficient and should be avoided. Instead, specify the columns you need to consider for uniqueness.
5. Examples
Section titled “5. Examples”Sample Data (Customers Table):
| customer_id | name | city | country |
|---|---|---|---|
| 1 | Alice | New York | USA |
| 2 | Bob | London | UK |
| 3 | Charlie | New York | USA |
| 4 | David | Paris | France |
| 5 | Eve | London | UK |
| 6 | Frank | New York | USA |
| 7 | Grace | Berlin | Germany |
| 8 | Henry | Berlin | Germany |
Example 1: Get a list of unique cities.
SELECT DISTINCT cityFROM Customers;Output:
| city |
|---|
| New York |
| London |
| Paris |
| Berlin |
Example 2: Get a list of unique countries.
SELECT DISTINCT countryFROM Customers;Output:
| country |
|---|
| USA |
| UK |
| France |
| Germany |
Example 3: Get unique combinations of city and country.
SELECT DISTINCT city, countryFROM Customers;Output:
| city | country |
|---|---|
| New York | USA |
| London | UK |
| Paris | France |
| Berlin | Germany |
Example 4: Counting unique countries using COUNT(DISTINCT)
SELECT COUNT(DISTINCT country) AS unique_country_countFROM Customers;Output:
| unique_country_count |
|---|
| 4 |
Example 5: Alternative using GROUP BY (often more efficient for counting):
SELECT country, COUNT(*)FROM CustomersGROUP BY country;This will return the count of customers per country, effectively showing the unique countries and their respective counts.
Output:
| country | COUNT(*) |
|---|---|
| USA | 3 |
| UK | 2 |
| France | 1 |
| Germany | 2 |
To just get the unique count, you could use a subquery:
SELECT COUNT(*) FROM ( SELECT country FROM Customers GROUP BY country) AS UniqueCountries;Example 6: Combining DISTINCT with other clauses
SELECT DISTINCT cityFROM CustomersWHERE country IN ('USA', 'UK');Output:
| city |
|---|
| New York |
| London |
6. Common Pitfalls
Section titled “6. Common Pitfalls”- Incorrect column selection: Forgetting to specify the correct column(s) for uniqueness. This results in unexpected output.
- Performance issues on large datasets: Using
DISTINCTon very large tables without proper indexing can lead to slow queries. - Misunderstanding multi-column
DISTINCT:DISTINCTacross multiple columns considers the combination of those columns. It doesn’t return rows where each individual column is unique. - Case sensitivity:
DISTINCTis often case-sensitive. To handle case-insensitive uniqueness, use functions likeLOWER()orUPPER()to normalize the data before applyingDISTINCT. For example:SELECT DISTINCT LOWER(city) FROM Customers; - Null Values:
DISTINCTtreats allNULLvalues as equal. If you have multipleNULLvalues in a column,DISTINCTwill return only oneNULL. - Applying DISTINCT to calculated columns: Applying DISTINCT to calculated columns or expressions can sometimes lead to unexpected results if the calculations have rounding errors or data type issues. Carefully review the calculations to ensure accuracy.
7. Database Variations
Section titled “7. Database Variations”While the basic syntax of DISTINCT is generally consistent across different SQL databases, there might be subtle differences in behavior and performance:
- MySQL: MySQL’s optimizer might handle
DISTINCTandGROUP BYdifferently depending on the query structure. Experiment to find the most efficient approach. - PostgreSQL: PostgreSQL generally has good support for
DISTINCTand offers various indexing options to optimize performance. - SQL Server: SQL Server also handles
DISTINCTefficiently and provides indexing features to enhance performance. - Oracle: Oracle’s optimizer is sophisticated and often chooses the best execution plan for
DISTINCTqueries.
Key Takeaways:
DISTINCTis a powerful tool for retrieving unique values.- Use it judiciously and consider performance implications.
- Index relevant columns for better performance.
- Understand the behavior of
DISTINCTacross multiple columns. - Test your queries thoroughly, especially on large datasets.