Beginner

SELECT DISTINCT in SQL: Removing Duplicate Rows

DISTINCT collapses duplicate rows into one. Learn how it works across multiple columns, how it differs from GROUP BY, the PostgreSQL-only DISTINCT ON, and when it hides a real problem.

DISTINCT removes duplicate rows from a result set, returning only unique combinations. It is one of the first keywords beginners reach for — and one of the most over-used, because a duplicate is often a sign of a JOIN problem rather than something to paper over. Keep the SQL cheat sheet open as you read.

What DISTINCT Does

Place DISTINCT right after SELECT. The database compares the selected columns row by row and keeps only the first of each identical group.

-- Every city we ship to, once each
SELECT DISTINCT city
FROM users;

DISTINCT Across Columns

With more than one column, DISTINCT returns unique combinations. Two rows are duplicates only if every selected column matches.

-- Unique (city, country) pairs — not unique cities
SELECT DISTINCT city, country
FROM users;

COUNT(DISTINCT)

COUNT(DISTINCT column) counts how many unique values a column has, ignoring NULLs. This is the standard way to answer "how many different customers ordered?" without listing them.

-- How many distinct customers placed an order?
SELECT COUNT(DISTINCT user_id) AS customers
FROM orders;

DISTINCT vs GROUP BY

SELECT DISTINCT col and SELECT col ... GROUP BY col return the same unique values. The difference is intent: use DISTINCT when you only want unique rows, and GROUP BY when you also want to aggregate within each group.

-- These return the same list of statuses
SELECT DISTINCT status FROM orders;
SELECT status FROM orders GROUP BY status;

-- But GROUP BY can do more — count per status
SELECT status, COUNT(*) FROM orders GROUP BY status;

DISTINCT ON (PostgreSQL)

PostgreSQL’s DISTINCT ON keeps the first row for each value of the listed column(s). Pair it with ORDER BY to control which row "first" means — this is a clean way to get the latest order per customer.

-- The most recent order for each customer
SELECT DISTINCT ON (user_id)
  user_id, id, created_at
FROM orders
ORDER BY user_id, created_at DESC;

Common Mistakes & Tips

  • Expecting DISTINCT to apply to a single column — it always applies to the whole selected row
  • Using DISTINCT to hide duplicate rows from a JOIN instead of fixing the join
  • Forgetting COUNT(DISTINCT col) ignores NULLs — they are never counted
  • Reaching for DISTINCT when GROUP BY with an aggregate is what you actually need
  • Assuming DISTINCT is free — it requires a sort or hash and can be slow on large sets

DISTINCT is simple but easy to misuse. Practice on real data to learn when it is the right tool and when a JOIN or GROUP BY is the real fix.