PostgreSQL DISTINCT ON
Summary: in this tutorial, you will learn how to use the PostgreSQL DISTINCT ON
clause to retrieve distinct rows based on a specific column.
Introduction to the PostgreSQL DISTINCT ON clause
The DISTINCT ON
clause allows you to retrieve unique rows based on specified columns. Here’s the basic syntax of the DISTINCT ON
clause:
The DISTINCT ON
clause retrieves the first unique entry from each column or combination of columns in a result set.
The key factor for determining which unique entry is selected lies in the columns that appear in the ORDER BY
clause.
Technically, you can use the DISTINCT ON
without the ORDER BY
clause. However, without the ORDER BY
clause, the “first” unique entry becomes unpredictable because the table stores the rows in an unspecified order.
Notice that you need to align the expression specified in the DISTINCT ON
clause with the leftmost expression in the ORDER BY
clause.
PostgreSQL DISTINCT ON example
Let’s take an example of using the DISTINCT ON
clause to understand it better.
First, create a table called student scores
to store the student’s scores:
Second, insert rows into the student_scores
table:
In the table, each student has both scores in Math
and Physics
.
Third, retrieve the highest score for each student in either Math
or Physics
using the DISTINCT ON
clause:
Output:
The output returns the highest score of each student in whatever subject, Math
or Physics
.
The result set includes a unique combination of names along with the corresponding subject and score. The ORDER BY
clause is important because it helps determine which row to retain in case of duplicate.
In this example, the DISTINCT ON
clause keeps the row with the highest scores because the ORDER BY
clause sorts the names and scores in descending order.
Summary
- Use the
DISTINCT ON
clause to keep the first unique entry from each column or combination of columns in a result set. - Always use the
ORDER BY
clause to determine which entry to retain in the result set.