Postgres lag() window function
Use lag() to access values from previous rows in a result set
The lag()
function in Postgres is a window function that allows you to access values from previous rows in a result set without the need for a self-join. It's useful for comparing values between the current row and a previous row, for example, when calculating running differences, plotting trends, or doing time series analysis.
Function signature
The lag()
function has the following forms:
value
: The value to return from the previous row. This can be a column, expression, or subquery.offset
(optional): The number of rows back from the current row to retrieve the value from. If omitted, it defaults to 1. Must be a non-negative integer.default
(optional): The value to return when the offset goes beyond the scope of the window. If omitted, it defaults to null.over (...)
: TheOVER
clause defines the window frame for the function. It can be an emptyOVER ()
, or it can include aPARTITION BY
and/orORDER BY
clause.
Example usage
Consider a table sales
that contains daily sales data for a company. We can use lag()
to compare each day's sales to the previous day's sales.
This query calculates the previous day's sales amount (prev_amount
) and the difference between the current day's sales and the previous day's sales (diff
). The OVER
clause specifies that the window frame should be ordered by sale_date
.
You can also use lag()
to access values from rows further back by specifying an offset. For example, to compare each day's sales to the sales from the same day of the previous week:
This query generates random sales data for each day in January 2023 and compares each day's sales to the sales from the same day of the previous week. The lag()
function with an offset of 7 retrieves the sales amount from 7 days ago.
Advanced examples
lag()
with a default value
Using When the offset in lag()
goes beyond the start of the window frame, it returns null by default. You can specify a default value to use instead, so the resulting column does not contain nulls.
This query calculates the change in inventory quantity compared to the previous day. For the first row, where there is no previous quantity, it uses the current quantity as the default value, resulting in a change of 0.
lag()
with partitioning
Using You can use lag()
with partitioning to perform calculations within groups of rows.
This query calculates the number of days since each customer's previous order. The OVER
clause partitions the data by customer_id
and orders it by order_date
within each partition.
Additional considerations
Correctness
The lag()
function relates each row in the result set to a previous row in the same window frame. If the window frame is not explicitly defined, the default frame is the entire result set. Make sure to specify the correct ORDER BY
and PARTITION BY
clauses to ensure the desired behavior.
Performance implications
Window functions like lag()
perform calculations across a set of rows defined by the OVER
clause. This can be computationally expensive for large datasets or complex window definitions.
To optimize performance, make sure to:
- Include an
ORDER BY
clause in theOVER
clause to avoid sorting the entire dataset. - Use partitioning (
PARTITION BY
) to divide the data into smaller chunks when possible. - Create appropriate indexes on the columns used in the
OVER
clause.
Alternative functions
- lead - Access values from subsequent rows in a result set. Similar to
lag()
but looks ahead in the partition instead of behind. first_value()
- Get the first value within a window frame.last_value()
- Get the last value within a window frame.