Mastering the SQL LAG Function: Comparing Data Like a Pro
The LAG() function in SQL is a window function that provides access to a row at a specified physical offset before the current row within…
The LAG() function in SQL is a window function that provides access to a row at a specified physical offset before the current row within the result set of a query. It’s commonly used to compare values in the previous row with the current row. The syntax is straightforward, but it becomes very powerful when used in data analysis, particularly for comparing change of of a value over time.
Syntax:
LAG(column_name, offset, default_value)
OVER (PARTITION BY partition_column ORDER BY order_column)column_name: The column from which you want to retrieve the previous value.offset: The number of rows before the current_row from which to retrieve the value (default is1).default_value: The value to return if the offset is outside the range of the result set (optional).PARTITION BY: Divides the result set into partitions to which theLAG()function is applied independently.ORDER BY: Specifies the order of rows to be used for retrieving the previous row.
Example 1: Basic LAG usage
Create a table named sales that records daily sales by day:
CREATE TABLE sales (
day DATE,
sales INT
);
INSERT INTO sales (day, sales)
VALUES
('2023-09-01', 100),
('2023-09-02', 120),
('2023-09-03', 150),
('2023-09-04', 130),
('2023-09-05', 180);We want to see the sales from the previous day on each row. We can use LAG() to achieve this
SELECT
day,
sales,
LAG(sales, 1, 0) OVER (ORDER BY day) AS previous_day_sales
FROM sales;Result:
day sales previous_day_sales
---------- ----- ------------------
2023-09-01 100 0
2023-09-02 120 100
2023-09-03 150 120
2023-09-04 130 150
2023-09-05 180 130Here, the LAG() function retrieves the value of sales column from the previous row (offset 1). For the first row, where there is no previous row it returns the default value 0.
Example 2: LAG with Multiple Partitions
Let’s create a new table named sales_per_store to represent sales from multiple stores:
CREATE TABLE sales_per_store (
store VARCHAR(50),
day DATE,
sales INT
);
INSERT INTO sales_per_store (store, day, sales)
VALUES
('Store A', '2023-09-01', 100),
('Store A', '2023-09-02', 120),
('Store A', '2023-09-03', 150),
('Store A', '2023-09-04', 130),
('Store A', '2023-09-05', 180),
('Store B', '2023-09-01', 90),
('Store B', '2023-09-02', 130),
('Store B', '2023-09-03', 110),
('Store B', '2023-09-04', 160),
('Store B', '2023-09-05', 140);Now, we want to calculate the previous day’s sales for each store individually. We can partition the data by store:
SELECT
store,
sales,
day,
sales,
LAG(sales, 1, 0) OVER (PARTITION BY store ORDER BY day) AS previous_day_sales
FROM sales_per_store;Result:
store sales day sales previous_day_sales
------- ----- ---------- ----- ------------------
Store A 100 2023-09-01 100 0
Store A 120 2023-09-02 120 100
Store A 150 2023-09-03 150 120
Store A 130 2023-09-04 130 150
Store A 180 2023-09-05 180 130
Store B 90 2023-09-01 90 0
Store B 130 2023-09-02 130 90
Store B 110 2023-09-03 110 130
Store B 160 2023-09-04 160 110
Store B 140 2023-09-05 140 160Here, the PARTITION BY clause ensures that the LAG() function calculates the previous day’s sales independently for each store.
Example 3: Using LAG to calculate Sales Differences
You can use LAG() function to compute differences between the current value and the previous value. Let’s calculate the difference in sales from the previous day:
SELECT
store,
sales,
day,
sales,
LAG(sales, 1, 0) OVER (PARTITION BY store ORDER BY day) AS previous_day_sales,
sales - LAG(sales, 1, 0) OVER (PARTITION BY store ORDER BY day) as diff
FROM sales_per_store;Result:
store sales day sales previous_day_sales diff
------- ----- ---------- ----- ------------------ ----
Store A 100 2023-09-01 100 0 100
Store A 120 2023-09-02 120 100 20
Store A 150 2023-09-03 150 120 30
Store A 130 2023-09-04 130 150 -20
Store A 180 2023-09-05 180 130 50
Store B 90 2023-09-01 90 0 90
Store B 130 2023-09-02 130 90 40
Store B 110 2023-09-03 110 130 -20
Store B 160 2023-09-04 160 110 50
Store B 140 2023-09-05 140 160 -20In this example, we use sales — LAG(sales,1,0) OVER (PARTITION BY store ORDER by DAY) to calculate the difference between consecutive days.
Example 4: Using LAG with Different Offsets
One powerful feature of LAG() is that you can also use different offsets to retrieve values from two more rows before. For example, to get the sales from 2 days ago, we modify the offset:
SELECT
store,
sales,
day,
sales,
LAG(sales, 2, 0) OVER (PARTITION BY store ORDER BY day) AS previous_day_sales,
sales - LAG(sales, 2, 0) OVER (PARTITION BY store ORDER BY day) as diff_2_days_before
FROM sales_per_store;Result:
store sales day sales previous_day_sales diff_2_days_before
------- ----- ---------- ----- ------------------ ------------------
Store A 100 2023-09-01 100 0 100
Store A 120 2023-09-02 120 0 120
Store A 150 2023-09-03 150 100 50
Store A 130 2023-09-04 130 120 10
Store A 180 2023-09-05 180 150 30
Store B 90 2023-09-01 90 0 90
Store B 130 2023-09-02 130 0 130
Store B 110 2023-09-03 110 90 20
Store B 160 2023-09-04 160 130 30
Store B 140 2023-09-05 140 110 30Conclusion
The LAG()function is a powerful tool for comparing rows in SQL. It’s commonly used to calculate changes over time, detect trends or generate values relative to previous rows in a dataset. By adjusting the PARTITION BY, ORDER BY, and OFFSET, you can customize the function to fit a variety of use cases.