Mastering the SQL LAG Function: Comparing Data Like a Pro

The LAG() function in SQL is a window function that provides access to a row at a specified physical offset before the current row within…

Mastering the SQL LAG Function: Comparing Data Like a Pro
Photo by Karsten Winegeart on Unsplash

The LAG() function in SQL is a window function that provides access to a row at a specified physical offset before the current row within the result set of a query. It’s commonly used to compare values in the previous row with the current row. The syntax is straightforward, but it becomes very powerful when used in data analysis, particularly for comparing change of of a value over time.

Syntax:

LAG(column_name, offset, default_value) 
OVER (PARTITION BY partition_column ORDER BY order_column)
  • column_name: The column from which you want to retrieve the previous value.
  • offset: The number of rows before the current_row from which to retrieve the value (default is 1).
  • default_value: The value to return if the offset is outside the range of the result set (optional).
  • PARTITION BY: Divides the result set into partitions to which the LAG() function is applied independently.
  • ORDER BY: Specifies the order of rows to be used for retrieving the previous row.

Example 1: Basic LAG usage

Create a table named sales that records daily sales by day:

CREATE TABLE sales ( 
    day DATE, 
    sales INT 
); 
 
INSERT INTO sales (day, sales) 
VALUES 
    ('2023-09-01', 100), 
    ('2023-09-02', 120), 
    ('2023-09-03', 150), 
    ('2023-09-04', 130), 
    ('2023-09-05', 180);

We want to see the sales from the previous day on each row. We can use LAG() to achieve this

SELECT  
  day, 
  sales, 
  LAG(sales, 1, 0) OVER (ORDER BY day) AS previous_day_sales 
FROM sales;

Result:

day         sales  previous_day_sales 
----------  -----  ------------------ 
2023-09-01  100    0 
2023-09-02  120    100 
2023-09-03  150    120 
2023-09-04  130    150 
2023-09-05  180    130

Here, the LAG() function retrieves the value of sales column from the previous row (offset 1). For the first row, where there is no previous row it returns the default value 0.

Example 2: LAG with Multiple Partitions

Let’s create a new table named sales_per_store to represent sales from multiple stores:

CREATE TABLE sales_per_store ( 
    store VARCHAR(50), 
    day DATE, 
    sales INT 
); 
 
INSERT INTO sales_per_store (store, day, sales) 
VALUES 
    ('Store A', '2023-09-01', 100), 
    ('Store A', '2023-09-02', 120), 
    ('Store A', '2023-09-03', 150), 
    ('Store A', '2023-09-04', 130), 
    ('Store A', '2023-09-05', 180), 
    ('Store B', '2023-09-01', 90), 
    ('Store B', '2023-09-02', 130), 
    ('Store B', '2023-09-03', 110), 
    ('Store B', '2023-09-04', 160), 
    ('Store B', '2023-09-05', 140);

Now, we want to calculate the previous day’s sales for each store individually. We can partition the data by store:

SELECT 
  store, 
  sales, 
  day, 
  sales, 
  LAG(sales, 1, 0) OVER (PARTITION BY store ORDER BY day) AS previous_day_sales 
FROM sales_per_store;

Result:

store    sales  day         sales  previous_day_sales 
-------  -----  ----------  -----  ------------------ 
Store A  100    2023-09-01  100    0 
Store A  120    2023-09-02  120    100 
Store A  150    2023-09-03  150    120 
Store A  130    2023-09-04  130    150 
Store A  180    2023-09-05  180    130 
Store B  90     2023-09-01  90     0 
Store B  130    2023-09-02  130    90 
Store B  110    2023-09-03  110    130 
Store B  160    2023-09-04  160    110 
Store B  140    2023-09-05  140    160

Here, the PARTITION BY clause ensures that the LAG() function calculates the previous day’s sales independently for each store.

Example 3: Using LAG to calculate Sales Differences

You can use LAG() function to compute differences between the current value and the previous value. Let’s calculate the difference in sales from the previous day:

SELECT 
  store, 
  sales, 
  day, 
  sales, 
  LAG(sales, 1, 0) OVER (PARTITION BY store ORDER BY day) AS previous_day_sales, 
  sales - LAG(sales, 1, 0) OVER (PARTITION BY store ORDER BY day) as diff 
FROM sales_per_store;

Result:

store    sales  day         sales  previous_day_sales  diff 
-------  -----  ----------  -----  ------------------  ---- 
Store A  100    2023-09-01  100    0                   100 
Store A  120    2023-09-02  120    100                 20 
Store A  150    2023-09-03  150    120                 30 
Store A  130    2023-09-04  130    150                 -20 
Store A  180    2023-09-05  180    130                 50 
Store B  90     2023-09-01  90     0                   90 
Store B  130    2023-09-02  130    90                  40 
Store B  110    2023-09-03  110    130                 -20 
Store B  160    2023-09-04  160    110                 50 
Store B  140    2023-09-05  140    160                 -20

In this example, we use sales — LAG(sales,1,0) OVER (PARTITION BY store ORDER by DAY) to calculate the difference between consecutive days.

Example 4: Using LAG with Different Offsets

One powerful feature of LAG() is that you can also use different offsets to retrieve values from two more rows before. For example, to get the sales from 2 days ago, we modify the offset:

SELECT  
  store, 
  sales, 
  day, 
  sales, 
  LAG(sales, 2, 0) OVER (PARTITION BY store ORDER BY day) AS previous_day_sales, 
  sales - LAG(sales, 2, 0) OVER (PARTITION BY store ORDER BY day) as diff_2_days_before 
FROM sales_per_store;

Result:

store    sales  day         sales  previous_day_sales  diff_2_days_before 
-------  -----  ----------  -----  ------------------  ------------------ 
Store A  100    2023-09-01  100    0                   100 
Store A  120    2023-09-02  120    0                   120 
Store A  150    2023-09-03  150    100                 50 
Store A  130    2023-09-04  130    120                 10 
Store A  180    2023-09-05  180    150                 30 
Store B  90     2023-09-01  90     0                   90 
Store B  130    2023-09-02  130    0                   130 
Store B  110    2023-09-03  110    90                  20 
Store B  160    2023-09-04  160    130                 30 
Store B  140    2023-09-05  140    110                 30

Conclusion

The LAG()function is a powerful tool for comparing rows in SQL. It’s commonly used to calculate changes over time, detect trends or generate values relative to previous rows in a dataset. By adjusting the PARTITION BY, ORDER BY, and OFFSET, you can customize the function to fit a variety of use cases.