SQL: powerful tricks with self-joins

If you are using SQL you know about the many types of joins, like inner join, left join, right, etc.. one particular powerful type of join…

SQL: powerful tricks with self-joins
Photo by Caspar Camille Rubin on Unsplash

If you are using SQL you know about the many types of joins, like inner join, left join, right, etc.. one particular powerful type of join is self join, a self join as its name mentions is a type of join between records of the same table, let's see some examples.

Finding Duplicates

Assume we have the following table named people, and we want to identify duplicate records

id,name,age,city 
1,Alice,30,New York 
2,Bob,25,Los Angeles 
3,Charlie,35,Chicago 
4,David,40,New York 
5,Eve,30,Los Angeles 
6,Alice,30,New York 
7,Frank,45,Chicago 
8,Grace,28,Houston 
9,Hank,50,Dallas 
10,Ivy,22,San Francisco

We can do this with the following query, since the table people has two aliases p1 and p2 we can now treat the table as two different tables, and we ask the database to return all rows where p1 and p2 have the same name, age, and city but different id (this is very important or else it would compare the record with its self which is wrong self)

select p1.name,p1.age,p1.city from people as p1 
inner join people as p2 
on p1.name = p2.name and p1.age = p2.age and p1.city = p2.city 
where p1.id <> p2.id;

Correctly finds the following two records

Alice|30|New York 
Alice|30|New York

Finding consecutive rows

self-join can help you find consecutive rows based on a sequential column like date, id, etc.

Assume that we have a table named events with the following content

1|2024-01-01|New Year Celebration 
2|2024-01-02|Conference Day 1 
3|2024-01-03|Conference Day 2 
4|2024-01-05|Company Meeting 
5|2024-01-06|Project Deadline

Executing the following query returns the consecutive rows based on the date column, this query matches rows from a and b where the event_date column is equal to event_date -1 day .

select a.event_date as current_date, 
       a.event_name as current_date_event, 
       b.event_date as next_date, 
       b.event_name as next_date_event 
from events as a 
join events as b  
on date(a.event_date) = date(b.event_date,'-1 day');

The query produces the following results, each row contains the current date, current date event, next date, and the next date event

current_date|current_date_event  |next_date |next_date_event | 
------------+--------------------+----------+----------------+ 
2024-01-01  |New Year Celebration|2024-01-02|Conference Day 1| 
2024-01-02  |Conference Day 1    |2024-01-03|Conference Day 2| 
2024-01-05  |Company Meeting     |2024-01-06|Project Deadline|

Finding value differences between consecutive rows

A variation of the previous example is to find consecutive rows and calculate the difference between values. Assume the table stocks with the following data

date      |price| 
----------+-----+ 
2024-01-01|100  | 
2024-01-02|105  | 
2024-01-03|102  | 
2024-01-04|108  | 
2024-01-05|107  |

To calculate the difference in price between each consecutive date use the following query

SELECT a.date AS current_date, 
       a.price AS current_price, 
       b.price AS next_price, 
       (b.price - a.price) AS price_change 
FROM stocks a 
JOIN stocks b ON date(a.date) = date(b.date, '-1 day');

This produces the following output

current_date|current_price|next_price|price_change| 
------------+-------------+----------+------------+ 
2024-01-01  |100          |105       |           5| 
2024-01-02  |105          |102       |          -3| 
2024-01-03  |102          |108       |           6| 
2024-01-04  |108          |107       |          -1|

Conclusion

In this article, we saw methods to find duplicate, consecutive and difference in values between consecutive rows with the help of self-join, self-join is a powerful tool that can help you to create cool and efficient reports / solutions using SQL, i hope you enjoyeed this article!