SQL: powerful tricks with self-joins
If you are using SQL you know about the many types of joins, like inner join, left join, right, etc.. one particular powerful type of join…
If you are using SQL you know about the many types of joins, like inner join, left join, right, etc.. one particular powerful type of join is self join, a self join as its name mentions is a type of join between records of the same table, let's see some examples.
Finding Duplicates
Assume we have the following table named people, and we want to identify duplicate records
id,name,age,city
1,Alice,30,New York
2,Bob,25,Los Angeles
3,Charlie,35,Chicago
4,David,40,New York
5,Eve,30,Los Angeles
6,Alice,30,New York
7,Frank,45,Chicago
8,Grace,28,Houston
9,Hank,50,Dallas
10,Ivy,22,San FranciscoWe can do this with the following query, since the table people has two aliases p1 and p2 we can now treat the table as two different tables, and we ask the database to return all rows where p1 and p2 have the same name, age, and city but different id (this is very important or else it would compare the record with its self which is wrong self)
select p1.name,p1.age,p1.city from people as p1
inner join people as p2
on p1.name = p2.name and p1.age = p2.age and p1.city = p2.city
where p1.id <> p2.id;Correctly finds the following two records
Alice|30|New York
Alice|30|New YorkFinding consecutive rows
self-join can help you find consecutive rows based on a sequential column like date, id, etc.
Assume that we have a table named events with the following content
1|2024-01-01|New Year Celebration
2|2024-01-02|Conference Day 1
3|2024-01-03|Conference Day 2
4|2024-01-05|Company Meeting
5|2024-01-06|Project DeadlineExecuting the following query returns the consecutive rows based on the date column, this query matches rows from a and b where the event_date column is equal to event_date -1 day .
select a.event_date as current_date,
a.event_name as current_date_event,
b.event_date as next_date,
b.event_name as next_date_event
from events as a
join events as b
on date(a.event_date) = date(b.event_date,'-1 day');The query produces the following results, each row contains the current date, current date event, next date, and the next date event
current_date|current_date_event |next_date |next_date_event |
------------+--------------------+----------+----------------+
2024-01-01 |New Year Celebration|2024-01-02|Conference Day 1|
2024-01-02 |Conference Day 1 |2024-01-03|Conference Day 2|
2024-01-05 |Company Meeting |2024-01-06|Project Deadline|Finding value differences between consecutive rows
A variation of the previous example is to find consecutive rows and calculate the difference between values. Assume the table stocks with the following data
date |price|
----------+-----+
2024-01-01|100 |
2024-01-02|105 |
2024-01-03|102 |
2024-01-04|108 |
2024-01-05|107 |To calculate the difference in price between each consecutive date use the following query
SELECT a.date AS current_date,
a.price AS current_price,
b.price AS next_price,
(b.price - a.price) AS price_change
FROM stocks a
JOIN stocks b ON date(a.date) = date(b.date, '-1 day');This produces the following output
current_date|current_price|next_price|price_change|
------------+-------------+----------+------------+
2024-01-01 |100 |105 | 5|
2024-01-02 |105 |102 | -3|
2024-01-03 |102 |108 | 6|
2024-01-04 |108 |107 | -1|Conclusion
In this article, we saw methods to find duplicate, consecutive and difference in values between consecutive rows with the help of self-join, self-join is a powerful tool that can help you to create cool and efficient reports / solutions using SQL, i hope you enjoyeed this article!