SQL: How to find the diff between two record sets

In this article, I will show you how to subtract one result set from another. The database I will use is SQLite which supports the EXCEPT…

SQL: How to find the diff between two record sets
Photo by Possessed Photography on Unsplash

In this article, I will show you how to subtract one result set from another. The database I will use is SQLite which supports the EXCEPT operator for this task, in other databases like Oracle the operation to achieve the same task is the MINUS operator which does not differ in usage from the EXCEPT, Let's start!

Assume that we have the table employees with the following data

sqlite> select * from employees; 
Name     Age  Year 
-------  ---  ---- 
Alice    28   2024 
Bob      34   2024 
Charlie  22   2024 
David    45   2024 
Eve      29   2024 
Alice    28   2023 
Bob      34   2023 
Charlie  22   2023 
David    45   2023 
Jones    37   2023

Our task is to create a set of employees for the year 2024 and one for the year 2023 and find the different records of each set.

Queries

Running the following query generates one result, which is correct the only difference of the set with the year 2024 with columns Name,Age vs the one of the year 2023 is The record with the Name “Eve”

sqlite> select Name,Age from employees where Year=2024 
   ...> except 
   ...> select Name,Age from employees where Year=2023; 
Name  Age 
----  --- 
Eve   29

To find the opposite, the difference of the set with the year 2023 vs the set with the year 2024 you swap the order of select statements

sqlite> select Name,Age from employees where Year=2023 
   ...> except 
   ...> select Name,Age from employees where Year=2024; 
Name   Age 
-----  --- 
Jones  37

Wasn't that difficult right? :) Now let's see a trick! assume that they asked for a combined report that will contain the diff between Year 2024 — Year 2023 and Year 2023 — Year 2024, sound complicated? well, it is not! there is a cool command called UNION that can help us achieve the task

sqlite> SELECT * FROM (SELECT Name, Age FROM employees WHERE Year=2023 
(x1...> EXCEPT 
(x1...> SELECT Name, Age FROM employees WHERE Year=2024) 
   ...> UNION 
   ...> SELECT * FROM (SELECT Name, Age FROM employees WHERE Year=2024 
(x1...> EXCEPT 
(x1...> SELECT Name, Age FROM employees WHERE Year=2023); 
Name   Age 
-----  --- 
Eve    29 
Jones  37

You might wonder, why we need to explicitly SELECT each set. The reason is that if we don't do this the UNION command gets confused in the order of execution.

Conclusion

The EXCEPT operator is a powerful operation that can be used in many cases, one of my favorites is validating two reports that supposedly should be the same, one thing to note is that the columns between the two sets must have the same number of columns and their corresponding data types be compatible. I hope you found this article useful! :)