SQL: return a random percentage of rows from a query
Assume you have the following task, you need to get a random percentage of records from a query, how do you do this? it might sound a bit…
Assume you have the following task, you need to get a random percentage of records from a query, how do you do this? it might sound a bit complicated but it is not, let's see how!
Exploring the dataset
We have a table named retail_sales (so typical) , let's run a count() to see how many records it has, we can identify 22620 records
Calculating Percentage
Now we need to find a clever way to get the random sample, let's start by calculating the percentage as the number of rows for this specific query. To do this we multiply the count of found records by 0.1 (10%) which returns 2262

Now since we can get the percentage of rows as the number we can easily use the following query, and indeed we get the 10% percent of the query, but if we take a better look we can see that the records returned are not random! how do we get random records? using order by!

Getting random records
You might have used to order by asc or desc which are the commonest to use, but apart from desc and asc there is also RANDOM which sorts the returned records randomly

A caveat to notice
Now we can randomly get a percentage of a query, but there is a caveat if you notice the number of random records is equal to this query select count(*) * 0.1 from retail_sales it calculates the row number from the full data set of the table and this is desired since in the first select statement we use this select * from retail_sales because the number of records is equal in both queries, if both queries do not return the same number of queries this means we do not get the number of rows defined as a percentage.
Conclusion
I hope you found this article easy to read and understand and I helped you learn something useful, SQL is used everywhere, and knowing such tricks is essential for data analysts or any kind of people working with SQL data.