Efficiency, readability, and scalability
Even though it’s called a question language, SQL is able to not only querying databases but additionally performing efficient data evaluation and manipulation. It shouldn’t be a surprise that SQL is embraced by the information science community.
In this text, we’ll study a really handy SQL feature, which allows for writing cleaner and more efficient queries. This I-wish-I-knew-this-earlier feature is the GROUPING SETS
, which might be regarded as an extension of the GROUP BY
function.
We’ll learn the difference between them in addition to the advantage of using GROUPING SETS
over the GROUP BY
function but first, we’d like a dataset to work on.
I created a SQL table from the Melbourne housing dataset available on Kaggle with a public domain license. The primary 5 rows of the table looks as follows:
The GROUP BY function
We will use the function to calculate aggregate values per group or distinct values in a column or multiple columns. For example, the next query returns the common price for every listing type.
SELECT
type,
AVG(price) AS avg_price
FROM melb
GROUP BY type
The output of this question is:
Multiple groupings
Let’s say you should see the common price for every region within the northern area, which might be achieved through the use of the GROUP BY
function as follows:
SELECT
regionname,
AVG(price) AS avg_price
FROM melb
WHERE regionname LIKE 'Northern%'
GROUP BY regionname
The output:
Consider a case where you should see the common price of various house types in these two regions in the identical table. You’ll be able to achieve this by writing two groupings and mixing the outcomes with UNION ALL.
SELECT
regionname,
'all' AS type,
AVG(price) AS average_price
FROM melb
WHERE regionname LIKE 'Eastern%'
GROUP BY regionname
UNION ALL
SELECT
regionname,
type,
AVG(price) AS average_price
FROM melb
WHERE regionname LIKE 'Eastern%'
GROUP BY regionname, type
ORDER BY regionname, type
What the query does is to calculate the common price for every region first. Then, in a separate query, it groups the rows by each region name and kind and calculates the common price for every group. The union combines the output of those two queries.
Because the first query doesn’t have the kind column, we create it manually with a worth of “all”. Finally, the combined results are ordered by the region name and the kind.
The output of this question:
The primary row for every region shows the region average and the next rows show the common price for various house types.
We had to put in writing two separate queries because we cannot have different queries in a GROUP BY
statement unless we use GROUPING SETS.
GROUPING SETS
Let’s rewrite the previous query using GROUPING SETS.
SELECT
regionname,
type,
AVG(price) as average_price
FROM melb
WHERE regionname LIKE 'Eastern%'
GROUP BY
GROUPING SETS (
(regionname),
(regionname, type)
)
ORDER BY regionname, type
The output:
The output is identical apart from the null values in the kind column which may easily get replaced with “all”.
Using the GROUPING SETS
has two primary benefits:
- It’s shorter and more intuitive which makes the code easier to debug and manage
- It’s more efficient and performant than writing separate queries and mixing the outcomes because SQL scans the tables for every query.
Final thoughts
We regularly disregard query readability and efficiency. We’re joyful if the query returns the specified data.
Efficiency is something we at all times must remember. The impact of writing bad queries could also be tolerated when querying a small database. Nonetheless, when the information size becomes large, bad queries may result in serious performance issues. With the intention to make ETL processes scalable and easy-to-manage, we’d like to adapt best practices. The GROUPING SETS
is considered one of these best practices.
You’ll be able to turn into a Medium member to unlock full access to my writing, plus the remaining of Medium. When you already are, don’t forget to subscribe in case you’d wish to get an email each time I publish a brand new article.
Thanks for reading. Please let me know if you’ve got any feedback.