Home Artificial Intelligence Why Do You Must Use SQL Grouping Sets for Aggregating Data?

Why Do You Must Use SQL Grouping Sets for Aggregating Data?

0
Why Do You Must Use SQL Grouping Sets for Aggregating Data?

Photo by Helena Lopes on Unsplash

Even though it’s called a question language, SQL is able to not only querying databases but additionally performing efficient data evaluation and manipulation. It shouldn’t be a surprise that SQL is embraced by the information science community.

In this text, we’ll study a really handy SQL feature, which allows for writing cleaner and more efficient queries. This I-wish-I-knew-this-earlier feature is the GROUPING SETS, which might be regarded as an extension of the GROUP BY function.

We’ll learn the difference between them in addition to the advantage of using GROUPING SETS over the GROUP BY function but first, we’d like a dataset to work on.

I created a SQL table from the Melbourne housing dataset available on Kaggle with a public domain license. The primary 5 rows of the table looks as follows:

(image by creator)

The GROUP BY function

We will use the function to calculate aggregate values per group or distinct values in a column or multiple columns. For example, the next query returns the common price for every listing type.

SELECT 
type,
AVG(price) AS avg_price
FROM melb
GROUP BY type

The output of this question is:

(image by creator)

Multiple groupings

Let’s say you should see the common price for every region within the northern area, which might be achieved through the use of the GROUP BY function as follows:

SELECT 
regionname,
AVG(price) AS avg_price
FROM melb
WHERE regionname LIKE 'Northern%'
GROUP BY regionname

The output:

(image by creator)

Consider a case where you should see the common price of various house types in these two regions in the identical table. You’ll be able to achieve this by writing two groupings and mixing the outcomes with UNION ALL.

SELECT 
regionname,
'all' AS type,
AVG(price) AS average_price
FROM melb
WHERE regionname LIKE 'Eastern%'
GROUP BY regionname
UNION ALL
SELECT
regionname,
type,
AVG(price) AS average_price
FROM melb
WHERE regionname LIKE 'Eastern%'
GROUP BY regionname, type
ORDER BY regionname, type

What the query does is to calculate the common price for every region first. Then, in a separate query, it groups the rows by each region name and kind and calculates the common price for every group. The union combines the output of those two queries.

Because the first query doesn’t have the kind column, we create it manually with a worth of “all”. Finally, the combined results are ordered by the region name and the kind.

The output of this question:

(image by creator)

The primary row for every region shows the region average and the next rows show the common price for various house types.

We had to put in writing two separate queries because we cannot have different queries in a GROUP BY statement unless we use GROUPING SETS.

GROUPING SETS

Let’s rewrite the previous query using GROUPING SETS.

SELECT 
regionname,
type,
AVG(price) as average_price
FROM melb
WHERE regionname LIKE 'Eastern%'
GROUP BY
GROUPING SETS (
(regionname),
(regionname, type)
)
ORDER BY regionname, type

The output:

(image by creator)

The output is identical apart from the null values in the kind column which may easily get replaced with “all”.

Using the GROUPING SETS has two primary benefits:

  • It’s shorter and more intuitive which makes the code easier to debug and manage
  • It’s more efficient and performant than writing separate queries and mixing the outcomes because SQL scans the tables for every query.

Final thoughts

We regularly disregard query readability and efficiency. We’re joyful if the query returns the specified data.

Efficiency is something we at all times must remember. The impact of writing bad queries could also be tolerated when querying a small database. Nonetheless, when the information size becomes large, bad queries may result in serious performance issues. With the intention to make ETL processes scalable and easy-to-manage, we’d like to adapt best practices. The GROUPING SETS is considered one of these best practices.

You’ll be able to turn into a Medium member to unlock full access to my writing, plus the remaining of Medium. When you already are, don’t forget to subscribe in case you’d wish to get an email each time I publish a brand new article.

Thanks for reading. Please let me know if you’ve got any feedback.

LEAVE A REPLY

Please enter your comment!
Please enter your name here