
Scenario
Say, We now have a table named “ARTICLES” that captures every day view counts for various articles from 1st June 2023 to tenth June 2023. Each entry within the table represents a date, article title, and the variety of views recorded on that day. You could find the source data and code file in my GitHub Repository,
Here is the sample data,
--sample data from table ARTICLES
SELECT
*
FROM
ARTICLES
LIMIT 10;
Essentially the most convenient and simple approach to compute each moving average and running total in SQL is by utilising Window Functions. To revisit the concepts of Window Functions and Aggregate Functions, you’ll be able to read the detailed explanation available here:
Calculating Running Total using Window Function
Continuing to our demo, imagine it’s good to discover a cumulative sum of the entire views on article “Aggregate Functions in SQL” by the top of every day,
--total number for views for "Aggregate Functions in SQL" by end of every day
SELECT
`DATE`,
ARTICLE_TITLE,
NO_OF_VIEWS,
SUM(NO_OF_VIEWS) OVER (ORDER BY `DATE`
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RUNNING_TOTAL
FROM
ARTICLES
WHERE
ARTICLE_TITLE = "Aggregate Functions in SQL";
Here, we’ve got used OVER() clause within the above query, it is important because it identifies a function as a Window Function and its purpose is to define a particular group of rows (a window) on which the Window Function will perform its calculations. But wait, that’s not all.
Within the code above we’ve got also used a FRAME clause as,
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
What’s it? Essentially, the Window Function relies on either ROW or RANGE to find out which values needs to be considered for the calculation inside the partition, by specifying the starting and ending points of the chosen subset.
So here, the FRAME clause specifies the scale of the frame – the worth of the present row and the values of all of the rows above the present row – on which the SUM(NO_OF_VIEWS) must be performed. It keeps adding the worth of “NO_OF_VIEWS” because it goes along, giving us a running total for every row ordered by DATE.
In the instance above, if we omit the RANGE clause, the result will remain unchanged. Are you able to guess why? At any time when we use ORDER BY clause in a Window Function, the default frame is, ‘RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW’. Nevertheless, personally, I find it helpful to incorporate the RANGE or ROW clause because it enhances clarity and understanding. Inevitably, there can be some extent in the long run when another person inherits your code. It’s all the time advisable to put in writing your code in a fashion that makes it easier for others to know and work with it.
You may read more about FRAME clause here,
Now let’s do the identical evaluation for all of the articles – discover a cumulative sum of the entire views of every article by the top of every day,
--running total
SELECT
`DATE`,
ARTICLE_TITLE,
NO_OF_VIEWS,
SUM(NO_OF_VIEWS) OVER (PARTITION BY ARTICLE_TITLE
ORDER BY `DATE`
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RUNNING_TOTAL
FROM
ARTICLES;
Here, we divided the info into partitions based on the ARTICLE_TITLE. Then, we performed a SUM(NO_OF_VIEWS) calculation for every partition as defined by the FRAME clause.
For a greater understanding, please check with the image provided below. It illustrates the calculations performed for a single partition. The identical logic and calculations are applicable to all other partitions as well.