**1.1 The constructing blocks of the model**

To grasp what sARIMA models are, let’s first introduce the constructing blocks of those models.

sARIMA is a composition of various sub-models (i.e. polynomials that we use to represent our time series data) which form the acronym: seasonal (s) autoregressive (AR) integrated (I) moving average (MA):

**AR**: the autoregressive component, governed by the hyperparameter “p”, assumes that the present value at a time “t” will be expressed as a linear combination of the previous “p” values:

**I**: the integrated component is represented by the hyperparameter “d”, which is the degree of the differencing transformation applied to the information.*Differencing*is a way used to remove trend from the information (i.e. make the information stationary with respect to the mean, as we’ll see later), which helps the model fit the information because it isolates the trend component (we use d=1 for linear trend, d=2 for quadratic trend, …). Differencing the information with d=1 means working with the difference between consecutive data points:

**MA**: the moving average component, governed by the hyperparameter “q”, assumes that the present value at a time “t” will be expressed as a relentless term (normally the mean) plus a linear combination of the errors of the previous “q” points:

- If we consider the components to date, we get “ARIMA”, which is the name of a model family to work with time series data with no seasonality. sARIMA models are a generalization to work with seasonal data with the addition of an
**S**-component: the seasonal component, which consists of a brand new set of AR, I, MA components with a seasonal lag. In other words, once identified a seasonality and defined its lag (represented by the hyperparameter “m” — e.g. m=12 implies that yearly, on a monthly dataset, we see the identical behavior), we create a brand new set of AR (P), I (D), MA (Q) components, with respect to the seasonal lag (m) (e.g. if D=1 and m=12, which means we apply a 1-degree differencing to the series, with a lag of 12).

To sum up, the sARIMA model is defined by 7 hyperparameters: 3 for the non-seasonal a part of the model, and 4 for the seasonal part. They’re indicated as:

sARIMA (p,d,q) (P,D,Q)m

Because of the model flexibility, we are able to “switch off” the components that aren’t embodied in our data (i.e. if the information doesn’t have a trend or doesn’t have seasonality, the respective parameters will be set to 0) and still use the identical model framework to suit the information.

Alternatively, amongst sARIMA limitations, we’ve that these models can capture just one seasonality. If a each day dataset has a yearly plus a weekly seasonality, we’ll need to decide on the strongest one.

## 1.2 The right way to select the model hyperparameters: ACF and PACF

To discover the model hyperparameters, we normally take a look at the ** autocorrelation** and

**of the time series; since all of the above components use past data to model present and future points, we should always investigate how past and present data are correlated and define what number of past data points we want, to model the current.**

*partial-autocorrelation*Because of this, autocorrelation and partial-autocorrelation are two widely used functions:

**ACF**(autocorrelation): describes the correlation of the time series, with its lags. All data points are in comparison with their previous lag 1, lag 2, lag 3, … The resulting correlation is plotted on a histogram. This chart (also called “correlogram”) is used to visualise how much information is retained throughout the time series. The ACF helps us in selecting the sARIMA model because:

The ACF helps to discover the MA(q) hyperparameter.

**PACF**(partial autocorrelation): describes the partial correlation of the time series, with its lags. In a different way from the ACF, the PACF shows the correlation between a degree X_t and a lag, which just isn’t explained by common correlations with other lags at a lower order. In other words, the PACF isolates the direct correlation between two terms. The PACF helps us in selecting the sARIMA model because:

The PACF helps to discover the AR(p) hyperparameter.

Before using these tools, nevertheless, we want to say that ACF and PACF can only be used on a “**stationary**” time series.

**1.3 Stationarity**

A (weakly) stationary time series is a time series where:

- The
**mean is constant**over time (i.e. the series fluctuates around a horizontal line without positive or negative trends) - The
**variance is constant**over time (i.e. there isn’t a seasonality or change within the deviation from the mean)

In fact not all time series are natively stationary; nevertheless, we are able to transform them to make them stationary. The **most typical transformations** used to make a time series stationary are:

- The
**natural log**: by applying the log to every data point, we normally manage to make the time series stationary with respect to the*variance*. **Differencing**: by differencing a time series, we normally manage to remove the trend and make the time series stationary with respect to the*mean*.

After transforming the time series, we are able to use two tools to verify that it’s stationary:

- The
**Box-Cox**plot: this can be a plot of the rolling mean (on the x-axis) vs the rolling standard deviation (on the y-axis) (or the mean vs variance of grouped points). Our data is stationary if we don’t observe any particular trends within the chart and we see little variation on each axes. - The
**Augmented Dickey–Fuller**test (ADF): a statistical test during which we attempt to reject the null hypothesis stating that the time series is non-stationary.

Once a time series is stationary, we are able to analyze the ACF and PACF patterns, and find the SARIMA model hyperparameters.

Identifying the sARIMA model that matches our data consist of a series of steps, which we are going to perform on the AirPassenger dataset (available here).

Each step roughly corresponds to a “page” of the Dash web app.

**2.1 Plot your data**

Create a line chart of your raw data: a few of the features described above will be seen by the naked eye, especially stationarity, and seasonality.

Within the above chart, we see a positive linear trend and a recurrent seasonality pattern; considering that we’ve monthly data, we are able to assume the seasonality to be yearly (lag 12). The information just isn’t stationary.

**2.2 Transform the information to make it stationary**

With the intention to find the model hyperparameters, we want to work with a stationary time series. So, if the information just isn’t stationary, we’ll need to rework it:

- Start with the
*log transformation*, to make the information stationary with respect to the variance (the log is defined over positive values. So, if the information presents negative or 0 values, add a relentless to every datapoint). - Apply
*differencing*to make the information stationary with respect to the mean. Often start with differencing of order 1 and lag 1. Then, if data remains to be not stationary, try differencing with respect to the seasonal lag (e.g. 12 if we’ve monthly data). (Using a reverse order won’t make a difference).

With our dataset, we want to perform the next steps to make it fully stationary:

After each step, by the ADF test p-value and Box-Cox plot, we see that:

- The Box-Cox plot gets progressively cleaned from any trend and all points catch up with and closer.
- The p-value progressively drops. We are able to finally reject the null hypothesis of the test.

## 2.3 Discover suitable model hyperparameters with the ACF and PACF

While transforming the information to stationary, we’ve already identified 3 parameters:

- Since we applied differencing, the model will include differencing components. We applied a differencing of 1 and 12: we are able to set d=1 and D=1 with m=12 (seasonality of 12).

For the remaining parameters, we are able to take a look at the ACF and PACF after the transformations.

Basically, we are able to apply the next *rules*:

- We’ve an
**AR(p) process if**: the PACF has a major spike at a certain lag “p” (and no significant spikes after) and the ACF decays or shows a sinusoidal behavior (alternating positive, negative spikes). - We’ve a
**MA(q) process if**: the ACF has a major spike at a certain lag “q” (and no significant spikes after) and the PACF decays or shows a sinusoidal behavior (alternating positive, negative spikes). - Within the case of
**seasonal AR(P) or MA(Q) processes**, we are going to see that the numerous spikes repeat on the seasonal lags.

By our example, we see the next:

- The closest rule to the above behavior, suggests some MA(q) process with “q” between 1 and three; the indisputable fact that we still have a major spike at 12, may additionally suggest an MA(Q) with Q=1 (since m=12).

We use the ACF and PACF to get a spread of hyperparameter values that may form model candidates. We are able to compare these different model candidates against our data, and pick the top-performing one.

In the instance, our model candidates appear to be:

- SARIMA (p,d,q) (P,D,Q)m = (0, 1, 1) (0, 1, 1) 12
- SARIMA (p,d,q) (P,D,Q)m = (0, 1, 3) (0, 1, 1) 12

## 2.4 Perform a model grid search to discover optimal hyperparameters

Grid search will be used to check several model candidates against one another: we fit each model to the information and pick the top-performing one.

To establish a grid search we want to:

- create a listing with all possible combos of model hyperparameters, given a spread of values for every hyperparameter.
- fit each model and measure its performance using a KPI of alternative.
- select the hyperparameters the top-performing models.

In our case, we are going to compare model performances using the **AIC (Akaike information criterion) rating**. This KPI formula consists of a trade-off between the fitting error (accuracy) and model complexity. Basically, when the complexity is just too low, the error is high, because we over-simplify the model fitting task; quite the opposite, when complexity is just too high, the error remains to be high resulting from overfitting. A trade-off between these two will allow us to discover the “top-performing” model.

** Practical note**: with fitting a sARIMA model, we are going to need to make use of the unique dataset with the log transformation (if we’ve applied it),

*but we don’t wish to use the information with differencing transformations*.

We are able to select to order a part of the time series (normally probably the most recent 20% observations) as a test set.

In our example, based on the below hyperparameter ranges, the perfect model is:

SARIMA (p,d,q) (P,D,Q)m = (0, 1, 1) (0, 1, 1) 12

## 2.5 Final model: fit and predictions

We are able to finally predict data for train, test, and any future out-of-sample commentary. The ultimate plot is:

To verify that we captured all correlations, we are able to plot the model residuals ACF and PACF:

On this case, some signal from the strong seasonality component remains to be present, but a lot of the remaining lags have a 0 correlation.

The steps described above should work on any dataset which might be modeled through sARIMA. To recap :

1-Plot & explore your data

2-Apply transformations to make the information stationary (concentrate on the left-end charts and the ADF test)

3-Discover suitable hyperparameters by the ACF and PACF (right-end charts)

4-Perform a grid search to pick optimal hyperparameters

5-Fit and predict using the perfect model

Download the app locally, upload your personal datasets (by replacing the .csv file in the information folder) and check out to suit the perfect model.

Thanks for reading!