Home Artificial Intelligence Local vs Global Forecasting: What You Have to Know Example: Australian tourism

Local vs Global Forecasting: What You Have to Know Example: Australian tourism

0
Local vs Global Forecasting: What You Have to Know
Example: Australian tourism

A comparison of Local and Global approaches to time series forecasting, with a Python demonstration using LightGBM and the Australian Tourism dataset.

Towards Data Science
Image by Silke from Pixabay

To leap to the Python example, click here!

What’s Local forecasting?

Local forecasting is the standard approach where we train one predictive model for every time series independently. The classical statistical models (like exponential smoothing, ARIMA, TBATS, etc.) typically use this approach, but it may well even be utilized by standard machine learning models via a feature engineering step.

Local forecasting has benefits:

  • It’s intuitive to know and implement.
  • Each model could be tweaked individually.

Nevertheless it also has some limitations:

  • It suffers from the “cold-start” problem: it requires a comparatively great amount of historical data for every time series to estimate the model parameters reliably. It also makes it unattainable to predict latest targets, just like the demand for a brand new product.
  • It will possibly’t capture the commonalities and dependencies amongst related time series, like cross-sectional or hierarchical relationships.
  • It’s hard to scale to large datasets with many time series, because it requires fitting and maintaining a separate model for every goal.

What’s Global forecasting?

Image by PIRO from Pixabay

Global forecasting is a more modern approach, where multiple time series are used to coach a single “global” predictive model. By doing so, it has a bigger training set and it may well leverage shared structures across the targets to learn complex relations, ultimately leading to higher predictions.

Constructing a world forecasting model typically involves a feature engineering step to construct features like:

  • Lagged values of the goal
  • Statistics of the goal over time-windows (e.g. “mean previously week”, “minimum previously month”, etc.)
  • Categorical features to differentiate groups of time series
  • Exogenous features to model external/interaction/seasonal aspects

Global forecasting has considerable benefits:

  • It leverages the knowledge from other time series to enhance accuracy and robustness.
  • It will possibly do predictions for time series with little to no data.
  • It scales to datasets with many time series since it requires fitting and maintaining just one single model.
  • By utilizing feature engineering, it may well handle problems similar to multiple data frequencies and missing data that are tougher to unravel with classical statistical models.

But global forecasting also has some limitations:

  • It requires an additional effort to make use of more complex models and perform feature engineering.
  • It’d need full re-training when latest time-series appear.
  • If performance for one specific time-series starts to degrade, it’s hard to update it without impacting the predictions on the opposite targets.
  • It might require more computational resources and complex methods to estimate and optimize the model parameters.

The way to choose from Local and Global forecasting?

There is no such thing as a definitive answer as to if local or global forecasting is best for a given problem.

Basically, local forecasting could also be more suitable for problems with:

  • Few time series with long histories
  • High variability and specificity among the many time series
  • Limited forecasting and programming expertise

Then again, global forecasting could also be more suitable for problems with:

  • Many time series with short histories
  • Low variability and high similarity among the many targets
  • Noisy data
Image by Penny from Pixabay

On this section we showcase the differences between the 2 approaches with a practical example in Python using LightGBM and the Australian Tourism dataset, which is accessible on Darts under the Apache 2.0 License.

Let’s start by importing the essential libraries.

import pandas as pd
import plotly.graph_objects as go
from lightgbm import LGBMRegressor
from sklearn.preprocessing import MinMaxScaler

Data Preparation

The Australian Tourism dataset is manufactured from quarter time-series starting in 1998. On this notebook we consider the tourism numbers at a region level.

# Load data.
data = pd.read_csv('https://raw.githubusercontent.com/unit8co/darts/master/datasets/australian_tourism.csv')
# Add time information: quarterly data starting in 1998.
data.index = pd.date_range("1998-01-01", periods = len(data), freq = "3MS")
data.index.name = "time"
# Consider only region-level data.
data = data[['NSW','VIC', 'QLD', 'SA', 'WA', 'TAS', 'NT']]
# Let's give it nicer names.
data = data.rename(columns = {
'NSW': "Latest South Wales",
'VIC': "Victoria",
'QLD': "Queensland",
'SA': "South Australia",
'WA': "Western Australia",
'TAS': "Tasmania",
'NT': "Northern Territory",
})

Let’s have a fast take a look at the info:

# Let's visualize the info.
def show_data(data,title=""):
trace = [go.Scatter(x=data.index,y=data[c],name=c) for c in data.columns]
go.Figure(trace,layout=dict(title=title)).show()

show_data(data,"Australian Tourism data by Region")

Which produces the next plot:

Image by creator

We will see that:

  • Data exhibits a robust yearly seasonality.
  • The dimensions of the time-series is sort of different across different regions.
  • The length of the time-series is at all times the identical.
  • There’s no missing data.

Data engineering

Let’s predict the worth of the subsequent quarter based on:

  • The lagged values of the previous 2 years
  • The present quarter (as a categorical feature)
def build_targets_features(data,lags=range(8),horizon=1):
features = {}
targets = {}
for c in data.columns:

# Construct lagged features.
feat = pd.concat([data[[c]].shift(lag).rename(columns = {c: f"lag_{lag}"}) for lag in lags],axis=1)
# Construct quarter feature.
feat["quarter"] = [f"Q{int((m-1) / 3 + 1)}" for m in data.index.month]
feat["quarter"] = feat["quarter"].astype("category")
# Construct goal at horizon.
targ = data[c].shift(-horizon).rename(f"horizon_{horizon}")

# Drop missing values generated by lags/horizon.
idx = ~(feat.isnull().any(axis=1) | targ.isnull())
features[c] = feat.loc[idx]
targets[c] = targ.loc[idx]

return targets,features

# Construct targets and features.
targets,features = build_targets_features(data)

Train/Test split

For simplicity, in this instance we backtest our model with a single train/test split (you may check this text for more details about backtesting). Let’s consider the last 2 years as test set, and the period before as validation set.

def train_test_split(targets,features,test_size=8):
targ_train = {k: v.iloc[:-test_size] for k,v in targets.items()}
feat_train = {k: v.iloc[:-test_size] for k,v in features.items()}
targ_test = {k: v.iloc[-test_size:] for k,v in targets.items()}
feat_test = {k: v.iloc[-test_size:] for k,v in features.items()}
return targ_train,feat_train,targ_test,feat_test

targ_train,feat_train,targ_test,feat_test = train_test_split(targets,features)

Model training

Now we estimate the forecasting models using the 2 different approaches. In each cases we use a LightGBM model with default parameters.

Local approach

As said before, with the local approach we estimate multiple models: one for every goal.

# Instantiate one LightGBM model with default parameters for every goal.
local_models = {k: LGBMRegressor() for k in data.columns}
# Fit the models on the training set.
for k in data.columns:
local_models[k].fit(feat_train[k],targ_train[k])

Global Approach

Then again, with the Global Approach we estimate one model for all of the targets. To do that we’d like to perform two extra steps:

  1. First, because the targets have different scales, we perform a normalization step.
  2. Then to permit the model to differentiate across different targets, we add a categorical feature for every goal.

These steps are described in the subsequent two sections.

Step 1: Normalization
We scale all the info (targets and features) between 0 and 1 by goal. This is very important since it makes the info comparable, which in turn it makes the model training easier. The estimation of the scaling parameters is finished on the validation set.

def fit_scalers(feat_train,targ_train):
feat_scalers = {k: MinMaxScaler().set_output(transform="pandas") for k in feat_train}
targ_scalers = {k: MinMaxScaler().set_output(transform="pandas") for k in feat_train}
for k in feat_train:
feat_scalers[k].fit(feat_train[k].drop(columns="quarter"))
targ_scalers[k].fit(targ_train[k].to_frame())
return feat_scalers,targ_scalers

def scale_features(feat,feat_scalers):
scaled_feat = {}
for k in feat:
df = feat[k].copy()
cols = [c for c in df.columns if c not in {"quarter"}]
df[cols] = feat_scalers[k].transform(df[cols])
scaled_feat[k] = df
return scaled_feat

def scale_targets(targ,targ_scalers):
return {k: targ_scalers[k].transform(v.to_frame()) for k,v in targ.items()}

# Fit scalers on numerical features and goal on the training period.
feat_scalers,targ_scalers = fit_scalers(feat_train,targ_train)
# Scale train data.
scaled_feat_train = scale_features(feat_train,feat_scalers)
scaled_targ_train = scale_targets(targ_train,targ_scalers)
# Scale test data.
scaled_feat_test = scale_features(feat_test,feat_scalers)
scaled_targ_test = scale_targets(targ_test,targ_scalers)

Step 2: Add “goal name” as a categorical feature
To permit the model to differentiate across different targets, we add the goal name as a categorical feature. This is just not a compulsory step and in some cases it could lead on to overfit, especially when the variety of time-series is high. Another might be to encode other features that are target-specific but more generic, like “ region_are_in_squared_km”, “is_the_region_on_the_coast “, etc.

# Add a `target_name` feature.
def add_target_name_feature(feat):
for k,df in feat.items():
df["target_name"] = k

add_target_name_feature(scaled_feat_train)
add_target_name_feature(scaled_feat_test)

For simplicity we make target_name categorical after concatenating the info together. The rationale why we specify the “category” type is since it’s robotically detected by LightGBM.

# Concatenate the info.
global_feat_train = pd.concat(scaled_feat_train.values())
global_targ_train = pd.concat(scaled_targ_train.values())
global_feat_test = pd.concat(scaled_feat_test.values())
global_targ_test = pd.concat(scaled_targ_test.values())
# Make `target_name` categorical after concatenation.
global_feat_train.target_name = global_feat_train.target_name.astype("category")
global_feat_test.target_name = global_feat_test.target_name.astype("category")

Predictions on the test set

To research the performance of the 2 approaches, we make predictions on the test set.

First with the local approach:

# Make predictions with the local models.
pred_local = {
k: model.predict(feat_test[k]) for k, model in local_models.items()
}

Then with the worldwide approach (note that we apply the inverse normalization):

def predict_global_model(global_model, global_feat_test, targ_scalers):
# Predict.
pred_global_scaled = global_model.predict(global_feat_test)
# Re-arrange the predictions
pred_df_global = global_feat_test[["target_name"]].copy()
pred_df_global["predictions"] = pred_global_scaled
pred_df_global = pred_df_global.pivot(
columns="target_name", values="predictions"
)
# Un-scale the predictions
return {
k: targ_scalers[k]
.inverse_transform(
pred_df_global[[k]].rename(
columns={k: global_targ_train.columns[0]}
)
)
.reshape(-1)
for k in pred_df_global.columns
}

# Make predicitons with the worldwide model.
pred_global = predict_global_model(global_model, global_feat_test, targ_scalers)

Error evaluation

To guage the performances of the 2 approaches, we perform an error evaluation.

First, let’s compute the Mean Absolute Error (MAE) overall and by region:

# Save predictions from each approaches in a convenient format.
output = {}
for k in targ_test:
df = targ_test[k].rename("goal").to_frame()
df["prediction_local"] = pred_local[k]
df["prediction_global"] = pred_global[k]
output[k] = df

def print_stats(output):
output_all = pd.concat(output.values())
mae_local = (output_all.goal - output_all.prediction_local).abs().mean()
mae_global = (output_all.goal - output_all.prediction_global).abs().mean()
print(" LOCAL GLOBAL")
print(f"MAE overall : {mae_local:.1f} {mae_global:.1f}n")
for k,df in output.items():
mae_local = (df.goal - df.prediction_local).abs().mean()
mae_global = (df.goal - df.prediction_global).abs().mean()
print(f"MAE - {k:19}: {mae_local:.1f} {mae_global:.1f}")

# Let's show some statistics.
print_stats(output)

which supplies:

Mean Absolute Error on the Test Set — Image by creator

We will see that the worldwide approach results in a lower error overall, in addition to for each region apart from Western Australia.

Let’s have a take a look at some predictions:

# Display the predictions.
for k,df in output.items():
show_data(df,k)

Listed below are a few of the outputs:

Image by creator
Image by creator
Image by creator

We will see that the local models predict a relentless, while the worldwide model captured the seasonal behaviour of the targets.

Conclusion

In this instance we showcased the local and global approaches to time-series forecasting, using:

  • Quarterly Australian tourism data
  • Easy feature engineering
  • LightGBM models with default hyper-parameters

We saw that the worldwide approach produced higher predictions, resulting in a 43% lower mean absolute error than the local one. Particularly, the worldwide approach had a lower MAE on all of the targets apart from Western Australia.

The prevalence of the worldwide approach on this setting was in some way expected, since:

  • We’re predicting multiple correlated time-series.
  • The depth of the historical data may be very shallow.
  • We’re using a in some way complex model for shallow univariate time-series. A classical statistical model could be more appropriate on this setting.

The code utilized in this text is accessible here.

LEAVE A REPLY

Please enter your comment!
Please enter your name here