
Why (and the way) you must create a baseline model before you train your final model

So that you’ve collected your data. You’ve outlined the business case, selected a candidate model (e.g. Random Forest), arrange your development environment, and your hands are on the keyboard. You’re able to construct and train your time series model.
Delay — don’t start just yet. Before you train and test your Random Forest model, you must first train a baseline model.
A baseline model is an easy model used to create a benchmark, or a degree of reference, upon which you will likely be constructing your final, more complex machine learning model.
Data scientists create baseline models because:
- Baseline models can offer you an excellent idea of how a more complex model will perform.
- If a baseline model does badly, it may very well be an indication of a problem with the information quality that needs addressing.
- If a baseline model performs higher than the ultimate model, it could indicate issues with that algorithm, features, hyperparameters or other data preprocessing.
- If the baseline and complicated model perform roughly the identical, this might indicate that the complex model needs more wonderful tuning (in features, architecture, or hyperparameters). It could also show that a more complex model isn’t vital, and an easier model will suffice.
Typically, a baseline model is a statistical model, equivalent to a moving average model. Alternatively, it’s an easier version of the goal model — for instance, for those who will likely be training a Random Forest model, you may first train a Decision Tree model as a baseline.
For time series data, there’s a few popular options for baseline models that I’d prefer to share with you. Each of those work well because they assume temporal order of the information and make forecasts based on the information’s patterns.
Naive forecast
The naive forecast is the only — it assumes that the subsequent value will likely be the identical because the…