
Gain a deeper understanding of Gaussian processes by implementing them with only NumPy.

Gaussian Processes (GPs) are an incredible class of models. There are only a few Machine Learning algorithms that offer you an accurate measure of uncertainty without spending a dime while still being super flexible. The issue is, GPs are conceptually really obscure. Most explanations use some complex algebra and probability, which is commonly not useful to get an intuition for the way these models work.
There are also many great guides that skip the maths and offer you the intuition for the way these models work, but in the case of using GPs yourself, in the suitable context, my personal belief is that surface knowledge won’t cut it. For this reason I desired to walk through a bare-bones implementation, from scratch, so that you simply get a clearer picture of what’s happening under the hood of all of the libraries that implement these models for you.
I also link my GitHub repo, where you’ll find the implementation of GPs using only NumPy. I’ve tried to abstract from the maths as much as possible, but obviously there continues to be some which can be required…
Step one is at all times to have a take a look at the info. We’re going to use the monthly CO2 atmospheric concentration over time, measured on the Mauna Loa observatory, a standard dataset for GPs [1]. That is intentionally the identical dataset that sklearn use of their GP tutorial, which teaches methods to use their API and never what is occurring under the hood of the model.
It is a quite simple dataset, which is able to make it easier to clarify the maths that may follow. The notable features are the linear upwards trend in addition to the seasonal trend, with a period of 1 yr.
What we are going to do is separate the seasonal component and linear components of the info. To do that, we fit a linear model to the info.