When ReLU’s extrapolation capabilities will not be enough
Neural networks are known to be great approximators for any function — not less than each time we don’t move too distant from our dataset. Allow us to see what which means. Here is a few data:
It doesn’t only appear like a sine wave, it actually is, with some noise added. We are able to now train a traditional feed-forward neural network having 1 hidden layer with 1000 neurons and ReLU activation. We get the next fit:
It looks quite decent, aside from the perimeters. We could fix this by adding more neurons to the hidden layer in line with Cybenko’s universal approximation theorem. But I would like to point you something else:
We could argue now that this extrapolation behavior is bad if we assume the wave pattern to proceed outside of the observed range. But when there is no such thing as a domain knowledge or more data we are able to resort to, it might just be this: an assumption.
Nevertheless, in the rest of this text, we will assume that any periodic pattern we are able to pick up throughout the data continues outside as well. This can be a common assumption when doing time series modeling, where we naturally wish to extrapolate into the longer term. We assume that any observed seasonality within the training data will just proceed like that, because what else can we are saying with none additional information? In this text, I would like to indicate you the way using sine-based activation functions helps bake this assumption into the model.
But before we go there, allow us to shortly dive deeper into how ReLU-based neural networks extrapolate normally, and why we should always not use them for time series forecasting as is.