Home Artificial Intelligence Cointegration vs Spurious Correlation: Understand the Difference for Accurate Evaluation Background Spurious Correlation

Cointegration vs Spurious Correlation: Understand the Difference for Accurate Evaluation Background Spurious Correlation

0
Cointegration vs Spurious Correlation: Understand the Difference for Accurate Evaluation
Background
Spurious Correlation

Why correlation doesn’t equal causation for time series

Towards Data Science
Photo by Wance Paleri on Unsplash

In time series evaluation, it’s precious to grasp if one series influences one other. For instance, it is beneficial for commodity traders to know if a rise in commodity A results in a rise in commodity B. Originally, this relationship was measured using linear regression, nonetheless, within the Nineteen Eighties Clive Granger and Paul Newbold showed this approach yields incorrect results, particularly for non-stationary time series. Consequently, they conceived the concept of cointegration, which won Granger a Nobel prize. On this post, I need to debate the necessity and application of cointegration and why it’s a very important concept Data Scientists should understand.

Overview

Before we discuss cointegration, let’s discuss the necessity for it. Historically, statisticians and economists used linear regression to find out the connection between different time series. Nevertheless, Granger and Newbold showed that this approach is inaccurate and results in something called spurious correlation.

A spurious correlation is where two time series may look correlated but truly they lack a causal relationship. It’s the classic ‘correlation doesn’t mean causation’ statement. It’s dangerous as even statistical tests could say that there’s a casual relationship.

Example

An example of a spurious relationship is shown within the plots below:

Plot generated by writer in Python.

Here we now have two time series A(t) and B(t) plotted as a function of time (left) and plotted against one another (right). Notice from the plot on the appropriate, that there may be some correlation between the series as shown by the regression line. Nevertheless, by taking a look at the left plot, we see this correlation is spurious because B(t) consistently increases while A(t) fluctuates erratically. Moreover, the typical distance between the 2 time series can be increasing…

LEAVE A REPLY

Please enter your comment!
Please enter your name here