I first checked out the person time series for 4 variables: Sleep, Studying, Socializing and Mood. I used Microsoft Excel to quickly draw some plots. They represent the every day variety of hours spent (blue) and the moving average¹ for five days MA(5) (red) which I considered to be a superb measure for my situation. The mood variable was rated from 10 (the best!) to 0 (awful!).

Regarding the information contained within the footnote of every plot: the *total *is the sum of the values of the series, the *mean *is the arithmetic mean of the series, the *STD *is the usual deviation and the *relative deviation *is the STD divided by the mean*.*

All things accounted for, I did well enough with sleep. I had rough days, like everyone else, but I believe the trend is pretty stable. In reality, it’s considered one of the least-varying of my study.

These are the hours I dedicated to my academic profession. It fluctuates lots — finding balance between work and studying often means having to cram projects on the weekends — but still, I consider myself satisfied with it.

Regarding this table, all I can say is that I’m surprised. The grand total is bigger than I expected, provided that I’m an introvert. In fact, hours with my colleagues at school also count. When it comes to variability, the STD is de facto high, which is smart given the issue of getting a stablished routine regarding socializing.

This the least variable series — the relative deviation is the bottom amongst my studied variables. *A priori*, I’m satisfied with the observed trend. I believe it’s positive to maintain a reasonably stable mood — and even higher if it’s a superb one.

After taking a look at the trends for the principal variables, I made a decision to dive deeper and study the potential correlations² between them. Since my goal was with the ability to mathematically model and predict (or a minimum of *explain*) “Mood”, correlations were a crucial metric to think about. From them, I could extract relationships like the next: “the times that I study essentially the most are those that I sleep the least”, “I often study languages and music together”, etc.

Before we do the rest, let’s open up a python file and import some key libraries from series evaluation. I normally use aliases for them, because it is a standard practice and makes things less verbose within the actual code.

`import pandas as pd #1.4.4`

import numpy as np #1.22.4

import seaborn as sns #0.12.0

import matplotlib.pyplot as plt #3.5.2

from pmdarima import arima #2.0.4

We are going to make two different studies regarding correlation. We are going to look into the Person Correlation Coefficient³ (for linear relationships between variables) and the Spearman Correlation Coefficient⁴ (which studies monotonic relationships between variables). We shall be using their implementation⁵ in pandas.

## Pearson Correlation matrix

The Pearson Correlation Coefficient between two variables *X** *and *Y** *is computed as follows:

We will quickly calculate a correlation matrix, where every possible pairwise correlation is computed.

`#read, select and normalize the information`

raw = pd.read_csv("final_stats.csv", sep=";")

numerics = raw.select_dtypes('number')#compute the correlation matrix

corr = numerics.corr(method='pearson')

#generate the heatmap

sns.heatmap(corr, annot=True)

#draw the plot

plt.show()

That is the raw Pearson Correlation matrix obtained from my data.

And these are the numerous values⁶ — those which might be, with a 95% confidence, different from zero. We perform a t-test⁷ with the next formula. For every correlation value ** rho**,

**we discard it if:**

where ** n** is the sample size. We will recycle the code from before and add on this filter.

`#constants`

N=332 #variety of samples

STEST = 2/np.sqrt(N)def significance_pearson(val):

if np.abs(val)return True

return False

#read data

raw = pd.read_csv("final_stats.csv", sep=";")

numerics = raw.select_dtypes('number')

#calculate correlation

corr = numerics.corr(method='pearson')

#prepare masks

mask = corr.copy().applymap(significance_pearson)

mask2 = np.triu(np.ones_like(corr, dtype=bool)) #remove upper triangle

mask_comb = np.logical_or(mask, mask2)

c = sns.heatmap(corr, annot=True, mask=mask_comb)

c.set_xticklabels(c.get_xticklabels(), rotation=-45)

plt.show()

Those which were discarded could just be noise, and wrongfully represent trends or relationships. In any case, it’s higher to assume a real relationship is meaningless than consider meaningful one which isn’t (what we consult with as error type II being favored over error type I). This is particularly true in a study with relatively subjective measurments.

## Spearman’s rank correlation coefficient

The spearman correlation coefficient may be calculated as follows:

As we did before, we are able to quickly compute the correlation matrix:

`#read, select and normalize the information`

raw = pd.read_csv("final_stats.csv", sep=";")

numerics = raw.select_dtypes('number')#compute the correlation matrix

corr = numerics.corr(method='spearman') #concentrate to this variation!

#generate the heatmap

sns.heatmap(corr, annot=True)

#draw the plot

plt.show()

That is the raw Spearman’s Rank Correlation matrix obtained from my data:

Let’s see what values are literally significant. The formula to envision for significance is the next:

Here, we’ll filter out all t-values higher (in absolute value) than 1.96. Again, the explanation they’ve been discarded is that we will not be sure whether or not they are noise — random probability — or an actual trend. Let’s code it up:

`#constants`

N=332 #variety of samples

TTEST = 1.96def significance_spearman(val):

if val==1:

return True

t = val * np.sqrt((N-2)/(1-val*val))

if np.abs(t)<1.96:

return True

return False

#read data

raw = pd.read_csv("final_stats.csv", sep=";")

numerics = raw.select_dtypes('number')

#calculate correlation

corr = numerics.corr(method='spearman')

#prepare masks

mask = corr.copy().applymap(significance_spearman)

mask2 = np.triu(np.ones_like(corr, dtype=bool)) #remove upper triangle

mask_comb = np.logical_or(mask, mask2)

#plot the outcomes

c = sns.heatmap(corr, annot=True, mask=mask_comb)

c.set_xticklabels(c.get_xticklabels(), rotation=-45)

plt.show()

These are the numerous values.

I consider this chart higher explains the apparent relationships between variables, as its criterion is more “natural” (it considers monotonic⁹, and never only linear, functions and relationships). It’s not as impacted by outliers as the opposite one (a few very bad days related to a certain variable won’t impact the general correlation coefficient).

Still, I’ll leave each charts for the reader to evaluate and extract their very own conclusions.