Discover the origins, theory and uses behind the famous t-distribution
The t-distribution, is a continuous probability distribution that may be very just like the normal distribution, nonetheless has the next key differences:
- Heavier tails: More of its probability mass is positioned on the extremes (higher kurtosis). Which means it’s more more likely to produce values removed from its mean.
- One parameter: The t-distribution has just one parameter, the degrees of freedom, because it’s used after we are unaware of the population’s variance.
An interesting fact concerning the t-distribution is that it is usually known as the “Student’s t-distribution.” It’s because the inventor of the distribution, William Sealy Gosset, an English statistician, published it using his pseudonym “Student” to maintain his identity anonymous, thus resulting in the name “Student’s t-distribution.”
Let’s go over some theory behind the distribution to construct some mathematical intuition.
Origin
The origin behind the t-distribution comes from the thought of modelling normally distributed data without knowing the population’s variance of that data.
For instance, say we sample n data points from a standard distribution, the next shall be the mean and variance of this sample respectively:
Where:
- x̄ is the sample mean.
- s is the sample standard deviation.
Combining the above two equations, we will construct the next random variable:
Here μ is the population mean and t is the t-statistic belongs to the t-distribution!
See here for a more thorough derivation.
Probability Density Function
As declared above, the t-distribution is parameterised by just one value, the degrees of freedom, ν, and its probability density function looks like this: