Home Artificial Intelligence Robust Statistics for Data Scientists Part 1: Resilient Measures of Central Tendency and Dispersions Classical vs. Robust Statistics: A Crucial Shift

Robust Statistics for Data Scientists Part 1: Resilient Measures of Central Tendency and Dispersions Classical vs. Robust Statistics: A Crucial Shift

0
Robust Statistics for Data Scientists Part 1: Resilient Measures of Central Tendency and Dispersions
Classical vs. Robust Statistics: A Crucial Shift

Constructing a foundation: understanding and applying robust measures in data evaluation

Towards Data Science
Image generate with DALL-E

The role of statistics in Data Science is central, bridging raw data to actionable insights. Nonetheless, not all statistical methods are created equal, especially when faced with the cruel realities of (messy) real-world data. This brings us to the aim of strong statistics, a subfield designed to resist the anomalies of knowledge that usually throw traditional statistical methods off target.

While classical statistics have served us well, their susceptibility to outliers and extreme values can result in misleading conclusions. Enter robust statistics, which goals to offer more reliable results under a greater variety of conditions. This approach will not be about discarding outliers without consideration but about developing methods which might be less sensitive to them.

Robust statistics is grounded within the principle of resilience. It’s about constructing statistical methods that remain unaffected, or minimally affected, by small deviations from assumptions that traditional methods hold dear. This resilience is crucial in real-world data evaluation, where perfectly distributed datasets are the exception, not the norm.

Key concepts in robust statistics are outliers, leverage points, and breakdown points.

Outliers and Legerave Points

Outliers are data points that significantly deviate from the opposite observations within the dataset. Leverage points, particularly within the context of regression evaluation, are outliers within the independent variable space that may excessively influence the fit of the model. In each cases, their presence can distort the outcomes of classical statistical analyses.

As an illustration, let’s consider a dataset where we measure the effect of hours on exam scores. An outlier is likely to be a student who studied little or no but scored exceptionally high, while a leverage point might be a student who studied an unusually high variety of hours in comparison with peers.

LEAVE A REPLY

Please enter your comment!
Please enter your name here