This tutorial explores how covariates influence A/B testing precision in a randomized experiment. A properly randomized A/B test calculates the lift by comparing the common end result within the treatment and control groups. Nonetheless, the influence of features apart from the treatment on the end result determines the statistical properties of the A/B test. As an example, omitting influential features within the test lift calculation can result in a highly imprecise estimate of the lift, even when it converges to the true value because the sample size increases.

You’ll learn what RMSE, bias, and size of a test are and understand the performance of an A/B test through generating simulated data and running Monte Carlo experiments. This type of work is useful to know how the properties of the Data Generating Process (DGP) influence A/B test performance and can assist you take this understanding to run A/B tests on real-world data. First, we discuss some basic statistical properties of an estimator.

## Root Mean Square Error (RMSE)

RMSE (Root Mean Square Error): RMSE is a steadily used measure of the differences between values predicted by a model or an estimator and observed values. It is the square root of the common squared differences between prediction and actual remark. The formula for RMSE is:

RMSE = sqrt[(1/n) * Σ(actual – prediction)²]

RMSE gives a comparatively high weight to large errors because they’re squared before they’re averaged, which implies the RMSE must be more useful when large errors are undesirable.

## Bias

In statistics, the bias of an estimator is the difference between this estimator’s expected value and the true value of the estimated parameter. An estimator or decision rule with zero bias is named unbiased; otherwise, the estimator is claimed to be biased. In other words, a bias occurs when an algorithm consistently learns the identical incorrect thing by failing to see the accurate underlying relationship.

As an example, when you are attempting to predict house prices based on features of the home, and your predictions are consistently $100,000 below the actual price, your model is biased.