Not very much, but we will improve it.

The F-test statistic for joint significance of the slope coefficients of a regression is routinely reported in regression outputs, together with other key statistics resembling R² and t-ratio values.

The query is whether or not it is beneficial or informative as a key statistic. Does it add any value to your regression results? While it’s routinely reported, one may observe that the F-statistic almost all the time rejects H0 in practical applications. What does it tell us concerning the goodness-of-fit of a regression? You’ll often find the worth of R² very low, however the F-test says the model has an explanatory power with statistical significance. Isn’t this a conflicting end result? How can we reconcile this?

On this post, I explain the issues related to the F-test and the way it will probably be modified in order that it will probably function a useful gizmo. I should prefer to thank Venkat Raman for his LinkedIn post that has motivated this text. The R code, data, and a supporting document can be found from here.

The contents are as below:

- What’s the F-test in linear regression?
- Critical values in response to sample size (T) and the variety of explanatory variables (K)
- F-statistics in response to T and K
- Example
- Why is that this phenomenon happening?
- How can the F-test be modified?

## 1. What’s the F-test in linear regression?

Consider a linear regression model

where *Y* is the dependent variable, *X*’s are the independent variables, and *u* is the error term that follows a standard distribution with 0 mean and a hard and fast variance. The null hypotheses of the test is

against H1 that not less than considered one of these β’s ≠ 0. Let P² be the population value of the coefficient of determination while R² is its sample estimator.

· Under H0, the *X* variables haven’t any explanatory power for *Y* and P² = 0.

· Under H1, not less than of considered one of *X*’s have explanatory power for *Y* and P² > 0.

It’s well-known that R² is an increasing function of K. That’s, it increases as more explanatory variables are added to the model.

The F-test statistic is written as

where SSR0 is the residual sum of squares under H0 and SSR1 is identical under H1, while T is the sample size. The F-test statistic may also be written when it comes to R², as given above.

The statistic follows the (central) F-distribution with (K, T-K-1) degrees of freedom, denoted as F(K, T-K-1). The null hypothesis is rejected on the α-level of significance, if F > Fc(α) where Fc(α) is the α-level critical value from F(K, T-K-1).

## 2. Critical values in response to K and T

Allow us to first see how the critical value Fc(α) changes in response to the values of sample size and the variety of explanatory variables.

Figure 1 above shows that the 5% critical value declines as the worth of K or as the worth of T increases. Which means, with a bigger sample size or a bigger variety of explanatory variables, the bar to reject H0 gets lower. Note that this property can also be evident for other α-level critical values.

## 3. F-test statistic in response to T and K

It is evident from its formula given in Equation (2) above that the worth of F- statistic is decided by T, K, and R². More specifically,

- the F-statistic is an increasing function of T, given a hard and fast value of K, so long as the worth of R² doesn’t decrease with T;
- when R² value decreases with T, the F-statistic still increases with T, if the effect of accelerating T outpaces that of decreasing R²/(1-R²);
- the F-statistic is an increasing function of K, given a hard and fast value of T, since the value of R² all the time increases with the worth of K as stated above.

The above observations indicate that it is extremely likely in practice that the F-statistic is an increasing function of T and K. Nevertheless, the F-critical values declines with the increasing values of T and K, as reported in Figure 1. Hence, in modern days where the worth of T and K are large, it’s regularly the case that F > Fc(α), often rejecting the null hypothesis.

## 4. An example

I consider the information set with sunspot numbers (*Y*) and stock returns of various stock markets (*X*1, …, *XK*), day by day from January 1988 to February 2016 (7345 observations). This is meant to be a non-sense regression for a relationship with little economic justification. If the F-test is beneficial and effective, it should almost all the time fail to reject H0, while the worth of R² is predicted to be near 0.

The stock returns are from 24 stock markets (K = 24), including Amsterdam, Athens, Bangkok, Brussels, Buenos Aires, Copenhagen, Dublin, Helsinki, Istanbul, Kuala Lumpur, London, Madrid, Manila, Latest York, Oslo, Paris, Rio de Janeiro, Santiago, Singapore, Stockholm, Sydney, Taipei, Vienna, and Zurich.

I run the regression of Y on (X1, …, XK), by progressively increasing the sample size and the variety of stock markets, i.e., increasing the worth of T and K. That’s, the primary regression starts with (T = 50, K =1), after which (T = 50, K =2), …, (T = 50, K = 24), followed by (T = 198, K =1), …, (T = 198, K = 24), and so forth, and the method continues until the last set of regressions with (T = 7345, K = 1), …, (T = 7345, K = 24).

As we will from Figure 2 above, the worth of F-test statistic normally increases with sample size, for a lot of the values of K. They’re larger than the 5% critical values Fc (that are well below 2 usually), rejecting H0 usually. In contrast, the values of R² approach 0 because the sample size increases, for all K values.

Which means R² is telling us effectively that the regression model is meaningless, however the F-test is doing otherwise by failing to reject H0 usually. Two key statistics show two conflicting outcomes.

## 5. Why is that this phenomenon happening?

This doesn’t mean that the speculation of F-test developed by Ronald Fisher is incorrect. The speculation is correct, nevertheless it works only *when H0 is true exactly and literally*. That’s, when P² = 0 or all slope coefficients are 0, *exactly* with none deviations. Nevertheless, such a situation is not going to occur in the true world where researchers use observational data: the values of R² can get near 0, nevertheless it can’t be zero exactly. Hence, the speculation works only in statistical textbooks or computationally under a controlled Monte Carlo experiment.

We should always also do not forget that the F-test was developed within the 1920’s where the values of T and K were as small as 20 and three, respectively. The values of T and K we encounter in the trendy days were something unimaginable then.

## 6. How can the F-test be modified?

The foremost problems with the F-test are identified above:

the critical value of the test decreases while the test statistic increases, in response to increasing values of T and K.

As mentioned above, this happens since the F-test is for H0: P² = 0, but its sample estimate R² won’t ever get to 0 exactly and literally. In consequence, the F-test statistic increases with sample size normally, even when R² decreases to a practically negligible value.

How will we fix this? In reality, the answer is kind of easy. As an alternative of testing for H0: P² = 0 as in the traditional F-test, we must always test for a one-tailed test of the next form:

H0: P² ≤ P0; H1: P² > P0

This relies on the argument that, for a model to be statistically necessary, its R² value must be not less than P0. Suppose P0 is about at 0.05. Under H0, any R² value lower than 0.05 is practically negligible and the model is thought to be being substantively unimportant. The researcher can select other values of P0, depending on the context of the research.

Under H0: P² ≤ P0, the F-statistic follows a non-central F-distribution F(K,T-K-1; λ) where λ is the non-centrality parameters given by

Obviously, when P0 = 0 as in the traditional F-test, the worth of λ = 0 and F-statistic follows the central F-distribution F(K,T-K-1). Because it clear from the above expression that λ is an increasing function of sample size T for P0 > 0. In consequence, the critical value Fc(α) can also be an increasing function of sample size.

Figure 3 above illustrates the non-central distributions F(K,T-K-1:λ) when K = 5 and P0 = 0.05, under a variety of accelerating values of T from 100 to 2000. The increasing value of λ pushes the distributions away from 0, in addition to their 5% critical values.

Figure 4 above demonstrates the property as a function of T and K when P0 = 0.05. For instance, when T = 1000 and K = 25, Fc(α) = 4.27; and when T = 2000 and K = 25, Fc(α) =6.74, where α = 0.05.

Further details of this test could be present in the working paper (currently under review for publication) whose pdf copy is accessible from here.

Getting back to our example for the sunspot regression, a test for H0: P² ≤ 0.05; H1: P² > 0.05 could be conducted. The outcomes of the chosen cases are summarized as below, where α = 0.05:

Except when T = 50, the F-statistics are greater than the critical values from the central F-distributions, which suggests that H0: P² = 0 is rejected on the 5% level of significance, despite negligible R² values. Nevertheless, the F-statistics are lower than the critical values from the non-central F-distributions, which suggests that H0: P² ≤ 0.05 can’t be rejected on the 5% level of significance, consistent with negligible R² values.