A to Z of Excel Functions: the FDIST Function
17 May 2019
Welcome back to our regular A to Z of Excel Functions blog. Today we look at the FDIST function.
The FDIST function
Named after Sir Ronald Aylmer Fisher, a British statistician, geneticist and mega-genius, the F-distribution is defined as a continuous distribution obtained from the ratio of two chi-square distributions and used in particular to test the equality of the variances of two normally distributed variances a continuous probability distribution, which means that it is defined for an infinite number of different values. This analysis of variance is often referred to as “ANOVA”, which I thought was a cheap car.
I’m glad we have cleared that up then, eh?
To be “slightly” clearer, the F-distribution can be used for several types of applications, including testing hypotheses about the equality of two population variances and testing the validity of what is known as a multiple regression equation, i.e. a formulaic attempt to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data.
The F-distribution shares one important property with the Student’s t-distribution: probabilities are determined by a concept known as degrees of freedom. Unlike the Student’s t-distribution, the F-distribution is characterized by two different types of degrees of freedom: numerator and denominator degrees of freedom.
The formula for the probability density function is given by
The cumulative distribution function is given by
Where I is the regularized incomplete beta function:
In other words, use the F.DIST function in Excel when you need to and try not to ask too many questions.
The F-distribution has two important properties:
- It’s defined only for positive values.
- It’s not symmetrical about its mean; instead, it’s positively skewed.
A distribution is positively skewed if the mean is greater than the median (the mean is the average value of a distribution, and the median is the midpoint; half of the values in the distribution are below the median and half are above). This function returns the F probability distribution. You can use this function to determine whether two data sets have different degrees of diversity.
A good example of a positively skewed distribution is household incomes. Suppose that half of the households in a country have incomes below $50,000 and half have incomes above $50,000; this indicates that the median household income is $50,000. Among households with incomes below $50,000, the smallest possible value is $0. Among households with incomes above $50,000, there may be incomes of several million dollars per year. This imbalance between incomes below the median and above the median causes the mean to be substantially higher than the median. Suppose for example that the mean income in this case is $120,000. This shows that the distribution of household incomes is positively skewed.
This image (above) shows a graph of the F-distribution for different combinations of numerator and denominator degrees of freedom. In each case, numerator degrees of freedom are listed first, and denominator degrees of freedom are listed second. The level of significance in each case is 0.05.
A level of significance is used to test a hypothesis. An hypothesis test begins with a null hypothesis; this is a statement that’s assumed to be true unless there is strong contrary evidence. There is also an alternative hypothesis; this is a statement that is accepted in place of the null hypothesis if there’s sufficient evidence to reject the null hypothesis.
The level of significance, designated alpha, refers to the probability of incorrectly rejecting the null hypothesis when it is actually true. This is known as a Type I error. By contrast, a Type II error occurs when you fail to reject the null hypothesis when it’s actually false. Therefore, with a level of significance of 0.05, there is a 5 percent chance of committing a Type I error.
The figure shows that the distribution isn’t defined for negative values (as you can see, no negative values appear along the horizontal axis). Additionally, as the number of degrees of freedom increases, the shape of the distribution shifts to the right. The distribution has a long right tail (more formally, it’s skewed to the right, or positively skewed).
FDIST returns the (right-tailed) F probability distribution (degree of diversity) for two data sets. You can use this function to determine whether two data sets have different degrees of diversity.
Important: This function has been replaced by newer ones which may provide improved accuracy and whose names better reflect their usage. Although this function is still available for backward compatibility, you should consider using this newer function from now on, because the older variant may not be available in future versions of Excel, according to Microsoft. However, this is NOT the equivalent of F.DIST. Interestingly, it is F.DIST.RT which supersedes FDIST.
The FDIST function employs the following syntax to operate:
The reason for his is due Microsoft’s apparent insistence on ensuring all functions ending in .DIST (e.g. NORM.DIST, NORM.S.DIST, T.DIST) return the left portion of a distribution, not the right. However, you would have to work very hard to invent a situation in which you would expect the F-ratio’s numerator to be smaller than its denominator. It’s likely if this did occur, it would probably be attributable to misreferencing the data. The F-test is a right-tailed test, unlike the z-test or t-test, where the directionality of the difference is important.
The FDIST function has the following arguments:
- x: this is required and represents the value at which to evaluate the function
- deg_freedom1: this is also required and denotes the numerator degrees of freedom
- deg_freedom2: this is again required and represents the denominator degrees of freedom.
It should be further noted that:
- if any argument is non-numeric, FDIST returns the #VALUE! error value
- if x < 0, FDIST returns the #NUM! error value
- if deg_freedom1 or deg_freedom2 is not an integer, it is truncated
- If either deg_freedom1 or deg_freedom2 is strictly less than 1 or either deg_freedom1 or deg_freedom2 is greater than or equal to 10^10, FDIST returns the #NUM! error value
- FDIST is calculated as FDIST=P( F>x ), where F is a random variable that has an F distribution with deg_freedom1 and deg_freedom2 degrees of freedom.
Please see my example below:
We’ll continue our A to Z of Excel Functions soon. Keep checking back – there’s a new blog post every business day.
A full page of the function articles can be found here.