Please note javascript is required for full website functionality.

Blog

A to Z of Excel Functions: The CHISQ.TEST Function

26 June 2017

Welcome back to our regular A to Z of Excel Functions blog.  Today we look at the CHISQ.TEST function.


The CHISQ.TEST function

In probability theory and statistics, the chi-squared distribution (also chi-square or ?2-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables.  It is one of the most widely used probability distributions in inferential statistics, e.g. in hypothesis testing or in construction of confidence intervals.

The chi-squared distribution is used in the common chi-squared tests for goodness of fit of an observed distribution to a proposed theoretical one, the independence of two criteria of classification of qualitative data, and in confidence interval estimation for a population standard deviation of a normal distribution from a sample standard deviation.

If Z1, ..., Zk are independent, standard normal random variables, then the sum of their squares

is distributed according to the chi-squared distribution with k degrees of freedom.  This is usually denoted as

Thus, the chi-squared distribution has one parameter: k — a positive integer that specifies the number of degrees of freedom.

As aforementioned, the chi-squared distribution is used primarily in hypothesis testing.  Unlike more widely known distributions such as the normal distribution and the exponential distribution, the chi-squared distribution is rarely used to model natural phenomena.  It arises in the following hypothesis tests, among others.

The primary reason that the chi-squared distribution is used extensively in hypothesis testing is its relationship to the normal distribution.  Many hypothesis tests use a test statistic, such as the t statistic in a t-test.  For these hypothesis tests, as the sample size, n, increases, the sampling distribution of the test statistic approaches the normal distribution (Central Limit Theorem).  Since the test statistic (such as t) is asymptotically normally distributed, provided the sample size is sufficiently large, the distribution used for hypothesis testing may be approximated by a normal distribution.  Testing hypotheses using a normal distribution is well understood and relatively easy.  The simplest chi-squared distribution is the square of a standard normal distribution.  So wherever a normal distribution could be used for a hypothesis test, a chi-squared distribution could be used.

A chi-squared distribution constructed by squaring a single standard normal distribution is said to have 1 degree of freedom, etc.

This function returns the test for independence.  CHISQ.TEST returns the value from the chi-squared (?2) distribution for the statistic and the appropriate degrees of freedom.  You can use ?2 tests to determine whether hypothesised results are verified by an experiment.

The CHISQ.TEST function employs the following syntax to operate:

CHISQ.TEST(actual_range, expected_range)

The CHISQ.TEST function has the following arguments:

  • actual_range: this is required and represents the range of data that contains observations to test against expected values
  • expected_range: this is also required.  This denotes the range of data that contains the ratio of the product of row totals and column totals to the grand total.

It should be further noted that:

  • if actual_range and expected_range have a different number of data points, CHISQ.TEST returns the #N/A error value
  • the ?2 test first calculates a ?2 statistic using the formula:

Where:

o Aij = actual frequency in the i-th row, j-th column

o Eij = expected frequency in the i-th row, j-th column

o r = number or rows

o c = number of columns

  • a low value of ?2 is an indicator of independence.  As can be seen from the formula above, ?2 is always positive or 0, and is 0 only if Aij = Eij for every i and j
  • CHISQ.TEST returns the probability that a value of the ?2 statistic at least as high as the value calculated by the above formula could have happened by chance under the assumption of independence.  In computing this probability, CHISQ.TEST uses the ?2 distribution with an appropriate number of degrees of freedom, df.  
  • If r > 1 and c > 1, then df = (r - 1)(c - 1).  If r = 1 and c > 1, then df = c - 1 or if r > 1 and c = 1, then df = r - 1.  r = c = 1 is not allowed and #N/A is returned
  • the use of CHISQ.TEST is most appropriate when Eij’s are not too small.  Some statisticians suggest that each Eij should be greater than or equal to 5
  • CHISQ.TEST appears to be the Excel 2010 replacement for the CHITEST function.

Please see my example below: 

We’ll continue our A to Z of Excel Functions soon.  Keep checking back – there’s a new blog post every other business day.

A full page of the function articles can be found here.

Newsletter