You are on page 1of 7

Chi-squared test for categories of data

Background: The Student's t-test and Analysis of Variance are used to analyse measurement data
which, in theory, are continuously variable. Between a measurement of, say, 1 mm and 2 mm
there is a continuous range from 1.0001 to 1.9999 m m.

But in some types of experiment we wish to record how many individuals fall into a particular
category, such as blue eyes or brown eyes, motile or non-motile cells, etc. These counts, or
enumeration data, are discontinuous (1, 2, 3 etc.) and must be treated differently from continuous
data. Often the appropriate test is chi-squared (c2), which we use to test whether the number of
individuals in different categories fit a null hypothesis (an expectation of some sort).

Chi squared analysis is simple, and valuable for all sorts of things - not just Mendelian crosses! On
this page we build from the simplest examples to more complex ones. When you have gone
through the examples you should consult the checklist of procedures and potential pitfalls.

A simple example

Suppose that the ratio of male to female students in the Science Faculty is exactly 1:1, but in the
Pharmacology Honours class over the past ten years there have been 80 females and 40 males. Is
this a significant departure from expectation? We proceed as follows (but note that we are going
to overlook a very important point that we shall deal with later).

Set out a table as shown below, with the "observed" numbers and the "expected" numbers (i.e. our
null hypothesis).

Then subtract each "expected" value from the corresponding "observed" value (O-E)

Square the "O-E" values, and divide each by the relevant "expected" value to give (O-E)2/E

Add all the (O-E)2/E values and call the total "X2"

Female Male Total
Observed numbers (O) 80 40 120
Expected numbers (E) 60*
3
60*
3
120 *
1

O - E 20 -20 0 *
2

(O-E)
2
400 400
(O-E)
2
/ E 6.67 6.67 13.34 = X
2


Notes:
*1 This total must always be the same as the observed total
*2 This total must always be zero
*3 The null hypothesis was obvious here: we are told that there are equal numbers of males and
females in the Science Faculty, so we might expect that there will be equal numbers of males and
females in Pharmacology. So we divide our total number of Pharmacology students (120) in a 1:1
ratio to get our expected values.

Now we must compare our X2 value with a c2 (chi squared) value in a table of c2 with n-1 degrees
of freedom (where n is the number of categories, i.e. 2 in our case - males and females). We have
only one degree of freedom (n-1). From the c2 table, we find a "critical value of 3.84 for p = 0.05.

If our calculated value of X2 exceeds the critical value of c2 then we have a significant difference
from the expectation. In fact, our calculated X2 (13.34) exceeds even the tabulated c2 value
(10.83) for p = 0.001. This shows an extreme departure from expectation. It is still possible that
we could have got this result by chance - a probability of less than 1 in 1000. But we could be
99.9% confident that some factor leads to a "bias" towards females entering Pharmacology
Honours. [Of course, the data don't tell us why this is so - it could be self-selection or any other
reason]

Now repeat this analysis, but knowing that 33.5% of all students in the Science Faculty are males

Female Male Total
Observed numbers (O) 80 40 120
Expected numbers (E) 79.8*3 40.2 120*
1

O - E 0.2 -0.2 0*
2

(O-E)
2
0.04 0.04
(O-E)
2
/ E 0.0005 0.001 0.0015 = X
2


Note *1: We know that the expected total must be 120 (the same as the observed total), so we can
calculate the expected numbers as 66.5% and 33.5% of this total.

Note *2: This total must always be zero.

Note *3: Although the observed values must be whole numbers, the expected values can be (and
often need to be) decimals.

Now, from a c2 table we see that our data do not depart from expectation (the null hypothesis).
They agree remarkably well with it and might lead us to suspect that there was some design
behind this! In most cases, though, we might get intermediate X2 values, which neither agree
strongly nor disagree with expectation. Then we conclude that there is no reason to reject the null
hypothesis.

Some important points about chi-squared

Chi squared is a mathematical distribution with properties that enable us to equate our calculated
X2 values to c2 values. The details need not concern us, but we must take account of some
limitations so that c2 can be used validly for statistical tests.

(i) Yates correction for two categories of data (one degree of freedom)

When there are only two categories (e.g. male/female) or, more correctly, when there is only one
degree of freedom, the c2 test should not, strictly, be used. There have been various attempts to
correct this deficiency, but the simplest is to apply Yates correction to our data. To do this, we
simply subtract 0.5 from each calculated value of "O-E", ignoring the sign (plus or minus). In
other words, an "O-E" value of +5 becomes +4.5, and an "O-E" value of -5 becomes -4.5. To
signify that we are reducing the absolute value, ignoring the sign, we use vertical lines: |O-E|-0.5.
Then we continue as usual but with these new (corrected) O-E values: we calculate (with the
corrected values) (O-E)2, (O-E)2/E and then sum the (O-E)2/E values to get X2. Yates correction
only applies when we have two categories (one degree of freedom).

We ignored this point in our first analysis of student numbers (above). So here is the table again,
using Yates correction:
Female Male Total
Observed numbers (O) 80 40 120
Expected numbers (E) 60*
3
60*
3
120 *
1

O - E 20 -20 0 *
2

|O-E|-0.5 19.5 -19.5 0
(|O-E|-0.5)
2
380.25 380.25
(|O-E|-0.5)
2
/ E 6.338 6.338 12.676 = X
2


In this case, the observed numbers were so different from the expected 1:1 ratio that Yates
correction made little difference - it only reduced the X2 value from 13.34 to 12.67. But there
would be other cases where Yates correction would make the difference between acceptance or
rejection of the null hypothesis.

(ii) Limitations on numbers in "expected" categories

Again to satisfy the mathematical assumptions underlying c2, the expected values should be
relatively large. The following simple rules are applied:

no expected category should be less than 1 (it does not matter what the observed values are)
AND no more than one-fifth of expected categories should be less than 5.
What can we do if our data do not meet these criteria? We can either collect larger samples so that
we satisfy the criteria, or we can combine the data for the smaller "expected" categories until their
combined expected value is 5 or more, then do a c2 test on the combined data. We will see an
example below.

Chi squared with three or more categories

Suppose that we want to test the results of a Mendelian genetic cross. We start with 2 parents of
genotype AABB and aabb (where A and a represent the dominant and recessive alleles of one
gene, and B and b represent the dominant and recessive alleles of another gene).

We know that all the F1 generation (first generation progeny of these parents) will have genotype
AaBb and that their phenotype will display both dominant alleles (e.g. in fruit flies all the F1
generation will have red eyes rather than white eyes, and normal wings rather than stubby wings).

This F1 generation will produce 4 types of gamete (AB, Ab, aB and ab), and when we self-cross the
F1 generation we will end up with a variety of F2 genotypes (see the table below).

Gametes
Gametes Gametes
AB Ab aB ab
AB AABB AABb AaBB AaBb
Ab AABb AAbb AaBb Aabb
aB AaBB AaBb aaBB aaBb
ab AaBb Aabb aaBb aabb
All these genotypes fall into 4 phenotypes, shown by colours in the table: double dominant, single
dominant A, single dominant B and double recessive. We know that in classical Mendelian
genetics the expected ratio of these phenotypes is 9:3:3:1

Suppose we got observed counts as follows


Phenotype
Phenotype
AB Ab aB ab Total
Observed
numbers (O)
40 20 16 4 80
Expected
numbers (E)
45 15 15 5 80*1
O - E -5 5 1 -1 0
(O-E)
2
25 25 1 1
(O-E)
2
/ E 0.56 1.67 0.07 0.20
2.50
= X
2


[Note: *1. From our expected total 80 we can calculate our expected values for categories on the
ratio 9:3:3:1.]

From a c2 table with 3 df (we have four categories, so 3 df) at p = 0.05, we find that a c2 value of
7.82 is necessary to reject the null hypothesis (expectation of ratio 9:3:3:1). So our data are
consistent with the expected ratio.

Combining categories

Look at the table above. We only just collected enough data to be able to test a 9:3:3:1 expected
ratio. If we had only counted 70 (or 79) fruit flies then our lowest expected category would have
been less than 1, and we could not have done the test as shown. We would break one of the "rules"
for c2 - that no more than one-fifth of expected categories should be less than 5. We could still do
the analysis, but only after combining the smaller categories and testing against a different
expectation.

Here is an illustration of this, assuming that we had used 70 fruit flies and obtained the following
observed numbers of phenotypes.


Phenotype
Phenotype
AB Ab aB ab
Combined
aB + ab
Total
Observed
numbers (O)
34 18 15 3 18 70
Expected
numbers (E)
39.375 13.125 13.125 4.375 17.5 70*1
O - E -5.375 4.875 0.5 0
(O-E)
2
28.891 23.766 0.25
(O-E)
2
/ E 0.734 1.811 0.014
2.559
= X
2


One of our expected categories (ab) is less than 5 (shown in bold italics in the table). So we have
combined this category with one of the others and then must analyse the results against an
expected ratio of 9:3:4. The numbers in the expected categories were entered by dividing the total
(70) in this ratio.

Now, with 3 categories we have only 2 degrees of freedom. The rest of the analysis is done as
usual, and we still have no reason to reject the null hypothesis. But it is a different null
hypothesis: the expected ratio is 9:3:4 (double dominant: single dominant Ab: single dominant aB
plus double recessive ab).

Chi-squared: double classifications

Suppose that we have a population of fungal spores which clearly fall into two size categories,
large and small. We incubate these spores on agar and count the number of spores that germinate
by producing a single outgrowth or multiple outgrowths.

Spores counted:

120 large spores, of which 80 form multiple outgrowths and 40 produce single outgrowths
60 small spores, of which 18 form multiple outgrowths and 42 produce single outgrowths

Is there a significant difference in the way that large and small spores germinate?

Procedure:

1. Set out a table as follows
Large spores Small spores Total
Multiple outgrowth 80 18 98
Single outgrowth 40 42 82
Total 120 60 180

2. Decide on the null hypothesis.

In this case there is no "theory" that gives us an obvious null hypothesis. For example, we have no
reason to suppose that 55% or 75% or any other percentage of large spores will produce multiple
outgrowths. So the most sensible null hypothesis is that both the large and the small spores will
behave similarly and that both types of spore will produce 50% multiple outgrowths and 50%
single outgrowths. In other words, we will test against a 1:1:1:1 ratio. Then, if our data do not
agree with this expectation we will have evidence that spore size affects the type of germination.





3. Calculate the expected frequencies, based on the null hypothesis.

This step is complicated by the fact that we have different numbers of large and small spores, and
different numbers of multiple versus single outgrowths. But we can find the expected frequencies
(a, b, c and d) by using the grand total (180) and the column and row totals (see table below).

Multiple
outgrowth
Observed (O) 80 18 98
Expected (E) a b (expected 98)
Single outgrowth Observed (O) 40 42 82
Expected (E) c d (expected 82)
Column totals 120 60 180

To find the expected value "a" we know that a total 98 spores had multiple outgrowths and that
120 of the total 180 spores were large. So a is 98(120/180) = 65.33.

Similarly, to find b we know that 98 spores had multiple outgrowths and that 60 of the total 180
spores were small. So, b is 98(60/180) = 32.67. [Actually, we could have done this simply by
subtracting a from the expected 98 row total - the expected total must always be the same as the
observed total]

To find c we know that a 82 spores had single outgrowths and that 120 of the total 180 spores
were large. So c is 82(120/180) = 54.67.

To find d we know that 82 spores had single outgrowths and that 60 of the total 180 spores were
small. So d is 82(60/180) = 27.33. [This value also could have been obtained by subtraction]

4. Decide the number of degrees of freedom

You might think that there are 3 degrees of freedom (because there are 4 categories). But there is
actually one degree of freedom! The reason is that we lose one degree of freedom because we have
4 categories, and we lose a further 2 degrees of freedom because we used two pieces of
information to construct our null hypothesis - we used a column total and a row total. Once we
had used these we would have needed only one data entry in order to fill in the rest of the values
(therefore we have one degree of freedom).

Of course, with one degree of freedom we must use Yates correction (subtract 0.5 from each O-E
value).

5. Run the analysis as usual. Calculating O-E, (O-E)2 and (O-E)2/E for each category, then sum
the (O-E)2/E. values to obtain X2 and test this against c2 .

The following table shows some of the working. The sum of the values shown in red gives X2 of
20.23

Large spores Small spores Row totals
Multiple
outgrowth
Observed (O) 80 18 98
Expected (E) 65.33 32.67 98
O-E +14.67 -14.67
Yates correction |O-E|-0.5 +14.17 -14.17 0
(O-Ecorrected)
2
/E 3.07 6.14
Single
outgrowth
Observed (O) 40 42 82
Expected (E) 54.67 27.33 82
O-E -14.67 +14.67
Yates correction |O-E|-0.5 +14.17 -14.17 0
(O-Ecorrected)
2
/E 3.67 7.35 X
2
= 20.23
Column totals 120 60 180

We compare the X2 value with a tabulated c2. with one degree of freedom. Our calculated X2
exceeds the tabulated c2 value (10.83) for p = 0.001. We conclude that there is a highly significant
departure from the null hypothesis - we have very strong evidence that large spores and small
spores show different germination behaviour.

Checklist: procedures and potential pitfalls

Chi squared is a very simple test to use. The only potentially difficult things about it are:

calculating the expected frequencies when we have double classifications - use the marginal
subtotals and totals to work out these frequencies
determining the number of degrees of freedom, especially when we have to use some of the data
to construct the null hypothesis.
If you follow the examples given on this page you should not have too many difficulties.

Some points to watch:

Always work with "real numbers" in the observed categories, not with proportions. To illustrate
this, consider a simple chi squared test on tossing of coins. Suppose that in 100 throws you get 70
"heads" and 30 "tails". Using Yates correction (for one degree of freedom) you would find an X2
value of 15.21, equating to a c2 probability less than 0.001. But if you got 7 "heads" and 3 "tails" in
a test of 10 throws it would be entirely consistent with random chance. The ratio is the same (7:3),
but the actual numbers determine the level of significance in a chi squared test.
Observed categories must have whole numbers, but expected categories can have decimals.
Follow the rules about the minimum numbers in expected categories. These rules do not apply to
the observed categories.
Remember Yates correction for one degree of freedom.

You might also like