You are on page 1of 22

An Introduction to Analysis of Variance

Analysis of Variance (ANOVA) can be used to test


for the equality of three or more population means
using data obtained from observational or
experimental studies.
We want to use the sample results to test the
following hypotheses.
H0: 1=2=3=. . . = k
Ha: Not all population means are equal

If H0 is rejected, we cannot conclude that all


population means are different.
Rejecting H0 means that at least two population
means have different values.

Slide 1

Assumptions for Analysis of Variance

For each population, the response variable is


normally distributed.
The variance of the response variable, denoted 2, is
the same for all of the populations.
The observations must be independent.

Slide 2

Analysis of Variance:
Testing for the Equality of K Population Means

Between-Samples Estimate of Population Variance


Within-Samples Estimate of Population Variance
Comparing the Variance Estimates: The F Test
The ANOVA Table

Slide 3

Between-Samples Estimate
of Population Variance

A between-samples estimate of 2 is called the mean


square between (MSB).
k

MSB

n (x
j 1

x )2

k 1

The numerator of MSB is called the sum of squares


between (SSB).
The denominator of MSB represents the degrees of
freedom associated with SSB.

Slide 4

Within-Samples Estimate
of Population Variance

The estimate of 2 based on the variation of the


sample observations within each sample is called the
mean square within (MSW).
k

MSW

(n j 1) s 2j

j1

nT k

The numerator of MSW is called the sum of squares


within (SSW).
The denominator of MSW represents the degrees of
freedom associated with SSW.

Slide 5

Comparing the Variance Estimates: The F Test

If the null hypothesis is true and the ANOVA


assumptions are valid, the sampling distribution of
MSB/MSW is an F distribution with MSB d.f. equal
to k - 1 and MSW d.f. equal to nT - k.
If the means of the k populations are not equal, the
value of MSB/MSW will be inflated because MSB
overestimates 2.
Hence, we will reject H0 if the resulting value of
MSB/MSW appears to be too large to have been
selected at random from the appropriate F
distribution.

Slide 6

Test for the Equality of k Population Means

Hypotheses

H0: 1=2=3=. . . = k
Ha: Not all population means are equal

Test Statistic

F = MSB/MSW

Rejection Rule
Reject H0 if F > F

where the value of F is based on an F distribution


with k - 1 numerator degrees of freedom and nT - 1
denominator degrees of freedom.

Slide 7

Example: Reed Manufacturing


Analysis of Variance
J. R. Reed would like to know if the mean number of
hours worked per week is the same for the department
managers at her three manufacturing plants (Buffalo,
Pittsburgh, and Detroit).
A simple random sample of 5 managers from each of
the three plants was taken and the number of hours
worked by each manager for the previous week is
shown on the next slide.

Slide 8

Example: Reed Manufacturing

Analysis of Variance

Observation

Plant 1
Buffalo

Plant 2
Pittsburgh

Plant 3
Detroit

1
2
3
4
5

48
54
57
54
62

73
63
66
64
74

51
63
61
54
56

55
26.0

68
26.5

57
24.5

Sample Mean
Sample Variance

Slide 9

Example: Reed Manufacturing

Analysis of Variance
Hypotheses
H0: 1= 2= 3
Ha: Not all the means are equal

where:
1 = mean number of hours worked per
week by the managers at Plant 1
2 = mean number of hours worked per
week by the managers at Plant 2
3 = mean number of hours worked per
week by the managers at Plant 3

Slide 10

Example: Reed Manufacturing

Analysis of Variance
Mean Square Between
Since the sample sizes are all equal
x= = (55 + 68 + 57)/3 = 60
SSB = 5(55 - 60)2 + 5(68 - 60)2 + 5(57 - 60)2 = 490
MSB = 490/(3 - 1) = 245
Mean Square Within
SSW = 4(26.0) + 4(26.5) + 4(24.5) = 308
MSW = 308/(15 - 3) = 25.667

Slide 11

Example: Reed Manufacturing

Analysis of Variance
F - Test
If H0 is true, the ratio MSB/MSW should be near 1
since both MSB and MSW are estimating 2. If Ha
is true, the ratio should be significantly larger than
1 since MSB tends to overestimate 2.
Rejection Rule
Assuming = .05, F.05 = 3.89 (2 d.f. numerator,
12 d.f. denominator). Reject H0 if F > 3.89

Slide 12

Example: Reed Manufacturing

Analysis of Variance
Test Statistic
F = MSB/MSW = 245/25.667 = 9.55
Conclusion
F = 9.55 > F.05 = 3.89, so we reject H0. The mean
number of hours worked per week by department
managers is not the same at each plant.

Slide 13

Example

EFS: Total
Alcoholic
Beverages,
Tobacco

Economic position of Household Reference Person


Unoc Ret unoc
SelfFulltime
Pt
under
Unempl. over min
employed employee employee
min ni
ni age
age

Mean

18.56

14.64

12.39

19.48

19.0

St. Dev.

18.5

15.0

19.7

7.34

14.6

TOTAL

11.99

12.67

19.1

17.8

Are there significant difference across the


means of these groups?
Or do the differences depend on the different
levels of variability across the groups?
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi

14

Null and alternative hypothesis for


ANOVA
Null hypothesis (H0): all the means are equal
Alternative hypothesis (H1): at least two
means are different

Statistics for Marketing & Consumer Research


Copyright 2008 - Mario Mazzocchi

15

Arranging data for ANOVA


1

Economic position of Household Reference Person


Group (g)
2
3
4
5

Selfemployed

Fulltime
Pt
Unempl.
employee employee
t

x11
x21
x31

x21
x22
x32

Observations
x13
x14
x23
x24
x33
x34

Ret unoc
over min
ni age

6
Unoc under
min ni
age

x15
x25
x35

x16
x26
x36

n1

Number of observations (n)


n2
n3
n4
Means

n5

n6

x1

x2

x5

x6

Overall mean

x3

x4

Statistics for Marketing & Consumer Research


Copyright 2008 - Mario Mazzocchi

16

The statistical distribution to carry out


ANOVA
1. Decompose the total variation (sum of
squares corrected for the mean)
2. Compute the F-test statistic
3. Choose the critical value
4. Interpret the result

Statistics for Marketing & Consumer Research


Copyright 2008 - Mario Mazzocchi

17

ANOVA: data
Suppose that we have n observation within each group and g group

Group (factor level)


1
2

i
Obs.

n
Group mean

TOTAL MEAN
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi

1
x11
x21

xi1

xn1

2
x12
x22

xi2

xn2

x1

x2

j
x1j
x2j

xij

xnj

xj

g
x1g
x2g

xig

xnn

xg

1 g
x xj
g j 1
18

Measuring and decomposing the total


variation
SUM OF SQUARES (corrected for the mean)

VARIATION BETWEEN THE GROUPS


+
VARIATION WITHIN EACH GROUP=
________________________________

TOTAL VARIATION

Statistics for Marketing & Consumer Research


Copyright 2008 - Mario Mazzocchi

19

The test statistic


The test statistic is computed as:
sB2 Variance between groups
F 2
sW
Variance within groups

This test statistic compares the weight of


the variance explained by the factors to the
weight of the variance not explained by the
factors
Statistics for Marketing & Consumer Research
Copyright 2008 - Mario Mazzocchi

20

ANOVA in SPSS
Target variable

Factor

Statistics for Marketing & Consumer Research


Copyright 2008 - Mario Mazzocchi

21

SPSS output
ANOVA

Variance between

EFS: Total Alcoholic Bev erages, Tobacco

Between Groups
Within Groups
Total

Sum of
Squares
6171.784
151535.3
157707.1

Variation
decomposition

Statistics for Marketing & Consumer Research


Copyright 2008 - Mario Mazzocchi

df
5
494
499

Mean Square
1234.357
306.752

F
4.024

Variance within
Degrees of
freedom

Sig.
.001

p-value < 0.05

The null is
rejected

22

You might also like