You are on page 1of 49

Analysis of variance

Chapter 7

Statistics for Marketing & Consumer Research 1


Copyright 2008 - Mario Mazzocchi
Tests on multiple hypotheses
Consider the situation where the means for more than two
groups are compared, e.g. mean alcohol expenditure for: (a)
students; (b) unemployed; (c) employees
One could run a set of two mean comparison tests (students
vs. unemployed, students vs. employed, employed vs.
unemployed)
However, as seen in lecture 6, each of these tests is subject
to Type one error (the level of significance ), i.e. the
probability of rejecting the null hypothesis when it is
actually true
Thus, the overall Type one error for the joint three tests is
larger than because the probability of Type one error
increases with the number of tests
This is the so-called problem of inflated family-wise (or
experiment-wise) Type one error
Statistics for Marketing & Consumer Research 2
Copyright 2008 - Mario Mazzocchi
Analysis of Variance

It is an alternative approach to mean comparison


for multiple groups
It is a set of techniques for a variety of situations
It is applicable to a sample of individuals that
differ for one or more given factors
It allows tests where variability in a variable is
attributable to one (or more) factors

Statistics for Marketing & Consumer Research 3


Copyright 2008 - Mario Mazzocchi
Example

Are there significant difference across the means of


these groups?

Or do the differences depend on the different levels of


variability across the groups?

Statistics for Marketing & Consumer Research 4


Copyright 2008 - Mario Mazzocchi
Analysis of Variance

Here the target variable is alcohol


expenditure, the factor is the economic
position of the HRP
Basically we investigate the attribution of a
variation in the metric target variable to
variations in one on more categorical
explanatory variables (the factors)
A treatment is a combination of different
factors in n-way analysis of variance
Statistics for Marketing & Consumer Research 5
Copyright 2008 - Mario Mazzocchi
One-way ANOVA

Only one categorical variable (a single factor)


Several levels (categories) for that factor
The typical hypothesis tested through ANOVA is that
the factor is irrelevant to explain differences in
the target variable (i.e. the means are equal, as in
bivariate mean comparisons/t-tests)
Apart from the tested factor(s), the groups should
be safely considered homogeneous between each
other

Statistics for Marketing & Consumer Research 6


Copyright 2008 - Mario Mazzocchi
Null and alternative hypothesis for
ANOVA
Null hypothesis (H0): all the means are equal
Alternative hypothesis (H1): at least two
means are different

Statistics for Marketing & Consumer Research 7


Copyright 2008 - Mario Mazzocchi
Arranging data for ANOVA

Statistics for Marketing & Consumer Research 8


Copyright 2008 - Mario Mazzocchi
The statistical distribution to carry out
ANOVA
1. Decompose the total variation (sum of
squares corrected for the mean)
2. Compute the F-test statistic
3. Choose the critical value
4. Interpret the result

Statistics for Marketing & Consumer Research 9


Copyright 2008 - Mario Mazzocchi
One-way ANOVA: data
Suppose that we have n observation within each group and g group
Group (factor level)
1 2 j g
1 x11 x12 x1j x1g
2 x21 x22 x2j x2g

Obs. i xi1 xi2 xij xig

n xn1 xn2 xnj xnn
Group mean
x1 x2 xj xg
1 g
TOTAL MEAN x xj
g j 1
Statistics for Marketing & Consumer Research 10
Copyright 2008 - Mario Mazzocchi
Measuring and decomposing the total
variation
SUM OF SQUARES (corrected for the mean)

VARIATION BETWEEN THE GROUPS +


VARIATION WITHIN EACH GROUP=
________________________________

TOTAL VARIATION

Statistics for Marketing & Consumer Research 11


Copyright 2008 - Mario Mazzocchi
Variance decomposition
g nc

xrc x
2
(TOTAL
VARIANCE)
sT2 c 1 r 1

n 1
g

x x nc (VARIANCE BETWEEN GROUPS)


2
c
2
sBW c 1

g 1

xrc xc
g nc 2
(VARIANCE WITHIN GROUPS)
s
2
W
c 1 r 1 nc 1 g nc g nr g

xrc x xrc xc xc x nc
2 2 2

c 1 r 1 c 1 r 1 c 1
g
DEGREES OF FREEDOM n 1 g 1 (nc 1) g 1 n g
c 1

Statistics for Marketing & Consumer Research 12


Copyright 2008 - Mario Mazzocchi
The basic principle of the ANOVA

If the variation explained by the different


factor between the groups is significantly
more relevant than the variation within the
groups, then the factor is assumed to be
statistically relevant in explaining the
differences

Statistics for Marketing & Consumer Research 13


Copyright 2008 - Mario Mazzocchi
The test statistic

The test statistic is computed as:


sB2 Variance between groups
F 2
sW Variance within groups

This test statistic compares the weight of


the variance explained by the factors to the
weight of the variance not explained by the
factors
Statistics for Marketing & Consumer Research 14
Copyright 2008 - Mario Mazzocchi
Distribution of the
F-statistic (one-tailed test)

Rejection area

Statistics for Marketing & Consumer Research 15


Copyright 2008 - Mario Mazzocchi
Characteristics of the
F-distribution

F (df1 , df 2 )
Its shape (critical value) changes according to the
degrees of freedom (numbers of observation/
groups)
It is a non-negative statistic (one-tailed test)
For ANOVA testing:
df1=g-1
df2=n-g
Statistics for Marketing & Consumer Research 16
Copyright 2008 - Mario Mazzocchi
ANOVA in SPSS

Target variable

Factor

Statistics for Marketing & Consumer Research 17


Copyright 2008 - Mario Mazzocchi
SPSS output
ANOVA
Variance between
EFS: Total Alcoholic Beverages, Tobacco
Sum of
Squares df Mean Square F Sig.
Between Groups 6171.784 5 1234.357 4.024 .001
Within Groups 151535.3 494 306.752
Total 157707.1 499

p-value < 0.05


Variance within
Variation decomposition The null is
Degrees of
rejected
freedom

Statistics for Marketing & Consumer Research 18


Copyright 2008 - Mario Mazzocchi
Contrasts
Allows to test hypotheses on specific sub-sets of the
treatments (factor combinations).
They open the way to further explore the sources of
variability when the null hypothesis of mean equality is
rejected.
Comparisons are usually based on a theory and planned
before the analysis, thus they are also called planned
comparisons.

Statistics for Marketing & Consumer Research 19


Copyright 2008 - Mario Mazzocchi
Linear contrasts

Linear contrasts are linear combinations of


the means, allowing one to test other
hypotheses than mean equality
For example, one may want to test whether
the mean for group one is double the mean
for groups two and three, while the means
of groups two and three are equal

Statistics for Marketing & Consumer Research 20


Copyright 2008 - Mario Mazzocchi
Linear contrasts

Contrasts are also useful after rejection of


the null hypothesis of mean equality
Rejection of the null hypothesis means that
at least two means differ, but it does not say
which ones actually differ
Planned comparisons through linear
contrasts can help

Statistics for Marketing & Consumer Research 21


Copyright 2008 - Mario Mazzocchi
Example

Test whether chicken expenditures increases


linearly with household size
Check whether there are significant differences:
Between households with one or two components and
households with more components
Considering the following groups
Households with one component
Households with two components
Households with more than two components
Considering the following comparison
Households with four, five, six and seven components have
equal means

Statistics for Marketing & Consumer Research 22


Copyright 2008 - Mario Mazzocchi
Household sizes and means
Descriptives

In a typical w eek how m uch do you s pend on fres h or frozen chicken (E uro)?
95% C onfidence Interval for
Mea n
N Mea n Std. Deviation Std. Error Low er Bound Upp er Bound Minim um Maximum
0 1 4.8000 . . . . 4.80 4.80
1 82 4.2470 2.82338 .3 1179 3.6266 4.8673 .3 7 15.00
2 145 5.0548 3.41626 .2 8371 4.4941 5.6156 .0 0 20.00
3 93 6.3231 4.71695 .4 8912 5.3517 7.2946 .0 0 30.00
4 87 6.7334 3.87396 .4 1533 5.9078 7.5591 .0 0 18.00
5 24 7.5613 7.64258 1.56003 4.3341 10.7884 .0 0 30.00
6 10 6.2730 3.25606 1.02966 3.9438 8.6022 .0 0 10.49
7 1 6.7500 . . . . 6.75 6.75
Total 443 5.6677 4.13383 .1 9640 5.2817 6.0537 .0 0 30.00

7 GROUPS

Statistics for Marketing & Consumer Research 23


Copyright 2008 - Mario Mazzocchi
Example
1 and 2 components versus 3, 4, 5, 6 and 7
Weights (they need to sum to 0)
1 = -2.5
2 = -2.5
3 = 1
4 = 1
5 = 1
6 = 1
7 = 1

Statistics for Marketing & Consumer Research 24


Copyright 2008 - Mario Mazzocchi
Planned comparisons
Helmert contrasts: the first treatment is compared with all of the
remaining treatments, the second treatment will all the remaining
treatments but the first, the third treatment will all of the remaining
ones but the first two, and so on.
By looking at the results of this battery of tests, it becomes possible to
identify those groups whose difference from the others is most
relevant.
polynomial contrasts: it is possible to tested whether the trend in
means follows a linear, quadratic or cubic sequence or any polynomial
relationship between the treatments,
repeated contrasts: each treatment is compared with the one which
follows
reverse Helmert contrasts (or difference contrasts): Helmert contrasts
going backwards
simple contrasts where the user can choose the benchmark treatment
between the first and the last category .
Statistics for Marketing & Consumer Research 25
Copyright 2008 - Mario Mazzocchi
Post-hoc comparisons
Linear contrasts are carried out independently from each
other
Post-hoc tests consist in a set of paired comparisons,
where the critical values are corrected to account for the
problem of inflating the Type I Error risk (rejecting the null
hypothesis when it is true) measured by the cumulative
Type I error or familiwise error.
The approach to correcting the critical values determines
the Type of test being used. In SPSS:
Scheffes test
Bonferronis test
Tukeys test.

Statistics for Marketing & Consumer Research 26


Copyright 2008 - Mario Mazzocchi
Some post-hoc tests
Scheffe test: simultaneous comparisons for all potential pair-
wise and linear combinations of treatments using F critical
values corrected to account for the number of treatment
being compared
Bonferroni post-hoc method: (1) run the usual pair-wise t-
tests; (2) to account for the inflated Type one error rate an
adjustment is provided by dividing the family-wise error by the
number of tests.
Tukeys test: also known as an Honestly Significant Difference
or HSD test, it can be used when samples are of equal size,
but statistical packages usually provide variants for unequal
sizes. With this test, significant differences are identified
through an adjusted Studentized range distribution (an
extension of the Student t statistic which uses pooled estimate
of the standard errors)
More tests on the textbook
Statistics for Marketing & Consumer Research 27
Copyright 2008 - Mario Mazzocchi
Effect size and power

The experimental factor matters, but how much?


(effect size)
Larger F statistics do not necessarily imply larger
effect sizes because they also depend on sample
sizes
A typical measure of effect size is 2 (the ratio
between variation between and total variation)
The power of a test is 1-where is the
probability of non-rejecting the null hypothesis
when the alternative is true (Type II error)
Statistics for Marketing & Consumer Research 28
Copyright 2008 - Mario Mazzocchi
Which post-hoc test?

One should check the probabilities of Type I


error and power (Type II errors)
There is a trade-off between power and
Type I error
Conservative tests: low Type I error, low power
(Scheff, Bonferroni)
Tukeys test more appropriate for a large number
of means

Statistics for Marketing & Consumer Research 29


Copyright 2008 - Mario Mazzocchi
ANOVA in SPSS
Target variable (scale)

Post-hoc
Factor (categorical) tests
Planned
comparisons
Statistics for Marketing & Consumer Research 30
Copyright 2008 - Mario Mazzocchi
Planned comparisons (contrasts)
w1 L w2 ML w3 MH w4 H 0
The polynomial contrast assumes
that the mean follows a given
polynomial (linear, quadratic, etc.)
Note: the null hypothesis is that
the polynomial contrast does not
hold

Other contrasts
Insert a weight for each
subgroup
Note: the null
hypothesis is that the Click here to insert other sets of
contrast holds weight (one set of weight per
comparison)

Statistics for Marketing & Consumer Research 31


Copyright 2008 - Mario Mazzocchi
ANOVA output
Descriptives

EFS: Total Alcoholic Beverages, Tobacco


95% Confidence Interval for
Mean
N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum
Low income 125 7.467 12.8693 1.1511 5.188 9.745 .0 70.0
Medium-low income 125 11.381 17.9038 1.6014 8.212 14.551 .0 93.9
Medium-high income 125 13.040 16.9137 1.5128 10.046 16.035 .0 79.4
High income 125 18.789 20.8025 1.8606 15.106 22.472 .0 92.5
Total 500 12.669 17.7777 .7950 11.107 14.231 .0 93.9

ANOVA

EFS: Total Alcoholic Beverages, Tobacco Mean equality is rejected


Sum of

Between (Combined)
Squares
8289.482
df
3
Mean Square
2763.161
F
9.172
Sig.
.000
The means are
Groups Linear Term Contrast 7932.717 1 7932.717 26.333 .000 compatible with a
Deviation
356.765 2 178.382 .592 .554 linear polynomial
Within Groups 149417.6 496 301.245 And not compatible
Total 157707.1 499
with a non-linear one

Statistics for Marketing & Consumer Research 32


Copyright 2008 - Mario Mazzocchi
Output
Contrast Coefficients

Anonymised hhold inc + allowances (Banded)


Medium-low Medium-high
Contrast Low income income income High income
1 0 1 -1 0
2 1 0 0 -1

Contrast Tests

Value of
Contrast Contrast Std. Error t df Sig. (2-tailed)
EFS: Total Alcoholic Assume equal variances 1 -1.659 2.1954 -.756 496 .450
Beverages, Tobacco 2 -11.322 2.1954 -5.157 496 .000
Does not assume equal 1 -1.659 2.2029 -.753 247.202 .452
variances 2
-11.322 2.1879 -5.175 206.788 .000

The first contrast (0, 1, -1, 0) holds (not rejected)


The second contrast (1, 0, 0, -1) (rejected)

Statistics for Marketing & Consumer Research 33


Copyright 2008 - Mario Mazzocchi
SPSS Post-hoc tests

Statistics for Marketing & Consumer Research 34


Copyright 2008 - Mario Mazzocchi
SPSS output (post-hoc tests)
Multiple Comparisons

Dependent Variable: EFS: Total Alcoholic Beverages, Tobacco


Results for
(I) Anonymised hhold inc (J) Anonymised hhold inc
Mean
Difference 95% Confidence Interval each paired
Tukey HSD
+ allowances (Banded)
Low income
+ allowances (Banded)
Medium-low income
(I-J)
-3.9147
Std. Error
2.1954
Sig.
.283
Lower Bound
-9.574
Upper Bound
1.745 comparison are
Medium-high income
High income
-5.5737
-11.3224*
2.1954
2.1954
.055
.000
-11.233
-16.982
.086
-5.663
reported and
Medium-low income Low income
Medium-high income
3.9147
-1.6590
2.1954
2.1954
.283
.874
-1.745
-7.318
9.574
4.000
significance
High income -7.4077* 2.1954 .004 -13.067 -1.748 level adjusted
Medium-high income Low income 5.5737 2.1954 .055 -.086 11.233
Medium-low income 1.6590 2.1954 .874 -4.000 7.318
High income -5.7487* 2.1954 .045 -11.408 -.089
High income Low income 11.3224* 2.1954 .000 5.663 16.982
Medium-low income 7.4077* 2.1954 .004 1.748 13.067
Medium-high income 5.7487* 2.1954 .045 .089 11.408
Bonferroni Low income Medium-low income -3.9147 2.1954 .451 -9.730 1.901
Medium-high income -5.5737 2.1954 .069 -11.389 .242
High income -11.3224* 2.1954 .000 -17.138 -5.507
Medium-low income Low income 3.9147 2.1954 .451 -1.901 9.730
Medium-high income -1.6590 2.1954 1.000 -7.474 4.156
High income -7.4077* 2.1954 .005 -13.223 -1.592
Medium-high income Low income 5.5737 2.1954 .069 -.242 11.389
Medium-low income 1.6590 2.1954 1.000 -4.156 7.474
High income -5.7487 2.1954 .055 -11.564 .067
High income Low income 11.3224* 2.1954 .000 5.507 17.138
Medium-low income 7.4077* 2.1954 .005 1.592 13.223
Medium-high income 5.7487 2.1954 .055 -.067 11.564
*. The mean difference is significant at the .05 level.

Statistics for Marketing & Consumer Research 35


Copyright 2008 - Mario Mazzocchi
ANOVA: Fixed versus random effects

1. Explore differences in monthly food expenditure for


different geographical regions
2. Explore differences in monthly food expenditure
according to the point of purchase for the last food
shopping
1. is a fixed effect which implies that the
researcher can fully control the factor (treatment)
2. is a random effect where the factor (treatment)
cannot be fully controlled and is subject to a
(random) measurement error

Statistics for Marketing & Consumer Research 36


Copyright 2008 - Mario Mazzocchi
ANOVA assumptions
Two key assumptions are needed for running analysis
of variance without risks
1) that the sub-samples defined by the treatment are
independent
2) that no big discrepancies exist in the variances of the
different sub-samples
Normality within the sub-sample: within limits, departure
from normality is not a serious issue
Different variances: results are still reliable if the sizes of
sub-samples are equal
Both variances and sample sizes differ: high risk of biased
results
Adjustments: Brown-Forsythe test and/or the Welch test
instead of the usual F test
Statistics for Marketing & Consumer Research 37
Copyright 2008 - Mario Mazzocchi
Adjustments for violated assumptions in
SPSS

Click on OPTIONS to request descriptive stats for a


random effect model , Brown-Forsythe and Welch
tests (plus more plots and descriptive statistics)

Statistics for Marketing & Consumer Research 38


Copyright 2008 - Mario Mazzocchi
Non-parametric ANOVA tests
Exclusive samples extracted from the same
population
KruskalWallis test: extends the Mann-Whitney test to the
case of a higher number of sub-samples. It tests the null
hypothesis that all the sub-populations have the same
distribution function.
Jonckheere-Terpstra test: the same null hypothesis, but
against the alternative that an increase in treatment leads
to an increase in the (median of the) dependent variable.
Related samples (the same respondent may appear in
several treatment sub-samples)
Friedman test, Kendall test or Cochran Q test, extend to
the multiple sample case some of the non-parametric tests
for mean comparisons

Statistics for Marketing & Consumer Research 39


Copyright 2008 - Mario Mazzocchi
Non-parametric ANOVA in SPSS-
Exhaustive sub-samples Related samples

Statistics for Marketing & Consumer Research 40


Copyright 2008 - Mario Mazzocchi
The class of ANOVA techniques

Statistics for Marketing & Consumer Research 41


Copyright 2008 - Mario Mazzocchi
Other ANOVA designs

Multi-way (factorial) ANOVA


Multivariate ANOVA (MANOVA)
(Multivariate) Analysis of Covariance
(MANCOVA)

Statistics for Marketing & Consumer Research 42


Copyright 2008 - Mario Mazzocchi
General linear model

One-way ANOVA is equivalent to a linear


model, where the target variable is the
dependent variable and then each of the
treatments is transformed into a dummy
variable which assumes a value of one if
respondents are subject to that treatment.
This means that they belong to that
economic condition and are zero otherwise.

Statistics for Marketing & Consumer Research 43


Copyright 2008 - Mario Mazzocchi
GLM example
Target variable: Alcohol and tobacco expenditure
Factor: employment status
yi 0 1SEi 2 FTi 3 PTi 4UN i 5 REi 6UAi i
yi is the amount spent in alcohol and tobacco by the i-th
respondent
SEi=1 if the respondent is self-employed
FTi=1 for full-time employees
PTi=1 for part-time employees
UNi=1 for unemployed resepondents
REi=1 for retired or inactive respondents and
UAi=1 for those under working age
Statistics for Marketing & Consumer Research 44
Copyright 2008 - Mario Mazzocchi
Tests on the GLM coefficients

T-test on each coefficient: bivariate mean


comparison
F-test: one-way ANOVA
Other analyses of variance
Multi-way (Factorial) ANOVA: More than one factor
(interactions)
MANOVA: More than one target variable: allows one
to test whether the factors lead to significant
differences in a set of variables.
Statistics for Marketing & Consumer Research 45
Copyright 2008 - Mario Mazzocchi
ANCOVA

A final generalization which is quite


interesting for consumer research is the
Analysis of Covariance (ANCOVA), which is
the appropriate technique when some of the
factors are continuous quantitative variables
instead of being measured on a nominal or
ordinal scale

Statistics for Marketing & Consumer Research 46


Copyright 2008 - Mario Mazzocchi
GLM and ANOVA techniques in SPSS
Univariate GLM: ANOVA, n-way
ANOVA, ANCOVA

Multivariate GLM: MANOVA,


MANCOVA

Statistics for Marketing & Consumer Research 47


Copyright 2008 - Mario Mazzocchi
Univariate GLM
Target variable

Factors (more than one for


n-way ANOVA, random
factors are allowed)

Scale variables for


ANCOVA

Statistics for Marketing & Consumer Research 48


Copyright 2008 - Mario Mazzocchi
Multivariate GLM
More than one target
variable for MANOVA
or MANCOVA

Statistics for Marketing & Consumer Research 49


Copyright 2008 - Mario Mazzocchi

You might also like