Experimental Design

Experimental Design
Nilesh B. Patel, PhD

Dept Medical Physiology
University of Nairobi
Kenya
What is science?
The systemic study of the structure and
behavior of the physical world, involving
experimentation and measurement and
the development of theories to describe
the results of these activities.
Cambridge International Dictionary of English
What is an experiment?
A test done in order to learn something or
to discover whether something works or is
true (plausible).
Experiments are done to prove the
hypothesis is false.
Only falsification is possible with certainty;
proving something can never be done with
certainty
What can be studied with the

methods of science?
1.
2.
3.
4.
5.
6.
The capital of Kenya is Nairobi.

Holland is twice the size of Kenya.
All humans are mortal.
All rabbits are grey.
All ziwats are blue.
Mentally ill people are possessed by an evil
spirit that
7. Mentally ill people are possessed by an evil
spirit that cannot be detected by any known
means.
Adapted from: Circadian Physiology. Roberto Refinetti. CRC Press 2000
Galileo Galilei
(1564-1642)
Father of modern physics
Father of modern
observational astronomy
Father of modern science
Pioneered the use of
quantitative experiments
whose results could be
analyzed with
mathematical precision
How many of you accept that the sun goes round the earth?
How many of you accept that the earth goes round the sun?
Types of Experiments
Experiments are done to test a hypothesis;
a prediction.
Observational
Experimental
Quasi-experimental
Experimental Design
Whether or not a studys findings are
useful or not depends crucially on design.
No matter how ingenious or important an
idea for an experiment might be, if the
study is badly designed, its worthless.
An excellent procedure for determining
cause and effect.
Does night cause day or does day cause night?
Three Aims of Research

Validity
Results actually show what it is that you intent them to
show
Reliable
Potentially replicable by yourself or anyone else
Important
Subjective
Research can possess all the above qualities

but still be essentially trivial.
Research cannot be important if its findings are
unreliable or invalid.
Score or Measurement
Consists of a number made up of different

components
1) A true measure of the thing we want to measure
2) A measure of other things
3) Systemic (non-random) basis: measuring other
things inadvertently.
4) Random (non-systemic) error, which should cancel
out over large number of observations.
We want the score or measurement to consist

of as much true score, as possible and little of
the other factors (validity and reliability).
Types of Measurement
Nominal
Ordinal
Interval
Ratio
Is what you measured, a real measure of

what you are interested in?
It is a measure what you are interested in.
Random (non-systematic) errors.
Non-random errors (systemic) errors.
Evidence for the intellectual

inferiority of women
Paul Broca (19th Century) make careful
measurement of brain weight and found
Caucasian men have larger brains than
Caucasian women, who in turn have a larger
brain than negros or any other non-Caucasian
for that matter.
Modern brains were supposedly heavier than
medieval brains, and French brains were
heavier than German brains.
The brain weights differences were considered
to reflect differences in intelligence between
these different groups.
MWM and disruption of biological clock

where the measure is the time to reach the
platform.
Anesthesia and learning.
Experimental Design
This is important so that the results can be
interpreted and other causes can be
excluded from the most likely conclusion.
The results have to be valid, reliable and
generalizable (important)
Simplest Experiment
Measurement
Baseline
Treatment
Measurement
There is a difference so my treatment

has an effect
Time
Habituation
animal handler effect
The need of control group
Lazaro Sapallanzani 1793

Found that bats and owls differ in their ability to
avoid objects in the dark.
With hoods the bats could not avoid the objects.
So the bats have better night vision than owls.
Covered with transparent hoods; the bats have a
problem (key control experiment).
Blind the bats and they are fine
Wax in the ears
Pierce invented the ultrasound microphone and
Griffin walked in with a bat.
Association is not causation but can be.
Experiment with control group
Treatment
Control
Measurement
Treatment
Experimental
Measurement
Great!!! There is a difference so my

treatment has an effect.
frequency
Population distribution
treatment
control
2cm
height
100 cm
Randomization
Post-test/control group
Independent variable and dependent variable
Treatment
Control
Measurement
Treatment
Experimental
Measurement
Random allocation
Measurement
Measurement
Treatment
Control
Measurement
Treatment
Experimental
Measurement
Between-group Design
Advantages
Simplicity
Less chance of practice or fatigue
Useful when it is impossible for an individual
to participate in all experimental conditions
Disadvantages
Expense in terms of time, effort and number
Repeated-Measure Design
Treatment
Measurement
No Treatment
Measurement
No Treatment
Measurement
Treatment
Measurement
Repeated-Measures Design
Advantages
Economy
Sensitivity
Disadvantages
Carry-over effect
The need for the condition to be reversible
Repeated-Measure Design
Treatment
Measurement
No Treatment
Measurement
No Treatment
Measurement
Treatment
Measurement
Multiple Levels of Independent Variable

4 levels
Random allocation
Measurement
Measurement
Measurement
Measurement
Treatment
Control
Treatment
Level A
Treatment
Level B
Treatment
Level C
Measurement
Measurement
Measurement
Measurement
Latin Square Design

To deal with the problem of order effects in
within-subject designs.
The order of the various conditions is
counter-balanced so each possible order
of condition occurs just once.
Example
Three conditions A, B,
and C.
Avoid systemic order
effect
6 possible
combinations of order
Order of conditions
or trials
Group 1
Group 2
Group 3
James Brown EDGAR programme

www.jic.bbsrc.ac.uk/services/statistics/edgar.htm
Multi-Factorial Design
Two or three independent variables in the
same study
Not recommended to use more than 2 as
the analysis gets quite complicated and
conclusion have to be qualified.
Gender and treatment
Day and performance
Correlations
How changes in one variable alters
another variable?
Can have more than one variable and
determine the contribution or weight or %
that each variable has on the measured
value.
Multi-factorial, multi-dimensions, e.g.
physics
Correlations are not cause and effect
Statistical Analysis
STATISTICS
Descriptive
Inferential
Graphs/tables
Numerical summary
Calculation of probability
Tests of significance
Descriptive statistics summarizes the

data obtained
Mode, Median, Mean
Range, Quartiles, Squared Sum, Variance,
Standard deviation
Inferential statistics is used to reach

certain conclusions that can be applied to
public health planning or patient care.
Modern knowledge methods and
objective decision making process
Measurement
Types of measurements
Nominal
Ordinal
Interval
Ratio
Is what you measured, a real measure of

what you are interested in?
It is a measure what you are interested in.
Random (non-systematic) errors.
Non-random errors (systemic) errors.
Descriptive
Statistics
Summarize the data
Measure of central tendency
Spread of data
Graphs
Tables
Mean
Mode
Median
Standard deviation (sd)

Standard error of the mean (sem)
rannge
Line graphs
Histograms
Pie charts
Measuring the accuracy of the

mean
The mean is a statistic that predicts the

likely score of a person or measurement.
Most of the time, it will not a number in the
data or measurement a hypothetical
value.
Mean number of children is 2.5 per family
in the UK.
The mean will only be a prefect

representation of the data if all of the
scores collected are the same.
The closer the mean is to all the data points the more representative is
the mean of the data
mean
5
4
mean
3
2
1
0
0
Distribution of data
Means are the
same
Distribution of the
scores is different
Range
Distribution measures
Range
Quartile
Mean deviation
Sum of squared deviation
Mean of the sum of squared deviation
variance
Standard deviation
Standard deviation and percentage

of data
1 sd
68%
2 sd
96%
1.96 sd 95%
Population and sample parameters
Population
Parameters
(mu, mean)
(sigma, standard
deviation)
Sample
Parameters
X (mean)
Sd (standard
deviation
estimate

How well does the sample mean represent the
population mean?
This is after all the purpose of research.
If we take different samples from the same
population, the mean for some samples will be
similar and for other different.
But usually we do not take many samples, we
take usually 1 sample.
So how good an estimate of the population
mean is it?
Population and sample parameters
X = 10
X = 11
X=9
X = 12
X = 11
X = 10
X = 10
Why Inferential Statistics?
Descriptive statistics does not help to answer

research questions.
Most of modern research is hypothesis driven.
There are two common hypothesis
(predictions) inherent
1. Experimental hypothesis (HA)
The experimental manipulation will have an effect.
2. Null hypothesis (H0)

The experimental manipulation will have no effect
So what if there is a difference

between the means?
Two samples from the same population will have
different means.
Samples taken from the extremes of the
distribution will have large difference in the
means.
So large differences suggest that the means are
from two different populations, but how can we
be certain that it not due to sampling from the
extreme of the distribution.
Application of inferential statistic.
The question
what is the probability that the difference
between the means is due to chance and
not due to my manipulation?
Carry out tests of significance

The tests of significance give the probability of obtaining
the difference between the means by chance.
Choice of test depends on the type of measurement and
the experimental design.
Whichever test you use you will end up with a number
the test statistic,e.g. t, z, F.
What is the probability of obtaining that value of the test
statistic for the means and the spread of data that I
have?
On what basis do I or anyone else accept that my
manipulation did indeed have an effect?
Test Statistic
A test statistic (t, F, z) is a statistic with a known
distribution.
e.g. Age and death
Test statistic = systemic variation / unsystemic

variation.
The exact form of this equation changes from
test to test, but essentially the comparison is the
amount of variance created by an experimental
effect against the amount of variance due to
random factors.
If the experiment has had an effect then it will
create more variance than random factors alone.
Go back
Ronald Aylmer Fisher

1890-1962
"To call in the
statistician after the
experiment is done
may be no more
than asking him to
perform a postmortem
examination: he
may be able to say
what the experiment
died of."
Fisher tea test

Is the milk added after or before the tea is
added?
The level of significance p < 0.05.
There is a 5% or less probability that the
difference between the means is due to chance.
I am therefore willing to accept with a 95%
confidence that the difference I obtained was
due to my manipulation and not due to chance.
My results are significant but are they important?
Which test to use?
Sample size
Data distribution
Homogeneity of variance
Independent measurements or repeated
measurements
Comparison between 2 groups
Comparison between 3 or more groups
Type of data
Parametric or Non-parametric
Parametric Tests
1. Arithmetic means
Interval or ratio data
2. Variance in the one condition = variance in the

other condition
Homogeneity of variance (Levenes test)
Sphericity assumption (Mauchlys test)
3. Normal distribution
1. Tests
Plot histogram
Kolmogorov-Smirnov test
Shapiro-Wilk test
Parametric tests
Students t-test
Independent t-test
Dependent t-test
Analysis of Variance (ANOVA)

One-Way ANOVA
Two-Way ANOVA
Repeated measures ANOVA
Mixed ANOVA
Students t-test
Only two groups of data
Sample size < 30
Ratio difference between the means /
estimated s.e. of the difference between
those two sample means.
Independent t-test
unpaired
Two groups
Different participants, i.e. each participant
contributes only one data point
Normal distribution
Levenes test not significant (p = 0.492)
Presenting the data:
m = 24.2, se = 1.49; m = 20.00, se = 1.30.
t(18) = 2.12, p < 0.05, r = 0.45.
Analysis of Variance (ANOVA)

For 3 or more experimental group
Generates a F ratio
Tells whether the means of the samples are significantly
different.
But not which means.
Additional tests need to be done to determine which
means are statistically different (planned comparison or
post-hoc tests).
Types of ANOVA
One-Way ANOVA
Two-Way ANOVA
Repeated measures ANOVA
Mixed ANOVA
One-Way ANOVA
3 or more groups
Different participants in each group
One-way for 1 independent variable
Groups are independent, i.e. data in one group
does not influence the data in another group
Homogeneity of variance
If Levenes test is significant
Use non-parametric tests or transform the data
Output from SPSS

Sum of
squares
df
Mean
square
Sig
Between
groups
450.664
90.133
269.733
0.000
Within
groups
38.094
114
0.334
Total
488.758
Presentation of results
F (5, 114) = 269.73, p < 0.05
Post-hoc tests
9 REGWQ or HSD
Equal sample size and homogeneity of variance
9 Bonferroni
Conservative, effected by group variance
9 Gabriels Procedure
Slight differences in sample size
9 Hochbergs GT2
Big differences in sample size
9 Games-Howell Procedure
If the variances are not equal
Assumption of sphericity
Equality of variance of the differences between
treatment levels
Mauchlys test
Violation of sphericity is loss of power, i.e.

increased probability of Type II error
Correction df by
Greenhouse-Geisser estimate of sphericity
Huyn-Feldt estimate of sphericity
Lowest possible estimate of sphericity (lower-bound)
If > 0.75 use Huyn-Feldt
If < 0.75 use Greenhouse-Geiser
Some sort of result

Mauchlys test indicated that the
assumption of sphericity had been
violated, 2(5) = 13.12, p < 0.05,therefore
degrees of freedom were corrected using
Huyn-Feldt estimates of sphericity ( =
0.85). The result showed that the number
of women eyed-up was significantly
affected by the amount of alcohol drunk,
F(2.55, 48.40) = 4.73, p < 0.05, r = 0.40)
Non-Parametric Test
Assumption-free tests
Less strict about the distribution of the data
being analyzed.
Thus raw scores are used
Data is ranked
Lowest score rank 1, next lowest rank 2, etc
Analysis is carried out on the ranks

Less power than parameteric tests hence higher
chance of Type II error.
Basic tests
Mann-Whitney test
= independent t-test
Wilcoxon Signed-Rank test

= dependent t-test
Kruskal-Wallis test
= one-way ANOVA
Friedmans ANOVA
= one-way repeated measures ANOVA
Mann-Whitney Test
= independent t-test
Two conditions and different participants in each
condition
Men (M dn = 27) did not seem to differ from

dogs (M dn = 24) in the amount of dog-like
behaviour they displayed (U = 194.5, ns)
Graphs are usually box and whisker plots.
Use of the median
Non-parametric tests are testing the difference in
ranks; not the difference in means.
Means are biased by outliers; whereas ranks are not.
Kruskal-Wallis Test
= one-way ANOVA
More than 2 conditions and different participants have been used in
each condition.
The test statistic is H and has chi-squared distribution

Childrens fear beliefs about clowns were significantly
affected by the format of information given to them (H(3)
= 17.06, p < 0.01). Mann-Whitney test were used to
follow-up this finding. A Bonferroni correction was
applied and so all effects are reported at a 0.167 level of
significance. It appears that fear beliefs were significantly
higher after the adverts compared to the control, U =
37.50, r = -.60.
Claude Bernard
(1813-1878)
Those whose minds are
bound and cramped. They
make poor observations,
because they choose
among the results of their
experiments, observation,
and reading only what suits
their object, neglecting
whatever is unrelated to it
and carefully setting aside
everything which might
tend toward the idea they
wish to combat.
Ignaz Semmelweis
(1818-1865)
Savior of mothers
10-35% mortality
Death of colleague Jacob
Kolletschka (1847)
Introduced lime washing
of hands
12.24% to 2.38%.
The Etiology, Concept
and Prophylaxis of
Childhood Fever (1861).
Semmelweis Reflex
Dismissing or rejecting out of hand
information, automatically, without thought,
inspection, or experiment.
References
Andy Field and Graham Hole. How to Design
and Report Experiments. Sage Publications.
London. 2006.
Martin Bland. An Introduction to Medical
Statistics. 2nd ed. Oxford Medical Publications.
1996.
Norman T. J. Bailey. Statistical Methods in
Biology. 3rd Ed. Cambridge University Press.
1995.
Beth Dawson and Robert G. Trapp. Basic and
Clinical Biostatistics. Lange Medical
Books/McGraw-Hill. 3rd ed. 2001.

Experimental Design

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Experimental Design

Uploaded by

Copyright:

Available Formats

Experimental Design

Nilesh B. Patel, PhD

What can be studied with the

The capital of Kenya is Nairobi.

Three Aims of Research

Research can possess all the above qualities

Consists of a number made up of different

We want the score or measurement to consist

Is what you measured, a real measure of

Evidence for the intellectual

MWM and disruption of biological clock

There is a difference so my treatment

animal handler effect

The need of control group

Lazaro Sapallanzani 1793

Experiment with control group

Great!!! There is a difference so my

Multiple Levels of Independent Variable

Latin Square Design

James Brown EDGAR programme

Descriptive statistics summarizes the

Inferential statistics is used to reach

Is what you measured, a real measure of

Measure of central tendency

Standard deviation (sd)

Measuring the accuracy of the

The mean is a statistic that predicts the

The mean will only be a prefect

Standard deviation and percentage

Population and sample parameters

Standard error of the mean (sem)

Population and sample parameters

Why Inferential Statistics?

Descriptive statistics does not help to answer

2. Null hypothesis (H0)

So what if there is a difference

Carry out tests of significance

Test statistic = systemic variation / unsystemic

Ronald Aylmer Fisher

Fisher tea test

Which test to use?

2. Variance in the one condition = variance in the

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA)

Output from SPSS

Violation of sphericity is loss of power, i.e.

Some sort of result

Analysis is carried out on the ranks

Wilcoxon Signed-Rank test

Men (M dn = 27) did not seem to differ from

The test statistic is H and has chi-squared distribution

You might also like