Professional Documents
Culture Documents
Lecture Notes
Joshua M. Tebbs
Department of Statistics
University of South Carolina
TABLE OF CONTENTS
Contents
3 Modeling Random Behavior
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2
Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1
3.2.2
3.2.3
3.2.4
10
3.2.5
12
3.3
15
3.4
17
3.4.1
Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . .
23
3.4.2
Geometric distribution . . . . . . . . . . . . . . . . . . . . . . . .
28
3.4.3
30
3.4.4
Hypergeometric distribution . . . . . . . . . . . . . . . . . . . . .
32
3.4.5
Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . .
35
3.5
3.6
. . . . . . . . . . . . . . . . . . . . . . . .
37
3.5.1
Exponential distribution . . . . . . . . . . . . . . . . . . . . . . .
42
3.5.2
Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . .
46
3.5.3
Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . .
49
53
3.6.1
Weibull distribution
. . . . . . . . . . . . . . . . . . . . . . . . .
54
3.6.2
Reliability functions . . . . . . . . . . . . . . . . . . . . . . . . .
57
4 Statistical Inference
63
4.1
63
4.2
65
TABLE OF CONTENTS
4.3
67
4.4
. . . . . . . . . . . . . . . . . . . . .
69
4.4.1
71
4.4.2
t distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
4.4.3
77
78
4.5.1
78
4.5.2
85
4.5.3
88
4.6
90
4.7
95
4.8
4.5
4.9
4.8.1
4.8.2
4.10 Condence interval for the ratio of two population variances 22 /12 : Independent samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
4.11 Condence intervals for the dierence of two population means 1 2 :
Dependent samples (Matched-pairs) . . . . . . . . . . . . . . . . . . . . . 116
4.12 One-way analysis of variance . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.12.1 Overall F test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.12.2 Follow up analysis: Tukey pairwise condence intervals . . . . . . 130
6 Linear regression
133
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.2
ii
TABLE OF CONTENTS
6.3
6.4
6.2.2
. . 139
6.2.3
6.2.4
6.2.5
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.3.2
6.3.3
6.3.4
6.3.5
6.3.6
6.3.7
7 Factorial Experiments
174
7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
7.2
7.3
iii
CHAPTER 3
3.1
Introduction
TERMINOLOGY : Statistics is the development and application of theory and methods to the collection, design, analysis, and interpretation of observed information from
planned and unplanned studies.
Statisticians get to play in everyone elses back yard. (John Tukey, Princeton)
Here are some examples where statistics could be used:
1. In a reliability (time to event) study, an engineer is interested in quantifying the
time until failure for a jet engine fan blade.
2. In an agricultural study in Iowa, researchers want to know which of four fertilizers
(which vary in their nitrogen contents) produces the highest corn yield.
3. In a clinical trial, physicians want to determine which of two drugs is more eective
for treating HIV in the early stages of the disease.
4. In a public health study, epidemiologists want to know whether smoking is linked
to a particular demographic class in high school students.
5. A food scientist is interested in determining how dierent feeding schedules (for
pigs) could aect the spread of salmonella during the slaughtering process.
6. A pharmacist posits that administering caeine to premature babies in the ICU at
Richland Hospital will reduce the incidence of necrotizing enterocolitis.
7. A research dietician wants to determine if academic achievement is related to body
mass index (BMI) among African American students in the fourth grade.
PAGE 1
CHAPTER 3
8. An economist as part of President Obamas re-election campaign is trying to forecast the monthly unemployment and under-employment rates for 2012.
REMARK : Statisticians use their skills in mathematics and computing to formulate
statistical models and analyze data for a specic problem at hand. These models are
then used to estimate important quantities of interest (to the researcher), to test the
validity of important conjectures, and to predict future behavior. Being able to identify
and model sources of variability is an important part of this process.
TERMINOLOGY : A deterministic model is one that makes no attempt to explain
variability. For example, in chemistry, the ideal gas law states that
P V = nRT,
where P = pressure of a gas, V = volume, n = the amount of substance of gas (number
of moles), R = Boltzmanns constant, and T = temperature. In circuit analysis, Ohms
law states that
V = IR,
where V = voltage, I = current, and R = resistance.
In both of these models, the relationship among the variables is completely determined without any ambiguity.
In real life, this is rarely true for the obvious reason: there is natural variation that
arises in the measurement process.
For example, a common electrical engineering experiment involves setting up a
simple circuit with a known resistance R. For a given current I, dierent students
will then calculate the voltage V .
With a sample of n = 20 students, conducting the experiment in succession,
we might very well get 20 dierent measured voltages!
A deterministic model is too simplistic for real life; it does not acknowledge
the inherent variability that arises in the measurement process.
PAGE 2
CHAPTER 3
CHAPTER 3
3.2
3.2.1
Probability
Sample spaces and events
CHAPTER 3
(c) Four equally qualied applicants (a, b, c, d) are competing for two positions. If the
positions are identical (so that selection order does not matter), then
S = {ab, ac, ad, bc, bd, cd}.
The size of the set of all possible outcomes is nS = 6. If the positions are dierent (e.g.,
project leader, assistant project leader, etc.), then
S = {ab, ba, ac, ca, ad, da, bc, cb, bd, db, cd, dc}.
In this case, the size of the set of all possible outcomes is nS = 12.
TERMINOLOGY : Suppose that S is a sample space for a random experiment. We say
that A is an event in S if the outcome satises
A S.
GOAL: We would like to develop a mathematical framework so that we can assign probability to an event A. This will quantify how likely the event is. The probability that
the event A occurs is denoted by P (A).
INTUITIVE : Suppose that a sample space S contains nS < outcomes, each of which
is equally likely. If the event A contains nA outcomes, then
P (A) =
nA
.
nS
This is called an equiprobability model. Its main requirement is that all outcomes in
S are equally likely.
Important: If the outcomes in S are not equally likely, then this result is not
applicable.
Example 3.2. In the random experiments from Example 3.1, we use the previous result
to assign probabilities to events (if applicable).
(a) The Michigan state lottery calls for a three-digit integer to be selected:
S = {000, 001, 002, ..., 998, 999}.
PAGE 5
CHAPTER 3
The size of the set of all possible outcomes is nS = 1000. Let the event
A = {000, 005, 010, 015, ..., 990, 995}
= {winning number is a multiple of 5}.
There are nA = 200 outcomes in A. It is reasonable to assume that each outcome in S
is equally likely. Therefore,
P (A) =
200
= 0.20.
1000
3
= 0.50.
6
PAGE 6
CHAPTER 3
INTERPRETATION : In general, what does P (A) really measure? There are two main
interpretations:
P (A) measures the likelihood that A will occur on any given experiment.
If the experiment is performed many times, then P (A) can be interpreted as the
percentage of times that A will occur over the long run. This is called the relative
frequency interpretation.
If we are using the former interpretation, then it is common to use a decimal representation; e.g.,
P (A) = 0.50.
If we are using the latter, it is commonly accepted to say something like the event A will
occur 50 percent of the time. This gives the impression that A will occur, on average,
1 out of every 2 times the experiment is performed. This does not mean that the event
will occur exactly 1 out of every 2 times the experiment is performed.
3.2.2
CHAPTER 3
nAB
6
= = 0.75,
nS
8
we would be making the assumption that each outcome in S is equally likely! This would
only be true if each system (primary and both backups) functions with probability equal
to 1/2. This is extremely unlikely! Therefore, we can not compute probabilities in this
example without additional information about the system-specic failure rates.
Example 3.4. Hemophilia is a sex-linked hereditary blood defect of males characterized
by delayed clotting of the blood which makes it dicult to control bleeding. When a
woman is a carrier of classical hemophilia, there is a 50 percent chance that a male child
PAGE 8
CHAPTER 3
will inherit this disease. If a carrier gives birth to two males (not twins), what is the
probability that either will have the disease? both will have the disease?
Solution. We can envision the process of having two male children as an experiment
with sample space
S = {++, +, +, },
where + means the male ospring has the disease; means the male does not have
the disease. To compute the probabilities requested in this problem, we will
assume that each outcome in S is equally likely. Dene the events:
A = {rst child has disease} = {++, +}
B = {second child has disease} = {++, +}.
The union and intersection of A and B are, respectively,
A B = {either child has disease} = {++, +, +}
A B = {both children have disease} = {++}.
The probability that either male child will have the disease is
P (A B) =
nAB
3
= = 0.75.
nS
4
The probability that both male children will have the disease is
P (A B) =
3.2.3
nAB
1
= = 0.25.
nS
4
P
Ai =
P (Ai ).
i=1
PAGE 9
i=1
CHAPTER 3
Ai = A1 A2 An .
i=1
3.2.4
IDEA: In some situations, we may be fortunate enough to have prior knowledge about
the likelihood of other events related to the event of interest. We can then incorporate
this information into a probability calculation.
TERMINOLOGY : Let A and B be events in a sample space S with P (B) > 0. The
conditional probability of A, given that B has occurred, is
P (A|B) =
P (A B)
.
P (B)
P (B|A) =
P (A B)
.
P (A)
Similarly,
Example 3.5. In a company, 36 percent of the employees have a degree from a SEC
university, 22 percent of the employees that have a degree from the SEC are also engineers,
and 30 percent of the employees are engineers. An employee is selected at random.
(a) Compute the probability that the employee is an engineer and is from the SEC.
(b) Compute the conditional probability that the employee is from the SEC, given that
s/he is an engineer.
Solution: Dene the events
A = {employee is an engineer}
B = {employee is from the SEC}.
PAGE 10
CHAPTER 3
From the information in the problem, we are given: P (A) = 0.30, P (B) = 0.36, and
P (A|B) = 0.22. In part (a), we want P (A B). Note that
0.22 = P (A|B) =
P (A B)
P (A B)
=
.
P (B)
0.36
Therefore,
P (A B) = 0.22(0.36) = 0.0792.
In part (b), we want P (B|A). From the denition of conditional probability:
P (B|A) =
P (A B)
0.0792
=
= 0.264.
P (A)
0.30
IMPORTANT : Note that, in this example, the conditional probability P (B|A) and the
unconditional probability P (B) are not equal.
In other words, knowledge that A has occurred has changed the likelihood that
B occurs.
In other situations, it might be that the occurrence (or non-occurrence) of a companion event has no eect on the probability of the event of interest. This leads us
to the denition of independence.
TERMINOLOGY : When the occurrence or non-occurrence of B has no eect on whether
or not A occurs, and vice-versa, we say that the events A and B are independent.
Mathematically, we dene A and B to be independent if and only if
P (A B) = P (A)P (B).
Note that if A and B are independent,
P (A|B) =
P (A)P (B)
P (A B)
=
= P (A)
P (B)
P (B)
P (B|A) =
P (B A)
P (B)P (A)
=
= P (B).
P (A)
P (A)
and
Note: These results only apply if A and B are independent. In other words, if A and B
are not independent, then these rules do not apply.
PAGE 11
CHAPTER 3
Example 3.6. In an engineering system, two components are placed in a series; that is,
the system is functional as long as both components are. Each component is functional
with probability 0.95. Dene the events
A1 = {component 1 is functional}
A2 = {component 2 is functional}
so that P (A1 ) = 0.95 and P (A2 ) = 0.95. The probability that the system is functional
is given by P (A1 A2 ).
If the components operate independently, then A1 and A2 are independent events
so that
P (A1 A2 ) = P (A1 )P (A2 ) = 0.95(0.95) = 0.9025.
If the components do not operate independently; e.g., failure of one component
wears on the other, then we can not compute P (A1 A2 ) without additional
knowledge.
3.2.5
CHAPTER 3
P (A|B)P (B)
P (A|B)P (B)
=
.
P (A)
P (A|B)P (B) + P (A|B)P (B)
Example 3.7. The probability that train 1 is on time is 0.95. The probability that train
2 is on time is 0.93. The probability that both are on time is 0.90. Dene the events
A1 = {train 1 is on time}
A2 = {train 2 is on time}.
We are given that P (A1 ) = 0.95, P (A2 ) = 0.93, and P (A1 A2 ) = 0.90.
(a) What is the probability that train 1 is not on time?
P (A1 ) = 1 P (A1 )
= 1 0.95 = 0.05.
(b) What is the probability that at least one train is on time?
P (A1 A2 ) = P (A1 ) + P (A2 ) P (A1 A2 )
= 0.95 + 0.93 0.90 = 0.98.
PAGE 13
CHAPTER 3
(c) What is the probability that train 1 is on time given that train 2 is on time?
P (A1 A2 )
P (A2 )
0.90
=
0.968.
0.93
P (A1 |A2 ) =
(d) What is the probability that train 2 is on time given that train 1 is not on time?
P (A1 A2 )
P (A1 )
P (A2 ) P (A1 A2 )
=
1 P (A1 )
0.93 0.90
=
= 0.60.
1 0.95
P (A2 |A1 ) =
CHAPTER 3
(b) Suppose that the policy-holder does have an accident. What is the probability that
s/he was accident-prone?
Solution: We want P (B|A). By Bayes Rule,
P (A|B)P (B)
P (A|B)P (B) + P (A|B)P (B)
0.4(0.3)
=
0.46.
0.4(0.3) + 0.2(0.7)
P (B|A) =
3.3
CHAPTER 3
Example 3.9. Classify the following random variables as discrete or continuous and
specify the support of each random variable.
V
W = pH of an aqueous solution
X = length of time between accidents at a factory
Y
CHAPTER 3
3.4
pY (y) = 1.
all y
Example 3.10. A mail-order computer business has six telephone lines. Let Y denote
the number of lines in use at a specic time. Suppose that the probability mass function
(pmf) of Y is given by
y
pY (y)
0.10 0.15
0.20
0.25
0.20
0.06
0.04
Figure 3.1 (left) displays pY (y), the probability mass function (pmf) of Y .
The height of the bar above y is equal to pY (y) = P (Y = y).
If y is not equal to 0, 1, 2, 3, 4, 5, 6, then pY (y) = 0.
PAGE 17
0.0
0.00
0.05
0.2
0.10
0.4
F(y)
0.15
p(y)
0.6
0.20
0.8
0.25
1.0
0.30
CHAPTER 3
pY (y)
0.10 0.15
0.20
0.25
0.20
0.06
0.04
FY (y)
0.10 0.25
0.45
0.70
0.90
0.96
1.00
PAGE 18
CHAPTER 3
(a) What is the probability that exactly two lines are in use?
pY (2) = P (Y = 2) = 0.20.
(b) What is the probability that at most two lines are in use?
P (Y 2) = P (Y = 0) + P (Y = 1) + P (Y = 2)
= pY (0) + pY (1) + pY (2)
= 0.10 + 0.15 + 0.20 = 0.45.
Note: This is also equal to FY (2) = 0.45.
(c) What is the probability that at least ve lines are in use?
P (Y 5) = P (Y = 5) + P (Y = 6)
= pY (5) + pY (6) = 0.06 + 0.04 = 0.10.
It is also important to note that in part (c), we could have computed
P (Y 5) = 1 P (Y 4)
= 1 FY (4) = 1 0.90 = 0.10.
TERMINOLOGY : Let Y be a discrete random variable with pmf pY (y). The expected
value of Y is given by
= E(Y ) =
ypY (y).
all y
The expected value for a discrete random variable Y is simply a weighted average of the
possible values of Y . Each value y is weighted by its probability pY (y). In statistical
applications, = E(Y ) is commonly called the population mean.
Example 3.11. In Example 3.10, we examined the distribution of Y , the number of
lines in use at a specied time. The probability mass function (pmf) of Y is given by
y
pY (y)
0.10 0.15
0.20
0.25
0.20
0.06
0.04
PAGE 19
CHAPTER 3
= E(Y ) =
ypY (y)
all y
g(y)pY (y).
E[g(Y )] =
all y
pY (y)
0.2
0.3
0.3
0.2
PAGE 20
F(y)
0.0
0.0
0.2
0.1
0.4
0.2
p(y)
0.6
0.3
0.8
0.4
1.0
CHAPTER 3
0.0
0.5
1.0
1.5
2.0
2.5
3.0
all y
We would expect 1.5 gallons of the toxic chemical to be produced per hour (on average).
(b) The cost (in hundreds of dollars) to produce Y gallons is given by the cost function
g(Y ) = 3 + 12Y + 2Y 2 . What is the expected cost in a one-hour period?
Solution: We want to compute E[g(Y )]. We rst compute E(Y 2 ):
E(Y 2 ) =
all y
Therefore,
E[g(Y )] = E(3 + 12Y + 2Y 2 ) = 3 + 12E(Y ) + 2E(Y 2 )
= 3 + 12(1.5) + 2(3.3) = 27.6.
The expected hourly cost is $2, 760.00.
PAGE 21
CHAPTER 3
TERMINOLOGY : Let Y be a discrete random variable with pmf pY (y) and expected
value E(Y ) = . The population variance of Y is given by
2 var(Y ) E[(Y )2 ]
=
(y )2 pY (y).
all y
The population standard deviation of Y is the positive square root of the variance:
=
2 =
var(Y ).
pY (y)
0.2
0.3
0.3
0.2
CHAPTER 3
= 2 = 1.05 1.025.
3.4.1
Binomial distribution
CHAPTER 3
Albino rats used to study the hormonal regulation of a metabolic pathway are
injected with a drug that inhibits body synthesis of protein. The probability that
a rat will die from the drug before the study is complete is 0.20.
rat = trial
dies before study is over = success
p = P (success) = P (dies early) = 0.20.
n py (1 p)ny , y = 0, 1, 2, ..., n
y
pY (y) =
0,
otherwise.
MEAN/VARIANCE: If Y b(n, p), then
E(Y ) = np
var(Y ) = np(1 p).
Example 3.15. In an agricultural study, it is determined that 40 percent of all plots
respond to a certain treatment. Four plots are observed. In this situation, we interpret
plot of land = trial
plot responds to treatment = success
p = P (success) = P (responds to treatment) = 0.4.
PAGE 24
F(y)
0.0
0.0
0.2
0.1
0.4
0.2
p(y)
0.6
0.3
0.8
0.4
1.0
CHAPTER 3
Figure 3.3: PMF (left) and CDF (right) of Y b(n = 4, p = 0.4) in Example 3.15.
If the Bernoulli trial assumptions hold (independent plots, same response probability for
each plot), then
Y = the number of plots which respond b(n = 4, p = 0.4).
(a) What is the probability that exactly two plots respond?
( )
4
P (Y = 2) = pY (2) =
(0.4)2 (1 0.4)42
2
= 6(0.4)2 (0.6)2 = 0.3456.
(b) What is the probability that at least one plot responds?
( )
4
(0.4)0 (1 0.4)40
P (Y 1) = 1 P (Y = 0) = 1
0
= 1 1(1)(0.6)4 = 0.8704.
(c) What are E(Y ) and var(Y )?
E(Y ) = np = 4(0.4) = 1.6
var(Y ) = np(1 p) = 4(0.4)(0.6) = 0.96.
PAGE 25
CHAPTER 3
Example 3.16. An electronics manufacturer claims that 10 percent of its power supply
units need servicing during the warranty period. To investigate this claim, technicians
at a testing laboratory purchase 30 units and subject each one to an accelerated testing
protocol to simulate use during the warranty period. In this situation, we interpret
power supply unit = trial
supply unit needs servicing during warranty period = success
p = P (success) = P (supply unit needs servicing) = 0.1.
If the Bernoulli trial assumptions hold (independent units, same probability of needing
service for each unit), then
Y
FY (y) = P (Y y)
dbinom(y,n,p)
pbinom(y,n,p)
(a) What is the probability that exactly ve of the 30 power supply units require
servicing during the warranty period?
( )
30
pY (5) = P (Y = 5) =
(0.1)5 (1 0.1)305
5
dbinom(5,30,0.1) = 0.1023048.
(b) What is the probability that at most ve of the 30 power supply units require
servicing during the warranty period?
5 ( )
30
FY (5) = P (Y 5) =
(0.1)y (1 0.1)30y
y
y=0
pbinom(5,30,0.1) = 0.9268099.
PAGE 26
0.0
0.00
0.2
0.05
0.4
0.10
p(y)
F(y)
0.6
0.15
0.8
0.20
1.0
0.25
CHAPTER 3
10
15
20
25
30
10
15
20
25
30
Figure 3.4: PMF (left) and CDF (right) of Y b(n = 30, p = 0.1) in Example 3.16.
(c) What is the probability at least ve of the 30 power supply units require service?
4 ( )
30
(0.1)y (1 0.1)30y
P (Y 5) = 1 P (Y 4) = 1
y
y=0
1-pbinom(4,30,0.1) = 0.1754949.
(d) What is P (2 Y 8)?
8 ( )
30
P (2 Y 8) =
(0.1)y (1 0.1)30y .
y
y=2
CHAPTER 3
3.4.2
Geometric distribution
NOTE : The geometric distribution also arises in experiments involving Bernoulli trials:
1. Each trial results in a success or a failure.
2. The trials are independent.
3. The probability of success, denoted by p, 0 < p < 1, is the same on every trial.
(1 p)y1 p, y = 1, 2, 3, ...
pY (y) =
0,
otherwise.
MEAN/VARIANCE: If Y geom(p), then
1
p
1p
var(Y ) =
.
p2
E(Y ) =
Example 3.17. Biology students are checking the eye color of fruit ies. For each y,
the probability of observing white eyes is p = 0.25. In this situation, we interpret
fruit y = trial
y has white eyes = success
p = P (success) = P (white eyes) = 0.25.
PAGE 28
0.0
0.00
0.2
0.05
0.4
0.10
p(y)
F(y)
0.6
0.15
0.8
0.20
1.0
0.25
CHAPTER 3
10
15
20
25
10
15
20
25
Figure 3.5: PMF (left) and CDF (right) of Y geom(p = 0.25) in Example 3.17.
If the Bernoulli trial assumptions hold (independent ies, same probability of white eyes
for each y), then
Y
(a) What is the probability the rst white-eyed y is observed on the fth y checked?
pY (5) = P (Y = 5) = (1 0.25)51 (0.25)
= (0.75)4 (0.25) 0.079.
(b) What is the probability the rst white-eyed y is observed before the fourth y is
examined? Note: For this to occur, we must observe the rst white-eyed y (success)
on either the rst, second, or third y.
FY (3) = P (Y 3) = P (Y = 1) + P (Y = 2) + P (Y = 3)
= (1 0.25)11 (0.25) + (1 0.25)21 (0.25) + (1 0.25)31 (0.25)
= 0.25 + 0.1875 + 0.140625 0.578.
PAGE 29
CHAPTER 3
FY (y) = P (Y y)
dgeom(y-1,p)
pgeom(y-1,p)
3.4.3
NOTE : The negative binomial distribution also arises in experiments involving Bernoulli
trials:
1. Each trial results in a success or a failure.
2. The trials are independent.
3. The probability of success, denoted by p, 0 < p < 1, is the same on every trial.
y 1 pr (1 p)yr , y = r, r + 1, r + 2, ...
r1
pY (y) =
0,
otherwise.
PAGE 30
CHAPTER 3
Example 3.18. At an automotive paint plant, 15 percent of all batches sent to the lab
for chemical analysis do not conform to specications. In this situation, we interpret
batch = trial
batch does not conform = success
p = P (success) = P (not conforming) = 0.15.
If the Bernoulli trial assumptions hold (independent batches, same probability of nonconforming for each batch), then
Y
(a) What is the probability the third nonconforming batch is observed on the tenth batch
sent to the lab?
(
)
10 1
pY (10) = P (Y = 10) =
(0.15)3 (1 0.15)103
31
( )
9
=
(0.15)3 (0.85)7 0.039.
2
(b) What is the probability that no more than two nonconforming batches will be
observed among the rst 30 batches sent to the lab? Note: This means the third
nonconforming batch must be observed on the 31st batch observed, the 32nd, the 33rd,
etc.
P (Y 31) = 1 P (Y 30)
)
30 (
y1
= 1
(0.15)3 (0.85)y3 0.151.
3
1
y=3
PAGE 31
0.0
0.00
0.2
0.01
0.4
0.02
p(y)
F(y)
0.6
0.03
0.8
0.04
1.0
0.05
CHAPTER 3
10
20
30
40
50
60
70
20
40
60
Figure 3.6: PMF (left) and CDF (right) of Y nib(r = 3, p = 0.15) in Example 3.18.
NEGATIVE BINOMIAL R CODE: Suppose that Y nib(r, p).
pY (y) = P (Y = y)
FY (y) = P (Y y)
dnbinom(y-r,r,p)
pnbinom(y-r,r,p)
3.4.4
Hypergeometric distribution
SETTING: Consider a population of N objects and suppose that each object belongs to
one of two dichotomous classes: Class 1 and Class 2. For example, the objects (classes)
might be people (infected/not), parts (conforming/not), plots of land (respond to treat-
PAGE 32
CHAPTER 3
y ( ny
) , y r and n y N r
N
pY (y) =
n
0,
otherwise.
MEAN/VARIANCE: If Y hyper(N, n, r), then
(r)
N
( r ) (N r) (N n)
var(Y ) = n
.
N
N
N 1
E(Y ) = n
Example 3.19. A supplier ships parts to a company in lots of 100 parts. The company
has an acceptance sampling plan which adopts the following acceptance rule:
....sample 5 parts at random and without replacement. If there are no defectives in the sample, accept the entire lot; otherwise, reject the entire lot.
In this example, the population size is N = 100. The sample size is n = 5. Dene the
random variable
Y
F(y)
0.0
0.0
0.1
0.2
0.2
0.4
0.3
p(y)
0.6
0.4
0.8
0.5
0.6
1.0
CHAPTER 3
Figure 3.7: PMF (left) and CDF (right) of Y hyper(N = 100, n = 5, r = 10) in
Example 3.19.
(a) If r = 10, what is the probability that the lot will be accepted? Note: The lot will
be accepted only if Y = 0.
(10)(90)
(100)5
0
pY (0) = P (Y = 0) =
1(43949268)
0.584.
75287520
(b) If r = 10, what is the probability that at least 3 of the 5 parts sampled are defective?
P (Y 3) = 1 P (Y 2)
[ ( )( )
10 90
= 1
(10)(90)
(10)(90) ]
FY (y) = P (Y y)
dhyper(y,r,N-r,n) phyper(y,r,N-r,n)
PAGE 34
CHAPTER 3
3.4.5
Poisson distribution
PAGE 35
0.0
0.00
0.05
0.2
0.10
0.4
F(y)
0.15
p(y)
0.6
0.20
0.8
0.25
1.0
0.30
CHAPTER 3
10
10
Figure 3.8: PMF (left) and CDF (right) of Y Poisson( = 2.5) in Example 3.20.
PMF: If Y Poisson(), then the probability mass function of Y is given by
y
e , y = 0, 1, 2, ...
y!
pY (y) =
0,
otherwise.
MEAN/VARIANCE: If Y Poisson(), then
E(Y ) =
var(Y ) = .
Example 3.20. Let Y denote the number of times per month that a detectable amount
of radioactive gas is recorded at a nuclear power plant. Suppose that Y follows a Poisson
distribution with mean = 2.5 times per month.
(a) What is the probability that there are exactly three times a detectable amount of
gas is recorded in a given month?
(2.5)3 e2.5
3!
15.625e2.5
0.214.
=
6
P (Y = 3) = pY (3) =
PAGE 36
CHAPTER 3
(b) What is the probability that there are no more than four times a detectable amount
of gas is recorded in a given month?
P (Y 4) =
(2.5)y e2.5
y!
y=0
(2.5)0 e2.5 (2.5)1 e2.5 (2.5)2 e2.5 (2.5)3 e2.5 (2.5)4 e2.5
+
+
+
+
0!
1!
2!
3!
4!
0.891.
=
FY (y) = P (Y y)
dpois(y,)
ppois(y,)
3.5
CHAPTER 3
FY (y0 ) = P (Y y0 ) =
y0
fY (y)dy.
= FY (y2 ) FY (y1 ).
5. If y0 is a specic value, then P (Y = y0 ) = 0. In other words, in continuous
probability models, specic points are assigned zero probability (see #4 above and
this will make perfect mathematical sense). An immediate consequence of this is
that if Y is continuous,
P (y1 Y y2 ) = P (y1 Y < y2 ) = P (y1 < Y y2 ) = P (y1 < Y < y2 )
and each is equal to
y2
fY (y)dy.
y1
F(y)
0.0
0.0
0.5
0.2
1.0
0.4
1.5
f(y)
0.6
2.0
0.8
2.5
3.0
1.0
CHAPTER 3
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
3y 2 , 0 < y < 1
fY (y) =
0, otherwise.
Find the cumulative distribution function (cdf) of Y .
Solution. For 0 < y < 1,
FY (y) =
3t2 dt
fY (t)dt =
y
= t3 = y 3 .
0
0,
FY (y) =
y<0
y3, 0 y < 1
1, y 1.
PAGE 39
CHAPTER 3
0.3
P (Y < 0.3) =
0
0.3
3y 2 dy = y 3 = (0.3)3 03 = 0.027.
0
Method 2: CDF:
P (Y < 0.3) = FY (0.3) = (0.3)3 = 0.027.
(b) Calculate P (Y > 0.8).
Method 1: PDF:
1
P (Y > 0.8) =
3y 2 dy = y 3 = 13 (0.8)3 = 0.488.
0.8
0.8
Method 2: CDF:
P (Y > 0.8) = 1 P (Y 0.8) = 1 FY (0.8) = 1 (0.8)3 = 0.488.
0.8
0.8
2
3
3y dy = y = (0.8)3 (0.3)3 = 0.485.
0.3
Method 2: CDF:
P (0.3 < Y < 0.8) = FY (0.8) FY (0.3) = (0.8)3 (0.3)3 = 0.485.
TERMINOLOGY : Let Y be a continuous random variable with pdf fY (y). The expected value (or mean) of Y is given by
= E(Y ) =
yfY (y)dy.
NOTE: The limits of the integral in this denition, while technically correct, will always
be the lower and upper limits corresponding to the nonzero part of the pdf.
PAGE 40
CHAPTER 3
FUNCTIONS : Let Y be a continuous random variable with pdf fY (y). Suppose that g
is a real-valued function. Then, g(Y ) is a random variable and
E[g(Y )] =
g(y)fY (y)dy.
TERMINOLOGY : Let Y be a continuous random variable with pdf fY (y) and expected
value E(Y ) = . The population variance of Y is given by
2
2
var(Y ) E[(Y ) ] =
(y )2 fY (y)dy.
2 =
var(Y ).
= E(Y ) =
yfY (y)dy
y 3y 2 dy =
=
0
1
4
3
3y
3y 3 dy =
= .
4
4
0
To nd var(Y ), we will use the computing formula var(Y ) = E(Y 2 ) [E(Y )]2 . We
already have E(Y ) = 3/4.
y 2 fY (y)dy
E(Y ) =
0
y 2 3y 2 dy =
=
0
1
5
3
3y
3y 4 dy =
= .
5
5
Therefore,
3
= var(Y ) = E(Y ) [E(Y )] =
5
2
0.0375 0.194.
PAGE 41
( )2
3
= 0.0375.
4
CHAPTER 3
QUANTILES : Suppose that Y is a continuous random variable with cdf FY (y) and let
0 < p < 1. The pth quantile of the distribution of Y , denoted by p , solves
p
FY (p ) = P (Y p ) =
fY (y)dy = p.
The median of the distribution of Y is the p = 0.5 quantile. That is, the median 0.5
solves
FY (0.5 ) = P (Y 0.5 ) =
0.5
fY (y)dy = 0.5.
NOTE: Another name for the pth quantile is the 100pth percentile.
REMARK : When Y is discrete, there are some potential problems with the denition
that p solves FY (p ) = P (Y p ) = p. The reason is that there may be many values of
p that satisfy this equation. By convention, in discrete distributions, the pth quantile
p is taken to be the smallest value satisfying FY (p ) = P (Y p ) p.
3.5.1
Exponential distribution
ey , y > 0
fY (y) =
0,
otherwise.
Shorthand notation is Y exponential(). Important: The exponential distribution is
used to model the distribution of positive quantities (e.g., lifetimes, etc.).
MEAN/VARIANCE: If Y exponential(), then
1
1
var(Y ) =
.
2
E(Y ) =
CDF: Suppose that Y exponential(). Then, the cdf of Y exists in closed form and
is given by
FY (y) =
0,
y0
1 ey , y > 0.
PAGE 42
f(y)
0.6
0.8
1.0
CHAPTER 3
0.0
0.2
0.4
lambda = 1
lambda = 1/2
lambda = 1/5
10
15
20
Example 3.23. Extensive experience with fans of a certain type used in diesel engines
has suggested that the exponential distribution provides a good model for time until
failure (i.e., lifetime). Suppose that the lifetime of a fan, denoted by Y (measured in
10000s of hours), follows an exponential distribution with = 0.4.
(a) What is the probability that a fan lasts longer than 30,000 hours?
Method 1: PDF:
P (Y > 3) =
3
)
1
0.4e0.4y dy = 0.4 e0.4y
0.4
3
= e0.4y
(
= e
PAGE 43
1.2
0.301.
F(y)
0.0
0.0
0.2
0.1
0.4
0.2
f(y)
0.6
0.3
0.8
0.4
1.0
CHAPTER 3
10
12
10
12
Figure 3.11: PDF (left) and CDF (right) of Y exponential( = 0.4) in Example 3.23.
Method 2: CDF:
P (Y > 3) = 1 P (Y 3) = 1 FY (3)
= 1 [1 e0.4(3) ]
= e1.2 0.301.
(b) What is the probability that a fan will last between 20,000 and 50,000 hours?
Method 1: PDF:
P (2 < Y < 5) =
2
5
1
0.4e0.4y dy = 0.4 e0.4y
0.4
2
5
= e0.4y
2
0.4(5)
= [e
e0.4(2) ]
= e0.8 e2 0.314.
PAGE 44
CHAPTER 3
Method 2: CDF:
P (2 < Y < 5) = FY (5) FY (2)
= [1 e0.4(5) ] [1 e0.4(2) ]
= e0.8 e2 0.314.
MEMORYLESS PROPERTY: Suppose that Y exponential(), and let r and s be
positive constants. Then
P (Y > r + s|Y > r) = P (Y > s).
If Y measures time (e.g., time to failure, etc.), then the memoryless property says that
the distribution of additional lifetime (s time units beyond time r) is the same as the
original distribution of the lifetime. In other words, the fact that Y has made it to
time r has been forgotten. For example, in Example 3.23,
P (Y > 5|Y > 2) = P (Y > 3) 0.301.
CHAPTER 3
pexp(y,)
qexp(p,)
NOTE: The command qexp(0.9,12) gives the 0.90 quantile (90th percentile) of the
exponential( = 12) distribution. In Example 3.24, this means that 90 percent of the
waiting times will be less than approximately 0.192 hours (only 10 percent will exceed).
3.5.2
Gamma distribution
for all t > 0. The gamma function satises the recursive relationship
() = ( 1)( 1),
for > 1. Therefore, if is an integer, then
() = ( 1)!.
PAGE 46
0.15
0.20
0.25
CHAPTER 3
0.00
0.05
0.10
f(y)
10
15
20
25
y 1 ey , y > 0
()
fY (y) =
0,
otherwise.
Shorthand notation is Y gamma(, ).
By changing the values of and , the gamma pdf can assume many shapes. This
makes the gamma distribution popular for modeling positive random variables (it
is more exible than the exponential).
Note that when = 1, the gamma pdf reduces to the exponential() pdf.
PAGE 47
CHAPTER 3
var(Y ) =
.
2
E(Y ) =
CDF: The cdf of a gamma random variable does not exist in closed form. Therefore,
probabilities involving gamma random variables and gamma quantiles must be computed
numerically (e.g., using R, etc.).
GAMMA R CODE: Suppose that Y gamma(, ).
FY (y) = P (Y y)
pgamma(y,,)
qgamma(p,,)
Example 3.25. When a certain transistor is subjected to an accelerated life test, the
lifetime Y (in weeks) is well modeled by a gamma distribution with = 4 and = 1/6.
(a) Find the probability that a transistor will last at least 50 weeks.
P (Y 50) = 1 P (Y < 50) = 1 FY (50)
= 1-pgamma(50,4,1/6)
= 0.03377340.
(b) Find the probability that a transistor will last between 12 and 24 weeks.
P (12 Y 24) = FY (24) FY (12)
= pgamma(24,4,1/6)-pgamma(12,4,1/6)
= 0.4236533.
(c) Twenty percent of the transistor lifetimes will be below which time? Note: I am
asking for the 0.20 quantile (20th percentile) of the lifetime distribution.
> qgamma(0.2,4,1/6)
[1] 13.78072
PAGE 48
F(y)
0.0
0.00
0.2
0.01
0.4
f(y)
0.02
0.6
0.03
0.8
1.0
0.04
CHAPTER 3
10
20
30
40
50
60
70
10
20
30
40
50
60
70
Figure 3.13: PDF (left) and CDF (right) of Y gamma( = 4, = 1/6) in Example
3.25.
3.5.3
Normal distribution
( )2
y
1 e 12 , < y <
2
fY (y) =
0,
otherwise.
0.4
CHAPTER 3
0.2
0.0
0.1
f(y)
0.3
mu = 0, sigma = 1
mu = 2, sigma = 2
mu = 1, sigma = 3
10
10
CDF: The cdf of a normal random variable does not exist in closed form. Probabilities
involving normal random variables and normal quantiles can be computed numerically
(e.g., using R, etc.).
NORMAL R CODE: Suppose that Y N (, 2 ).
FY (y) = P (Y y)
pnorm(y,,)
qnorm(p,,)
Example 3.26. The time it takes for a driver to react to the brake lights on a decelerating
vehicle is critical in helping to avoid rear-end collisions. A recently published study
suggests that this time during in-trac driving, denoted by Y (measured in seconds),
follows a normal distribution with mean = 1.5 and variance 2 = 0.16.
PAGE 50
0.4
F(y)
0.6
0.0
0.0
0.2
0.2
0.4
f(y)
0.6
0.8
0.8
1.0
1.0
CHAPTER 3
0.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Figure 3.15: PDF (left) and CDF (right) of Y N ( = 1.5, 2 = 0.16) in Example 3.26.
(a) What is the probability that reaction time is less than 1 second?
P (Y < 1) = FY (1)
= pnorm(1,1.5,sqrt(0.16))
= 0.1056498.
(b) What is the probability that reaction time is between 1.1 and 2.5 seconds?
P (1.1 Y 2.5) = FY (2.5) FY (1.1)
= pnorm(2.5,1.5,sqrt(0.16))-pnorm(1.1,1.5,sqrt(0.16))
= 0.835135.
(c) Five percent of all reaction times will exceed which time? Note: I am asking for
the 0.95 quantile (95th percentile) of the reaction time distribution.
> qnorm(0.95,1.5,sqrt(0.16))
[1] 2.157941
PAGE 51
CHAPTER 3
About 68 percent of all reaction times will be between 1.1 and 1.9 seconds.
About 95 percent of all reaction times will be between 0.7 and 2.3 seconds.
About 99.7 percent of all reaction times will be between 0.3 and 2.7 seconds.
TERMINOLOGY : A random variable Z is said to have a standard normal distribution if its pdf is given by
0,
otherwise.
Shorthand notation is Z N (0, 1). A standard normal distribution is a special normal
distribution, that is, a normal distribution with mean = 0 and variance 2 = 1. The
variable Z is called a standard normal random variable.
RESULT : If Y N (, 2 ), then
Z=
Y
N (0, 1).
The result says that Z follows a standard normal distribution; i.e., Z N (0, 1). In this
context, Z is called the standardized value of Y .
Important: Therefore, any normal random variable Y N (, 2 ) can be converted
to a standard normal random variable Z by applying this transformation.
PAGE 52
CHAPTER 3
)
y1
y2
P (y1 < Y < y2 ) = P
<Z<
(
)
(
)
y2
y1
= FZ
FZ
.
3.6
TERMINOLOGY : Reliability analysis deals with failure time (i.e., lifetime, time-toevent) data. For example,
T = time from start of product service until failure
T = time of sale of a product until a warranty claim
T = number of hours in use/cycles until failure.
We call T a lifetime random variable if it measures the time to an event; e.g.,
failure, death, eradication of some infection/condition, etc. Engineers are often involved
with reliability studies in practice, because reliability is related to product quality.
PAGE 53
CHAPTER 3
3.6.1
Weibull distribution
t
e(t/) , t > 0
fT (t) =
0,
otherwise.
Shorthand notation is T Weibull(, ).
We call
= shape parameter
= scale parameter.
By changing the values of and , the Weibull pdf can assume many shapes. The
Weibull distribution is very popular among engineers in reliability applications.
Note that when = 1, the Weibull pdf reduces to the exponential( = 1/) pdf.
MEAN/VARIANCE: If T Weibull(, ), then
(
)
1
E(T ) = 1 +
{ (
) [ (
)]2 }
2
1
2
var(T ) = 1 +
1+
.
PAGE 54
0.10
beta = 2, eta = 5
beta = 2, eta = 10
beta = 3, eta = 10
0.00
0.05
f(t)
0.15
0.20
CHAPTER 3
10
15
20
25
CDF: Suppose that T Weibull(, ). Then, the cdf of T exists in closed form and is
given by
FT (t) =
0,
t0
1 e(t/) , t > 0.
( )
3
E(T ) = 10
8.862 hours.
2
(b) What is the probability that a battery is still functional at time t = 20?
P (T 20) = 1 P (T < 20) = 1 FT (20)
= 1 [1 e(20/10) ] 0.018.
2
PAGE 55
0.0
0.00
0.2
0.02
0.4
0.04
f(y)
F(y)
0.6
0.06
0.8
0.08
1.0
0.10
CHAPTER 3
10
15
20
25
10
15
20
25
Figure 3.17: PDF (left) and CDF (right) of T Weibull( = 2, = 10) in Example 3.27.
(c) What is the probability that a battery is still functional at time t = 20 given that
the battery is functional at time t = 10?
P (T 20|T 10) =
P (T 20 and T 10)
P (T 20)
=
P (T 10)
P (T 10)
1 FT (20)
e(20/10)
= (10/10)2 0.050.
1 FT (10)
e
2
set
Solving for 0.99 gives 0.99 21.460 hours. Only one percent of the battery lifetimes
will exceed this value.
WEIBULL R CODE: Suppose that T Weibull(, ).
FT (t) = P (T t)
pweibull(t,,)
qweibull(p,,)
PAGE 56
CHAPTER 3
3.6.2
Reliability functions
DESCRIPTION : We now describe some dierent, but equivalent, ways of dening the
distribution of a (continuous) lifetime random variable T .
The cumulative distribution function (cdf )
FT (t) = P (T t).
This can be interpreted as the proportion of units that have failed by time t.
The survivor function
ST (t) = P (T > t) = 1 FT (t).
This can be interpreted as the proportion of units that have not failed by time t;
e.g., the unit is still functioning, a warranty claim has not been made, etc.
The probability density function (pdf )
fT (t) =
d
d
FT (t) = ST (t).
dt
dt
FT (t) =
fT (u)du
0
and
ST (t) =
fT (u)du.
t
PAGE 57
CHAPTER 3
hT (t) = lim
P (t T < t + )
= lim
0
P (T t)
FT (t + ) FT (t)
fT (t)
1
lim
=
.
=
P (T t) 0
ST (t)
hT (t) = lim
We can therefore describe the distribution of the continuous lifetime random variable T
by using either fT (t), FT (t), ST (t), or hT (t).
Example 3.28. In this example, we nd the hazard function for T Weibull(, ).
Recall that the pdf of T is
( )1
t
e(t/) , t > 0
fT (t) =
0,
otherwise.
The cdf of T is
FT (t) =
t0
0,
1 e(t/) , t > 0.
1,
t0
e(t/) , t > 0.
PAGE 58
h(t)
1.0
20
0
0.0
10
h(t)
30
2.0
40
3.0
CHAPTER 3
h(t)
0
0.6
0.8
1.0
h(t)
1.2
1.4
2
t
Figure 3.18: Weibull hazard functions with = 1. Upper left: = 3. Upper right:
= 1.5. Lower left: = 1. Lower right: = 0.5.
( )1
t
e(t/)
e(t/)
( )1
t
.
Plots of Weibull hazard functions are given in Figure 3.18. It is easy to show
hT (t) is increasing if > 1 (wear out; population of units get weaker with aging)
hT (t) is constant if = 1 (constant hazard; exponential distribution)
hT (t) is decreasing if < 1 (infant mortality; population of units gets stronger with
aging).
PAGE 59
CHAPTER 3
Example 3.29. We consider the data in Example 3.23 of Vining and Kowalski (pp 162).
The data are times, denoted by T (measured in months), to the rst failure for 20 electric
carts used for internal delivery and transportation in a large manufacturing facility.
0.9
1.5
2.3
3.2
3.9
5.0
6.2
7.5
8.3
10.4
16.3
19.3
22.6
24.8
31.5
38.1
53.0
1.110
] 0.075.
(b) Use the estimated distribution to nd the 90th percentile of the cart lifetimes.
FT (0.90 ) = 1 e(0.90 /15.271)
1.110
set
= 0.90.
Solving for 0.90 gives 0.90 32.373 months. Only ten percent of the cart lifetimes will
exceed this value.
PAGE 60
F(t)
0.04
0.00
0.02
f(t)
20
40
0.06
CHAPTER 3
60
20
60
0.04
0.00
h(t)
0.08
S(t)
40
20
40
60
20
40
60
Figure 3.19: Weibull functions with b 1.110 and b 15.271. Upper left: PDF. Upper
right: CDF. Lower left: Survivor function. Lower right: Hazard function.
PAGE 61
30
20
0
10
Observed values
40
50
CHAPTER 4
10
20
30
40
50
Weibull percentiles
Figure 3.20: Weibull qq plot for the electric cart data in Example 3.29. The observed data
are plotted versus the theoretical quantiles from a Weibull distribution with b 1.110
and b 15.271.
On the horizontal axis, we plot the (ordered) theoretical quantiles from the distribution (model) assumed for the observed data.
Our intuition should suggest the following:
If the observed data agrees with the distributions theoretical quantiles, then the
qq plot should look like a straight line (the distribution is a good choice).
If the observed data does not agree with the theoretical quantiles, then the qq
plot should have curvature in it (the distribution is not a good choice).
Interpretation: The Weibull qq plot in Figure 3.20 looks like a straight line. This
suggests that a Weibull distribution is a good t for the electric cart lifetime data.
PAGE 62
CHAPTER 4
Statistical Inference
Complementary reading: Chapter 4 (VK); Sections 4.1-4.8. Read also Sections 3.6-3.7.
4.1
OVERVIEW : This chapter is about statistical inference. This deals with making
(probabilistic) statements about a population of individuals based on information that
is contained in a sample taken from the population.
Example 4.1. Suppose that we wish to study the performance of lithium batteries used
in a certain calculator. The purpose of our study is to determine the mean lifetime of
these batteries so that we can place a limited warranty on them in the future. Since this
type of battery has not been used in this calculator before, no one (except the Oracle) can
tell us the distribution of Y , the batterys lifetime. In fact, not only is the distribution
not known, but all parameters which index this distribution arent known either.
TERMINOLOGY : A population refers to the entire group of individuals (e.g., parts,
people, batteries, etc.) about which we would like to make a statement (e.g., proportion
defective, median weight, mean lifetime, etc.).
It is generally accepted that the entire population can not be measured. It is too
large and/or it would be too time consuming to do so.
To draw inferences (make probabilistic statements) about a population, we therefore
observe a sample of individuals from the population.
We will assume that the sample of individuals constitutes a random sample.
Mathematically, this means that all observations are independent and follow the
same probability distribution. Informally, this means that each sample (of the same
size) has the same chance of being selected. Our hope is that a random sample of
individuals is representative of the entire population of individuals.
PAGE 63
CHAPTER 4
2066 2584
1009
318
1429
981
1402
1137
414
564
604
14
4152
737
852
1560
1786
520
396
1278
209
349
478
3032
1461
701
1406
261
83
205
602
3770
726
3894
2662
497
35
2778
1379
3920
1379
99
510
582
308
3367
99
373
454
In Figure 4.1, we display a histogram of the battery lifetime data. We see that the
(empirical) distribution of the battery lifetimes is skewed towards the high side.
Which continuous probability distribution seems to display the same type of pattern
that we see in histogram?
An exponential() model seems reasonable here (based on the histogram shape).
What is ?
In this example, is called a (population) parameter. It describes the theoretical
distribution which is used to model the entire population of battery lifetimes.
In general, (population) parameters which index probability distributions (like the
exponential) are unknown.
All of the probability distributions that we discussed in Chapter 3 are meant to
describe (model) population behavior.
PAGE 64
Count
10
15
CHAPTER 4
1000
2000
3000
4000
4.2
CHAPTER 4
1
Yi .
n i=1
n
Y =
The sample variance is
1
(Yi Y )2 .
S =
n 1 i=1
n
The sample standard deviation is the positive square root of the sample variance;
i.e.,
v
u
u
2
S= S =t
1
(Yi Y )2 .
n 1 i=1
n
Important: Unlike their population analogues, these quantities can be computed from
a sample of data Y1 , Y2 , ..., Yn .
TERMINOLOGY : A statistic is a numerical quantity that can be calculated from a
sample of data. Some very common examples are:
Y
= sample mean
S 2 = sample variance
pb = sample proportion.
For example, with the battery lifetime data (a random sample of n = 50 lifetimes),
y = 1274.14 hours
s2 = 1505156 (hours)2
s 1226.85 hours.
PAGE 66
CHAPTER 4
Group of individuals
Numerical quantity
Status
Parameter
Unknown
Sample (Observed)
Statistic
4.3
NOTATION : To keep our discussion as general as possible (as the material in this subsection can be applied to many situations), we will let denote a population parameter.
For example, could denote a population mean, a population variance, a population
proportion, a Weibull or gamma model parameter, etc. It could also denote a
parameter in a regression context (Chapter 6-7).
PAGE 67
CHAPTER 4
TERMINOLOGY : A point estimator b is a statistic that is used to estimate a population parameter . Common examples of point estimators are:
Y
S 2
S
CHAPTER 4
GOAL: Not only do we desire to use point estimators b which are unbiased, but we would
also like for them to have small variability. In other words, when b misses , we would
like for it to not miss by much. This deals with precision.
MAIN POINT : Accuracy and precision are the two main mathematical characteristics
b We desire point estimators
that arise when evaluating the quality of a point estimator .
b which are unbiased (perfectly accurate) and have small variance (highly precise).
TERMINOLOGY : The standard error of a point estimator b is equal to
b
b
se() = var().
In other words, the standard error is equal to the standard deviation of the sampling
b An estimators standard error measures the amount of variability in
distribution of .
b Therefore,
the point estimator .
b b more precise.
smaller se()
4.4
2
= .
se(Y ) = var(Y ) =
n
n
PAGE 69
CHAPTER 4
f(y)
Population distribution
Sample mean, n=5
Sample mean, n=25
0.0
0.5
1.0
1.5
2.0
2.5
3.0
CHAPTER 4
This distribution describes the values of Y we would expect to see in repeated sampling,
that is, if we repeatedly sampled n = 5 individuals from this population of in-trac
drivers and calculated the sample mean Y each time.
Question. Suppose that we take a random sample of n = 25 drivers with times
Y1 , Y2 , ..., Y25 . What is the distribution of the sample mean Y ?
Solution. If the sample size is n = 25, then with = 1.5 and 2 = 0.16, we have
(
)
2
Y N ,
= Y N (1.5, 0.0064).
n
The sampling distribution of Y when n = 5 and when n = 25 is shown in Figure 4.2.
4.4.1
(
)
2
Y AN ,
.
n
The symbol AN is read approximately normal. This result is called the Central Limit
Theorem (CLT).
Result 1 guarantees that when the underlying population distribution is N (, 2 ),
the sample mean
(
)
2
Y N ,
.
n
The Central Limit Theorem (Result 2) says that even if the population distribution
is not normal (Guassian), the sampling distribution of the sample mean Y will be
approximately normal (Gaussian) when the sample size is suciently large.
Example 4.3. The time to death for rats injected with a toxic substance, denoted by
Y (measured in days), follows an exponential distribution with = 1/5. That is,
Y exponential( = 1/5).
PAGE 71
0.3
0.4
CHAPTER 4
0.2
0.0
0.1
Population distribution
Sample mean, n=5
Sample mean, n=25
10
15
20
Figure 4.3: Rat death times. Population distribution: Y exponential( = 1/5). Also
depicted are the sampling distributions of Y when n = 5 and n = 25.
This is the population distribution, that is, this distribution describes the time to
death for all rats in the population.
In Figure 4.3, I have shown the exponential(1/5) population distribution (solid
curve). I have also depicted the theoretical sampling distributions of Y when n = 5
and when n = 25.
Main point: Notice how the sampling distribution of Y begins to (albeit distantly)
resemble a normal distribution when n = 5. When n = 25, the sampling distribution of Y looks very much to be normal (Gaussian). This is precisely what is
conferred by the CLT. The larger the sample size n, the better a normal (Gaussian)
distribution approximates the true sampling distribution of Y .
PAGE 72
CHAPTER 4
Example 4.4. When a batch of a certain chemical product is prepared, the amount of
a particular impurity in the batch (measured in grams) is a random variable Y with the
following population parameters:
= 4.0g
2 = (1.5g)2 .
Suppose that n = 50 batches are prepared (independently). What is the probability that
the sample mean impurity amount Y is greater than 4.2 grams?
Solution. With n = 50, = 4, and 2 = (1.5)2 , the CLT says that
)
(
2
Y AN ,
= Y AN (4, 0.045).
n
Therefore,
P (Y > 4.2) = 1 P (Y < 4.2)
1-pnorm(4.2,4,sqrt(0.045)) = 0.1728893.
Important: Note that in making this (approximate) probability calculation, we never
made an assumption about the underlying population distribution shape.
4.4.2
t distribution
Y
N (0, 1).
/ n
Replacing the population standard deviation with the sample standard deviation S,
we get a new sampling distribution:
t=
Y
t(n 1),
S/ n
0.4
CHAPTER 4
0.2
0.0
0.1
f(y)
0.3
N(0,1)
t(2)
t(10)
Figure 4.4: Probability density functions of N (0, 1), t(2), and t(10).
CHAPTER 4
pt(t,)
qt(p,)
Example 4.5. Hollow pipes are to be used in an electrical wiring project. In testing
1-inch pipes, the data below were collected by a design engineer. The data are measurements of Y , the outside diameter of this type of pipe (measured in inches). These
n = 25 pipes were randomly selected and measuredall in the same location.
1.296
1.320
1.311
1.298
1.315
1.305
1.278
1.294
1.311
1.290
1.284
1.287
1.289
1.292
1.301
1.298
1.287
1.302
1.304
1.301
1.313
1.315
1.306
1.289
1.291
From their extensive experience, the manufacturers of this pipe claim that the population
distribution is normal (Gaussian) and that the mean outside diameter is = 1.29 inches.
Under this assumption (which may or may not be true), calculate the value of
t=
y
.
s/ n
Solution. We use R to nd the sample mean y and the sample standard deviation s:
1.29908 1.29
4.096.
0.01108272/ 25
PAGE 75
0.2
0.0
0.1
f(t)
0.3
0.4
CHAPTER 4
Figure 4.5: t(24) probability density function. An at t = 4.096 has been added.
Analysis. If the manufacturers claim is true (that is, if = 1.29 inches), then
t=
s/ n
comes from a t(24) distribution. The t(24) pdf is displayed above in Figure 4.5.
Key question: Does t = 4.096 seem like a value you would expect to see from this
distribution? If not, what might this suggest? Recall that t was computed under the
assumption that = 1.29 inches (the manufacturers claim).
Question. The value t = 4.096 is what percentile of the t(24) distribution?
> pt(4.096,24)
[1] 0.9997934
Answer: t = 4.096 is approximately the 99.98th percentile of the t(24) distribution.
PAGE 76
CHAPTER 4
4.4.3
Y
t(n 1).
S/ n
An obvious question therefore arises: What if Y1 , Y2 , ..., Yn are non-normal (i.e., nonGaussian)?
Answer: The t distribution result still approximately holds, even if the underlying
population distribution is not perfectly normal. The approximation improves when
the sample size is larger
the population distribution is more symmetric (not highly skewed).
Because the normality assumption (for the population distribution) is not absolutely
critical for this t sampling distribution result to hold, we say that Result 3 is robust to
the normality assumption.
REMARK : Robustness is a nice property; it assures us that the underlying assumption
of normality is not an absolute requirement for the t distribution result to hold. Other
sampling distribution results (coming up) are not always robust to normality departures.
TERMINOLOGY : Just as we used Weibull qq plots to assess the Weibull model assumption in the last chapter, we can use a normal quantile-quantile (qq) plot to assess the
normal distribution assumption. The plot is constructed as follows:
On the vertical axis, we plot the observed data, ordered from low to high.
On the horizontal axis, we plot the (ordered) theoretical quantiles from the distribution (model) assumed for the observed data (here, normal).
ILLUSTRATION : Figure 4.6 shows the normal qq plot for the pipe diameter data in
Example 4.5. The ordered data do not match up perfectly with the normal quantiles,
but the plot doesnt set o any serious alarms (insofar as a departure from normality is
concerned).
PAGE 77
1.30
1.28
1.29
Sample Quantiles
1.31
1.32
CHAPTER 4
Theoretical Quantiles
Figure 4.6: Normal qq plot for the pipe diameter data in Example 4.5. The observed
data are plotted versus the theoretical quantiles from a normal distribution. The line
added passes through the rst and third theoretical quartiles.
4.5
4.5.1
SETTING: To get things started, we will assume that Y1 , Y2 , ..., Yn is a random sample
from a N (, 2 ) population distribution. We will assume that
the population variance 2 is known (largely unrealistic).
the goal is to estimate the population mean .
PAGE 78
CHAPTER 4
(
)
2
Y N ,
n
PAGE 79
0.2
0.0
0.1
f(z)
0.3
0.4
CHAPTER 4
Figure 4.7: N (0, 1) pdf. The upper 0.025 and lower 0.025 areas have been shaded. The
associated quantiles are z0.025 1.96 and z0.025 1.96, respectively.
and therefore
Z=
Y
N (0, 1).
/ n
We now introduce new notation that identies quantiles from this distribution.
NOTATION : Let z/2 denote the upper /2 quantile from the N (0, 1) distribution.
Because the N (0, 1) distribution is symmetric about z = 0, we know that z/2 is the
lower /2 quantile. For example, if = 0.05 (see Figure 4.7), we know that
z0.05/2 = z0.025 1.96
z0.05/2 = z0.025 1.96.
To see where these values come from, you can use Table 1 (VK, pp 593-594) or R:
PAGE 80
CHAPTER 4
(
)
Y z/2 , Y + z/2
n
n
a 100(1) percent condence interval for the population mean . This is sometimes
written (more succinctly) as
Y z/2 .
n
Note the form of the interval:
point estimate quantile standard
error} .
|
{z
| {z }
|
{z
}
z/2
/ n
Many condence intervals we will study follow this same general form.
Here is how we interpret this interval: We say
We are 100(1 ) percent condent that the population mean is in
this interval.
PAGE 81
CHAPTER 4
Unfortunately, the word condent does not mean probability. The term condence in condence interval means that if we were able to sample from the population over and over again, each time computing a 100(1 ) percent condence
interval for , then 100(1 ) percent of the intervals we would compute would
contain the population mean .
That is, condence refers to long term behavior of many intervals; not probability for a single interval. Because of this, we call 100(1 ) the condence
level. Typical condence levels are
90 percent ( = 0.10)
= z0.05 1.645
95 percent ( = 0.05)
= z0.025 1.96
99 percent ( = 0.01)
= z0.005 2.33.
Y z/2
n
is equal to
2z/2 .
n
Therefore,
the larger the sample size n, the smaller the interval length.
the larger the population variance 2 , the larger the interval length.
the larger the condence level 100(1 ), the larger the interval length.
Clearly, shorter condence intervals are preferred. They are more informative!
Example 4.6. Civil engineers have found that the ability to see and read a sign at night
depends in part on its surround luminance; i.e., the light intensity near the sign. The
data below are n = 30 measurements of the random variable Y , the surround luminance
(in candela per m2 ). The 30 measurements constitute a random sample from all signs in
a large metropolitan area.
PAGE 82
CHAPTER 4
10.9
1.7
9.5
2.9
9.1
3.2
9.1
7.4
13.3
13.1
6.6
13.7
1.5
6.3
7.4
9.9
13.6
17.3
3.6
4.9
13.1
7.8
10.3
10.3
9.6
5.7
2.6
15.1
2.9
16.2
Based on past experience, the engineers assume a normal population distribution (for
the population of all signs) with known population variance 2 = 20.
Question. Find a 90 percent condence interval for , the mean surround luminance.
Solution. We rst use R to calculate the sample mean y:
> mean(intensity) ## sample mean
[1] 8.62
For a 90 percent condence level; i.e., with = 0.10, we use
z0.10/2 = z0.05 1.645.
This can be determined from Table 1 (VK, pp 593-594) or from R:
> qnorm(0.95,0,1) ## z_{0.05} ## upper 0.05 quantile
[1] 1.644854
With n = 30 and 2 = 20, a 90 percent condence interval for the mean surround
luminance is
y z/2
n
( )
20
= 8.62 1.645
30
Interpretation: We are 90 percent condent that the mean surround luminance for
all signs in the population is between 7.28 and 9.96 candela/m2 .
Further analysis: Recall that the engineers claimed that the population of all light
intensities is described by a normal (Gaussian) distribution. In Figure 4.8, we display a
normal qq plot to check this assumption. The qq plot might cause some concern about
the normality assumption. There is some mild evidence of disagreement with the normal
model (although with a small sample, it is always hard to be sure).
PAGE 83
10
5
Sample Quantiles
15
CHAPTER 4
Theoretical Quantiles
Figure 4.8: Normal qq plot for the light intensity data in Example 4.6. The observed
data are plotted versus the theoretical quantiles from a normal distribution. The line
added passes through the rst and third theoretical quartiles.
Is this a serious cause for concern?: Probably not. Recall that even if the population
distribution (here, the distribution of all light intensity measurements in the city) is not
perfectly normal, we still have
(
)
2
Y AN ,
,
n
for n large, by the Central Limit Theorem. Therefore, our 90 percent condence interval
is still approximately valid. A sample of size n = 30 is pretty large. In other words,
at n = 30, the CLT approximation above is usually kicking in rather well unless the
underlying population distribution is grossly skewed (and I mean very grossly). This is
not the case here.
PAGE 84
CHAPTER 4
4.5.2
Y
N (0, 1).
/ n
Y z/2 .
n
The obvious problem is that, because 2 is now unknown, we can not calculate the
interval. Not to worry; we just need a dierent starting point. Recall that
t=
Y
t(n 1),
S/ n
where S is the sample standard deviation (a point estimator for the population standard
deviation). This result is all we need; in fact, it is straightforward to reproduce the
known 2 derivation and tailor it to this (now more realistic) case. A 100(1 )
percent condence interval for is given by
S
Y tn1,/2 .
n
The symbol tn1,/2 denotes the upper /2 quantile from a t distribution with = n1
degrees of freedom. This value can be easily obtained from R. For those of you that like
probability tables, VK tables the t distributions in Table 2 (pp 595).
PAGE 85
CHAPTER 4
tn1,/2
S/ n
Example 4.7. Acute exposure to cadmium produces respiratory distress and kidney and
liver damage (and possibly death). For this reason, the level of airborne cadmium dust
and cadmium oxide fume in the air, denoted by Y (measured in milligrams of cadmium
per m3 of air), is closely monitored. A random sample of n = 35 measurements from a
large factory are given below:
0.044 0.030
0.052
0.044
0.046
0.020
0.066
0.052 0.049
0.030
0.040
0.045
0.039
0.039
0.039 0.057
0.050
0.056
0.061
0.042
0.055
0.037 0.062
0.062
0.070
0.061
0.061
0.058
0.053 0.060
0.047
0.051
0.054
0.042
0.051
Based on past experience, engineers assume a normal population distribution (for the
population of all cadmium measurements).
Question. Find a 99 percent condence interval for , the mean level of airborne
cadmium.
Solution. We rst use R to calculate the sample mean y and the sample standard
deviation s:
> mean(cadmium) ## sample mean
[1] 0.04928571
> sd(cadmium) ## sample standard deviation
[1] 0.0110894
PAGE 86
CHAPTER 4
(
= 0.049 2.728
0.011
35
)
= (0.044, 0.054) mg/m3 .
Interpretation: We are 99 percent condent that the population mean level of airborne
cadmium is between 0.044 and 0.054 mg/m3 .
NOTE : It is possible to implement the t interval procedure entirely in R:
Further analysis: Recall that the engineers claimed that the population of all airborne
cadmium concentrations is described by a normal (Gaussian) distribution. In Figure
4.9, we display the normal qq plot to check this assumption. The qq plot looks pretty
supportive of the normal assumption, although there is mild evidence of a slight departure
in the upper tail.
ROBUSTNESS : The t condence interval is based on the population distribution being
normal (Gaussian). However, this interval is robust to departures from normality; i.e.,
even if the population distribution is non-normal (non-Gaussian), we can still use the t
condence interval and get approximately valid results.
PAGE 87
0.05
0.04
0.02
0.03
Sample Quantiles
0.06
0.07
CHAPTER 4
Theoretical Quantiles
Figure 4.9: Normal qq plot for the cadmium data in Example 4.7. The observed data
are plotted versus the theoretical quantiles from a normal distribution. The line added
passes through the rst and third theoretical quartiles.
4.5.3
CHAPTER 4
(
Y z/2
|
.
n
{z
}
=B, say
(
B = z/2
)
n =
(z
/2
)2
.
(z
/2
)2
=
1.96 8
2
)2
> qnorm(0.975,0,1)
[1] 1.959964
PAGE 89
61.46.
CHAPTER 4
4.6
SITUATION : We now switch gears and focus on a new parameter: the population
proportion p. This parameter emerges when the characteristic we measure on each
individual is binary (i.e., only 2 outcomes possible). Here are some examples:
p = proportion of defective circuit boards
p = proportion of customers who are satised
p = proportion of payments received on time
p = proportion of HIV positives in SC.
To start our discussion, we need to recall the Bernoulli trial assumptions for each
individual in the sample:
1. each individual results in a success or a failure,
2. the individuals are independent, and
3. the probability of success, denoted by p, 0 < p < 1, is the same for every
individual.
In our examples above,
success circuit board defective
success customer satised
success payment received on time
success HIV positive individual.
RECALL: If the individual success/failure statuses in the sample adhere to the Bernoulli
trial assumptions, then
Y = the number of successes out of n sampled individuals
follows a binomial distribution, that is, Y b(n, p). The statistical problem at hand is
to use the information in Y to estimate p.
PAGE 90
CHAPTER 4
Y
,
n
the sample proportion. This statistic is simply the proportion of successes in the
sample (out of n individuals).
PROPERTIES : Fairly simple arguments can be used to show the following results:
E(b
p) = p
se(b
p) =
p(1 p)
.
n
The rst result says that the sample proportion pb is an unbiased estimator of the
population proportion p. The second (standard error) result quanties the precision of pb
as an estimator of p.
SAMPLING DISTRIBUTION : Knowing the sampling distribution of pb is critical if we
are going to formalize statistical inference procedures for p. In this situation, we appeal
to an approximate result (conferred by the CLT) which says that
[
]
p(1 p)
pb AN p,
,
n
when the sample size n is large.
RESULT : An approximate 100(1 ) percent condence interval for p is given by
pb(1 pb)
pb z/2
.
n
This interval should be used only when the sample size n is large. A common
rule of thumb (to use this interval formula) is to require
nb
p 5
n(1 pb) 5.
Under these conditions, the CLT should adequately approximate the true sampling
distribution of pb, thereby making the condence interval formula above approximately valid.
PAGE 91
CHAPTER 4
z/2
p
b(1b
p)
n
Example 4.9. One source of water pollution is gasoline leakage from underground
storage tanks. In Pennsylvania, a random sample of n = 74 gasoline stations is selected
and the tanks are inspected; 10 stations are found to have at least one leaking tank.
Question. Calculate a 95 percent condence interval for p, the population proportion
of gasoline stations with at least one leaking tank.
Solution. In this situation, we interpret
gasoline station = individual trial
at least one leaking tank = success
p = population proportion of stations with at least one leaking tank.
For 95 percent condence, we need z0.05/2 = z0.025 1.96. The sample proportion of
stations with at least one leaking tank is
pb =
10
0.135.
74
0.135(1 0.135)
0.135 1.96
= (0.057, 0.213).
74
Interpretation: We are 95 percent condent that the population proportion of stations
in Pennsylvania with at least one leaking tank is between 0.057 and 0.213.
PAGE 92
CHAPTER 4
10
74
= 10
(
)
10
n(1 pb) = 74 1
= 64.
74
Both of these are larger than 5 = we can feel comfortable in using this interval formula.
QUESTION : Suppose that we would like to write a 100(1) percent condence interval
for p, a population proportion. We know that
pb(1 pb)
pb z/2
n
is an approximate 100(1) percent condence interval for p. What sample size n should
we use?
SAMPLE SIZE DETERMINATION : To determine the necessary sample size, we rst
need to specify two pieces of information:
the condence level 100(1 )
the margin of error:
B = z/2
pb(1 pb)
.
n
A small problem arises. Note that B depends on pb. Unfortunately, pb can only be
calculated once we know the sample size n. We overcome this problem by replacing pb
with p0 , an a priori guess at its value. The last expression becomes
p0 (1 p0 )
.
B = z/2
n
Solving this equation for n, we get
n=
(z
/2
)2
p0 (1 p0 ).
This is the desired sample size n to nd a 100(1 ) percent condence interval for p
with a prescribed margin of error (roughly) equal to B. I say roughly, because there
may be additional uncertainty arising from our use of p0 (our best guess).
PAGE 93
CHAPTER 4
(z
/2
)2
p0 (1 p0 ),
1.96
0.02
1.96
0.02
1.96
0.02
)2
0.05(1 0.05) 457
)2
0.10(1 0.10) 865
)2
0.50(1 0.50) 2401.
As we can see, the guessed value of p0 has a substantial impact on the nal sample
size calculation.
PAGE 94
CHAPTER 4
4.7
MOTIVATION : In many situations, one is concerned not with the mean of an underlying
(continuous) population distribution, but with the variance 2 instead. If 2 is excessively
large, this could point to a potential problem with a manufacturing process, for example,
where there is too much variation in the measurements produced. In a laboratory setting,
chemical engineers might wish to estimate the variance 2 attached to a measurement
system (e.g., scale, caliper, etc.). In eld trials, agronomists are often interested in
comparing the variability levels for dierent cultivars or genetically-altered varieties. In
clinical trials, physicians are often concerned if there are substantial dierences in the
variation levels of patient responses at dierent clinic sites.
NEW RESULT : Suppose that Y1 , Y2 , ..., Yn is a random sample from a N (, 2 ) distribution. The quantity
Q=
(n 1)S 2
2 (n 1),
2
pchisq(q,)
qchisq(p,)
NOTE : Table 3 (VK, pp 596-597) catalogues some quantiles from some 2 distributions.
PAGE 95
0.15
CHAPTER 4
0.00
0.05
f(q)
0.10
Chisquare; df = 5
Chisquare; df = 10
Chisquare; df = 20
10
20
30
40
PAGE 96
CHAPTER 4
1)S
(n 1)S 2
> 2 > 2
= P
2n1,/2
n1,1/2
[
]
2
(n 1)S 2
(n
1)S
= P
.
< 2 < 2
2n1,1/2
n1,/2
This argument shows that
(n 1)S 2 (n 1)S 2
,
2n1,1/2 2n1,/2
(n
,
2n1,1/2
1)S 2
(n
2n1,/2
1)S 2
PAGE 97
CHAPTER 4
the population mean time that it takes for a low-frequency sound to die out is
= 1.3 seconds
the population standard deviation for the distribution of die-out times is = 0.6
seconds.
Computer simulations of a preliminary design are conducted to see whether these standards are being met; here are data from n = 20 independently-run simulations. The data
are obtained on the time (in seconds) it takes for the low-frequency sound to die out.
1.34 2.56
1.28
2.25
1.84
2.35
0.77
1.84
1.80
2.44
0.86 1.29
0.12
1.87
0.71
2.08
0.71
0.30
0.54
1.48
Question. Find a 95 percent condence interval for the population standard deviation of
times . What does this interval suggest about whether the preliminary design conforms
to specications (with respect to variability)?
Solution. For 95 percent condence (i.e., using = 0.05), we need to use
2n1,/2 = 219,0.025 8.907
2n1,1/2 = 219,0.975 32.852.
I got these quantiles from R:
> qchisq(0.025,19) ## chi^2_{19,0.025}
[1] 8.906516
> qchisq(0.975,19) ## chi^2_{19,0.975}
[1] 32.85233
We also need to calculate the sample variance s2 . From R, we get s2 0.555.
> var(sounds) ## sample variance
[1] 0.554666
PAGE 98
1.5
1.0
0.5
Sample Quantiles
2.0
2.5
CHAPTER 4
Theoretical Quantiles
Figure 4.11: Normal qq plot for the swimming pool sound time data in Example 4.11.
The observed data are plotted versus the theoretical quantiles from a normal distribution.
The line added passes through the rst and third theoretical quartiles.
= (0.321, 1.184).
CHAPTER 4
specications are being met (with respect to the population standard deviation level).
Note that = 0.6 is contained in the condence interval for .
Major warning: Unlike the z and t condence intervals for a population mean ,
the 2 interval for 2 (and for ) is not robust to departures from normality. If the
underlying population distribution is non-normal (non-Guassian), then the condence
interval formulas for 2 and are not to be used. Therefore, it is very important to
check the normality assumption with these interval procedures (e.g., use a qq-plot).
Analysis: With only n = 20 measurements, it is somewhat hard to tell, but the qq-plot
in Figure 4.11 looks fairly straight. Small sample sizes make interpreting qq-plots more
dicult (e.g., the analyst may look for patterns that are not really there).
4.8
PAGE 100
CHAPTER 4
Sample 2 :
GOAL: Construct a 100(1) percent condence interval for the dierence of population
means 1 2 .
POINT ESTIMATORS : We dene the statistics
Y 1+
n1
1
=
Y1j = sample mean for sample 1
n1 j=1
Y 2+ =
n2
1
Y2j = sample mean for sample 2
n2 j=1
S12 =
1
1
(Y1j Y 1+ )2 = sample variance for sample 1
n1 1 j=1
S22 =
2
1
(Y2j Y 2+ )2 = sample variance for sample 2.
n2 1 j=1
4.8.1
GOAL: We want to write a condence interval for 1 2 , but how this interval is
constructed depends on the values of 12 and 22 . In particular, we consider two cases:
12 = 22 ; that is, the two population variances are equal
12 = 22 ; that is, the two population variances are not equal.
We rst consider the equal variance case. Addressing this case requires us to start with
the following (sampling) distribution result:
T =
(Y 1+ Y 2+ ) (1 2 )
(
t(n1 + n2 2),
)
1
1
2
Sp n1 + n2
where
Sp2 =
0.15
CHAPTER 4
0.00
0.05
f(y)
0.10
Distribution 1
Distribution 2
40
50
60
70
CHAPTER 4
interval quantiles will come from this t distribution; note that this distribution
depends on the sample sizes from both samples.
In particular, because T t(n1 + n2 2), we can nd the value tn1 +n2 2,/2 that
satises
P (tn1 +n2 2,/2 < T < tn1 +n2 2,/2 ) = 1 .
Substituting T into the last expression and performing algebraic manipulations, we
obtain
(Y 1+ Y 2+ ) tn1 +n2 2,/2
(
Sp2
)
1
1
+
.
n1 n2
quantile
| {z }
tn1 +n2 2,/2
Y 1+ Y 2+
standard
| {z error} .
Sp2
1
+ n1
n1
2
Example 4.12. In the vicinity of a nuclear power plant, environmental engineers from
the EPA would like to determine if there is a dierence between the mean weight in
sh (of the same species) from two locations. Independent samples are taken from each
location and the following weights (in ounces) are observed:
PAGE 103
20
0
10
30
40
CHAPTER 4
Location 1
Location 2
Location 1:
21.9 18.5
12.3
16.7
21.0
15.1
18.2
23.0
Location 2:
22.0 20.6
15.4
17.9
24.4
15.6
11.4
17.5
36.8
26.6
PAGE 104
CHAPTER 4
> t.test(loc.1,loc.2,conf.level=0.90,var.equal=TRUE)$conf.int
[1] -1.940438
7.760438
(
Sp2
1
1
+
n1 n2
4.8.2
PAGE 105
CHAPTER 4
S12 S22
(Y 1+ Y 2+ ) t,/2
+ ,
n1
n2
where the degrees of freedom is calculated as
)2
( 2
S1
S22
+ n2
n1
= ( S 2 )2 ( S 2 )2 .
1
n1
n1 1
2
n2
n2 1
Example 4.13. You are part of a recycling project that is examining how much paper is
being discarded (not recycled) by employees at two large plants. These data are obtained
on the amount of white paper thrown out per year by employees (data are in hundreds
of pounds). Samples of employees at each plant were randomly selected.
Plant 1:
Plant 2:
3.01 2.58
3.04
1.75
2.87
2.57
2.51
2.93
2.85
3.09
1.43 3.36
3.18
2.74
2.25
1.95
3.68
2.29
1.86
2.63
2.83 2.04
2.23
1.92
3.02
3.79 2.08
3.66
1.53
4.07
4.31
2.62
4.52
3.80
5.30
3.41 0.82
3.03
1.95
6.45
1.86
1.87
3.78
2.74
3.81
Question. Are there dierences in the mean amounts of white paper discarded by
employees at the two plants? Answer this question by nding a 95 percent condence
interval for the mean dierence 1 2 . Here, 1 (2 ) denotes the population mean
amount of white paper discarded per employee at Plant 1 (2).
Solution. In order to visually assess the equal variance assumption, we again use
PAGE 106
4
0
CHAPTER 4
Plant 1
Plant 2
Figure 4.14: Boxplots of discarded white paper amounts (in 100s lb) in Example 4.13.
boxplots to display the data in each sample; see Figure 4.14. For these data, the equal
variance assumption would be highly suspect; the spread in the distribution of Plant 2
values is much larger than that of Plant 1. We again use R to calculate the (unequal
variance) condence interval for 1 2 :
> t.test(plant.1,plant.2,conf.level=0.95,var.equal=FALSE)$conf.int
[1] -1.35825799 -0.01294201
CHAPTER 4
135.8 and 1.3 lb. This interval does not include 0. Therefore, we have evidence that
the population mean weights (1 and 2 ) at the two plants are dierent.
REMARK : In this subsection, we have presented two condence intervals for 1 2 . One
assumes 12 = 22 (equal variance assumption) and one that assumes 12 = 22 (unequal
variance assumption). If you are unsure about which interval to use, go with the
unequal variance interval. The penalty for using it when 12 = 22 is much smaller
than the penalty for using the equal variance interval when 12 = 22 .
4.9
Condence interval for the dierence of two population proportions p1 p2 : Independent samples
NOTE : We also can extend our condence interval procedure for a single population
proportion p to two populations. Dene
p1 = population proportion of successes in Population 1
p2 = population proportion of successes in Population 2.
For example, we might want to compare the proportion of
defective circuit boards for two dierent suppliers
satised customers before and after a product design change (e.g., Facebook, etc.)
on-time payments for two classes of customers
HIV positives for individuals in two demographic classes.
POINT ESTIMATORS : We assume that there are two independent random samples of
individuals (one sample from each population to be compared). Dene
Y1 = number of successes in Sample 1 (out of n1 individuals) b(n1 , p1 )
Y2 = number of successes in Sample 2 (out of n2 individuals) b(n2 , p2 ).
PAGE 108
CHAPTER 4
The point estimators for p1 and p2 are the sample proportions, dened by
Y1
n1
Y2
=
.
n2
pb1 =
pb2
GOAL: We would like to write a 100(1 ) percent condence interval for p1 p2 , the
dierence of two population proportions.
IMPORTANT : To accomplish this goal, we need the following distributional result.
When the sample sizes n1 and n2 are large,
(b
p1 pb2 ) (p1 p2 )
Z=
AN (0, 1).
p1 (1p1 )
p2 (1p2 )
+
n1
n2
If this sampling distribution holds approximately, then
z/2
standard
|
{z error} .
p
b1 (1b
p1 )
p
b (1b
p )
+ 2 n 2
n1
2
CHAPTER 4
Example 4.14. A programmable lighting control system is being designed. The purpose of the system is to reduce electricity consumption costs in buildings. The system
eventually will entail the use of a large number of transceivers (a device comprised of
both a transmitter and a receiver). Two types of transceivers are being considered. In
life testing, 200 transceivers (randomly selected) were tested for each type.
Transceiver 1:
Transceiver 2:
pb1 =
pb2
CHAPTER 4
Interpretation: We are 95 percent condent that the dierence of the population failure
rates for the two transceivers is between 0.025 and 0.085. Because this interval includes
0, we do not have sucient evidence that the two failure rates p1 and p2 are dierent.
CLT approximation check: We have
(
)
20
n1 pb1 = 200
= 20
200
)
(
20
= 180
n1 (1 pb1 ) = 200 1
200
(
n2 pb2 = 200
14
200
)
= 14
(
)
14
n2 (1 pb2 ) = 200 1
= 186.
200
All of these quantities are larger than 5 = we can feel comfortable in using this interval
formula.
4.10
Condence interval for the ratio of two population variances 22 /12 : Independent samples
IMPORTANCE : You will recall that when we wrote a condence interval for 1 2 ,
the dierence of the population means (with independent samples), we proposed two
dierent intervals:
one interval that assumed 12 = 22
one interval that assumed 12 = 22 .
We now propose a condence interval procedure that can be used to determine which
assumption is more appropriate. This condence interval is used to compare the population variances in two independent samples.
TWO-SAMPLE PROBLEM : Suppose that we have two independent samples:
Sample 1 :
Sample 2 :
GOAL: Construct a 100(1 ) percent condence interval for the ratio of population
variances 22 /12 .
PAGE 111
CHAPTER 4
S12 /12
F (n1 1, n2 1),
S22 /22
pf(q,1 , 2 )
qf(p,1 , 2 )
NOTATION : Let Fn1 1,n2 1,/2 and Fn1 1,n2 1,1/2 denote the lower and upper quantiles,
respectively, of the F (n1 1, n2 1) distribution; i.e., these values satisfy
P (Q < Fn1 1,n2 1,/2 ) = /2
P (Q > Fn1 1,n2 1,1/2 ) = /2,
respectively. Similar to the 2 distribution, the F distribution is not symmetric. Therefore, dierent notation is needed to identify the quantiles of F distributions.
DERIVATION : Because
Q=
S12 /12
F (n1 1, n2 1),
S22 /22
PAGE 112
0.6
0.8
CHAPTER 4
0.4
0.0
0.2
f(q)
F; df = 5,5
F; df = 5,10
F; df = 10,10
we can write
(
)
1 = P Fn1 1,n2 1,/2 < Q < Fn1 1,n2 1,1/2
(
)
S12 /12
= P Fn1 1,n2 1,/2 < 2 2 < Fn1 1,n2 1,1/2
S2 /2
( 2
)
S2
22
S22
= P
Fn1 1,n2 1,/2 < 2 < 2 Fn1 1,n2 1,1/2 .
S12
1
S1
This shows that
S22
S22
Fn1 1,n2 1,/2 , 2 Fn1 1,n2 1,1/2
S12
S1
is a 100(1 ) percent condence interval for the ratio of the population variances
22 /12 . We interpret the interval in the same way.
We are 100(1 ) percent condent that the ratio 22 /12 is in this interval.
PAGE 113
CHAPTER 4
If the condence interval for 22 /12 includes 1, this does not suggest that the variances 12 and 22 are dierent.
If the condence interval for 22 /12 does not include 1, it does.
Example 4.15. We consider again the recycling project in Example 4.13 that examined
the amount of white paper discarded per employee at two large plants. The data (presented in Example 4.13) were obtained on the amount of white paper thrown out per
year by employees (data are in hundreds of pounds). Samples of employees at each plant
(n1 = 25 and n2 = 20) were randomly selected. The boxplots in Figure 4.14 did suggest
that the population variances may be dierent.
Question. Find a 95 percent condence interval for 22 /12 , the ratio of the population
variances. Here, 12 (22 ) denotes the population variance of the amount of white paper
by employees at Plant 1 (Plant 2).
Solution. We use R to get the sample variances s21 and s22 :
> var(plant.1) ## sample variance (Plant 1)
[1] 0.3071923
> var(plant.2) ## sample variance (Plant 2)
[1] 1.878411
That is, s21 0.307 and s22 1.878. We also use R to get the necessary F quantiles:
> qf(0.025,24,19) ## F_{24,19,0.025} ## lower 0.025 quantile
[1] 0.4264113
> qf(0.975,24,19) ## F_{24,19,0.975} ## upper 0.025 quantile
[1] 2.452321
The lower bound of the a 95 percent condence interval for 22 /12 is
s22
F24,19,0.025 =
s21
1.878
0.4264113 = 2.607.
0.307
1.878
2.452321 = 14.995.
0.307
PAGE 114
4
3
Sample Quantiles
2.5
1.5
2.0
Sample Quantiles
3.0
3.5
CHAPTER 4
Theoretical Quantiles
Theoretical Quantiles
Figure 4.16: Normal quantile-quantile (qq) plots for employee recycle data for two plants
in Example 4.15.
Interpretation: We are 95 percent condent that the ratio of the population variances
22 /12 is between 2.607 and 14.995. This interval does not include 1. Therefore, we
have sucient evidence that the population variances (12 and 22 ) at the two plants are
dierent.
Discussion: This nding supports our use of the unequal-variance interval for 1 2
in Example 4.13. Some statisticians recommend to use this equal/unequal variance test
before deciding which condence interval to use for 1 2 . Some statisticians (including
your authors) do not.
Major warning: Like the 2 interval for single population variance 2 , the two-sample F
interval for the ratio of two variances is not robust to departures from normality. If the
underlying population distributions are non-normal (non-Guassian), then this interval
should not be used.
Discussion: Figure 4.16 (above) displays the normal qq plots for data from the two
plants. I am somewhat worried about the normal assumption for Plant 2.
PAGE 115
CHAPTER 4
4.11
Example 4.16. Creatine is an organic acid that helps to supply energy to cells in the
body, primarily muscle. Because of this, it is commonly used by those who are weight
training to gain muscle mass. Does it really work? Suppose that we are designing an
experiment involving USC male undergraduates who exercise/lift weights regularly.
Design 1 (Independent samples): Recruit 30 students who are representative of the
population of USC male undergraduates who exercise/lift weights. For a single weight
training session, we will
assign 15 students to take creatine.
assign 15 students an innocuous substance that looks like creatine (but has no
positive/negative eect on performance).
For each student, we will record
Y = maximum bench press weight (MBPW).
We will then have two samples of data (with n1 = 15 and n2 = 15):
Sample 1 (Creatine):
Sample 2 (Control):
PAGE 116
CHAPTER 4
or
(Y 1+ Y 2+ ) t,/2
S12 S22
+ ,
n1
n2
CHAPTER 4
Table 4.1: Creatine example. Sources of variation in the two independent sample and
matched pairs designs.
Design
Sources of Variation
Matched Pairs
within students
ADVANTAGE : When you remove extra variability, this enables you to do a better job at
comparing the two experimental conditions (treatments). By better job, I mean, you
can more precisely estimate the dierence between the treatments (excess variability
that naturally arises among individuals is not getting in the way). This gives you a better
chance of identifying a dierence between the treatments if one really exists.
NOTE : In matched pairs experiments, it is important to randomize the order in which
treatments are assigned. This may eliminate common patterns that may be seen when
always following, say, Treatment 1 with Treatment 2. In practice, the experimenter could
ip a fair coin to determine which treatment is applied rst.
IMPLEMENTATION : Data from matched pairs experiments are analyzed by examining
the dierence in responses of the two treatments. Specically, compute
Dj = Y1j Y2j ,
for each individual j = 1, 2, ..., n. After doing this, we have essentially created a one
sample problem, where our data are:
D1 , D2 , ..., Dn ,
the so-called data dierences. The one sample 100(1 ) percent condence interval
SD
D tn1,/2 ,
n
where D and SD are the sample mean and sample standard deviation of the dierences,
respectively, is an interval estimate for
D = mean dierence between the 2 treatments.
PAGE 118
CHAPTER 4
Table 4.2: Creatine data. Maximum bench press weight (in lbs) for creatine and control
treatments with 15 students. Note: These are not real data.
Student j
Creatine MBPW
Control MBPW
230
200
30
140
155
15
215
205
10
190
190
200
170
30
230
225
220
200
20
255
260
220
240
20
10
200
195
11
90
110
20
12
130
105
25
13
255
230
25
14
80
85
15
265
255
10
INTERPRETATION : The parameter D describes the dierence in means for the two
treatment groups. If there are no dierences between the two treatments, D = 0.
If the condence interval for D includes 0, this does not suggest that two treatment
means are dierent.
If the condence interval for D does not include 0, it does.
To analyze the creatine data, lets compute a 95 percent condence interval for D :
> t.test(diff,conf.level=0.95)$conf.int
[1] -3.227946 15.894612
Interpretation: We are 95 percent condent that the mean dierence MBPW weight
is between 3.2 and 15.9 lb. Because this interval includes 0, this does not suggest
that taking creating leads to a larger mean MBPW.
PAGE 119
CHAPTER 4
4.12
REVIEW : So far in this chapter, we have discussed condence intervals for a single
population mean and for the dierence of two population means 1 2 . When there
are two means, we have recently seen that the design of the experiment/study completely
determines how the data are to be analyzed.
When the two samples are independent, this is called a (two) independentsample design.
When the two samples are obtained on the same individuals (so that the samples
are dependent), this is called a matched pairs design.
Condence interval procedures for 1 2 depend on the design of the study.
TERMINOLOGY : More generally, the purpose of an experiment is to investigate differences between or among two or more treatments. In a statistical framework, we do
this by comparing the population means of the responses to each treatment.
In order to detect treatment mean dierences, we must try to control the eects
of error so that any variation we observe can be attributed to the eects of the
treatments rather than to dierences among the individuals.
BLOCKING: Designs involving meaningful grouping of individuals, that is, blocking,
can help reduce the eects of experimental error by identifying systematic components
of variation among individuals.
The matched pairs design for comparing two treatments is an example of such a
design. In this situation, individuals themselves are treated as blocks.
The analysis of data from experiments involving blocking will not be covered in this
course (see, e.g., STAT 506, STAT 525, and STAT 706). We focus herein on a simpler
setting, that is, a one-way classication model. This is an extension of the two
independent-sample design to more than two treatments.
PAGE 120
CHAPTER 4
48.06
38.27
38.88
42.74
49.62
PIM:
52.79
64.87
53.27
51.24
55.87
61.76
67.15
RM:
48.95 62.41
52.11
60.45
58.07
52.16
61.71
61.06
57.63
56.80
50.99
51.52
52.85
46.75
48.31
PCM:
PAGE 121
60
40
0
20
Strength (MPa)
80
100
CHAPTER 4
OCM
PIM
RM
PCM
Figure 4.17: Boxplots of strength measurements (MPa) for four mortar types.
In this example,
Treatment = mortar type (OCM, PIM, RM, and PCM). There are t = 4 treatment groups.
Individuals = mortar specimens
This is an example of an observational study; not an experiment. That is, we do
not physically apply a treatment here; instead, the mortar specimens are inherently
dierent to begin with. We simply take random samples of each mortar type.
QUERY : An initial question that we might have is the following:
Are the treatment (mortar type) population means equal? Or, are the treatment population means dierent?
PAGE 122
CHAPTER 4
This question can be answered by performing a hypothesis test, that is, by testing
H0 : 1 = 2 = 3 = 4
versus
Ha : the population means i are not all equal.
GOAL: We now develop a statistical procedure that allows us to test this type of hypothesis in a one-way classication model.
4.12.1
Overall F test
ni
1
=
Yij
ni j=1
i
1
=
(Yij Y i+ )2
ni 1 j=1
Si2
Y ++
ni
t
1
=
Yij .
N i=1 j=1
The statistics Y i+ and Si2 are simply the sample mean and sample variance, respectively, of the ith sample. The statistic Y ++ is the sample mean of all the data
(across all t treatment groups).
PAGE 123
CHAPTER 4
Sample 2:
..
.
Sample t:
The procedure we develop is formulated by deriving two estimators for 2 . These two
estimators are formed by (1) looking at the variance of the observations within samples,
and (2) looking at the variance of the sample means across the t samples.
WITHIN ESTIMATOR: To estimate 2 within samples, we take a weighted average
(weighted by the sample sizes) of the t sample variances; that is, we pool all variance
estimates together to form one estimate. Dene
SSres = (n1 1)S12 + (n2 1)S22 + + (nt 1)St2
ni
t
=
(Yij Y i+ )2 .
i=1 j=1
{z
(ni 1)Si2
PAGE 124
CHAPTER 4
SSres
.
N t
(
)
2
N ,
.
n
If H0 is true, then the t sample means Y 1+ , Y 2+ , ..., Y t+ are a random sample of size t
from a normal distribution with mean and variance 2 /n. The sample variance of this
random sample is given by
1
(Y i+ Y ++ )2
t 1 i=1
t
]
t
1
2
2
E
(Y i+ Y ++ ) = .
t 1 i=1
n
Therefore,
1
=
n(Y i+ Y ++ )2 ,
t 1 i=1
|
{z
}
t
M Strt
SStrt
CHAPTER 4
TERMINOLOGY : We call SStrt the treatment sums of squares and M Strt the treatment mean squares. M Strt is our second point estimator for 2 . Recall that M Strt
is an unbiased estimator of 2 only when H0 : 1 = 2 = = t is true (this is
important!). If we have dierent sample sizes, we simply adjust M Strt to
1
ni (Y i+ Y ++ )2 .
t 1 i=1
|
{z
}
t
M Strt =
SStrt
M Strt
1.
M Sres
When H0 is not true (i.e., the treatment means are dierent), then
E(M Strt ) > 2
E(M Sres ) = 2 .
These two facts suggest that when H0 is not true,
F =
M Strt
> 1.
M Sres
M Strt
F (t 1, N t).
M Sres
DECISION : We reject H0 and conclude the treatment population means are dierent
if the F statistic is far out in the right tail of the F (t 1, N t) distribution. Why?
Because a large value of F is not consistent with H0 being true! Large values of F (far
out in the right tail) are more consistent with Ha .
PAGE 126
0.4
0.0
0.2
0.6
CHAPTER 4
10
15
Figure 4.18: The F (3, 32) probability density function. This is the distribution of F in
Example 4.17 if H0 is true. An at F = 16.848 has been added.
MORTAR DATA: We now use R to calculate the F statistic for the strength/mortar
type data in Example 4.17.
> anova(lm(strength ~ mortar.type))
Analysis of Variance Table
Df
mortar.type
Residuals
3 1520.88
32
962.86
506.96
Pr(>F)
30.09
CONCLUSION : F = 16.848 is not an observation we would expect from the F (3, 32)
distribution (the distribution of F when H0 is true); see Figure 4.18. Therefore, we reject
H0 and conclude the population mean strengths for the four mortar types are dierent.
In other words, the evidence from the data suggests that Ha is true; not H0 .
PAGE 127
CHAPTER 4
df
SS
MS
Treatments
t1
SStrt
M Strt =
SStrt
t1
Residuals
N t
SSres
M Sres =
SSres
N t
Total
N 1
SStotal
F =
M Strt
M Sres
MORTAR DATA: For the strength/mortar type data in Example 4.17 (from the R output), we see that
p-value = 0.0000009576.
PAGE 128
CHAPTER 4
This is obviously quite small which suggests that we have an enormous amount of
evidence against H0 .
In this example, this p-value is calculated as the area to the right of F = 16.848 on
the F (3, 32) probability density function.
Therefore, this probability is interpreted as follows: If H0 is true, this is the
probability that we would get a test statistic equal to or larger than F = 16.848.
Since this is extremely unlikely (p-value = 0.0000009576), this strongly suggests
that H0 is not true.
P-VALUE RULES : Probability values are used in more general hypothesis test settings
in statistics (not just in one-way classication).
Q: How low does a p-value have to get before we reject H0 ?
A: Unfortunately, there is no right answer to this question. What is commonly done
is the following.
First choose a signicance level that is small. This represents the probability
that we will reject a true H0 , that is,
= P (Reject H0 |H0 true).
Common values of chosen beforehand are = 0.10, = 0.05 (the most common),
and = 0.01.
The smaller the is chosen to be, the more evidence one requires to reject H0 .
This is a true statement because of the following well-known decision rule:
p-value < = reject H0 .
Therefore, the value of chosen by the experimenter determines how low the pvalue must get before H0 is ultimately rejected.
For the strength/mortar type data, there is no ambiguity in our decision. For other
situations (e.g., p-value = 0.052), the decision may not be as clear cut.
PAGE 129
CHAPTER 4
4.12.2
RECALL: In a one-way classication, the overall F test is used to test the hypotheses:
H0 : 1 = 2 = = t
versus
Ha : the population means i are not all equal.
QUESTION : If we do reject H0 , have we really learned anything that is all that
relevant? All we have learned is that at least one of the population treatment means
is dierent. We have no idea which one(s) or how many. In this light, rejecting H0 is
largely an uninformative conclusion.
FOLLOW-UP ANALYSES : If H0 is rejected, that is, we conclude at least one population
treatment mean i is dierent, the obvious game becomes determining which one(s) and
how they are dierent. To do this, we will construct Tukey pairwise condence
intervals for all population treatment mean dierences i i , 1 i < i t. If there
are t treatments, then there are
( )
t
t(t 1)
=
2
2
pairwise condence intervals to construct. For example, with the strength/mortar type
example, there are t = 4 treatments and 6 pairwise intervals:
1 2 ,
1 3 ,
1 4 ,
2 3 ,
2 4 ,
3 4 ,
where
1 = population mean strength for mortar type OCM
2 = population mean strength for mortar type PIM
3 = population mean strength for mortar type RM
4 = population mean strength for mortar type PCM.
PROBLEM : If we construct multiple condence intervals (here, 6 of them), and if we
construct each one at the 100(1 ) percent condence level, then the overall condence
PAGE 130
CHAPTER 4
level for all 6 intervals will be less than 100(1 ) percent. In statistics, this is known
as the multiple comparisons problem.
GOAL: Construct condence intervals for all pairwise intervals i i , 1 i < i t,
and have our overall condence level still be at 100(1 ) percent.
SOLUTION : Simply increase the condence level associated with each individual interval! Tukeys method is designed to do this. Even better, R automates the construction
of Tukeys intervals. The intervals are of the form:
(Y i+ Y i + ) qt,N t,
M Sres
)
1
1
+
,
ni ni
where qt,N t, is the Tukey quantile which gives an overall condence level of 100(1)
percent (overall for the set of all possible pairwise intervals).
MORTAR DATA: For the strength/mortar type data in Example 4.17, the R output
below gives all pairwise intervals. Note that the overall condence level is 95 percent.
lwr
2.48000 -4.950955
upr
p adj
9.910955 0.8026758
PIM-OCM 15.21575
RM-OCM
12.99875
PIM-PCM 12.73575
RM-PCM
10.51875
RM-PIM
-2.21700 -8.863448
4.429448 0.8029266
ANALYSIS : In the R output, the columns labeled lwr and upr give, respectively, the
lower and upper limits of the pairwise condence intervals.
PAGE 131
CHAPTER 6
We are (at least) 95 percent condent that the dierence between the population
mean strengths for the PCM and OCM mortars is between 4.95 and 9.91 MPa.
Note that this condence interval includes 0, which suggests that these two
population means are not dierent.
An equivalent nding is that the adjusted p-value, given in the p adj column, is large.
We are (at least) 95 percent condent that the dierence between the population
mean strengths for the PIM and OCM mortars is between 8.17 and 22.27 MPa.
Note that this condence interval does not include 0, which suggests that
these two population means are dierent.
An equivalent nding is that the adjusted p-value, given in the p adj column, is small.
Interpretations for the remaining 4 condence intervals are formed similarly.
The main point is this:
If a pairwise condence interval (for two population means) includes 0, then
these population means are declared not to be dierent.
If a pairwise interval does not include 0, then the population means are
declared to be dierent.
The conclusions we make for all possible pairwise comparisons are at the
100(1 ) percent condence level.
Therefore, for the strength/mortar type data, the following pairs of population
means are declared to be dierent:
PIM-OCM RM-OCM PIM-PCM RM-PCM.
The following pairs of population means are declared to be not dierent:
PCM-OCM RM-PIM.
PAGE 132
CHAPTER 6
Linear regression
6.1
Introduction
that is, g is a linear function of 0 , 1 , ..., k . We call this a linear regression model.
The response variable Y is assumed to be random (but we do get to observe its
value).
The regression parameters 0 , 1 , ..., k are assumed to be xed and unknown.
The independent variables x1 , x2 , ..., xk are assumed to be xed (not random).
The error term is assumed to be random (and not observed).
PAGE 133
CHAPTER 6
= 0 + 1 x +
| {z }
g(x)
= 0 + 1 x + 2 x2 +
|
{z
}
g(x)
= 0 + 1 x1 + 2 x2 + 3 x1 x2 +.
{z
}
|
g(x1 ,x2 )
Main point: The term linear does not refer to the shape of the regression function g.
It refers to how the regression parameters 0 , 1 , ..., k enter the g function.
6.2
PAGE 134
CHAPTER 6
Example 6.1. As part of a waste removal project, a new compression machine for
processing sewage sludge is being studied. In particular, engineers are interested in the
following variables:
Y
Obs
125.3
77.9
11
159.5
79.9
98.2
76.8
12
145.8
79.0
201.4
81.5
13
75.1
76.7
147.3
79.8
14
151.4
78.2
145.9
78.2
15
144.2
79.5
124.7
78.3
16
125.0
78.1
112.2
77.5
17
198.8
81.5
120.2
77.0
18
132.5
77.0
161.2
80.1
19
159.6
79.0
10
178.9
80.2
20
110.7
78.6
Table 6.1: Sewage data. Moisture (Y , measured as a percentage) and machine ltration
rate (x, measured in kg-DS/m/hr). There are n = 20 observations.
Figure 6.1 displays the data in a scatterplot. This is the most common graphical display
for bivariate data like those seen above. From the plot, we see that
the variables Y and x are positively related, that is, an increase in x tends to be
associated with an increase in Y .
the variables Y and x are linearly related, although there is a large amount of
variation that is unexplained.
this is an example where a simple linear regression model may be adequate.
PAGE 135
80
79
77
78
Moisture (Percentage)
81
CHAPTER 6
80
100
120
140
160
180
200
6.2.1
CHAPTER 6
[Yi (0 + 1 xi )]2 .
i=1
Denote the least squares estimators by b0 and b1 , respectively, that is, the values of 0
and 1 that minimize Q(0 , 1 ). A two-variable minimization argument can be used to
nd closed-form expressions for b0 and b1 . Taking partial derivatives of Q(0 , 1 ), we
obtain
Q(0 , 1 )
set
= 2
(Yi 0 1 xi ) = 0
0
i=1
n
Q(0 , 1 )
set
= 2
(Yi 0 1 xi )xi = 0.
1
i=1
n
filtration.rate
72.95855
0.04103
From the output, we see the least squares estimates (to 3 dp) for the sewage data are
b0 = 72.959
b1 = 0.041.
PAGE 137
80
79
77
78
Moisture (Percentage)
81
CHAPTER 6
80
100
120
140
160
180
200
Therefore, the equation of the least squares line that relates moisture percentage Y to
the ltration rate x is
Yb = 72.959 + 0.041x,
or, in other words,
\ = 72.959 + 0.041 Filtration rate.
Moisture
NOTE : Your authors call the least squares line the prediction equation. This is
because we can predict the value of Y (moisture) for any value of x (ltration rate). For
example, when the ltration rate is x = 150 kg-DS/m/hr, we would predict the moisture
percentage to be
Yb (150) = 72.959 + 0.041(150) 79.109.
PAGE 138
CHAPTER 6
6.2.2
and
b1 N (1 , c11 2 ),
where
c00 =
1
x2
+
n SSxx
and
c11 =
1
.
SSxx
CHAPTER 6
6.2.3
INTERESTING: In the simple linear regression model (provided that the model includes
an intercept term 0 ), we have the following algebraic result:
n
i=1
ei =
(Yi Ybi ) = 0,
i=1
that is, the sum of the residuals (from a least squares t) is equal to zero.
PAGE 140
CHAPTER 6
Obs
Yb = b0 + b1 x
e = Y Yb
Obs
Yb = b0 + b1 x
e = Y Yb
125.3
77.9
78.100
0.200
11
159.5
79.9
79.503
0.397
98.2
76.8
76.988
0.188
12
145.8
79.0
78.941
0.059
201.4
81.5
81.223
0.277
13
75.1
76.7
76.040
0.660
147.3
79.8
79.003
0.797
14
151.4
78.2
79.171
0.971
145.9
78.2
78.945
0.745
15
144.2
79.5
78.876
0.624
124.7
78.3
78.075
0.225
16
125.0
78.1
78.088
0.012
112.2
77.5
77.563
0.062
17
198.8
81.5
81.116
0.384
120.2
77.0
77.891
0.891
18
132.5
77.0
78.396
1.396
161.2
80.1
79.573
0.527
19
159.6
79.0
79.508
0.508
10
178.9
80.2
80.299
0.099
20
110.7
78.6
77.501
1.099
Table 6.2: Sewage data. Fitted values and residuals from the least squares t.
SEWAGE DATA: In Table 6.2, I have used R to calculate the tted values and residuals
for each of the n = 20 observations in the sewage sludge data set.
TERMINOLOGY : We dene the residual sum of squares by
SSres
e2i
i=1
(Yi Ybi )2 .
i=1
SSres
n2
SSres
b = M Sres =
n2
estimates and is called the residual standard error.
SEWAGE DATA: For the sewage data in Example 6.1, we use R to calculate M Sres :
> fitted.values = predict(fit)
> residuals = moisture-fitted.values
> # Calculate MS_res
> sum(residuals^2)/18
[1] 0.4426659
PAGE 141
CHAPTER 6
b=
M Sres =
0.4426659 0.6653.
> summary(fit)
lm(formula = moisture ~ filtration.rate)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
filtration.rate
72.958547
0.041034
0.697528 104.596
0.004837
6.2.4
p-value: 1.052e-07
CHAPTER 6
CONFIDENCE INTERVAL FOR 1 : Under our model assumptions, the following sampling distribution arises:
b1 1
t=
t(n 2).
M Sres /SSxx
This result can be used to derive a 100(1 ) percent condence interval for 1 ,
which is given by
b1 tn2,/2
M Sres /SSxx .
The value tn2,/2 is the upper /2 quantile from the t(n 2) distribution.
Note the form of the interval:
point estimate quantile standard
| {z error} .
|
{z
}
| {z }
b1
tn2,/2
M Sres /SSxx
CHAPTER 6
> summary(fit)
lm(formula = moisture ~ filtration.rate)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
filtration.rate
72.958547
0.041034
0.697528 104.596
0.004837
ANALYSIS : Figure 6.3 shows the t(18) distribution, that is, the distribution of t when
H0 : 1 = 0 is true for the sewage sludge example. Clearly, t = 8.484 is not an expected
outcome from this distribution (p-value = 0.000000105)!! In other words, there is strong
evidence that the absorption rate is linearly related to machine ltration rate.
ANALYSIS : A 95 percent condence interval for 1 is calculated as follows:
b1 t18,0.025 se(b1 ) = 0.0410 2.1009(0.0048) = (0.0309, 0.0511).
We are 95 percent condent that population regression slope 1 is between 0.0309 and
0.0511. Note that this interval does not include 0.
> qt(0.975,18)
[1] 2.100922
PAGE 144
0.2
0.0
0.1
f(t)
0.3
0.4
CHAPTER 6
Figure 6.3: t(18) probability density function. An at t = 8.484 has been added.
6.2.5
PAGE 145
CHAPTER 6
probability distribution:
Y (x0 ) N (0 + 1 x0 , 2 ).
We might be interested in predicting a new response Y when x = x0 . This
predicted response is denoted by Y (x0 ). This value is a new outcome from
Y (x0 ) N (0 + 1 x0 , 2 ).
In the rst problem, we are interested in estimating the mean of the response variable Y
at a certain value of x. In the second problem, we are interested in predicting the value
of a new random variable Y at a certain value of x. Conceptually, the second problem is
far more dicult than the rst.
GOALS : We would like to create 100(1 ) percent intervals for the mean E(Y |x0 ) and
for the new value Y (x0 ). The former is called a condence interval (since it is for
a mean response) and the latter is called a prediction interval (since it is for a new
random variable).
POINT ESTIMATOR/PREDICTOR: To construct either interval, we start with the
same quantity:
Yb (x0 ) = b0 + b1 x0 ,
where b0 and b1 are the least squares estimates from the t of the model.
In the condence interval for E(Y |x0 ), we call Yb (x0 ) a point estimator.
In the prediction interval for Y (x0 ), we call Yb (x0 ) a point predictor.
The primary dierence in the intervals arises in assessing the variability of Yb (x0 ).
CONFIDENCE INTERVAL: A 100(1 ) percent condence interval for the mean
E(Y |x0 ) is given by
Yb (x0 ) tn2,/2
M Sres
PAGE 146
]
1 (x0 x)2
+
.
n
SSxx
CHAPTER 6
Yb (x0 ) tn2,/2
[
M Sres
]
1 (x0 x)2
1+ +
.
n
SSxx
COMPARISON : The two intervals are identical except for the extra 1 in the standard
error part of the prediction interval. This extra 1 arises from the additional uncertainty associated with predicting a new response from the N (0 + 1 x0 , 2 ) distribution.
Therefore, a 100(1 ) percent prediction interval for Y (x0 ) will be wider than the
corresponding 100(1 ) percent condence interval for E(Y |x0 ).
INTERVAL LENGTH : The length of both intervals clearly depends on the value of x0 .
In fact, the standard error of Yb (x0 ) will be smallest when x0 = x and will get larger
the farther x0 is from x in either direction. This implies that the precision with which
we estimate E(Y |x0 ) or predict Y (x0 ) decreases the farther we get away from x. This
makes intuitive sense, namely, we would expect to have the most condence in our
tted model near the center of the observed data.
TERMINOLOGY : It is sometimes desired to estimate E(Y |x0 ) or predict Y (x0 ) based
on the t of the model for values of x0 outside the range of x values used in the experiment/study. This is called extrapolation and can be very dangerous. In order for
our inferences to be valid, we must believe that the straight line relationship holds for x
values outside the range where we have observed data. In some situations, this may be
reasonable. In others, we may have no theoretical basis for making such a claim without
data to support it.
Example 6.1 (continued). In our sewage sludge example, suppose that we are interested
in estimating E(Y |x0 ) and predicting a new Y (x0 ) when the ltration rate is x0 = 150
kg-DS/m/hr.
E(Y |x0 ) denotes the mean moisture percentage for compressed pellets when the
machine ltration rate is x0 = 150 kg-DS/m/hr. In other words, if we were to repeat
PAGE 147
CHAPTER 6
the experiment over and over again, each time using a ltration rate of x0 = 150
kg-DS/m/hr, then E(Y |x0 ) denotes the mean value of Y (moisture percentage)
that would be observed.
Y (x0 ) denotes a possible value of Y for a single run of the machine when the
ltration rate is set at x0 = 150 kg-DS/m/hr.
R automates the calculation of condence and prediction intervals, as seen below.
> predict(fit,data.frame(filtration.rate=150),level=0.95,interval="confidence")
fit
lwr
upr
lwr
upr
PAGE 148
79
80
95% confidence
95% prediction
77
78
Moisture (Percentage)
81
CHAPTER 6
80
100
120
140
160
180
200
6.3
6.3.1
CHAPTER 6
x1
x2
xk
Y1
x11
x12
x1k
2
..
.
Y2
..
.
x21
..
.
x22
..
.
..
.
x2k
..
.
Yn
xn1
xn2
xnk
CHAPTER 6
6.3.2
Matrix representation
Y=
Y1
Y2
..
.
, X =
Yn
1 xn1 xn2
x1k
x2k
, =
..
..
.
.
xnk
1 x11 x12
1 x21 x22
.. ..
..
. .
.
0
1
2
..
.
, =
1
2
..
.
LEAST SQUARES : The notion of least squares is the same as it was in the simple linear
regression model. To t a multiple linear regression model, we want to nd the values of
0 , 1 , ..., k that minimize
Q(0 , 1 , ..., k ) =
i=1
CHAPTER 6
solution is
1
b = (X X) X Y =
b0
b1
b2
..
.
bk
This is the least squares estimator of . The tted regression model is
b = Xb,
Y
or, equivalently,
Ybi = b0 + b1 xi1 + b2 xi2 + + bk xik ,
for i = 1, 2, ..., n.
TECHNICAL NOTE : For the least squares estimator
b = (X X)1 X Y
to be unique, we need X to be of full column rank; i.e., r(X) = p = k + 1. This will
occur when there are no linear dependencies among the columns of X. If r(X) < p, then
X X does not have a unique inverse. In this case, the normal equations can not be solved
uniquely.
Example 6.2. The taste of matured cheese is related to the concentration of several
chemicals in the nal product. In a study from the LaTrobe Valley of Victoria, Australia, samples of cheddar cheese were analyzed for their chemical composition and were
PAGE 152
CHAPTER 6
subjected to taste tests. For each specimen, the taste Y was obtained by combining the
scores from several tasters. Data were collected on the following variables:
Y =
x1 =
x2 =
x3 =
Variables ACETIC and H2S were both measured on the log scale. The variable LACTIC
has not been transformed. Table 6.3 contains concentrations of the various chemicals in
n = 30 specimens of cheddar cheese and the observed taste score.
Specimen
TASTE
ACETIC
H2S
LACTIC
Specimen
TASTE
ACETIC
H2S
LACTIC
12.3
4.543
3.135
0.86
16
40.9
6.365
9.588
1.74
20.9
5.159
5.043
1.53
17
15.9
4.787
3.912
1.16
39.0
5.366
5.438
1.57
18
6.4
5.412
4.700
1.49
47.9
5.759
7.496
1.81
19
18.0
5.247
6.174
1.63
5.6
4.663
3.807
0.99
20
38.9
5.438
9.064
1.99
25.9
5.697
7.601
1.09
21
14.0
4.564
4.949
1.15
37.3
5.892
8.726
1.29
22
15.2
5.298
5.220
1.33
21.9
6.078
7.966
1.78
23
32.0
5.455
9.242
1.44
18.1
4.898
3.850
1.29
24
56.7
5.855
10.20
2.01
10
21.0
5.242
4.174
1.58
25
16.8
5.366
3.664
1.31
11
34.9
5.740
6.142
1.68
26
11.6
6.043
3.219
1.46
12
57.2
6.446
7.908
1.90
27
26.5
6.458
6.962
1.72
13
0.7
4.477
2.996
1.06
28
0.7
5.328
3.912
1.25
14
25.9
5.236
4.942
1.30
29
13.4
5.802
6.685
1.08
15
54.9
6.151
6.752
1.52
30
5.5
6.176
4.787
1.25
Table 6.3: Cheese data. ACETIC, H2S, and LACTIC are independent variables. The response variable is TASTE.
MODEL: Researchers postulate that each of the three chemical composition variables
x1 , x2 , and x3 is important in describing the taste and consider the multiple linear rePAGE 153
CHAPTER 6
gression model
Yi = 0 + 1 xi1 + 2 xi2 + 3 xi3 + i ,
for i = 1, 2, ..., 30. We now use R to t this model using the method of least squares:
> fit = lm(taste~acetic+h2s+lactic)
> fit
Coefficients:
(Intercept)
acetic
h2s
lactic
-28.877
0.328
3.912
19.670
b
28.877
0
b1
0.328
=
.
b=
b2
3.912
b3
19.670
Therefore, the tted least squares regression model is
Yb = 28.877 + 0.328x1 + 3.912x2 + 19.670x3 ,
or, in other words,
\ = 28.877 + 0.328 ACETIC + 3.912 H2S + 19.670 LACTIC.
TASTE
6.3.3
PAGE 154
CHAPTER 6
(Yi Ybi )2 =
i=1
e2i .
i=1
SSres
np
b = M Sres =
SSres
np
CHAPTER 6
6.3.4
6.3.5
(Yi Y )2 =
|i=1 {z
SStotal
(Ybi Y )2 +
|i=1 {z
|i=1
SSreg
(Yi Ybi )2 .
{z
SSres
SStotal is the total sum of squares. SStotal is the numerator of the sample variance
of Y1 , Y2 , ..., Yn . It measures the total variation in the response data.
SSreg is the regression sum of squares. SSreg measures the variation in the
response data explained by the linear regression model.
SSres is the residual sum of squares. SSres measures the variation in the response
data not explained by the linear regression model.
PAGE 156
CHAPTER 6
df
SS
MS
Regression
SSreg
M Sreg =
SSreg
k
Residual
np
SSres
M Sres =
SSres
np
Total
n1
SStotal
F =
M Sreg
M Sres
SSres
SSres
=
np
nk1
is an unbiased estimator of 2
The sum of squares (SS) also add down. This follows from the algebraic identity
noted earlier.
Mean squares (MS) are the sums of squares divided by their degrees of freedom.
The F statistic is formed by taking the ratio of M Sreg and M Sres . More on this in
a moment.
COEFFICIENT OF DETERMINATION : Since
SStotal = SSreg + SSres ,
PAGE 157
CHAPTER 6
the proportion of the total variation in the data explained by the linear regression model
is
SSreg
.
SStotal
This statistic is called the coecient of determination. Clearly,
R2 =
0 R2 1.
The larger the R2 , the better the regression model explains the variability in the data.
IMPORTANT : It is critical to understand what R2 does and does not measure. Its value
is computed under the assumption that the multiple linear regression model is correct
and assesses how much of the variation in the data may be attributed to that relationship
rather than to inherent variation.
If R2 is small, it may be that there is a lot of random inherent variation in the data,
so that, although the multiple linear regression model is reasonable, it can explain
only so much of the observed overall variation.
Alternatively, R2 may be close to 1; e.g., in a simple linear regression model t, but
this may not be the best model. In fact, R2 could be very high, but ultimately
not relevant because it assumes the simple linear regression model is correct. In
reality, a better model may exist (e.g., a quadratic model, etc.).
F STATISTIC : The F statistic in the ANOVA table is used to test
H0 : 1 = 2 = = k = 0
versus
Ha : at least one of the j is nonzero.
In other words, F tests whether or not at least one of the independent variables
x1 , x2 , ..., xk is important in describing the response Y . If H0 is rejected, we do not
know which one or how many of the j s are nonzero; only that at least one is.
SAMPLING DISTRIBUTION : When H0 : 1 = 2 = = k = 0 is true,
F =
M Sreg
F (k, n p).
M Sres
PAGE 158
CHAPTER 6
Mean
DF
Squares
Square
F Value
Pr > F
4994.50861
1664.83620
16.22
<.0001
Residual
26
2668.37806
102.62993
Corrected Total
29
7662.88667
Source
Regression
4994.51
SSreg
=
0.652.
SStotal
7662.89
Interpretation: About 65.2 percent of the variability in the taste data is explained by
the linear regression model that includes ACETIC, H2S, and LACTIC. The remaining 34.8
percent of the variability in the taste data is explained by other sources.
PAGE 159
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
CHAPTER 6
10
15
20
Figure 6.5: F (3, 26) probability density function. An at F = 16.22 has been added.
Sum Sq
Mean Sq
F value
Pr(>F)
acetic
2314.14
2314.14
22.5484
6.528e-05 ***
h2s
2147.11
2147.11
20.9209
0.0001035 ***
lactic
533.26
533.26
5.1959
26
2668.38
102.63
Residuals
0.0310870 *
CHAPTER 6
into sums of squares for each of the three independent variables ACETIC, H2S, and LACTIC,
as they are added sequentially to the model (these are called sequential sums of
squares). The sequential sums of squares for the independent variables add to the
SSreg for the model (up to rounding error), that is,
SSreg = 4994.51 = 2314.14 + 2147.11 + 533.26
= SS(ACETIC) + SS(H2S) + SS(LACTIC).
In words,
SS(ACETIC) is the sum of squares added when compared to a model that includes
only an intercept term.
SS(H2S) is the sum of squares added when compared to a model that includes an
intercept term and ACETIC.
SS(LACTIC) is the sum of squares added when compared to a model that includes
an intercept term, ACETIC, and H2S.
In other words, we can use the sequential sums of squares to assess the impact of adding
independent variables ACETIC, H2S, and LACTIC to the model in sequence.
6.3.6
b2 cjj ,
PAGE 161
CHAPTER 6
where
b2 = M Sres =
SSres
np
> summary(fit)
Call: lm(formula = taste ~ acetic + h2s + lactic)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
-28.877
19.735
-1.463
0.15540
acetic
0.328
4.460
0.074
0.94193
h2s
3.912
1.248
3.133
0.00425 **
19.670
8.629
2.279
0.03109 *
lactic
p-value: 3.810e-06
PAGE 162
CHAPTER 6
OUTPUT : The Estimate output gives the values of the least squares estimates:
b0 28.877
b1 0.328
b2 3.912
b3 19.670.
se(b
b 0 ) = c00
b2
b2
se(b
b 1 ) = c11
b2
se(b
b 2 ) = c22
se(b
b 3 ) = c33
b2
19.735 =
4.460 =
1.248 =
8.629 =
=
b2 (X X)1
00
=
b2 (X X)1
11
=
b2 (X X)1
22
=
b2 (X X)1
33 ,
where
b2 = M Sres =
SSres
= (10.13)2 102.63
30 4
is the square of the Residual standard error. The t value output gives the t statistics
b0 0
t = 1.463 =
c00
b2
b1 0
t = 0.074 =
c11
b2
b2 0
t = 3.133 =
c22
b2
b3 0
t = 2.279 =
.
c33
b2
These t statistics can be used to test H0 : i = 0 versus H0 : i = 0, for i = 0, 1, 2, 3.
Two-sided probability values are in Pr(>|t|). At the = 0.05 level,
we do not reject H0 : 0 = 0 (p-value = 0.155). Interpretation: In the model
which includes all three independent variables, the intercept term 0 is not statistically dierent from zero.
PAGE 163
CHAPTER 6
6.3.7
GOALS : We would like to create 100(1 ) percent intervals for the mean E(Y |x0 )
and for the new value Y (x0 ). As in the simple linear regression case, the former is
called a condence interval (since it is for a mean response) and the latter is called a
prediction interval (since it is for a new random variable).
CHEESE DATA: Suppose that we are interested estimating E(Y |x0 ) and predicting a
new Y (x0 ) when ACETIC = 5.5, H2S = 6.0, and LACTIC = 1.4, so that
5.5
x0 = 6.0 .
1.4
PAGE 164
CHAPTER 6
lwr
upr
lwr
upr
6.4
IMPORTANCE : We now discuss certain diagnostic techniques for linear regression. The
term diagnostics refers to the process of checking the model assumptions. This is an
important exercise because if the model assumptions are violated, then our analysis (and
all subsequent interpretations) could be compromised.
MODEL ASSUMPTIONS : We rst recall the model assumptions on the error terms in
the linear regression model
Yi = 0 + 1 xi1 + 2 xi2 + + k xik + i ,
for i = 1, 2, ..., n. Specically, we have made the following assumptions:
PAGE 165
CHAPTER 6
RESIDUAL PLOT : By the phrase residual plot, I mean the plot of the residuals (on
the vertical axis) versus the predicted values (on the horizontal axis). This plot is simply
the scatterplot of the residuals and the predicted values.
PAGE 166
10
0
10
Sample Quantiles
20
CHAPTER 6
Theoretical Quantiles
Figure 6.6: Cheese data. Normal qq-plot of the least squares residuals.
Advanced linear model arguments show that if the model does a good job at describing the data, then the residuals and tted values are independent.
This means that a plot of the residuals versus the tted values should reveal no
noticeable patterns; that is, the plot should appear to be random in nature (e.g.,
a random scatter of points).
On the other hand, if there are denite (non-random) patterns in the residual plot,
this suggests that the model is inadequate in some way or it could point to a
violation in the model assumptions.
The plot in Figure 6.7 does not suggest any obvious model inadequacies! It looks
completely random in appearance.
PAGE 167
10
Residuals
10
20
CHAPTER 6
10
20
30
40
50
Fitted values
Figure 6.7: Cheese data. Residual plot for the multiple linear regression model t. A
horizontal line at zero has been added.
COMMON VIOLATIONS : Although there are many ways to violate the statistical assumptions associated with linear regression, the most common violations are
non-constant variance (heteroscedasticity)
misspecifying the true regression function
correlated observations over time.
Example 6.3. An electric company is interested in modeling peak hour electricity
demand (Y ) as a function of total monthly energy usage (x). This is important for
planning purposes because the generating system must be large enough to meet the
maximum demand imposed by customers. Data for n = 53 residential customers for a
given month are shown in Figure 6.8.
PAGE 168
Residuals
10
0
15
CHAPTER 6
500
1000
1500
2000
2500
3000
3500
10
12
Fitted values
Figure 6.8: Electricity data. Left: Scatterplot of peak demand (Y , measured in kWh)
versus monthly usage (x, measured in kWh) with least squares simple linear regression
line superimposed. Right: Residual plot for the simple linear regression model t.
Problem: There is a clear problem with non-constant variance here. Note how the
residual plot fans out like the bell of a trumpet. This violation may have been missed
by looking at the scatterplot alone, but the residual plot highlights it.
Remedy: A common course of action to handle non-constant variance is to apply a
transformation to the response variable Y . Common transformations are logarithmic
0.5
Residuals
0.5
0.0
3.0
2.5
2.0
1.0
1.5
1.0
0.5
3.5
4.0
CHAPTER 6
500
1000
1500
2000
2500
3000
3500
1.0
1.5
2.0
2.5
3.0
3.5
4.0
Fitted values
Figure 6.9: Electricity data. Left: Scatterplot of the square root of peak demand ( Y )
versus monthly usage (x, measured in kWh) with the least squares simple linear regression
line superimposed. Right: Residual plot for the simple linear regression model t with
transformed response.
> fit.2
Coefficients:
(Intercept)
monthly.usage
0.580831
0.000953
ANALYSIS : Figure 6.9 above shows the scatterplot (left) and the residual plot (right)
from tting the transformed model. The fanning out shape that we saw previously (in
the untransformed model) is now largely absent. The tted transformed model is
c = 0.580831 + 0.000953x,
W
or, in other words,
Residuals
0.6
0.5
0.4
1.0
0.2
DC Output
1.5
0.0
2.0
0.2
CHAPTER 6
10
1.0
1.5
2.0
2.5
Fitted values
Figure 6.10: Windmill data. Left: Scatterplot of DC output Y versus wind velocity (x,
measured in mph) with least squares simple linear regression line superimposed. Right:
Residual plot for the simple linear regression model t.
Example 6.4. A research engineer is investigating the use of a windmill to generate
electricity. He has collected data on the direct current (DC) output Y from his windmill
and the corresponding wind velocity (x, measured in mph). Data for n = 25 observation
pairs are shown in Figure 6.10.
Problem: There is a clear quadratic relationship between DC output and wind velocity,
so a simple linear regression model t (as shown above) is inappropriate. The residual
plot shows a pronounced quadratic pattern; this pattern is not accounted for in tting a
straight line model.
Remedy: Fit a multiple linear regression model with two independent variables: wind
velocity x and its square x2 , that is, consider the quadratic regression model
Yi = 0 + 1 xi + 2 x2i + i ,
for i = 1, 2, ..., 25. It is straightforward to t a quadratic model in R. We simply regress
Y on x and x2 .
PAGE 171
0.0
0.2
0.5
0.1
1.0
Residuals
DC Output
1.5
0.1
2.0
0.2
CHAPTER 6
10
0.5
1.0
1.5
2.0
Fitted values
Figure 6.11: Windmill data. Scatterplot of DC output Y versus wind velocity (x, measured in mph) with least squares quadratic regression curve superimposed. Right: Residual plot for the quadratic regression model t.
> wind.velocity.sq = wind.velocity^2
> fit.2 = lm(DC.output ~ wind.velocity + wind.velocity.sq)
> fit.2
Coefficients:
(Intercept)
wind.velocity
wind.velocity.sq
-1.15590
0.72294
-0.03812
Residuals
0.2
0.0
0.2
0.4
0.4
CHAPTER 7
1900
1920
1940
1960
1980
2000
20
40
60
80
100
Year
Year
Figure 6.12: Global temperature data. Left: Time series plot of the temperature Y
measured one time per year. The independent variable x is year, measured as 1900,
1901, ..., 1997. A simple linear regression model t has been superimposed. Right:
Residual plot from the simple linear regression model t.
Example 6.5. The data in Figure 6.12 (left) are temperature readings (in deg C) on
land-air average temperature anomalies, collected once per year from 1900-1997. To
emphasize that the data are collected over time, I have used straight lines to connect the
observations; this is called a time series plot.
Unfortunately, it is all too common that people t linear regression models to time
series data and then blindly use them for prediction purposes.
It takes neither a meteorologist nor an engineering degree to know that temperature
observations collected over time are probably correlated. Not surprisingly, residuals
from a simple linear regression display clear correlation over time.
Regression techniques (as we have learned in this chapter) are generally not appropriate when analyzing time series data for this reason. More advanced modeling
techniques are needed.
PAGE 173
CHAPTER 7
Factorial Experiments
7.1
Introduction
REMARK : In engineering experiments, particularly those carried out in industrial settings, there are often several factors of interest and the goal is to assess the eects of
these factors on a continuous response Y (e.g., yield, lifetime, ll weights, etc.). A factorial treatment structure is an ecient way of dening treatments in these types of
experiments.
One example of a factorial treatment structure uses k factors, where each factor
has two levels. This is called a 2k factorial experiment.
Factorial experiments are often used in the early stages of experimental work. For
this reason, factorial experiments are also called factor screening experiments.
Example 7.1. A nickel-titanium alloy is used to make components for jet turbine
aircraft engines. Cracking is a potentially serious problem in the nal part, as it can lead
to nonrecoverable failure. A test is run at the parts producer to determine the eect of
k = 4 factors on cracks: pouring temperature (A), titanium content (B), heat treatment
method (C), and amount of grain rener used (D).
Factor A has 2 levels: low temperature and high temperature
Factor B has 2 levels: low content and high content
Factor C has 2 levels: Method 1 and Method 2
Factor D has 2 levels: low amount and high amount.
The response variable in the experiment is
Y = length of largest crack (in mm) induced in a piece of sample material.
PAGE 174
CHAPTER 7
NOTE : In this example, there are 4 factors, each with 2 levels. Thus, there are
2 2 2 2 = 24 = 16
dierent treatment combinations. These are listed here:
a1 b1 c1 d1
a1 b2 c1 d1
a2 b1 c1 d1
a2 b2 c1 d1
a1 b1 c1 d2
a1 b2 c1 d2
a2 b1 c1 d2
a2 b2 c1 d2
a1 b1 c2 d1
a1 b2 c2 d1
a2 b1 c2 d1
a2 b2 c2 d1
a1 b1 c2 d2
a1 b2 c2 d2
a2 b1 c2 d2
a2 b2 c2 d2
For example, the treatment combination a1 b1 c1 d1 holds each factor at its low level, the
treatment combination a1 b1 c2 d2 holds Factors A and B at their low level and Factors
C and D at their high level, and so on.
TERMINOLOGY : In a 2k factorial experiment, one replicate of the experiment uses 2k
runs, one at each of the 2k treatment combinations.
Therefore, in Example 7.1, one replicate of the experiment would require 16 runs
(one at each treatment combination listed above).
Two replicates would require 32 runs, three replicates would require 48 runs, and
so on.
TERMINOLOGY : There are dierent types of eects of interest in factorial experiments:
main eects and interaction eects. For example, in a 24 factorial experiment,
there is 1 eect that does not depend on any of the factors.
there are 4 main eects: A, B, C, and D.
there are 6 two-way interaction eects: AB, AC, AD, BC, BD, and CD.
there are 4 three-way interaction eects: ABC, ABD, ACD, and BCD.
there is 1 four-way interaction eect: ABCD.
PAGE 175
CHAPTER 7
(k )
0
(k )
1
(k )
2
(k )
3
k(k1)
2
Note that
( ) ( ) ( )
( )
k ( )
k
k
k
k
k
+
+
+ +
=
= 2k
0
1
2
k
j
j=0
(k ) (k )
(k )
and additionally that 0 , 1 , ..., k are the entries in the (k + 1)th row of Pascals
Triangle. Observe also that 2k grows quickly in size as k increases. For example, if there
are k = 10 factors (A, B, C, D, E, F, G, H, I, and J, say), then performing just one
replicate of the experiment would require 210 = 1024 runs! In real life, rarely would this
type of experiment be possible.
7.2
NOTE : We rst consider 2k factorial experiments where k = 2, that is, there are only
two factors, denoted by A and B. This is called a 22 experiment. We illustrate with an
agricultural example.
Example 7.2. Predicting corn yield prior to harvest is useful for making feed supply and
marketing decisions. Corn must have an adequate amount of nitrogen (Factor A) and
phosphorus (Factor B) for protable production and also for environmental concerns.
PAGE 176
CHAPTER 7
Yield (Y )
a1 b1
30
a1 b2
36
a2 b1
32
a2 b2
44
575
191.67
Residuals 16
320
20.00
Pr(>F)
PAGE 177
40
30
0
10
20
Yield (bushels/plot)
50
60
70
CHAPTER 7
a1b1
a1b2
a2b1
a2b2
Figure 7.1: Boxplots of corn yields (bushels/plot) for four treatment groups.
OMNIBUS CONCLUSION : The value F = 9.5833 is not what we would expect from an
F (3, 16) distribution, the distribution of F when
H0 : 11 = 12 = 21 = 22
is true (p-value 0.0007). Therefore, we would conclude that at least one of the factorial
treatment population means is dierent.
REMARK : As we have discussed before in one-way classication experiments, an overall
F test provides very little information. With a factorial treatment structure, it is possible
to explore the data further; in particular, we can learn about the main eects due to
nitrogen (Factor A) and due to phosphorus (Factor B). We can also learn about the
interaction between nitrogen and phosphorus.
PAGE 178
CHAPTER 7
PARTITION : Let us rst recall the treatment sum of squares from the one-way ANOVA:
SStrt = 575.
The way we learn more about specic eects is to partition SStrt into the following
pieces: SSA , SSB , and SSAB . By partition, I mean that we will write
SStrt = SSA + SSB + SSAB .
In words,
SSA is the sum of squares due to the main eect of A (nitrogen)
SSB is the sum of squares due to the main eect of B (phosphorus)
SSAB is the sum of squares due to the interaction eect of A and B (nitrogen and
phosphorus).
We can use R to write this partition in a richer ANOVA table (mathematical details
omitted):
> fit = lm(yield ~ nitrogen*phosphorus)
> anova(fit)
Analysis of Variance Table
Df Sum Sq Mean Sq F value
Pr(>F)
nitrogen
125
125
6.25 0.0236742 *
phosphorus
405
405
nitrogen:phosphorus
45
45
16
320
20
Residuals
2.25 0.1530877
The F statistics FA , FB , and FAB can be used to test for main eects and an interaction
eect, respectively. Each F statistic above has an F (1, 16) distribution under the assumption that the associated eect is zero. Small p-values (e.g., p-value < 0.05) indicate
that the eect is nonzero. Eects with large p-values can be treated as not signicant.
PAGE 179
0.0
0.2
0.4
0.6
0.8
1.0
CHAPTER 7
Figure 7.2: F (1, 16) probability density function. An at FAB = 2.25 has been added.
50
CHAPTER 7
phosphorus
40
35
30
20
25
45
b2
b1
a1
a2
nitrogen
Figure 7.3: Interaction plot for nitrogen and phosphorus in Example 7.2.
If Factors A and B do not interact at all, the interaction plot should display parallel
lines. That is, the eect of one factor stays constant across the levels of the other
factor. This is essentially what it means to have no interaction.
If the interaction plot displays a departure from parallelism (including an overwhelming case where the lines intersect), then this is visual evidence of interaction.
That is, the eect of one factor depends on the levels of the other factor.
The F test that uses FAB provides numerical evidence of interaction. The interaction plot provides visual evidence.
CONCLUSION : We do not have strong evidence that nitrogen and phosphorus interact.
The FAB statistic is not signicant and the interaction plot does not show a substantial
departure from parallelism.
PAGE 181
CHAPTER 7
GENERAL STRATEGY : The following are guidelines for analyzing data from 22 factorial experiments. Start by looking at whether or not the interaction contribution is
signicant. This can be done by using an interaction plot and an F test that uses FAB .
If the interaction is signicant, then formal analysis of main eects is not all
that meaningful because their interpretations depend on the interaction. In this
situation, the best approach is to just redo the entire analysis as a one-way ANOVA
with 4 treatments. Tukey pairwise condence intervals can help you formulate an
ordering among the 4 treatment population means.
If the interaction is not signicant, I prefer to ret the model without the
interaction term present and then examine the main eects. This can be done
numerically by examining the sizes of FA and FB , respectively.
ANALYSIS : Here is the ANOVA table for the corn yield data, leaving out the nitrogen/phosphorus interaction term:
> fit = lm(yield ~ nitrogen + phosphorus)
> anova(fit)
Analysis of Variance Table
Df Sum Sq Mean Sq F value
Pr(>F)
nitrogen
125
125.00
5.8219 0.027403 *
phosphorus
405
Residuals
17
365
21.47
Comparing this to the ANOVA table with interaction, note that the interaction sum of
squares, SSAB = 45, has now been absorbed into the residual sum of squares.
The main eect of nitrogen (Factor A) is signicant in describing yield (FA =
5.8219, p-value 0.0274).
The main eect of phosphorus (Factor B) is strongly signicant in describing yield
(FB = 18.8630, p-value = 0.0004).
PAGE 182
45
40
35
Yield (bushels/plot)
20
25
30
35
30
20
25
Yield (bushels/plot)
40
45
50
50
CHAPTER 7
Nitrogen.1
Nitrogen.2
Phosphorus.1
Phosphorus.2
Figure 7.4: Left: Side by side boxplots for nitrogen (Factor A). Right: Side by side
boxplots for phosphorus (Factor B).
CONFIDENCE INTERVALS : A 95 percent condence interval for A1 A2 , the dierence in means for the two levels of nitrogen (Factor A) is given by
)
(
1
1
(Y A1 Y A2 ) t17,0.025 M Sres
+
.
10 10
A 95 percent condence interval for B1 B2 , the dierence in means for the two levels
of phosphorus (Factor B) is given by
(Y B1 Y B2 ) t17,0.025
M Sres
)
1
1
+
.
10 10
95% CI for B1 B2 :
Note that neither of these intervals includes zero. This is expected because FA and FB
are both signicant at the = 0.05 level.
PAGE 183
CHAPTER 7
REGRESSION : In Example 7.2, there were two levels of nitrogen (a1 = 10 and a2 = 15)
and two levels of phosphorus (b1 = 2 and b2 = 4) used in the experiment. These levels,
which we generically called low and high when analyzing the data using ANOVA,
are actually numerical in nature (measured in pounds per plot). In this light, there is
nothing to prevent us from tting the following multiple linear regression model
using the numerical values of nitrogen and phosphorus:
Yi = 0 + 1 xi1 + 2 xi2 + i ,
where the independent variables
x1 = nitrogen amount (10 or 15 pounds)
x2 = phosphorus amount (2 or 4 pounds).
Doing so in R gives the following output:
9.5000
6.1297
1.550 0.139599
nitrogen
1.0000
0.4144
2.413 0.027403 *
phosphorus
4.5000
1.0361
p-value: 0.0004886
9.5
b
b = b1 = 1.0 .
4.5
b2
PAGE 184
CHAPTER 7
Therefore, the tted least squares regression model for the corn yield data is
Yb = 9.5 + 1.0x1 + 4.5x2 ,
or, in other words,
\ = 9.5 + 1.0 NITROGEN + 4.5 PHOSPHORUS.
YIELD
This equation can subsequently be used make predictions about future yields based on
given values of nitrogen and phosphorus. In doing so, be careful about extrapolation;
for example, you would not want to make a prediction when x1 = 25 and x2 = 10. These
values are not representative of those used in the actual experiment, so this model may
not be a good description of yield for these values of nitrogen and phosphorus.
INTERESTING: I have below constructed the analysis of variance table for the multiple
linear regression t:
> anova(fit)
Analysis of Variance Table
Response: yield
Df Sum Sq Mean Sq F value
Pr(>F)
nitrogen
125
125.00
5.8219 0.027403 *
phosphorus
405
Residuals
17
365
21.47
You will note that this table is identical to the two-way ANOVA table (without interaction) on pp 182 (notes). This is no coincidence! In fact, the two-way ANOVA model
(without interaction) and the multiple linear regression model
Yi = 0 + 1 xi1 + 2 xi2 + i
are actually identical models. Therefore, tting each one gives the same analysis and the
same conclusions.
PAGE 185
CHAPTER 7
7.3
Example 7.3. A chemical product is produced in a pressure vessel. A factorial experiment is carried out to study the factors thought to inuence the ltration rate of this
product. The four factors are temperature (A), pressure (B), concentration of formaldehyde (C) and stirring rate (D). Each factor is present at two levels (e.g., low and
high). A 24 experiment is performed with one replication; the data are shown below.
Factor
Filtration rate
Run
Run label
(Y , gal/hr)
a1 b1 c1 d1
45
a2 b1 c1 d1
71
a1 b2 c1 d1
48
a2 b2 c1 d1
65
a1 b1 c2 d1
68
a2 b1 c2 d1
60
a1 b2 c2 d1
80
a2 b2 c2 d1
65
a1 b1 c1 d2
43
10
a2 b1 c1 d2
100
11
a1 b2 c1 d2
45
12
a2 b2 c1 d2
104
13
a1 b1 c2 d2
75
14
a2 b1 c2 d2
86
15
a1 b2 c2 d2
70
16
a2 b2 c2 d2
96
PAGE 186
CHAPTER 7
NOTE : In this experiment, there are k = 4 factors, so there are 15 eects to estimate:
the 4 main eects: A, B, C, and D
the 6 two-way interactions: AB, AC, AD, BC, BD, and CD
the 4 three-way interactions: ABC, ABD, ACD, BCD
the 1 four-way interaction: ABCD.
In this 24 experiment, we have 16 values of Y and 15 eects to estimate. This means that
only one Y observation is left, but this observation is used to estimate the overall mean.
This leaves us with no observations (and therefore no degrees of freedom) to perform
statistical tests. This is an obvious problem! Why? Because we have no way to judge
which main eects are signicant, and we cannot learn about how these factors interact.
TERMINOLOGY : A single replicate of a 2k factorial experiment is called an unreplicated factorial. With only one replicate, as in Example 7.3, there is no internal error
estimate, so we cannot perform statistical tests to judge signicance. What do we do?
One approach to the analysis of an unreplicated factorial is to assume that certain
higher-order interactions are negligible and then combine their mean squares to
estimate the error.
This is an appeal to the sparsity of eects principle; that is, most systems
are dominated by some of the main eects and low-order interactions and most
high-order interactions are negligible.
To learn about which eects may be negligible, we can t the full ANOVA model
and obtain the SS attached to each of these 15 eects (see next page).
Eects with large SS can be retained. Eects with small SS can be discarded.
A smaller model with only the large eects can then be t. This smaller model
will have an error estimate formed by taking all of the eects with small SS and
combining them together.
PAGE 187
CHAPTER 7
1 1870.56 1870.56
39.06
39.06
390.06
390.06
855.56
855.56
A:B
0.06
0.06
A:C
1 1314.06 1314.06
B:C
A:D
1 1105.56 1105.56
B:D
0.56
0.56
C:D
5.06
5.06
A:B:C
14.06
14.06
A:B:D
68.06
68.06
A:C:D
10.56
10.56
B:C:D
27.56
27.56
A:B:C:D
7.56
7.56
Residuals
0.00
22.56
22.56
d2
d1
80
70
40
40
50
60
70
90
Factor.D
c1
c2
60
80
Factor.C
50
90
100
100
CHAPTER 7
a1
a2
a1
Factor.A
a2
Factor.A
Figure 7.5: Left: Interaction plot for temperature (Factor A) and concentration of
formaldehyde (Factor C). Right: Interaction plot for temperature (Factor A) and stirring
rate (Factor D).
> # Fit smaller model
> fit = lm(filtration ~ A + C + D + A:C + A:D)
> anova(fit)
Analysis of Variance Table
Df
Pr(>F)
1 1870.56 1870.56
390.06
390.06
19.990
855.56
855.56
A:C
1 1314.06 1314.06
A:D
1 1105.56 1105.56
Residuals 10
195.13
0.001195 **
19.51
NOTE : It is clear that these ve eects are each signicant (note that the p-values are
all very close to zero). Interaction plots for temperature (Factor A) and concentration of
formaldehyde (Factor C) and temperature (Factor A) and stirring rate (Factor D) are in
Figure 7.5. These plots depict the strong pairwise interaction that exists.
PAGE 189
CHAPTER 7
REGRESSION : In Example 7.3, there were no numerical values attached to the levels
of temperature (Factor A), concentration of formaldehyde (Factor C), and stirring rate
(Factor D). Therefore, if we wanted to t a regression model (e.g., for prediction purposes,
etc.), we can use the following variables with arbitrary numerical codings assigned:
x1 = temperature (1 = low; 1 = high)
x2 = concentration of formaldehyde (1 = low; 1 = high)
x3 = stirring rate (1 = low; 1 = high).
With these values, we can t the multiple linear regression model
Yi = 0 + 1 xi1 + 2 xi2 + 3 xi3 + 4 xi1 xi2 + 5 xi1 xi3 + i .
Doing so in R gives the following output:
> fit = lm(filtration ~ temp + conc + stir.rate + temp:conc + temp:stir.rate)
> summary(fit)
Estimate Std. Error t value Pr(>|t|)
(Intercept)
70.062
1.104
temp
10.812
1.104
conc
4.938
1.104
4.471
stir.rate
7.313
1.104
temp:conc
-9.062
1.104
8.312
1.104
temp:stir.rate
0.00120 **
p-value: 5.14e-07
70.062
b
b1 10.812
b2 4.938
.
b=
=
b3 7.313
b4 9.062
8.312
b5
PAGE 190
CHAPTER 7
Therefore, the tted least squares regression model for the ltration rate data is
Yb = 70.062 + 10.812x1 + 4.938x2 + 7.313x3 9.062x1 x2 + 8.312x1 x3
or, in other words,
F[
ILT = 70.062+10.812 TEMP+4.938 CONC+7.313 STIR9.062 TEMP*CONC+8.312 TEMP*STIR
This tted regression model can be used to write condence intervals or prediction intervals for future values of temperature, concentration, and stirring rate.
ANOVA TABLE : I have constructed the analysis of variance table for the multiple linear
regression t in this example:
> anova(fit)
Analysis of Variance Table
Df
Pr(>F)
temp
1 1870.56 1870.56
conc
390.06
390.06
19.990
stir.rate
855.56
855.56
temp:conc
1 1314.06 1314.06
temp:stir.rate
1 1105.56 1105.56
Residuals
10
195.13
0.001195 **
19.51
Note that this table is identical to the ANOVA table on pp 189 (notes).
REMARK : In this chapter, we have only just scratched the surface when it comes
to discussing factorial treatment structures. Specialized courses in experimental design,
such as STAT 506, would delve into more advanced designs and analysis techniques.
For example, a design that often arises in industrial experiments is that of running
a 2k factorial experiment in less than 2k runs; these are called fractional factorial
experiments. In these experiments, the engineer acknowledges a priori that the highest
order interactions are negligible and the goal is to assess main eects and lower order
interactions only. For those who are interested, Section 7.3 (VK) introduces this concept.
An illustrative example is Example 7.7 on pp 499 (VK).
PAGE 191