You are on page 1of 36

13- 1

Chapter

Thirteen

McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.


13- 2

Chapter Thirteen
Linear Regression and Correlation
GOALS
When you have completed this chapter, you will be able to:
ONE
Draw a scatter diagram.
TWO
Understand and interpret the terms dependent variable and
independent variable.
THREE
Calculate and interpret the coefficient of correlation, the coefficient
of determination, and the standard error of estimate.
FOUR
Conduct a test of hypothesis to determine if the population
coefficient of correlation is different from zero.
Goals
13- 3

Chapter Thirteen continued

Linear Regression and Correlation


GOALS
When you have completed this chapter, you will be able to:

FIVE
Calculate the least squares regression line and interpret the slope
and intercept values.
SIX
Construct and interpret a confidence interval and prediction
interval for the dependent variable.
SEVEN
Set up and interpret an ANOVA table.

Goals
13- 4

Correlation Analysis is a group of statistical techniques to


measure the association between two variables.
Advertising Minutes and $ Sales

A Scatter Diagram 30

Sales ($thousands)
25

is a chart that portrays 20


15
10

the relationship between 5


0

two variables. 70 90 110 130


Advertising Minutes
150 170 190

The Independent
The Dependent Variable
is the variable being Variable provides the
predicted or estimated. basis for estimation. It
is the predictor variable.
Correlation Analysis
13- 5

The Coefficient of Correlation (r) is a measure of the


strength of the relationship between two variables.
Also called Pearson’s r and It requires interval or ratio-
Pearson’s product moment
scaled data.
correlation coefficient.
It can range from Pearson's r
-1.00 to 1.00.
Values of -1.00 or 1.00
indicate perfect and strong
correlation. -1 0 1
Negative values indicate an Values close to 0.0 indicate
inverse relationship and weak correlation.
positive values indicate a
direct relationship. The Coefficient of Correlation, r
13- 6

10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X

Perfect Negative Correlation


13- 7

10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X

Perfect Positive Correlation


13- 8

10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X

Zero Correlation
13- 9

10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X

Strong Positive Correlation


13- 10

We calculate the coefficient of correlation from the


following formula.

r= S(X – X)(Y – Y)
(n-1)sxsy

Formula for r
13- 11

The coefficient of determination (r2) is the


proportion of the total variation in the dependent variable
(Y) that is explained or accounted for by the variation in
the independent variable (X).

It is the square of the coefficient of correlation.


It ranges from 0 to 1.
It does not give any information on the direction
of the relationship between the variables.

Coefficient of Determination
13- 12
Dan Ireland, the student body
president at Toledo State
University, is concerned about
the cost to students of
textbooks. He believes there is
a relationship between the
number of pages in the text and
the selling price of the book.
To provide insight into the
problem he selects a sample of
eight textbooks currently on
sale in the bookstore. Draw a
scatter diagram. Compute the
correlation coefficient. Example 1
13- 13
Book Page Price($)
Introduction to History 500 84
Basic Algebra 700 75
Introduction to Psychology 800 99
Introduction to Sociology 600 72
Business Management 400 69
Introduction to Biology 500 81
Fundamentals of Jazz 600 63
Principles of Nursing 800 93

Example 1 continued
13- 14

Scatter Diagram of Number of Pages and Selling Price of Text

100

90
Price ($)

80

70

60
400 500 600 700 800
Page

Example 1 continued
13- 15

Page Price

Mean 625 Mean 79.50


Standard Error 49 Standard Error 4.32
Median 600 Median 78
Mode 600 Mode #N/A
E Standard Deviation 139 Standard Deviation 12.21
X Sample Variance 19,286 Sample Variance 149.14
C Kurtosis -0.55 Kurtosis -0.77
E Skewness -0.16 Skewness 0.40
L Range 400 Range 36
Minimum 400 Minimum 63
Maximum 800 Maximum 99
Sum 5,000 Sum 636
Count 8 Count 8

Example 1 continued
13- 16

S(X – X)(Y – Y)
(a) (b) (c) (d) (c)*(d)
Page Price Page - Mean(Page) Price-Mean(Price) .
500 84 -125 4.5 (562.5)
700 75 75 -4.5 (337.5)
800 99 175 19.5 3,412.5
600 72 -25 -7.5 187.5
400 69 -225 -10.5 2,362.5
600 81 -25 1.5 (37.5)
600 63 -25 -16.5 412.5
800 93 175 13.5 2,362.5
7,800.0

Example 1 continued
13- 17

r= S(X – X)(Y – Y)
(n-1)sxsy
7800
= 7(138.87)(12.21)

= .657 r2 = .6572 = .432

The correlation between the number of pages and the


selling price of the book is 0.657. This indicates a
moderate association between the variable.

Example 1 continued
13- 18

Did a computed r come from a


population of paired observations
with zero correlation?

Ho: r = 0 (The correlation in the population is zero.)


H1: r ≠ 0 (The correlation in the population is
different from zero.

t test for the t = r  n- 2 With n-2 d.f.


coefficient of  1- r2
correlation
T-test of significance of r
13- 19

Computed r = .657. Test the hypothesis that


there is no correlation in the population. Use a
.02 significance level.

Step 1 Step 2
H0: the correlation in the Significance
population is zero. level is .02.
H1:The correlation in the
population is not zero.
Step 4
H0 is rejected if
Step 3
t>3.143 or if t<-3.143
The statistic to
use follows the or if p < .02. There are
t distribution. 6 degrees of freedom,
found by n – 2 = 8 – 2 = 6.
13- 20
Step 5
Find the value of the
test statistic.

t = r  n- 22 = .657  8 – 2 = 2.135
 1- r  1 - .6572
p(t > 2.135) = .077

H0 is not rejected. We cannot


reject the hypothesis that there is
no correlation in the population.
The amount of association could
be due to chance. Example 1 continued
13- 21

In Regression Analysis we use the independent


variable (X) to estimate the dependent variable (Y).

The relationship Both variables must


between the be at least interval
variables is linear. scale.

The least squares criterion


is used to determine the
equation. That is the term
S(Y – Y’)2 is minimized.
Regression Analysis
The regression equation is Y’= a + bX
13- 22

where
Y’ is the average predicted value of Y for any X.

a is the Y-intercept.
It is the estimated Y value when X=0

b is the slope of the line, or the average change


in Y’ for each change of one unit in X

The least squares principle is used to obtain a


and b.

Regression Analysis
13- 23

The least squares principle is used to obtain a and


b. The equations to determine a and b are:

sy
b=r
sx a = Y – bX

Regression Analysis
13- 24

Develop a regression sy
equation for the b=r
information given in
sx
example 1 that can be = (.657) 12.21
used to estimate the 138.87
selling price based on
= .0578
the number of pages.

a = Y – bX = 79.5 - .0578*625 = 43.39

Example 1 revisited
13- 25

The regression equation


The sign of the b
is:
value and the sign
Y’ = 43.39 + .0578X
of r will always be
the same.

The slope of the line is


.0578. Each addition The equation crosses
page costs about a the Y-axis at $43.39. A
nickel. book with no pages
would cost $43.39.
Example 1 revisited
13- 26

We can use the


regression equation
to estimate values
of Y.

The estimated selling price of an


800 page book is $89.61, found by

Price = $43.39 + .0578(Number of Pages)


= $43.39 + .0578(800)
= $89.61

Example 1 revisited
13- 27

The Standard Error of Estimate measures the


scatter, or dispersion, of the observed values around the
line of regression

The formula
that is used to
compute the
standard error:
sy x =
  (Y-Y')2
n-2

The Standard Error of Estimate


13- 28

Find the standard error of estimate for the problem involving


the number of pages in a book and the selling price.

Actual price
(Y)
Estimated
price
(Y') (Y-Y')
Deviation
Deviation Squared
(Y-Y')
2
sy x =
  (Y-Y')2
n-2
84 72.28 11.72 137.41


75 83.83 -8.83 78.03 593.33
99 89.61 9.39 88.15 =
72 78.06 -6.06 36.67
8-2
69 66.50 2.50 6.25
81 78.06 2.94 8.67
63 78.06 -15.06 226.67 = 9.944
93 89.61 3.39 11.48
0.00 593.33

Example 1 revisited
13- 29

Assumptions Underlying Linear Regression


For each value of X, there is a group of Y values, and
these Y values are normally distributed.

The Y values are statistically independent. This means


that in the selection of a sample, the Y values chosen for
a particular X value do not depend on the Y values for
any other X values.

The means of these normal The standard deviations


distributions of Y values all of these normal
lie on the straight line of distributions are the same.
regression.
13- 30

The confidence interval for the mean value of Y for a


given value of X is given by:

Y' + t(sy x)
1 + (X-X)2
n  (X-X)2
Y’is the predicted value for any selected X value
X is an selected value of X
X is the mean of the Xs
n is the number of observations
Sy.x is the standard error of the estimate
t is the value of t at n-2 degrees of freedom
Confidence Interval
13- 31

For our earlier price estimate of $89.61, the


confidence interval, assuming a desired 95%
confidence, is calculated as follows.

Page - Mean(Page) (Page - Mean(Page))2


-125 15625
75 5625
175 30625
-25 625
-225 50625
-25 625
-25 625
175 30625
135000

Example 1
Y’ the predicted value, is $89.61 13- 32

X is 800 pages
X is 625, the mean of the pages
n is 8, the number of observations
Sy.x is 9.944, the standard error of the estimate
t is 2.447 at 8-2 degrees of freedom and 95% confidence

89.61 + 9.944(2.447)  1
8
+
(800 - 625)2
135,000

=89.61 + 14.433


Y' + t(sy x)
1 + (X-X)2
n  (X-X)2
Example 1 revisited
13- 33

The prediction interval for an individual value


of Y for a given value of X

Y' + t(sy x)
 1+
1 + (X-X)2
n  (X-X)2

1 (800 - 625)2
89.61 + 9.944(2.447) 1 + 8 + 135,000
=89.61 + 28.292

Prediction Interval
13- 34

Summarizing The Results


 The estimated selling price for a book with 800 pages
is $89.61.
 The standard error of estimate is $9.94.

 The 95 percent confidence interval for all books with


800 pages is $89.61 + $14.43. This means the limits
are between $75.18 and $104.04.
 The 95 percent prediction interval for a particular
book with 800 pages is $89.61+ $28.29. The means
the limits are between $61.32 and $117.90.
 These results appear in the following Minitab and
Excel outputs.
Example 1 revisited
13- 35

Regression Analysis
The regression equation is
Price = 43.4 + 0.0578 No of Pages
M
Predictor Coef StDev T P I
Constant 43.39 17.28 2.51 0.046 N
No of Pages 0.05778 0.02706 2.13 0.077
I
S = 9.944 R-Sq = 43.2% R-Sq(adj) = 33.7% T
A
Analysis of Variance
Source DF SS MS F P
B
Regression 1 450.67 450.67 4.56 0.077
Error 6 593.33 98.89
Total 7 1044.00

Fit StDev Fit 95.0% CI 95.0% PI


89.61 5.90 ( 75.17, 104.05) ( 61.31, 117.91)

Example 1 revisited
13- 36

Regression Statistics
Multiple R 0.657
R Square 0.432
Adjusted R Square 0.337
E
Standard Error 9.944

Observations 8
X
C
ANOVA E
Significance
df SS MS F F L
Regression 1 450.67 450.67 4.5573034 0.0767
Residual 6 593.33 98.89
Total 7 1044

Standard
Coefficients Error t Stat P-value
Intercept 43.3889 17.277 2.511 0.0458193
Page 0.0578 0.027 2.135 0.0767009

Example 1 revisited

You might also like