Multiple Regression - Statistical Method For Psychology

PY206: Statistical Methods for Psychology Multiple Regression
Multiple Regression: Part I

z Howell: Chapter 15
z Last updated: 9/27/04
Slide 1 Luiz Pessoa, Brown University © 2004

Multiple regression
z In simple regression we have a single predictor
Yˆ = b0 + b1 X 1
z In multiple regression, several explanatory variables are
employed simultaneously
Yˆ = b0 + b1 X 1 + b2 X 2 + L + bp X p

Example
z Course evaluation: 5-point scale
1. Overall quality of lectures (Overall)
2. Teaching skills of the instructor (Teach)
3. Quality of tests and exams (Exam)
4. Instructor’s knowledge of the subject matter (Knowledge)
5. Student’s expected grade (Grade; F=1, A=5)
6. Enrollment (Enroll)
Consider Overall as the dependent measure and let’s try

to predict it based on the other five variables

Example
z Let’s consider the pattern of pairwise correlations
Overall Teach Exam Knowl Grade Enroll

Overall 1 0.804 0.596 0.682 0.301 -0.24
Teach 1 0.72 0.526 0.469 -0.451
Exam 1 0.451 0.61 -0.558
Knowl 1 0.224 -0.128
Grade 1 -0.337
Enroll 1
z Teach, Knowledge, and Exam have highest correlation

with Overall

Example
z Some variables have relatively high correlations with
each other
Exam and Teach: r = 0.72
Exam and Grade: r = 0.61
Exam and Enrollment: r = –0.558
z Collinearity
When other variables are included as predictors, Exam will have
little to offer in the way of explaining variability in Overall

Example
z Fitted regression equation:
Yˆ = −1.195 + 0.763Teach + 0.132Exam + 0.498Knowledge
−0.184Grade + 0.0005Enroll
z If all other variables are held constant:

Predicted Y would be higher 0.763 for every Teach unit
Predicted Y would be higher 0.0005 for every Enroll unit
z Improving variable i by one unit does not necessarily

improve Overall by biOverall units
Changing one variable likely will affect other variables because
of the pattern of intercorrelations

Example
z Common interpretation mistake: the relative magnitudes
of the bi provide an index of the relative importance of
the individual predictors
Teach is more important than Enroll because b is larger
z Relative magnitudes of the coefficients are in part a

function of the standard deviations of the corresponding
variables
Because SD of Enroll is much larger than SD of other variables,
the regression coefficient is small regardless of the importance
of that variable

Example
z Intuitive reason:
For one instructor to have a Teach rating one point higher
is a major accomplishment (range: 2.2 to 4.5)
Having one additional student is a trivial matter
(range: 7 to 800)

Standardized regression coefficients

z Question of relative importance has several answers…
z Suppose that before applying the multiple regression
equation, we had standardized each of our variables
mean = 0 and SD = 1
z One-unit difference on one variable would be comparable

to a one-unit difference on other variables
z Solving the regression using standardized variables:
YˆZ = 0.662 Z1 + 0.106Z 2 + 0.325Z 3 − 0.105Z 4 + 0.124 Z 5

Example
YˆZ = 0.662 Z1 + 0.106Z 2 + 0.325Z 3 − 0.105Z 4 + 0.124 Z 5
z Although relative magnitude of coefficients is not

necessarily the best indicator of importance, it has a
simple interpretation and provides a rough estimate of
the relative contributions of the variables
Commonly used

Residual variance
z Predictor variables will not predict Y perfectly
z Error: residual variance or residual error
∑ (Y − Yˆ ) 2
= MS residual = MSE = s0.12345

2
N − p −1
z Goal is to minimize residual error

z Square root of MSresidual is sometimes called standard
error of estimate

Residual variance
z Remember from simple regression that
sY ⋅X =
2 ∑ i
(Y − Yˆ ) 2
=
SS residual
N −2 df
z In the case of multiple regression: df = N – p – 1

(where p is the number of regressors)

Multiple correlation coefficient

z Correlation coefficient in the context of multiple
regression
Ex.: between Overall and other predictors: 0.869
z What does this mean?
z R can be defined as the correlation between the
criterion (Y) and the best linear combination of the
predictors
R = rYYˆ
Yˆ = b0 + b1 X 1 + b2 X 2 + L + bp X p

SSR SSE
R =
2
= 1−
SSTO SSTO
z R2 can be interpreted in terms of percentage of

accountable variation
z R2 = 0.755: we can say that 75.5% of the variation in
Overall quality of the lectures can be predicted on the
basis of the five predictors
11% more than on the basis of Teach alone (best
predictor)

Partitioning of deviations
z Sum of squares that is related to regression (SSR):
∑ (Yi − Y ) = SSYˆ
ˆ 2
z Sum of squares that is related to residual (error; SSE):
∑ (Yi − Yi ) = SSresidual
ˆ 2
z Sum of squares related to total deviation (SSTO):
∑ i
(Y − Y ) 2
= SSY

z Adding more X variables to a regression model can only increase R2
and never reduce it, because SSE can never become larger with
more X variables and SSTO is always the same
z It is often suggested that R2 be adjusted to take into account the
number of X variables in the model: adjusted coefficient of
multiple determination (“Ajusted R Square”)
z Adjust R2 by dividing each sum of squares by degrees of freedom:
SSE
N − p −1
Ra2 = 1 − SSTO
N −1
z Not commonly reported in papers… (probably should)

Testing the significance of R2

z Does the set of variables taken together predict Y at
better-than-chance levels?
z H0: R* = 0, where R* is correlation in the population

( N − p − 1) R 2
F=
p (1 − R 2 )
which is distributed as the standard F distribution on p and

N – p – 1 degrees of freedom

F distribution

Example
(50 − 5 − 1)(0.755) 44(0.755)

F= = = 27.184
5(0.245) 1.225
z We can reject the null hypothesis

F test for regression relation

z Is there a regression relation between the response
variable Y and the set of X variables?
H 0 : b1 = b2 = L = bp = 0
H a : not all bi equal zero
MSR
F=
MSE
z MSR: mean square regression: SSR/df (df = p)

z MSE: mean square error: SSE/df (df = N – p – 1)
Partitioning of deviations
Yi − Y Yi − Yî Yî − Y
Y1 Y1
Yˆ1 Yˆ1
Y Y
Y Yˆ = a + bX Yˆ = a + bX
Yˆ2
Yˆ2
Y2 Y2
Total deviation Residual deviation Regression deviation

Tests on regression coefficients

z If coefficients are not significantly different from zero,
predictor will serve no use
z To determine this we need to know the standard error of
the bi
Remember that the standard error refers to the variability of the
statistic over repeated sampling
z We can then form a t test on the regression coefficients
b j − b*j
t=
sb j
where bj* is the true value of the coefficient in the

population and sbj is the standard error of bj
Tests on regression coefficients

z Typically we test H0: bj* = 0
bj
t=
sb j
z For Teach we have: t = 0.763/0.133 = 5.742

N – p – 1 = 50 – 5 – 1 = 44 df
We can reject the null hypothesis
z Exam is not significant: t = 0.8

Drop Exam from the model?
A test of a predictor is done in the context of all other variables
in the equation

Example
z Note that Exam is actually significantly correlated with
Overall (r = 0.596)
z Nothing useful to contribute once other variables are
included

Sample sizes
z When there is no relation between the criterion and the predictors,
the expected value of R is not zero!
p/(N – 1)
z With 5 predictors and 51 cases: R = 0.1
z Large sample are needed (to push “random” R to 0)
z Power of a statistical test: 1 – β: probability of rejecting the null
hypothesis when it is in fact false and should be rejected
z Example: With 5 predictors, a population correlation of 0.3 would
require N = 187 for 0.8 power
z Reasonable amount of power requires fairly large samples!

Partial correlation
z Coefficients in multiple regression are called partial
regression coefficients
z Example: b1 is coefficient for regression of Y on X1
when we partial out the effect of X2, …, Xp: b01.23…p
When other variables are held constant
z Common mistake: equate b1 in the context of the
other Xi with the simple regression coefficient when
ignoring Xi:
b01.2 ≠ b01

Partial correlation
z Example (somewhat exaggerated)
Y 2 1 4 3 6 5
X1 1 2 3 4 5 6
X2 2 2 4 4 6 6

Example
z For any fixed value of X2, slope of the regression line of
Y on X1 is negative (in fact b01.2 = –1 )
z However, regression of Y on X1 when ignoring X2 is
positive (b01 = 0.829)
Partialling out and ignoring

are very different!
z When X1 and X2 are independent,

regression coefficients will be equal in both cases
Interpretation
z Suppose we regress Y on X2 and obtain the residual
values Yr = Yi − Yî
z Residual values represent part of Y that cannot be
predicted by X2: independent of X2
z Now regress X1 on X2 generating X 1r = X 1i − Xˆ 1i
z Again, residual values represent part of X1 that is
independent of X2
z We now have two sets of residuals: part of Y and part of
X1 that are independent of X2
Partialled X2 out of Y and out of X1

Interpretation
z Now regress Yr on X1r: regression coefficient will be the
partial coefficient b01.2
z Correlation between Yr and X1r is the partial correlation
of Y and X1, with X2 partialled out: r01.2
covYr X1r
bYr X1r = 2
= −1
s X1 r
= b01.2

Example
z Suppose an investigator established a significant
correlation between earned income and success in college
z Should students study more??
z Both variables are probably linked to IQ
People with high IQ likely do well in college and earn
more
Relationship between income and college success
potentially spurious
z How can this issue be addressed?
Partial out effect of IQ

Example
z Compute partial correlation between Income and
Success with IQ partialled out
1. Regress Income on IQ and obtain residuals
2. Regress Success on IQ and obtain residuals
z In each case, residuals indicate variation of Income and

Success that cannot be attributed to IQ
z Question: Can variation in Income-independent-of-IQ
be predicted by variation in Success-independent-of-IQ?
z Correlation between the two variables is the partial
correlation of Income and Success
Partialling out effect of IQ
Another example
z Example: compute the
partial correlation
between Y and X1: r01.2
cov X1rYr
r= = 0.738
s X1r sYr

Multiple Regression - Statistical Method For Psychology

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiple Regression - Statistical Method For Psychology

Uploaded by

Copyright:

Available Formats

PY206: Statistical Methods for Psychology Multiple Regression

Multiple Regression: Part I

Slide 1 Luiz Pessoa, Brown University © 2004

Slide 2 Luiz Pessoa, Brown University © 2004

 Consider Overall as the dependent measure and let’s try

Slide 3 Luiz Pessoa, Brown University © 2004

Overall Teach Exam Knowl Grade Enroll

z Teach, Knowledge, and Exam have highest correlation

Slide 4 Luiz Pessoa, Brown University © 2004

Slide 5 Luiz Pessoa, Brown University © 2004

z If all other variables are held constant:

z Improving variable i by one unit does not necessarily

Slide 6 Luiz Pessoa, Brown University © 2004

z Relative magnitudes of the coefficients are in part a

Slide 7 Luiz Pessoa, Brown University © 2004

Slide 8 Luiz Pessoa, Brown University © 2004

Standardized regression coefficients

z One-unit difference on one variable would be comparable

Slide 9 Luiz Pessoa, Brown University © 2004

YˆZ = 0.662 Z1 + 0.106Z 2 + 0.325Z 3 − 0.105Z 4 + 0.124 Z 5

z Although relative magnitude of coefficients is not

Slide 10 Luiz Pessoa, Brown University © 2004

= MS residual = MSE = s0.12345

z Goal is to minimize residual error

Slide 11 Luiz Pessoa, Brown University © 2004

z In the case of multiple regression: df = N – p – 1

Slide 12 Luiz Pessoa, Brown University © 2004

Multiple correlation coefficient

Slide 13 Luiz Pessoa, Brown University © 2004

Multiple correlation coefficient

z R2 can be interpreted in terms of percentage of

Slide 14 Luiz Pessoa, Brown University © 2004

z Sum of squares that is related to residual (error; SSE):

z Sum of squares related to total deviation (SSTO):

Multiple correlation coefficient

z Not commonly reported in papers… (probably should)

Slide 16 Luiz Pessoa, Brown University © 2004

Testing the significance of R2

z H0: R* = 0, where R* is correlation in the population

which is distributed as the standard F distribution on p and

Slide 17 Luiz Pessoa, Brown University © 2004

Slide 18 Luiz Pessoa, Brown University © 2004

(50 − 5 − 1)(0.755) 44(0.755)

z We can reject the null hypothesis

Slide 19 Luiz Pessoa, Brown University © 2004

F test for regression relation

z MSR: mean square regression: SSR/df (df = p)

Total deviation Residual deviation Regression deviation

Slide 21 Luiz Pessoa, Brown University © 2004

Tests on regression coefficients

where bj* is the true value of the coefficient in the

Tests on regression coefficients

z For Teach we have: t = 0.763/0.133 = 5.742

z Exam is not significant: t = 0.8

Slide 23 Luiz Pessoa, Brown University © 2004

Slide 24 Luiz Pessoa, Brown University © 2004

Slide 25 Luiz Pessoa, Brown University © 2004

Slide 26 Luiz Pessoa, Brown University © 2004

Slide 27 Luiz Pessoa, Brown University © 2004

Partialling out and ignoring

z When X1 and X2 are independent,

Slide 29 Luiz Pessoa, Brown University © 2004

Slide 30 Luiz Pessoa, Brown University © 2004

Consider Overall as the dependent measure and let’s try