You are on page 1of 26

CHAPTER 6

Multiple Regression

1. The model
2. The Least Squares Method of Determining Sample Regression Coefficients
3. Using Matrix Algebra to Solve the System of Least Squares Normal Equations
4. Standard Error of Estimate
5. Sampling Properties of the Least Squares Estimators
5.1. The Variances and Covariances of the Least Squares Estimators
6. Inferences about the Population Regression Parameters
6.1. Interval Estimates for the Coefficients of the Regression
6.2. Confidence Interval for the Mean Value of y for Given Values of
6.3. Prediction Interval for the Individual Value of y for Given Values of
6.4. Hypothesis Testing for a Single Coefficient
6.4.1. Two-Tail Test of Significance
6.4.2. One-Tailed Tests
6.4.2.1. Example 1: Testing for Price Elasticity of Demand
6.4.2.2. Example 2: Testing for Effectiveness of Advertising
6.4.2.3. Example 3: Testing for a Linear Combination of Coefficients
7. Extension of the Regression Model
7.1. The Optimal Level of Advertising
7.2. Interaction Variables
7.3. Log-Linear Model
8. Measuring Goodness of Fit2
8.1. Adjusted 2

1. The model
In multiple regression models we may include two or more independent variables. We will start with the
model that includes two independent variables 2 and 3 . The main features of and assumptions regarding
the simple regression model can be extended to multiple regression models. Here the dependent variable is
influenced by the variations in the independent variables through the three parameters 1 , 2 , and 3 . The
disturbance term u is still present, because the variations in y continue to be influenced by the random
component . Thus,

= 1 + 2 2 + 3 3 +

Given that ( ) = 0, the following holds (dropping the subscript),

() = 1 + 2 2 + 3 3

The sample regression equation is

= 1 + 2 2 + 3 3 +

To estimate the parameters from the sample data we need formulas to determine values for the estimators of
the population parameters. The estimators are the sample regression coefficients 1 , 2 , and 3 .

Chapter 6Multiple Regression 1 of 26


2. The Least Squares Method of Determining Sample Regression Coefficients
We want to find the values for 1 , 2 , and 3 such that the sum of the squared deviation of the observed
from the fitted plane is minimized. The deviation takes on the same familiar form as in the simple regression
model

where is the predicted value which lies on the regression plane.

= 1 + 2 2 + 3 3

Substituting for in the residual equation above, squaring both sides, and then summing for all , we have,

= 1 2 2 3 3

2 = ( 1 2 2 3 3 )2
Taking three partial derivatives, one for each coefficient, and then setting them equal to zero, we obtain three
normal equations.

2
= 2( 1 2 2 3 3 ) = 0
1

2
= 22 ( 1 2 2 3 3 ) = 0
2

2
= 23 ( 1 2 2 3 3 ) = 0
3

The normal equations are:

1 2 2 3 3 = 0

2 1 2 2 22 3 2 3 = 0

3 1 3 2 2 3 3 23 = 0
To find the solutions for the three s, we can write the normal equations in the following way, by taking the
terms not involving the to the right-hand-side.

1 + (2 )2 + (3 )3 =

(2 )1 + (22 )2 + (2 3 )3 = 2

(3 )1 + (2 3 )2 + (32 )3 = 3

Here we have a system of three equations with three unknowns, 1 , 2 , and 3 . Now we need to develop a
method to find the solution for these unknowns. For this, a brief introduction to matrix algebra is called for.
(See the appendix at the end of this chapter.)

Chapter 6Multiple Regression 2 of 26


3. Using Matrix Algebra to Solve the System of Least Squares Normal
Equations
We can write the system of normal equations in the matrix format:

2 3 1
[2 2
2
2 3 ] [2 ] = [2 ]
3 2 3 32 3 3

with the following shorthand notation

where,

2 3 1
= [2 2
2
2 3 ] = [2 ] = [2 ]
3 2 3 32 3 3

Solutions for are obtained by finding the product of the inverse matrix of , , times .

Example:
We want to obtain a regression of (), in $1,000's, on (2 ), in dollars, and
(3 ), in $1,000's, of a fast food restaurant. The data is contained in the Excel file CH6 DATA in
worksheet tab burger.

Use Excel to compute the values for the elements of the matrix and :

2 3 75 426.5 138.3
[2 22 2 3 ] = [426.5 2445.707 787.381]
3 2 3 32 138.3 787.381 306.21

5803.1
[2 ] = [32847.7]
3 10789.6

Thus, our system of normal equations can be written as:

75 426.5 138.3 1 5803.1


[426.5 2445.707 787.381] [2 ] = [32847.7]
138.3 787.381 306.21 3 10789.6

Since = 1 , we need to find 1 . Now that we know what the inverse of a matrix means and how to find
it, we have Excel to do the hard work for us. In Excel use the array type function =(). You must
first highlight a block of 3 3 cells, then call up the function =MINVERSE(). When it asks for array, you
must lasso the block of cells containing the elements of the matrix and then press Ctrl-Shift-Enter keys
together. The result is the following:

Chapter 6Multiple Regression 3 of 26


1.689828 -0.28462 -0.03135
-0.28462 0.050314 -0.00083
-0.03135 -0.00083 0.019551

which is the matrix 1 . Then pre multiplying by 1 , we have

b 1.689828 -0.28462 -0.03135 5803.1 118.9136


b = -0.28462 0.050314 -0.00083 32847.7 = -7.90785
b -0.03135 -0.00083 0.019551 10789.6 1.862584

= 118.9136 7.907852 + 1.8625843

= 118.914 7.908 + 1.863

2 = 7.908 implies that for each dollar increase in price (advertising held constant) monthly sales would fall
by $7,908. Or, a 10 increase in price would result in a decrease in monthly sales of $790.8. 3 = $1.863
implies that (holding price constant) for each additional $1,000 of advertising sales would increase by $1,863.

The following is the Excel regression output showing the estimated coefficients.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.669521
R Square 0.448258
Adjusted R Square 0.432932
Standard Error 4.886124
Observations 75

ANOVA
df SS MS F Significance F
Regression 2 1396.5389 698.26946 29.247859 5.04086E-10
Residual 72 1718.9429 23.874207
Total 74 3115.4819

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 118.91361 6.35164 18.72172 0.00000 106.25185 131.57537
PRICE -7.90785 1.09599 -7.21524 0.00000 -10.09268 -5.72303
ADVERT 1.86258 0.68320 2.72628 0.00804 0.50066 3.22451

4. Standard Error of Estimate


In simple regression we learned that the disturbance term for each given is normally distributed about the
population regression line, with an expected value E() = 0, and variance var(). The unbiased estimator of
var(), var(), was obtained by using the least squares residuals = .

2 ( )2
var() = =
2 2

The square root of the estimated variance was called the standard error of estimate, se(). Here with two
independent variables, the least squares residuals take the same form:

= = (1 + 2 2 + 3 3 )

Chapter 6Multiple Regression 4 of 26


Thus the estimated variance of the disturbance or error term is:

2 ( )2
var() = =

where is number of parameters of the regression being estimated. Here = 3, for estimating 1 , 2 , and 3 .
Given the estimated regression equation in the last example,

= 118.9136 7.907852 + 1.8625843

the predicted value of (), for price 2 = $6.20 and advertising 3 = $3.0 (observation #10), is:

= 118.9136 7.90785(6.2) + 1.862584(3) = $75.473 (thousand)

The observed value of sales is 10 = $76.4. Thus, the residual is calculated as = 76.4 75.473 = 0.927.
Computing such residuals for all 75 observations and finding the sum of squared residuals, the estimated
variance and the standard error of estimate are, respectively

( )2 1718.943
var() = = = 23.874
72

se() = 4.886

5. Sampling Properties of the Least Squares Estimators


As they were in simple regression, the least squares estimators, or the coefficients of the estimated
regression, are sample statistics obtained from a random sample. Thus they are all random variables each
with its own expected value and variance. Again, like the simple regression coefficients, these estimators are
BLUE.

1 , 2 , and 3 are each a linear function of the dependent variable .


The coefficients are unbiased estimators: E(1 ) = 1 , E(2 ) = 2 , and E(3 ) = 3
var(1 ), var(2 ), and var(3 ) are all minimum variances.

Once again, the linearity assumption is important for inferences about the parameters of the regression. If
the disturbance terms , and correspondingly the values, are normally distributed about the regression
plane, then linearly related coefficients are also normally distributed. If the disturbance terms are not
normal, then according to the central limit theorem, the coefficients will be approximately normal for large
samples.

5.1. The Variances and Covariances of the Least Squares Estimators


Since the least squares estimators are random variables, their variances show how closely or widely
dispersed they tend to scatter around their respective population parameters. In a multiple regression model
with three parameters 1 , 2 , and 3 , the variance of each estimator 1 , 2 , and 3 is estimated from the
sample data. We can determine the variances and covariances of the least square estimators using
elementary matrix algebra. Using the inverse matrix X 1 determined above, we can generate a covariance
matrix as follows:

Chapter 6Multiple Regression 5 of 26


var(1 ) covar(1 , 2 ) covar(1 , 3 )
[covar(1 , 2 ) var(2 ) covar(2 , 3 )] = var() 1
covar(1 , 3 ) covar(2 , 3 ) var(3 )

1.6898 -0.2846 -0.0313 40.3433 -6.7951 -0.7484


var() 1 = 23.87421 -0.2846 0.0503 -0.0008 = -6.7951 1.2012 -0.0197
-0.0313 -0.0008 0.0196 -0.7484 -0.0197 0.4668

From the covariance matrix on the extreme right hand side we have the following variances and covariances:

var(1 ) = 40.3433 covar(1 , 2 ) = 6.7951


var(2 ) = 1.2012 covar(1 , 3 ) = 0.7484
var(3 ) = 0.4668 covar(2 , 3 ) = 0.0197

6. Inferences about the Population Regression Parameters


To build an interval estimate or perform a hypothesis test about the s in the population regression, the least
square estimators of these parameters must be normally distributed. The normality is established if the
disturbance terms and, correspondingly, the values for the various values of the explanatory variables are
normally distributed. And, as long as the least squares estimators are linear functions of the , then the
estimator are also normally distributed.

6.1. Interval Estimates for the Coefficients of the Regression


Following the same reasoning as in simple regression, the interval estimates for each parameter take the
following form, for a given confidence level 1 (confidence level):

For 1 : , = 1 2, se(1 )

For 2 : , = 2 2, se(2 )

For 3 : , = 3 2, se(3 )

The standard error for each regression coefficient can be obtained from the variance-covariance matrix. The
95% interval estimates for the slope coefficients in the above example, based on the following information,
are given below.

1 = 18.914 se(1 ) = 6.352


2 = 7.908 se(2 ) = 1.096
3 = 1.863 se(3 ) = 0.683
0.025,(72) = 1.993

The 95% interval estimate for the response of to a change, 2 , is

, = 2 0.025,(3) se(2 )

, = 7.908 (1.993)(1.096) = [10.093, 5.723]

Chapter 6Multiple Regression 6 of 26


This implies that a $1 decrease in price will result in an increase in revenue somewhere between $5,723 and
$10,093.

The 95% interval estimate for the response of to a change in , 3 , is

, = 3 0.025,(3) se(3 )

, = 1.863 (1.993)(0.683) = [0.501,3.225]

An increase in advertising expenditure of $1,000 would increase sales between $501 and $3,225. This is a
relatively wide interval and, therefore, does not convey good information about the range of responses in
sales to advertising. This wide interval arises from the fact that the sampling variability of the 3 coefficient is
large. One way to reduce this variability is to increase the sample size. But this in many cases is impractical
because of the limitations on data availability. An alternative way is to introduce some kind of non-sample
information on the coefficient, to be explained later.

6.2. Interval Estimate for the Mean Value of y for Given Values of

The mean value of for given values of the independent variables is 0 , the predicted value of for the given
values x, x, ... The confidence interval for the mean value of y is therefore,

, = 0 2,() se(0 )

To determine se(0 ), start with var(0 ). For a model with two independent variables,

0 = 1 + 2 02 + 3 03

var(0 ) = var(1 + 2 02 + 3 03 )
2 2
var(0 ) = var(1 ) + 02 var(2 ) + 03 var(3 ) + 202 covar(1 , 2 ) + 203 covar(1 , 3 ) + 202 03 covar(2 , 3 )

To build a 95% interval for the mean value of in the model under consideration,

= 118.914 7.908 + 1.863

Let 02 = 0 = 5 and 03 = 0 = 3. Then 0 = 84.96.

0 = 118.914 7.908(5) + 1.863(3) = 84.96

From the covariance matrix,

40.3433 -6.7951 -0.7484


-6.7951 1.2012 -0.0197
-0.7484 -0.0197 0.4668

we have,

var(0 ) = 40.3433 + (52 )(1.2012) + (32 )(0.4668) + 2(5)(6.7951) + 2(3)(0.7484) + 2(5)(3)(0.0197)

var(0 ) = 1.5407

se(0 ) = 1.5407 = 1.241

Chapter 6Multiple Regression 7 of 26


, = 84.96 (1.993)(1.241) = [82.49,87.44]

6.3. Prediction Interval for the Individual Value of for a Given


For the given value of , 0 , the difference between an individual value 0 and the mean value 0 is the error
term,

0 0 =

Therefore,

var(0 ) = var() + var(0 )

In the above example,

var(0 ) = 23.8742 + 1.5407 = 25.4149

se(0 ) = 5.0413

, = 84.96 (1.993)(5.0413) = [74.91,95.01]

6.4. Interval Estimation for a Linear Combination of Coefficients


In the above example, suppose we want to increase advertising expenditure by $800 and reduce the price by
$0.40. Build a 95% interval estimate for the change in the mean (expected) sales. The change in the mean
sales is:

= 1 0

= [1 + 2 (02 0.4) + 3 (03 + 0.8)] [1 + 2 02 + 3 03 ]

= 0.42 + 0.83

The interval estimate is:

, = 0.025, se()

, = (0.42 + 0.83 ) 0.025, se(0.42 + 0.83 )

var(0.42 + 0.83 ) = 0.42 var(2 ) + 0.82 var(3 ) 2 0.4 0.8 cov(2 , 3 )

var(0.42 + 0.83 ) = 0.42 (1.2012) + 0.82 (0.4668) 2 0.4 0.8(0.0197)

var(0.42 + 0.83 ) = 0.5036

se(0.42 + 0.83 ) = 0.7096

= 0.42 + 0.83 = 0.4(7.908) + 0.8(1.8626) = 4.653

, = 4.653 1.993 0.7096 = [3.239,6.068]

We are 95% confident the mean increase in sales will be between $3,239 and $6,068.

Chapter 6Multiple Regression 8 of 26


6.5. Hypothesis Testing for a Single Coefficient

6.5.1. Two-Tail Test of Significance

Generally, a computer output such as the Excel regression output provides the test statistic for the test of
null hypothesis 0 : = 0 against 1 : 0. If the null hypothesis is not rejected, then we conclude that the
independent variable does not influence . The test statistic for such a test is


= || =
se( )

And, since this is a two-tail test, the critical value, for a level of significance , is = 2,() . For example,

0 : 2 = 0 1 : 2 0

= 7.9081.096 = 7.215, and = 0.025,(72) = 1.993. Since > , we reject the null hypothesis and
conclude that revenue is related to price. Also note that the probability value for the test statistic is 2
P( > 7.215) 0. Since this is less than = 0.05, we reject the null. The test for 3 is as follows:

0 : 3 = 0 1 : 3 0

= 1.8630.683 = 2.726 and = 1.993. Since > , we reject the null hypothesis and conclude that
revenue is related to advertising expenditure. The probability value for the test statistic is 2 P( > 2.726) =
0.0080.

All these figures are presented in the Excel regression output shown above.

6.5.2. One-Tailed Tests

We will use the current example to provide an example of one-tailed test of hypothesis in regression.

6.5.2.1. Example 1: Testing for Price Elasticity of Demand

Suppose we are interested to test the hypothesis that the demand is price inelastic against the alternative
hypothesis that demand is price elastic. According to the total-revenue test of elasticity, if demand is
inelastic, then a decrease in price would lead to a decrease in revenue. If demand is elastic, a decrease in
price will lead to an increase in revenue.

0 : 2 0 (When demand is price inelastic, price and revenue change in the same direction.)

1 : 2 < 0 (When demand is price elastic, price and revenue change in the opposite direction.)

Note that here we are giving the benefit of the doubt to inelasticity. If demand is elastic, we want to have
strong evidence of that. Also note that the regression result already provides some evidence that demand is
elastic (2 = 7.908 < 0). But is this evidence significant?

The test statistic is

Chapter 6Multiple Regression 9 of 26


2 (2 )0
=
se(2 )

Since by the null hypothesis (2 )0 is equal to (or greater than) zero, the test statistic simplifies to =
2 se(2 ). However, since this is a one-tailed test, the critical value is = ,() .

= 7.9081.096 = 7.215 > = 0.05,(72) = 1.666

This leads us to reject the null hypothesis and conclude that there is strong evidence that 2 is negative, and
hence demand is elastic.

6.5.2.2. Example 2: Testing for Effectiveness of Advertising

We can also perform a test for the effectiveness of advertising. Does an increase in advertising expenditure
bring an increase in total revenue above that spent on advertising? That is, is 3 > 1? Again, the sample
regression provides that 3 = 1.863 > 1. But we want to prove if this evidence is significant. Thus,

0 : 3 1
1 : 3 > 1

The test statistic is,

3 (3 )0 1.862 1
= = = 1.263
se(2 ) 0.683

Since = 1.263 < = 0.05,(72) = 1.666, do not reject the null hypothesis. The p-value for the test is P( >
1.263) = 0.1053, which exceeds = 0.05. The test does not prove that advertising is effective. That is, 3 is
not significantly greater than 1.

6.5.2.3. Example 3: Testing for a Linear Combination of Coefficients

Test the hypothesis that dropping price by $0.20 will be more effective for increasing sales revenue than
increasing advertising expenditure by $500, that is: 0.202 > 0.53

Note that the regression model provides that 0.22 = 1.582 > 0.53 = 0.931. The test is to prove that this
is significant. Therefore, the null and alternative hypotheses are:

0 : 0.22 0.53 0
1 : 0.22 0.53 > 0

The test statistic is,

0.22 0.53 (0.22 0.53 )0


=
se(0.22 0.53 )

0.22 0.53
=
se(0.22 0.52 )

The problem here is to find the standard error of the linear combination of the two coefficients in the
denominator. For that, first determine var(0.202 0.52 ):

var(0.22 0.53 ) = (0.2)2 var(2 ) + (0.5)2 var(3 ) + 2(0.2)(0.5)covar(2 , 3 )

Chapter 6Multiple Regression 10 of 26


Form the covariance matrix above, we obtain the following:

var(0.22 0.53 ) = (0.2)2 (1.2012) + (0.5)2 (0.4668) + 2(0.2)(0.5)(0.0197)

var(0.22 0.53 ) = 0.1608

se(0.22 0.53 ) = 0.4010

Then,

(0.2)(7.9079) (0.5)(1.8626)
= = 1.622
0.4010

= 0.05,(72) = 1.666

Since = 1.622 < = 0.05,(72) = 1.666, do not reject the null hypothesis. The p-value for the test is P( >
1.622) = 0.055, which exceeds = 0.05. The test does not prove that 0.202 > 5003 .

7. Extension of the Regression Model


We can extend a regression model by altering the existing independent variables and treat them as new
variables. Lets use the current example of sales-price and advertising expenditure model. We want to
address the issue that sales does not rise indefinitely and at a constant rate in response to increases in
advertising expenditure. As expenditure on advertising rises revenues may rise at a decreasing (rather than a
constant) rate, implying diminishing returns to advertising expenditure. To take into account the impact of
the diminishing returns on advertising is to include the squared value of advertising, 32 in the model.

= 1 + 2 2 + 3 3 + 4 32 +

E() = 1 + 2 2 + 3 3 + 4 32

Thus,

E()
= 3 + 24 3
3

We expect revenues to increase with each additional unit increase in advertising expenditure. Therefore,
3 > 0. We also expect the rate of increase in revenues to decrease with each additional unit increase in
advertising. Therefore, 4 < 0.

Once we point out the characteristics of the extended model, we can treat 32 as a new variable 4 . Using the
same data, the Excel regression output is show below.

Chapter 6Multiple Regression 11 of 26


SUMMARY OUTPUT
Regression Statistics
Multiple R 0.7129061
R Square 0.5082352
Adjusted R Square 0.4874564
Standard Error 4.645283
Observations 75

ANOVA
df SS MS F Significance F
Regression 3 1583.397408 527.79914 24.459316 5.59996E-11
Residual 71 1532.084459 21.578654
Total 74 3115.481867

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 109.719 6.799 16.137 1.87E-25 96.162 123.276
PRICE -7.640 1.046 -7.304 3.236E-10 -9.726 -5.554
ADVERT 12.151 3.556 3.417 0.0010516 5.060 19.242
ADVERT -2.768 0.941 -2.943 0.0043927 -4.644 -0.892

From the regression table, the coefficient of 2 , 4 = 2.768. It has the anticipated sign, and it is also
significantly different from zero (value = 0.0044 < 0.05). There is diminishing returns to advertising.

To interpret the role of the coefficients 3 = 12.151 and 4 = 2.768, consider the following table where
() is computed for a fixed value of 2 () = $5 and for various values of 3 (). Starting
from (3 )0 = $1.8 and increasing advertising by the increment of 3 = 0.10 to , 3 = 1.9, the predicted sales
increases by = $0.19. As advertising expenditure is increased by the same increment, the increment in
predicted sales decreases to 0.14 and then to 0.08.

(3 )0
1.8 84.42
1.9 84.61 0.19
2.0 84.75 0.14
2.1 84.83 0.08

7.1. The Optimal Level of Advertising

What is the optimum level of advertising? Optimality in economics always implies marginal benefit of an
action be equal to its marginal cost. If marginal benefit exceeds the marginal cost, the action should be taken.
If marginal benefit is less than the marginal cost, the action should be curtailed. The optimum is, therefore,
where the two are equal.

The marginal benefit of advertising is the contribution of each additional one thousand dollar ($1) of
advertising expenditure to total revenue. Form the model,

= 1 + 2 2 + 3 3 + 4 32 +

the marginal benefit of advertising is:

Chapter 6Multiple Regression 12 of 26


E()
= 3 + 24 3
3

Ignoring the marginal cost of additional sales, marginal cost is each additional $1 (thousand) spent on
advertising. Thus, the optimality requirement is = :

3 + 24 3 = $1

Using the estimated least squares coefficients, we thus have:

12.151 + 2(2.768)3 = 1

Yielding, 3 = $2.014 thousand.

We want to build an interval estimate for the optimal level of advertising:

1 3
(3 ) =
24

13
Note that the sample statistic is obtained as a nonlinear combination of the two coefficients 3 and 4 .
24
Therefore, we cannot use the same formula that we used to find the variance of the linear combination of the
two coefficients. The (approximate) variance of the nonlinear combination of two random variables is
obtained using the delta method. Let

1 3
=
24

Then,

2 2
var() = ( ) var(3 ) + ( ) var(4 ) + 2 ( )( ) covar(3 , 4 )
3 4 3 4

Using partial derivatives,

1 1 3
= and =
3 24 4 242

Hence,
2
1 2 1 3 1 1 3
var() = ( ) var(3 ) + ( ) var(4 ) + 2 ( ) ( ) covar(3 , 4 )
24 242 24 242

We can obtain var(3 ) and var(4 ) by simply squaring the standard errors from the regression output.
Unfortunately, the Excel regression output does not provide the covariance value. We can, however, still use
Excel to compute covar(3 , 4 ). If you recall, using matrix algebra we can determine the variance-covariance
matrix: var() 1 . Adding 32 to the model, we have now have three independent variables. The solution for
this problem is simple because the matrix can be expanded to incorporate any number of independent
variables. For a 3-variable model we have,

Chapter 6Multiple Regression 13 of 26


2 3 4
2 22 2 3 2 4
=
3 2 3 32 3 4
[4 2 4 3 4 42 ]

Using Excel we can compute these quantities as the elements of the X matrix:

75.0 426.5 138.3 306.2


426.5 2445.7 787.4 1746.5
138.3 787.4 306.2 755.0
306.2 1746.5 755.0 1982.6

Then determine the inverse matrix 1 :

2.1423 -0.2978 -0.5376 0.1362


-0.2978 0.0507 0.0139 -0.0040
-0.5376 0.0139 0.5861 -0.1524
0.1362 -0.0040 -0.1524 0.0410

and find the variance-covariance matrix by finding the product

var() 1 = 21.579 1

46.227 -6.426 -11.601 2.939


-6.426 1.094 0.300 -0.086
-11.601 0.300 12.646 -3.289
2.939 -0.086 -3.289 0.885

Thus,

1 2
1 12.151 2 1 1 12.151
var() = ( ) (12.646) + ( 2
) (0.885) + 2 ( ) ( ) (3.289)
2 2.768 2 2.768 2 2.768 2 2.7682

var() = 0.41265 + .46857 + 2(1.18064)(0.72773)(3.28875) = 0.01657

se() = 0.12872

The 95% confidence interval for the optimal level of advertising then is,

, = 2,(4) se()

, = 2.014 (1.994)(0.12872) = ($1.757, $2.271)

7.2. Interaction Variables

An example to illustrate the use of interaction variables is the life-cycle model of consumption behavior. Here
the model involves the response of consumption () to (2 ) and (3 ) of the consumer.
First consider the simple model without the interaction variables.

= 1 + 2 2 + 3 3 +

Using the data in the Excel file CH6 DATA in the tab pizza, the estimated regression equation is

Chapter 6Multiple Regression 14 of 26


= 342.88 7.5762 + 1.8323

Thus, holding constant, for each additional year of the expenditure on decreases by
$7.58:


2 = = 7.576
2

And, holding constant, for each additional $1,000 increase in expenditure on rises by
$1.83.

However, we would expect that people of different ages would not spend similar amount on pizza for each
additional $1,000 increase in income. It is reasonable to expect that older persons spend smaller amount of
additional income on pizza than younger persons. Thus, we expect there to be an interaction between age
and income variables. This gives rise to the interaction variable in the model, represented by the product of
the variables and : (2 3 ).

= 1 + 2 2 + 3 3 + 4 2 3 +

The estimated regression equation is now:

= 161.465 2.9772 + 6.9803 0.1232 3


= 2 + 4 3 = 3 + 4 2
2 3

= 2.977 0.1233 = 6.98 0.1232
2 2

For 3 = 30 ($1,000) For 2 = 25 years


= 2.977 0.123(30) = 6.67 = 6.98 0.123(25) = 3.90
2 2

When income is $30,000, for each additional year, When age is 25, for each additional $1,000 income,
expenditure on pizza is reduced by $6.67 expenditure on pizza is increased by $3.90.

For 3 = 80 ($1,000) For 3 = 50 years


= 2.977 0.123(80) = 12.84 = 6.98 0.123(50) = 0.82
2 2

When income is $80,000, for each additional year, When age is 50, for each additional $1,000 income,
expenditure on pizza is reduced by $12.84 expenditure on pizza is increased by $0.82.

7.3. Log-Linear Model

Lets start with a model where the percent change in () is a stochastic function of two independent
variables (2 ) and years of (3 )

ln() = 1 + 2 2 + 3 3 +

Chapter 6Multiple Regression 15 of 26


Now if we believe that the effect of each additional year of experience also depends on the level of education,
then we may include the interaction variable (2 3 ) in the model is a third
variable.

ln() = 1 + 2 2 + 3 3 + 4 2 3 +

Using the data in the tab , the estimated regression equation is


ln() = 1.3923 + 0.094942 + 0.0063295143 0.00003642 3

The effect of another year of , holding constant, is,

1
= 2 + 4 3
2

For example, given , 3 = 5 years, the increase in from an extra year of is


9.48%.

2
= 0.09494 0.000036(5) = 0.09476 (9.476%)

At a higher level of , say 3 = 10 years, the percentage increase in for an additional year
of decreases slightly to 9.46%.

2
= 0.09494 0.000036(10) = 0.09457 (9.457%)

Note that these percentage changes in are the result of a very small change in the variable
. The results of a discrete change in the variable , where 2 = 1, are shown in the
calculations in the following table. The results show that at a higher level of , any additional
year of education has a slightly smaller impact on .


02 12 02 12
1.39232 1 1 1 1
(2 ) 0.09494 16 17 16 17
(2 ) 0.00633 5 5 10 10
(2 3 ) -0.000036 80 85 160 170

ln () 2.94 3.03 2.97 3.06
18.917 20.797 19.469 21.400
1.880 1.931
( )% 9.94% 9.92%

The effect of another year of , holding constant, is

1
= 3 + 4 2
3

Holding constant at 2 = 8,

Chapter 6Multiple Regression 16 of 26


3
= 0.00633 0.000036(8) = 0.00604 (0.604%)

Holding constant at 2 = 16,

3
= 0.00633 0.000036(16) = 0.00575 (0.575%)

The results of a discrete change in the variable , where 3 = 1, are shown the calculation in
the following table.


0 1 0 1
1.39232 1 1 1 1
(2 ) 0.09494 8 8 16 16
(2 ) 0.00633 10 11 10 11
(2 3 ) -0.000036 80 88 160 176

ln () 2.212 2.218 2.969 2.975
9.136 9.192 19.469 19.585
0.055 0.116
( )% 0.617% 0.598%

The greater the number of years of education, the less valuable is an extra year of experience.

8. Measuring Goodness of Fit


The coefficient of determination 2 in a multiple regression measures the combined effects of all independent
variables on . It simply measures the proportion of total variation in that is explained by the regression
model.

( )2
2 = =
( )2

These quantities are easily calculated and they are also shown in the ANOVA part of the regression output for
the example.

ANOVA
df SS
Regression 2 1396.539
Residual 72 1718.943
Total 74 3115.482

1396.539
2 = = 0.4483
3115.482

This implies that nearly 45% of the variations in (monthly sales) is explained by the variations in price and
advertising expenditure.

In Chapter 3 it was shown that 2 is a measure of goodness of fit, that is, how well the estimated regression
2
fits the data: 2 = . The same argument applies here. A high value means there is a close association
2

between the predicted and observed values of .

Chapter 6Multiple Regression 17 of 26


8.1. Adjusted
In multiple regression 2 is affected by the number of independent variables. As we add more explanatory
variables to the model, 2 will increase. This would artificially improve the model. To see this, consider the
regression model

= 1 + 2 2 + 3 3

According to the formula

( )2
2 = 1
( )2

Now, if we add another variable to the regression model, then the quantity = ( )2 becomes smaller
and 2 = 1 becomes larger.

An alternative measure devised to address this problem is the adjusted 2 : 1

( )
2 = 1
( 1)

For our example,

1718.943 72
RA2 1 = 0.4329
3115.482 74

This figure is shown in the computer regression output right below the regular 2 .

1Note that 2 = 1 . Divide the numerator and denominator of the quotient on the right-hand-side by their
respective degrees of freedom. Thus,

( )
2 = 1
( 1)

To show why 2 does not increase with the increase in (the number of independent variables), we can rewrite it as:

1 1
2 = 1 = 1 (1 2 )

As increases, the negative adjustment on the right-hand-side rises, reducing 2 .

Chapter 6Multiple Regression 18 of 26


Appendix

A Brief Introduction to Matrix Algebra


1. Introduction
1.1. The Algebra of Matrices
1.1.1. Addition and subtraction of matrices
1.1.1.1. Scalar Multiplication
1.1.1.2. Matrix Multiplication
1.1.2. Identity Matrices
1.1.3. Transpose of a Matrix
1.1.4. Inverse of a Matrix
1.1.4.1. How to Find the Inverse of a Matrix
1.1.4.1.1. The Determinant of a Matrix
1.1.4.1.2. Properties of Determinants
1.1.5. How to use the Inverse of Matrix A to find the solutions for an equation system

1. Introduction
Matrix algebra enables us to:
write an equation system in a compact way,
develop a method to test the existence of a solution by evaluation of a determinant, and
devise a method to find the solution.

For example, consider the following equation system with three variables 1 , 2 , and 3 :

61 + 32 + 13 = 22
11 + 42 23 = 12
41 12 + 53 = 10

This equation system can be written in the matrix format as follows:

6 3 1 1 22
[1 4 2] [2 ] = [12]
4 1 5 3 10

The lead matrix on the left hand side is the coefficient matrix, denoted by . The lag matrix is the variable
matrix and the matrix on the right hand side is the matrix of the constant terms, .

6 3 1 1 22
= [1 4 2]
= [ 2] = [12]
4 1 5 3 10

Thus, the short hand version of an equation system is:

Any given matrix can be defined by its dimension: The number of rows () and number of columns (). The
dimension of is 3 3, that of and are both 3 1.

1.1. The Algebra of Matrices


1.1.1. Addition and subtraction of matrices
Two matrices can be added if and only if they are conformable for addition. That is, they have the same
dimension. For example:

Chapter 6Multiple Regression 19 of 26


4 5 6 10 4+6 5 + 10 10 15
=[ ] =[ ] + =[ ]=[ ]
6 12 3 15 6+3 12 + 15 9 27

6 8 8 5 68 85 2 3
=[ ] =[ ] =[ ]=[ ]
12 19 3 11 12 3 19 11 9 8

1.1.1.1. Scalar Multiplication


When every element of a matrix is multiplied by a number (scalar), then we are performing a scalar
multiplication.

4 5 16 20
4[ ]=[ ]
6 12 24 48

1.1.1.2. Matrix Multiplication


To multiply two matrices, they must be conformable for multiplication. This requires that the number of
columns of the lead matrix be equal to the number of rows of the lag matrix. For example, let dimension of
be 3 2 and that of , 2 1, then and are conformable for multiplication because has two columns and
has 2 rows. The resulting product matrix will have a dimension of 3 1.

A B AB
(32) ( 21) (31)

The following example shows how the multiplication rule applies to two matrices.

11 12 11 11 + 12 21
= [21 22 ] = [11 ] = [ 21 11 + 22 21 ]
31 32 21
31 11 + 32 21

5 4 5(7) + 4(5) 55
7
= [6 1] =[ ] = [6(7) + 1(5)] = [47]
5
2 9 2(7) + 9(5) 59

Note that if is used as the lead matrix and as the lag matrix, the two are no longer conformable for
multiplication. : 2 1 and : 3 2. The number of columns of the lead matrix is not the same as the
number of rows of the lag matrix. Even if switching the lead and lag matrices preserved the conformability,
still the resulting product matrix would not be the same. That is: . Matrix multiplication is not
commutative.

Another example,

2 5
1 2 1
A =[ ] B =[ 4 3]
23 3 1 4 32
2 1
13 1 22
4 2
AB = [ ] BA = [5 5 16]
22 6 16 33
5 5 2

Also note that we have used the matrix multiplication rule to write an equation system in the matrix format:

6 3 1 1 22 61 + 32 + 13 = 22
[1 4 2] [2 ] = [12] 11 + 42 23 = 12
4 1 5 3 10 41 12 + 53 = 10

Chapter 6Multiple Regression 20 of 26


A x d
(33) (31) (31)

1.1.2. Identity Matrices


An identity matrix is a matrix with the same number of rows and columns (a square matrix) which has 1s in
its principal diagonal and 0s everywhere else. The identity matrix is denoted by . The numeric subscript, if
shown, indicates the dimension.

1 0 0
1 0
2 = [ ] 3 = [0 1 0]
1 0
0 0 1

Pre or post multiplying a matrix by an identity matrix leaves the matrix unchanged.

= =

Example

2 4 6
= [3 2 3]
1 4 9

2 4 6 1 0 0 2 4 6
= [3 2 3] [0 1 0] = [3 2 3] =
1 4 9 0 0 1 1 4 9

1 0 0 2 4 6 2 4 6
= [0 1 0] [3 2 3] = [3 2 3] =
0 0 1 1 4 9 1 4 9

1.1.3. Transpose of a Matrix


A matrix is transposed by interchanging its rows and columns. The transpose of the matrix is denoted by
.

11 12
11 21 31
= [21 22 ] = [ 22 32 ]
12
31 32

For example,

5 4
5 6 2
= [6 1] = [ ]
4 1 9
2 9

1.1.4. Inverse of a Matrix


The inverse of the square matrix (if it exists) is another matrix, denoted by 1 , such that if is pre or post
multiplied by 1 , the resulting product is an identity matrix.

1 = 1 =

If does not have an inverse, then it is called a singular matrix. Otherwise it is a nonsingular matrix.

Chapter 6Multiple Regression 21 of 26


1.1.4.1. How to Find the Inverse of a Matrix
Finding the inverse of a matrix is a complicated process. First we must understand several concepts required
in determining the inverse of a matrix.

1.1.4.1.1. The Determinant of a Matrix


The determinant of a matrix is a scalar quantity (a number) and is denoted by ||. It is obtained by
summing various products of the elements of . For example, the determinant of a 2 2 matrix is defined to
be:

12
|| = |11 22 | = 11 22 12 21
21

The determinant of a 3 3 matrix is obtained as follows:

11 12 13
23 21 23 21 22
|| = |21 22 23 | = 11 | 22
32 33 | 12 |31 33 | + 13 |31 32 |
31 32 33

|| = 11 22 33 11 23 32 12 21 33 + 12 23 31 + 13 21 32 13 22 31

In the latter case, each element in the top row is multiplied by a sub determinant. The first sub determinant

22 23
| 33 |
32

multiplied by 11 , is obtained by eliminating the first row and the first column. The sub determinant
associated with 11 is called the minor of that element and is denoted by 11 . The minor of 11 , 11 , is
obtained by eliminating the first row and second column, and so on. If the sum of the subscript of the element
is odd, then the sign is negative. The calculation of the determinant of A then can be presented as:

|| = 11 11 12 12 + 13 13

2 1 3
5 6 4 6 4 5
|4 5 6| = 2 | || | + 3| | = 2(5 9 6 8) (4 9 6 7) + 3(4 8 5 7) = 9
8 9 7 9 7 8
7 8 9

Here the determinant is obtained by expanding the first row. The same determinant can be found by
expanding any other row or column. In the previous example, find the determinant by expanding the third
column:

|| = 13 13 23 23 + 33 33

2 1 3
4 5 2 1 2 1
|4 5 6| = 3 | | 6| | + 9| | = 3(4 8 5 7) 6(2 8 1 7) + 9(2 5 1 4) = 9
7 8 7 8 4 5
7 8 9

Now lets introduce a related concept to the minor, called the cofactor. A cofactor, denoted by , is a minor
with a prescribed algebraic sign attached. If the sum of the two subscripts in the minor is odd then the
cofactor is negative:

= (1)(+)

In short, the value of determinant || of order can be found by the expansion of any row or any column as
follows:

Chapter 6Multiple Regression 22 of 26


|| = [expansion by the th row]


=1

|| = [expansion by the th column]


=1

For example, for n = 3,


Expansion by the first row: || = 1 1 = 11 |11 | + 12 |12 | + 13 |13 |


=1

Expansion by the first column: || = 1 1 = 11 |11 | + 21 |21 | + 31 |31 |


=1

1.1.4.1.2. Properties of Determinants


1) Transposing a matrix does not affect the value of the determinant: || = ||

1 2 1 3
|| = | | = 1 4 2 3 = 2 || = | | = 1 4 2 3 = 2
3 4 2 4

2) The interchange of any two rows (or any two columns) will alter the sign, leaving the absolute value
of the determinant unchanged.

1 2 3 4
| | = 1 4 2 3 = 2 | |= 3241 =2
3 4 1 2

3) Multiplying any one row (or any one column) by a scalar will change the value of the determinant by
the multiple of that scalar.

1 2 15 25
| | = 1 4 2 3 = 2 | | = 5 4 10 3 = 10
3 4 3 4

4) The addition (subtraction) of a multiple of any row (or column) to another row (or column) will leave
the value of the determinant unchanged.

1 2
In the determinant| |, multiply the first row by 3 and add to the second row:
3 4

1 2 1 2
| |=| | = 2
33 46 0 2

5) If one row (or column) is a multiple of another row (or column), the value of the determinant is zero;
the determinant will vanish. In other words, if two rows (or columns) are linearly dependent, the
determinant vanishes.

In the following example the second row is the first row multiplied by 4.

1 2
| |=824 =0
4 8

A very important conclusion relating the existence of the determinant to the existence of a unique
solution for an equation system:

Chapter 6Multiple Regression 23 of 26


Consider the equation system shown at beginning of the discussion of matrices:

61 + 32 + 13 = 22
11 + 42 23 = 12
41 12 + 53 = 10

The coefficient matrix of the equation system is

6 3 1
= [1 4 2]
4 1 5

This equation system has a unique solution because there is a non-vanishing determinant || = 52.

The matrix A is a nonsingular matrix.

Now consider the following equation system:

161 + 32 + 13 = 22
121 + 62 + 23 = 11
141 12 + 53 = 10

The determinant of the coefficient matrix is

6 3 1
|| = |12 6 2| = 6(6 5 + 2) 3(12 5 2 4) + (1 12 6 4) = 0
4 1 5

The determinant vanishes because rows 1 and 2 are linearly dependent: The second row is first row
multiplied by 2. Thus the equation system will not have a unique solution.

6) The expansion of the determinant by alien cofactors (the cofactor of a wrong row or column)
always yields a value of zero.

Expand the following determinant by using the first row elements but the cofactors of the second row
elements

6 3 1
= |1 4 2|
4 1 5

11 = 6 21 = [3 5 (1) 1] = 16 11 21 = 96
12 = 3 22 = (6 5 1 4) = 26 12 22 = 78
12 = 1 23 = [6 (1) 3 4] = 18 13 23 = 18

= 11 21 + 12 22 + 13 23 = 96 + 78 + 18 = 0
=1

This last property of the determinants finally leads us to the method to find the inverse of a determinant.
Finding the inverse of A involves the following steps:

11 12 13
= [21 22 23 ]
31 32 33

Chapter 6Multiple Regression 24 of 26


1) Replace each element of A by its cofactors .

11 12 13
= [21 22 23 ]
31 32 33

2) Find the transpose of . This transpose matrix, , is called the matrix of .

11 21 31
adjoint = [12 22 32 ]
13 23 33

3) Multiply by adjoint

11 12 13 11 21 31 1 1 1 2 1 3
= [21
22 23 ] [12 22 32 ] = [2 1 2 2 2 3 ]
31 32 33 13 23 33 3 1 3 2 3 3

Now note that the elements in principal diagonal of the product matrix simply provide the determinant of .
All other elements outside the principal diagonal are the expansions of the determinant by an alien cofactor.
Thus, they are all zeros.

1 1 1 2 1 3 || 0 0 1 0 1
[2 1 2 2 2 3 ] = [ 0 || 0 ] = || [0 1 0] = ||
3 1 3 2 3 3 0 0 || 0 0 1

Thus,

= ||

Now divide both sides of the equation by the determinant ||

=
||

Pre multiply both sides by 1 ,

1
= 1
||

Since 1 = , = , and 1 = 1 , then:


1 =
||

The inverse of matrix A is obtained by dividing the A by the determinant.

6 3 1
Find the inverse of = [1 4 2]
4 1 5

First find the ||

Chapter 6Multiple Regression 25 of 26


4 2 1 2 1 4
|| = 6 | | 3| |+| | = 6(18) 3(13) + (17) = 52
1 5 4 5 4 1

Next find the cofactor matrix

4 2 1 2 1 4
| | | | | |
1 5 4 5 4 1 18 13 17
3 1 6 1 6 3
= | | | | | | = [16 26 18]
1 5 4 5 4 1
3 1 6 1 6 3 10 13 21
[ |4 2| |
1 2
| |
1 4
|]

Find adjoint

18 16 10
= [13 26 13]
17 18 21

1 1 18 16 10
1 = = [13 26 13]
|| 52
17 18 21

With each element in 1 rounded to three decimal points,

0.346 0.308 0.192


1 = [0.250 0.500 0.250]
0.327 0.346 0.404

1.1.5. How to use the Inverse of Matrix A to find the solutions for an equation system
Note that in the previous example the matrix A was the coefficient matrix of the equation system

61 + 32 + 13 = 22 6 3 1 1 22
11 + 42 23 = 12 [1 4 2] [2 ] = [12]
41 12 + 53 = 10 4 1 5 3 10

Now pre multiply both sides of the matrix notation of the equation system by 1 :

1 = 1

which results in

= 1

1 0.346 0.308 0.192 22 2


[2 ] = [0.250 0.500 0.250] [12] = [3]
3 0.327 0.346 0.404 10 1

Thus, 1 = 2, 2 = 3, and 3 = 1.

Chapter 6Multiple Regression 26 of 26

You might also like