You are on page 1of 66

MANG6008 Quantitative Research Methods in Finance

Overview of Classical
Linear Regression
Dr Raju Chinthalapati
Email: v.l.r.chinthalapati@soton.ac.uk
Office hour: Wednesday (15.00-16.00), Thursday (14.00-15.00),
Friday (13.00-14.00) in 2/3063
Learning Objectives
•  To understand what regression analysis can do.

•  To form a regression equation.

•  To explain the properties of various components of


regression.

2
What is regression?
•  Regression is probably the single most important tool at the
econometrician’s disposal.

•  What is regression analysis?


–  It is concerned with describing and evaluating the
relationship between a given variable (usually called the
dependent variable) and one or more other variables (usually
known as the independent variable(s)).

Denote the dependent variable by y and the independent


variable(s) by x1, x2, ... , xk where there are k independent
variables.
3
Regression is different from correlation?
•  If we say y and x are correlated, it means that we are
treating y and x in a completely symmetrical way: causality
runs from both directions. There is little role of
uncertainty.

•  In regression, we treat the dependent variable (y) and the


independent variable(s) (x’s) very differently. The y
variable is assumed to be random or “stochastic” in some
way, i.e. to have a probability distribution. The x variables
are, however, assumed to have fixed (“non-stochastic”)
values in repeated samples.

•  In regression, direction of causality is important,


4
uncertainty also plays a big role.
Simple Regression
•  There can be many x variables. For simplicity, assume we
will limit ourselves to the case where there is only one x
variable to start with. This is the situation where y depends
on only one x variable.

•  Examples of the kind of relationship that may be of interest


include:
–  How asset returns vary with their level of market risk?
–  Measuring the long-term relationship between stock prices
and dividends.
–  How teaching quality can affect students’ performance.
5
An Example from Finance
•  Suppose that we have the following data on the excess returns on
a fund manager’s portfolio (fund XXX) together with the excess
returns on a market index:
Year t Excess return (Ri-Rf): Y Market premium (Rm-Rf): X
1 17.8 13.7
2 39.0 23.2
3 12.8 6.9
4 24.2 16.8
5 17.2 12.3

•  We have some intuition that the beta on this fund is positive, and
we therefore want to find whether there appears to be a
relationship between X and Y given the data that we have. The
6
first stage would be to form a scatter plot of the two variables.
Graph (Scatter Diagram)
45
40
Excess return on fund XXX

35
30
25
20
15
10
5
0
0 5 10 15 20 25
Excess return on market portfolio

7
How to model this?
•  We need to use what we call the Line of Best Fit.

•  We can use the general equation for a straight line to get the line
that best “fits” the data.
y = a + bx

•  However, this equation (y=a+bx) is completely deterministic.

•  Is this realistic? No. So what we do is to add a random


disturbance term, u into the equation.
yt = α + β xt + ut
where t = 1,...,5
8
Why do we include a disturbance term?
•  The disturbance term can capture a number of features:
–  We always leave out some determinants of yt
–  There may be errors in the measurement of yt that
cannot be modelled.
–  Random outside influences on yt which we cannot
model.

9
Determining the Regression Coefficients
•  So how do we determine what α and β are?

•  Choose α and β so that the (vertical) distances from the data


points to the fitted lines are minimised (so that the line fits the
data as closely as possible): y

x 10
Ordinary Least Squares
•  The most common method used to fit a line to the data is
known as OLS (ordinary least squares).

•  What we actually do is take each distance and square it (i.e.


take the area of each of the squares in the diagram) and
minimise the total sum of the squares (hence least
squares).

•  Tightening up the notation, let


–  ​𝑦↓𝑡  denote the actual data point t
–  ​𝑦 ↓𝑡  denote the fitted value from the regression line
–  ​𝑒↓𝑡  denote the residual, ​𝑦↓𝑡 −​𝑦 ↓𝑡  11
Actual and Fitted Value
y

yi
yt

ûe
i t

ŷiyˆ t

12
xi x
How OLS works?
•  So min(​𝑒↓1↑2 +​𝑒↓2↑2 +​𝑒↓3↑2 +​𝑒↓4↑2 +​𝑒↓5↑2 ), or
minimise ∑𝑡=1↑5▒​𝑒↓𝑡↑2  . This is known as the residual
sum of squares, or RSS.

•  But what was ​𝑒↓𝑡↑2 ? It was the difference between the


actual point and the line,​y↓t −​𝑦 ↓𝑡 .

•  So minimising ∑↑▒​(​ y↓t −​𝑦 ↓𝑡 )↑2   is equivalent to


minimising ​𝑒↓𝑡↑2 with respect to ​𝛼  and ​𝛽 .

13
Simple regression model: The Mechanism
•  To begin with, we will draw the fitted line so as to minimize
the sum of the squares of the residuals, RSS. This is
described as the least squares criterion.
n
RSS = ∑ ei2 = e12 + ... + en2
i =1
•  Can we just minimize the following?
n

∑e
i =1
i = e1 + ... + en

14
Deriving linear regression coefficients
•  This sequence shows how the regression coefficients for a simple
regression model are derived.

(3, 6)
(2, 5)

(1, 3)

15
Deriving linear regression coefficients
•  We will determine the values of b1 and b2 that minimize
RSS:

(3, 6)
(2, 5)

(1, 3)

16
Deriving linear regression coefficients
•  Given our choice of b1 and b2, the residuals are as shown.

(3, 6)
(2, 5)

(1, 3)

17
Deriving linear regression coefficients
•  For a minimum, the partial derivatives of RSS with respect to b1
and b2 should be zero. (We should also check a second-order
condition.)
RSS = e12 + e22 + e32 = (3 − b1 − b2 )2 + (5 − b1 − 2b2 ) 2 + (6 − b1 − 3b2 ) 2
= 70 + 3b12 + 14b22 − 28b1 − 62b2 + 12b1b2
⎧ ∂RSS
⎪⎪ ∂b ⇒ 6b1 + 12b2 − 28
1
⎨ ∂RSS
⎪ ⇒ 12b1 + 28b2 − 62
⎪⎩ ∂b2

18
Deriving linear regression coefficients
•  When we establish the expression for RSS, we do so as a function
of b1 and b2. At this stage, b1 and b2 are not specific values.
Our task is to determine the particular values that minimize
RSS.
RSS = e 2 + e 2 + e 2 = (3 − b − b )2 + (5 − b − 2b ) 2 + (6 − b − 3b ) 2
1 2 3 1 2 1 2 1 2

= 70 + 3b12 + 14b22 − 28b1 − 62b2 + 12b1b2


⎧ ∂RSS OLS OLS
⎪⎪ ∂b = 0 ⇒ 6 b1 + 12 b2 − 28 = 0
1
⎨ ∂RSS
⎪ = 0 ⇒ 12b1OLS + 28b2OLS − 62 = 0
⎪⎩ ∂b2
⎧ b1OLS = 1.67
⎨ OLS
⎩ b2 = 1.50 19
Deriving linear regression coefficients
•  Now for the case of having n observations instead of just 3
in the previous case.

20
Deriving linear regression coefficients
•  The residual for the first observation is defined.

21
Deriving linear regression coefficients
•  Similarly we define the residuals for the remaining
observations. That for the last one is marked.

22
Deriving linear regression coefficients
RSS = e12 + ... + en2 = (Y1 − b1 − b2 X 1 ) 2 + ... + (Yn − b1 − b2 X n ) 2
= Y12 + b12 + b22 X 12 − 2b1Y1 − 2b2 X 1Y1 + 2b1b2 X 1
+...
+ Yn2 + b12 + b22 X n2 − 2b1Yn − 2b2 X nYn + 2b1b2 X n
= ∑Yi 2 + nb12 + b22 ∑ X i2 − 2b1 ∑Yi − 2b2 ∑ X iYi + 2b1b2 ∑ X i

23
Deriving linear regression coefficients
RSS = ∑Yi 2 + nb12 + b22 ∑ X i2 − 2b1 ∑Yi − 2b2 ∑ X iYi + 2b1b2 ∑ X i
∂RSS
= 0 ⇒ 2nb1 − 2∑Yi + 2b2 ∑ X i = 0
∂b1
⇒ nb1 = ∑Yi −b2 ∑ X i
⇒ b1 = Y − b2 X

24
Deriving linear regression coefficients
RSS = ∑Yi 2 + nb12 + b22 ∑ X i2 − 2b1 ∑Yi − 2b2 ∑ X iYi + 2b1b2 ∑ X i
∂RSS
= 0 ⇒ 2b2 ∑ X i2 − 2∑ X iYi + 2b1 ∑ X i = 0
∂b2
⇒ b2 ∑ X i2 − ∑ X iYi + (Y − b2 X )∑ X i = 0
⇒ b2 ∑ X i2 − ∑ X iYi + (Y − b2 X )nX = 0
⇒ b2 ∑ X i2 − ∑ X iYi + nXY − nb2 X 2 = 0
⇒ b2 ( ∑ X i2 − nX 2 ) = ∑ X iYi − nXY

∑ X Y − nXY
i i
⇒ b2 = 2 2
∑ X − nX
i
25
Deriving linear regression coefficients
•  We might use the alternative expression of b2
∑ X Y − nXY
i i
b =
2 2 2
∑ X − nXi

⇒b =
∑ ( X − X )(Y − Y )
i i
2 2
∑( X − X ) i

∑ (X i − X )(Yi − Y ) = ∑ X iYi − ∑ X iY − ∑ XYi + ∑ XY


= ∑ X iYi − Y ∑ X i − X ∑ Yi + nXY
= ∑ X iYi − Y (nX ) − X (nY ) + nXY
= ∑ X iYi − nXY 26
Deriving linear regression coefficients
•  We might use the alternative expression of b2
∑ X Y − nXYi i
b =
2 2 2
∑ X − nX i

⇒b =
∑ ( X − X )(Y − Y )
i i
2 2
∑( X − X ) i
2
∑ ( X − X ) = ∑ ( X − 2 XX + X )
i i
2
i
2

2 2
= ∑ X − 2X ∑ X + ∑ X
i i
2 2
= ∑ X − 2 X ( nX ) + nX
i
2 2 27
= ∑ X − nX
i
Deriving linear regression coefficients
•  These are the particular values of b1 and b2 that minimize RSS, and we
should differentiate them from the rest by giving them special names,
for example b1OLS and b2OLS. Matrix Expression of OLS

28
Simple model with no intercept

ei = Yi − Yˆi = Yi − b2 X i
2
RSS = ∑ (Yi − b2 X i ) = ∑ Yi 2 − 2b2 ∑ X iYi + b22 ∑ X i2

dRSS
= 2b2 ∑ X i2 − 2∑ X iYi = 0
db2
⇒ 2b2OLS ∑ X i2 − 2∑ X iYi = 0

OLS ∑ XY
i i
⇒b 2 = 2
∑X i
29
An Example from Finance
•  Suppose that we have the following data on the excess returns on
a fund manager’s portfolio (fund XXX) together with the excess
returns on a market index:
Year t Excess return (Ri-Rf): Y Market premium (Rm-Rf): X
1 17.8 13.7
2 39.0 23.2
3 12.8 6.9
4 24.2 16.8
5 17.2 12.3

•  We have some intuition that the beta on this fund is positive, and
we therefore want to find whether there appears to be a
relationship between X and Y given the data that we have. The
30
first stage would be to form a scatter plot of the two variables.
Estimation of coefficients for CAPM example
•  In the CAPM example used above, plugging the 5 observations in
to make up the formula given above would lead to the estimates
αˆ = −1.74 and βˆ = 1.64
Then we have the fitted line as: yˆ t = −1.74 + 1.64 xt
•  Question: If an analyst tells you that she expects the market to
yield a return 20% higher than the risk-free rate next year, what
would you expect the return on fund XXX to be?

•  Solution: We can say that the expected value of y = “-1.74 + 1.64×


value of x”, so plug x = 20% into the equation to get the expected
value for y:
yˆ t = −1.74 + 1.64 × 20% = −1.41 31
Estimator Vs Estimate
•  Estimators are the formula used to calculate the
coefficients, for example,

bOLS
=
∑ ( X − X )(Y − Y )
i i
2 2
∑( X − X )
i

•  Estimates are the actual numerical values for the


coefficients, for example,

αˆ = −1.74 and βˆ = 1.64

32
Accuracy of Intercept Estimate
•  Care needs to be exercised when considering the intercept
estimate, particularly if there are no or few observations
close to the y-axis:
y

0 x
33
The Population and the Sample
•  The population is the total collection of all objects or people
to be studied.

•  A sample is a selection of just some items from the


population.

•  A random sample is a sample in which each individual item


in the population is equally likely to be drawn.

34
The DGP and the PRF
•  The population regression function (PRF) is a description of the
model that is thought to be generating the actual data and the
true relationship between the variables (i.e. the true values of α
and β).

•  The PRF is yt = α + β xt + ut

•  The sample regression function (SRF) is yˆ t = αˆ + βˆxt

and we also know that et = yt − yˆ t

•  We use the SRF to infer likely values of the PRF.

•  We also want to know how “good” our estimates of α and β are.


35
Linearity
•  In order to use OLS, we need a model which is linear in the
parameters (α and β ). It does not necessarily have to be linear
in the variables (y and x).

•  Linear in the parameters means that the parameters are not


multiplied together, divided, squared or cubed etc.

•  Some models can be transformed to linear ones by a suitable


substitution or manipulation, e.g. the exponential regression
model
Yt = eα X tβ e ut ⇔ ln Yt = α + β ln X t + ut

•  Then let yt=ln Yt and xt=ln Xt


yt = α + β xt + ut 36
Linear and Non-linear Models
•  This is known as the exponential regression model. Here, the
coefficients can be interpreted as elasticities.

•  Similarly, if theory suggests that y and x should be inversely


related:
β
yt = α + + u t
xt

then the regression can be estimated using OLS by substituting


1
zt =
xt
•  But some models are intrinsically non-linear, e.g.
β
yt = α + xt + ut
37
The Assumptions Underlying the
Classical Linear Regression Model (CLRM)
•  We observe data for xt, but since yt also depends on ut, we must be specific
about how the ut are generated.

•  We usually make the following set of assumptions about the ut’s (the
unobservable error terms):

•  Technical Notation Interpretation

1. E(ut) = 0 The errors have zero mean

2. Var (ut) = σ2 The variance of the errors is constant and finite over all
values of xt

3. Cov (ui, uj)=0 The errors are statistically independent of one another
4. Cov (ut, xt)=0 No relationship between the error and corresponding x
variate 38
The Assumptions Underlying the CLRM
•  An alternative assumption to 4., which is slightly stronger,
is that the xt’s are non-stochastic or fixed in repeated
samples, or E(ut|xt)=0 for all t.

•  A fifth assumption is required if we want to make


inferences about the population parameters (the actual
αandβ ) from the sample parameters (​𝛼  and ​𝛽 ), that is:
“5. ut is normally distributed”

39
Properties of the OLS Estimator
•  If assumptions 1. through 4. hold, then the estimators ​𝛼  and ​𝛽
 determined by OLS are known as Best Linear Unbiased
Estimators (BLUE). What does the acronym stand for?

•  “Best” - means that the OLS estimator ​𝛽 has minimum


variance among the class of linear unbiased estimators. The
Gauss-Markov theorem proves that the OLS estimator is best.

•  “Linear” - ​𝛽 is a linear estimator.

•  “Unbiased” - On average, the actual value of ​𝛼  and ​𝛽  will be


equal to the true values.

•  “Estimator” - ​𝛽 is an estimator of the true value of β.


40
Consistency/Unbiasedness/Efficiency
•  Consistent

The least squares estimators ​𝛼  and ​𝛽 are consistent. That is, the estimates
will converge to their true values as the sample size increases to infinity.
Need the assumptions Cov(xtut)=0 and Var(ut)=σ2<∞ to prove this.
Consistency implies that
limPr ⎡ βˆ − β > δ ⎤ = 0 ∀δ > 0
•  Unbiased T →∞ ⎣ ⎦

The least squares estimates of ​𝛼  and ​𝛽  are unbiased. That is E(​𝛼 )=𝛼
and E(​𝛽 )=𝛽.Thus on average the estimated value will be equal to the
true values. To prove this also requires the assumption that E(ut)=0.
Unbiasedness is a stronger condition than consistency. (Proof)

•  Efficiency

An estimator ​𝛽 of parameter β is said to be efficient if it is unbiased and no


other unbiased estimator has a smaller variance. If the estimator is 41
efficient, we are minimising the probability that it is a long way off from
Precision and Standard Errors
•  Any set of regression estimates of ​𝛼  and ​𝛽  are specific to the sample
used in their estimation.

•  Recall that the estimators of αandβ from the sample parameters (​𝛼  and
​𝛽 ) are given by ˆ ∑ ( xt − x )( yt − y )
β= 2
and αˆ = y − βˆ x
∑ ( xt − x )
•  What we need is some measure of the reliability or precision of the
estimators (​𝛼  and ​𝛽 ). The precision of the estimate is given by its
standard error. Given assumptions 1 - 4 above, then the standard
errors can be shown to be given by x 2
x
2

SE (αˆ ) = s
∑ t =s ∑ t ,
T ∑ ( xt − x )2 T ∑ xt2 − T 2 x 2
1 1
SE ( βˆ ) = s 2
=s 2 2
(
∑ tx − x ) x
∑t − Tx
42
Estimating the Variance of the Disturbance Term
•  The variance of the random variable ut is given by
Var(ut) = E[(ut)-E(ut)]2
which reduces to
Var(ut) = E(ut2)
•  We could estimate this using the average of ​𝑢↓𝑡↑2 :
1
σ2 = u
∑t 2

•  Unfortunately this is not workable since ut is not observable. We can


use the sample counterpart to ut, which is ​𝑢 ↓𝑡 :
1
s = ∑ uˆt2
2

T
•  But this estimator is a biased estimator of σ2. 43
Estimating the Variance of the Disturbance Term
•  An unbiased estimator of σ is given by
2

s=
∑ uˆ
t

T −2

where ∑​𝑢 ↓𝑡↑2  is the residual sum of squares (RSS) and T is the sample
size, so:
RSS
s=
T −2

44
Some Comments on the Standard Error Estimators

•  Both SE(​𝛼 ) and SE(​𝛽 ) depend on s2 (or s). The greater the variance s2,
then the more dispersed the errors are about their mean value and
therefore the more dispersed y will be about its mean value.

•  The sum of the squares of x about their mean appears in both formula.
The larger the sum of squares, the smaller the coefficient variances.

2 2

SE (αˆ ) = s
∑x t
=s
∑ xt ,
2 2 2 2
T ∑(x − x )
t T∑x −T x
t

1 1
SE ( βˆ ) = s 2
=s
∑ ( xt − x ) ∑ xt2 − Tx 2

45
Some Comments on the Standard Error Estimators

Example: Consider what happens if ∑​(​𝑥↓𝑡 −​𝑥 )↑2  is small or large:

y
y

y
y

x x
0 x 0 x

46
Some Comments on the Standard Error Estimators
•  The larger the sample size, T, the smaller will be the coefficient
variances.
2 2

SE (αˆ ) = s
∑ xt = s ∑ xt ,
T ∑ ( xt − x )2 T ∑ xt2 − T 2 x 2
1 1
SE ( βˆ ) = s 2
=s
∑ ( xt − x ) ∑ xt2 − Tx 2
•  The term ∑​𝑥↓𝑡↑2 appears in the SE(​𝛼 ) .

The reason is that ∑​𝑥↓𝑡↑2  measures how far the points are away from
the y-axis.

47
Example: How to Calculate the Parameters and
Standard Errors
•  Assume we have the following data calculated from a regression of y on
a single variable x and a constant over 22 observations.

•  Data:
∑ x y = 830102, T = 22, x = 416.5,
t t y = 86.65,
2
∑ x = 3919654, RSS = 130.6
t

•  Calculations: βˆ = ∑ x y − Txy = 830102 − 22 × 416.5 × 86.65 =0.35


t t
2 2 2
∑ x − Tx t 3919654 − 22 × 416.5

αˆ = y − βˆ x = 86.65 − 0.35 × 416.5 = −59.12


•  And we have

yˆ t = −59.12 + 0.35xt
48
Example (cont’d)
•  SE(regression):
RSS 130.6
s= = = 2.55
T −2 22 − 2
2

SE (αˆ ) = s
∑x t
= 2.55 ×
3919654
= 3.35
T ∑ xt2 − T 2 x 2 ( 22 × 3919654 ) − ( 22 × 416.5 )
2

1 1
SE ( βˆ ) = s 2 2
= 2.55 × = 0.0079
x
∑t − Tx 3919654 − ( 22 × 416.5 )
2

•  We now write the results as: yˆ t = − 59.12 + 0.35 xt


(3.35) (0.0079)
49
Goodness of fit
•  We would like some measure of how well our regression model actually
fits the data.

•  We have goodness of fit statistics to test this: i.e. how well the sample
regression function (SRF) fits the data.

•  The most common goodness of fit statistic is known as ​𝑅↑2 , it is the


ratio of the explained variation compared to the total variation.

2 RSS
R = 1− 2
(
∑ ty − y )

50
Defining ​𝑅↑2 
•  Defining total sum of squares (TSS) to measure of the total sample
variation. TSS = ( y − y )2 ∑ t

•  We can split the TSS into two parts, the part which we have explained
(known as the explained sum of squares, ESS) and the part which we
did not explain using the model (the RSS).
2 2
(
∑ ty − y ) = [( y
∑ t t t
− ˆ
y ) + ( ˆ
y − y )]
= ∑ [uˆt + ( yˆ t − y )]2
ESS RSS
= ∑ uˆt2 + 2∑ uˆt ( yˆ t − y ) + ∑ ( yˆ t − y )2 R 2 = = 1−
TSS TSS
Let ESS = ∑ ( yˆ t − y ) 2
TSS = RSS + 2∑ uˆt ( yˆ t − y ) + ESS
TSS = RSS + ESS 51
The Limit Cases: ​𝑅↑2 =0 and ​𝑅↑2 =1
yt
yt

xt xt

52
Problems with ​𝑅↑2 
•  ​𝑅↑2  is defined in terms of variation about the mean of y so that if a
model is reparameterised (rearranged) and the dependent variable
changes, ​𝑅↑2  will change.

•  ​𝑅↑2  never falls if more regressors are added to the regression, for
example, Regression 1: yt = β1 + β 2 x2t + β3 x3t + ut
Regression 2: yt = β1 + β2 x2t + β3 x3t + β4 x4t + ut

•  ​𝑅↑2 will always be at least as high for regression 2 relative to


regression 1.

•  ​𝑅↑2  quite often takes on values of 0.9 or higher for time series
regressions.
53
Adjusted ​𝑅↑2 
•  In order to get around these problems, a modification is often made
which takes into account the loss of degrees of freedom associated with
adding extra variables. This is known as ​𝑅 ↑2 , or adjusted ​𝑅↑2 :
T −1
R = 1−[
2
(1 − R 2 )]
T −k
where k is the number of regressors (including the constant)

•  So if we add an extra regressor, k increases and unless R2 increases by a


more than offsetting amount, ​𝑅 ↑2 will actually fall.

•  There are still problems with the criterion:


–  A “soft” rule.
–  No distribution for ​𝑅 ↑2  or ​𝑅↑2 .
54
Supplementary Reading

Chapter 1 and 2 of Introductory Econometrics for


Finance, 2ed, by C. Brooks.
APPENDIX I
Matrix Expression of OLS
Multiple Regression and the Constant Term
•  Now we have:
yt = β1 + β 2 x2t + β 3 x3t + ... + β k xkt + ut

where β1 is the coefficient attached to the constant term (which we called α


before).

•  It looks like: y1 = β1 + β 2 x21 + β 3 x31 + ... + β k xk1 + u1


y2 = β1 + β 2 x22 + β 3 x32 + ... + β k xk 2 + u2
! ! !
yT = β1 + β 2 x2T + β 3 x3T + ... + β k xkT + uT
•  Normally we write: y = Xβ +u
•  Where y is 𝑇×1; X is 𝑇×𝑘;𝛽 is k×1and u is 𝑇×1 column vector.
Inside the Matrices of the MLRM
•  e.g. if k is 3, we have 3 regressors (including 1 constant),
one of which is a column of ones:

•  Therefore we have:
y = Xβ +u

•  That is:
⎡ y1 ⎤ ⎡1 x21 x31 ⎤ ⎡ u1 ⎤
⎢ y ⎥ ⎢1 ⎥ ⎡ β1 ⎤ ⎢ ⎥
x22 x32 ⎢ ⎥ u2
⎢ 2⎥ = ⎢ ⎥ β2 + ⎢ ⎥
⎢ M⎥ ⎢M M M ⎥ ⎢ ⎥ ⎢ M⎥
⎢ ⎥ ⎢ ⎥ ⎢⎣ β 3 ⎥⎦ ⎢ ⎥
⎣ yT ⎦ ⎣1 x2T x3T ⎦ ⎣uT ⎦
How to calculate the parameters in this case?
•  Previously, we took the residual sum of squares, and minimised
it w.r.t. 𝛼 and 𝛽.

•  In the matrix notation, we have


⎡ uˆ1 ⎤
⎢ uˆ ⎥
uˆ = ⎢ 2 ⎥
⎢ M⎥
⎢ˆ ⎥
⎣ uT ⎦
•  The RSS would be given by
⎡ uˆ1 ⎤
⎢uˆ ⎥
uˆ ' uˆ = [uˆ1 uˆ2 K uˆT ] ⎢ 2 ⎥ = uˆ12 + uˆ22 + ... + uˆT2 = ∑ uˆt2
⎢ ⎥
⎢ˆ ⎥
⎣ uT ⎦
The OLS Estimator for MLRM
•  In order to obtain the parameter estimates, ​𝛽↓1 , ​𝛽↓2 , …, ​𝛽↓𝑘 ,
we would minimise the RSS with respect to all the𝛽𝑠.

•  It can be shown that


⎡ βˆ1 ⎤
⎢ ⎥
⎢ βˆ ⎥
βˆ = ⎢ 2 ⎥ = ( X ʹ X ) −1 X ʹ y
M
⎢ ⎥
⎢⎣ βˆT ⎥⎦
An Example
•  The following model with k=3 is estimated over 15 observations:
y = β1 + β 2 x2 + β 3 x3 + u
and the following data have been calculated from the original X’s.
⎛ 2 3.5 −1 ⎞ ⎡ −3 ⎤
( X ʹ X ) −1 = ⎜⎜ 3.5 1 6.5 ⎟⎟ , (X ʹ y ) = ⎢ 2.2 ⎥ , uˆ ʹuˆ = 10.96
⎢ ⎥
⎜ −1 6.5 4.3 ⎟
⎝ ⎠ ⎣⎢ 0.6⎦⎥
•  Calculate the coefficient estimates and their standard errors.
•  To calculate the coefficients, just multiply the matrix by the vector to
obtain ˆ −1
β = (X X ) X y
ʹ ʹ

•  To calculate the standard errors, we need an estimate of σ2.


2 RSS 10.96
s = = = 0.91
T − k 15 − 3
An Example (cont’d)
•  The variance-covariance matrix of ​𝛽  is given by

⎛ 1.83 3.2 −0.91⎞


s 2 ( X ʹ X ) −1 = 0.91( X ʹ X ) −1 ⎜⎜ 3.2 0.91 5.94 ⎟⎟
⎜ −0.91 5.94 3.93 ⎟
⎝ ⎠
•  The variances are on the leading diagonal:
Var ( βˆ1 ) = 1.83 ⇒ SE ( βˆ1 ) = Var ( βˆ1 ) = 1.35

Var ( βˆ2 ) = 0.91 ⇒ SE ( βˆ2 ) = Var ( βˆ2 ) = 0.96

Var ( βˆ3 ) = 3.93 ⇒ SE ( βˆ3 ) = Var ( βˆ3 ) = 1.98


•  We then write:
yˆ = 1.10 − 4.40 x2t + 19.88 x3t
(1.35) (0.96 ) (1.98) Back
APPENDIX II
Proof of Unbiasedness
Unbiasedness of OLS
•  We need to prove 𝐸(​𝛽 )=𝛽:

βˆ2OLS =
∑ ( X − X )(Y − Y ) = ∑ ( X − X )Y − ∑ ( X − X )Y
i i i i i
2 2
∑( X − X ) i ∑( X − X ) i

=
∑ ( X − X ) Y − ∑ ( X Y − XY ) ∑ ( X − X ) Y − (Y ∑ X
i i
=
i i i i − nXY )
2 2
∑( X − X ) i ∑( X − X ) i

=
∑ ( X − X )Y ∑ ( X − X ) ( β + β X + u )
i
=
i i 1 2 i i
2 2
∑( X − X ) i ∑( X − X ) i

β ∑ ( X − X ) + β ∑ ( X − X )X + ∑ ( X − X )u
1 i 2 i i i i
= 2
∑( X − X ) i
Unbiasedness of OLS
Given ∑ ( X i − X ) = 0, we have,

ˆ OLS β 2 ∑ ( X i − X )X i + ∑ ( X i − X )ui
β2 = 2
∑( Xi − X )
2
Given ∑ ( X i − X ) = ∑ X i2 − nX 2 =∑ ( X i − X )X i , we have,

ˆ OLS ∑ ( X i − X )ui
β2 = β2 + 2
∑( Xi − X )
Given E (ui ) = 0, we have,
∑ ( X i − X )ui
E ( β 2 ) = E ( β 2 ) + E[
ˆ OLS
2
] = E (β2 )
∑( Xi − X )
Q.E .D.
Unbiasedness of OLS
•  To prove 𝐸(​​𝛽 ↓1 )=​𝛽↓1 :
Averaging Yi = β1 + β 2 X i + ui for all i, we have,
∑Y i
= β1 + β 2
∑ X + ∑u
i i

n n n
thus, Y = β1 + β 2 X i + u
Given βˆ1OLS = Y − βˆ2OLS X ,
= β1 + β 2 X + u − β 2OLS X
= β1 + ( β 2 − β 2OLS ) X + u
Therefore, E ( βˆ1OLS ) = E ( β1 ) + E[( β 2 − β 2OLS ) X ] + E (u ) = E ( β1 )
Q.E.D. Back

You might also like