Professional Documents
Culture Documents
A preliminary result
Suppose our data set Dn is a random sample of size n on the scalar random variables (xi , yi ) with finite
means, variances, and covariance. Let:
n
1X
(xi x
)(yi y)
cov(x
i , yi ) =
n i=1
Prove that plim cov(x
i , yi ) = cov(xi , yi ).
We will use this result repeatedly in this problem set and in the future. So once you have proved this result,
please feel free to take it as given for the remainder of this course. You can also take as given that if you
define var(x
i ) = cov(x
i , xi ) then plim var(x
i ) = var(xi ).
In many cases, the best way to understand various issues in regression analysis - measurement errors, proxy
variables, omitted variables bias, etc. - is to work through the issue in the special case of a single explanatory
variable. That way, we can develop intuition without getting lost in the linear algebra. Once we have the
basics down, we can then look at the multivariate case to see if anything changes. This problem goes through
the main starting results.
Suppose our regression model has an intercept and a single explanatory variable, i.e.:
yi = 0 + 1 xi + ui
where (yi , xi , ui ) are scalar random variables. To keep things fairly general, we will assume this is a model
of the best linear predictor, i.e. E(ui ) = E(xi ui ) = cov(xi , ui ) = 0.
Our data Dn consists of a random sample of size n
1 x1
1 x2
X= .
..
..
.
1
Let:
=
y1
y2
y= .
..
xn
0
1
yn
= (X0 X)1 X0 y
(1)
cov(xi , yi )
var(xi )
E(yi ) 1 E(xi )
Pn
)(yi y)
cov(x
i , yi )
i=1 (xi x
Pn
=
1
2
var(x
i)
(x
)
i
i=1
n
y 1 x
1
n
The idea for this problem is that you get a little practice translating between different ways of writing the
same model, so even if you know another way to get these results please start with equation (1).
c) Without using linear algebra (i.e., just apply Slutskys theorem and the Law of Large Numbers to the
result from part (b) of this question), prove that
plim 1
plim 0
Often variables are measured with error. Let (yi , xi , ui ) be scalar random variables such that
yi = 0 + 1 xi + ui
where cov(x, u) = 0
= y i + vi
x
i
= xi + wi
where wi and vi are scalar random variables representing measurement error. We assume classical measurement error1 :
cov(vi , xi ) = cov(vi , ui ) = cov(vi , wi )
speaking the classical model of measurement error also assumes independence and normality, but we wont need
those for our results
In applied work one is often faced with choosing units for our variables. Should we express proportions as
decimals or percentages? Miles or kilometers? etc. The short answer is that it doesnt matter if we are
comparing across linearly related scales; the OLS coefficients will scale accordingly, so one can choose units
according to convenience.
Suppose our data set Dn is a sample (random or otherwise; this question is about an algebraic property
of OLS and not a statistical property) of size n on the scalar random variables (yi , xi ). Let the regression
coefficients for the OLS regression of yi on xi be:
cov(x
i , yi )
1 =
var(x
i)
0 = y 1 x
Now lets suppose we take a linear transformation of our data. That is, let:
x
i
= axi + b
yi
= cyi + d
where (a, b, c, d) are a set of scalars (both a and c must be nonzero), and let the regression coefficients for
the OLS regression of yi on x
i be:
cov(
xi , yi )
1 =
var(
xi )
0 = y 1 x
where y
=
1
n
Pn
i=1
yi and x
=
1
n
Pn
i=1
x
i .
The results from the previous question carry over when there is more than one explanatory variable: adding
a constant to either yi or xi only changes the intercept, multiplying yi by a constant c ends up multiplying
all coefficients by c, and multiplying any variable in xi by a constant a ends up multiplying the coefficient
on that variable by 1/a. In this question we will prove that result. In working through this question,
please remember that our results and arguments will be analogous to those in the previous (much easier)
question, and that the purpose of this question is to get some practice with more advanced tools like the
Frisch-Waugh-Lovell theorem.
Let y be an n 1 matrix of outcomes, and X be an n K matrix of explanatory variables. Let:
= (X0 X)1 X0 y
be the vector of coefficients from the OLS regression of y on X. We are interested in what will happen if we
apply some linear transformation to our variables.
a) We start by seeing what happens if we take some multiplicative transformation. Let:
= XA
= cy
where A is a K K matrix2 with full rank (i.e., A1 exists) and c is a nonzero scalar. Let:
0 X)
1 X
0y
= (X
on X.
be the vector of coefficients from the OLS regression of y
1
Show that = cA .
1 x1
1 x2
X=
.
1 ..
1
xn
0
1
= (X0 X)1 X0 y
X + n b
y + dn
= (X
1
i.
be the vector of coefficients from the OLS regression of yi on x
2 We are mostly interested in the case where A is diagonal, i.e., we are multiplying each column in X by some number. But
notice that this setup includes a lot of other redefinitions of variables.