You are on page 1of 5

3 Multiple Regression Model.

Inference
3.1 Sampling Distribution of OLS Estimator
In this section, we will learn how to use the OLS estimates to test various
hypotheses about the coecients. This is called inference. Inference is crucial
for economic analysis as it allows to test economic theories and evaluate impact
of policies.
Throughout this chapter, we will maintain the classical linear regression
assumptions 6-10. In addition, we make the following assumption:
Assumption 11 Normality: The disturbances fn
i
g are i.id. n
i
(0, o
2
),
and independent of the explanatory variables r.
This assumption is needed to derive the sampling distribution of the OLS
estimator, which in turn is required for testing hypotheses. This assumption
implies that the dependent variables is distributed normally conditional on A :
jjA (A,, o
2
1
n
).
or
j
i
jA (j
i
, o
2
)
j
i
= 1 (j
i
jA) = ,
0
+,
1
r
i1
+,
2
r
i2
+... +,
k
r
ik
Why do we assume a normal distribution? Normality is justied by a central
limit theorem (CLT) which says that the normalized sum of i.i.d. variables with
nite variances can be approximated by a normal distribution. This means that
even if, in reality, the n are not normal, for large :, their distribution can be
approximated by a normal distribution. Moreover, normal distributions have a
convenient property: any linear combination of normal variables is again normal.
Recall that a standard normal distribution has a bell shape, it is symmetric
about 0. See Figure B.7.
Assumption 11 implies that the sampling distribution of the OLS estimator
is normal:
Theorem 14 Under Assumptions 6-11, the conditional distribution of the OLS
estimator is normal with mean, 1(

,jA) = , and the variance \ ar(

,jA) =
o
2
(A
0
A)
1

,jA (,, o
2
(A
0
A)
1
)
and

,
j
jA
_
,
j
,
o
2
ooT
j
(1 1
2
j
)
_
.
We can standardize the distribution of

,
j
:

,
j
,
j
:d(

,
j
)
(0, 1) (50)
42
where :d(

,
j
) =
_
\ ar(

,
j
) is the standard deviation of

,
j
. However, in practice
:d(

,
j
) is not known, and therefore, we need to replace it with its estimate:
:c(

,
j
) =
_

\ ar(

,
j
). (51)
Then, it turns out that the distribution

,
j
,
j
:c(

,
j
)
(52)
is no longer normal, but is a t-distribution. More specically, we have the
following theorem:
Theorem 15 Under Assumptions 6-11,

,
j
,
j
:c(

,
j
)
t
nk1
(53)
where : / 1 are degrees of freedom.
In general, t-distribution (or Students distribution) has a similar shape as
the standard normal distribution, symmetric about 0, but has heavier tails, see
Figure B.10. As the degrees of freedom increase, t-distribution approaches a
normal distribution. As a rule of thumb, for d) 100 a t distribution can be
approximated reasonably well by a normal distribution. A t
p
distribution can
be viewed as the ratio of two random variables: the numerator, which has the
standard normal distribution (0, 1), and the denominator, which is a random
variable that is independent of the numerator and is equal to the square root of
the sum of squares of j independent (0, 1) variables. That is, loosely speaking,
t
p
=
1
_
_
A
2
1
+.... +A
2
p
_
,j
where 1 (0, 1), A
i
(0, 1), i = 1, ..., j all independent of each other.
As you know, the distribution of the sum of squares of j independent (0, 1)
variables is called
2
p
distribution with j degrees of freedom. Hence, t
p
distribu-
tion is the ratio of the standard normal distribution (0, 1) and the square root
of
2
p
divided by its degree of freedoms, which is independent of the numerator:
t
p

(0, 1)
_

2
p
,j
Thus, the intuition for the last theorem is as follows: the random variable in
(52) can be re-written as:
T =

,
j
,
j
:c(

,
j
)
=
_

,
j
,
j
_
,:d(

,
j
)
:c(

,
j
),:d(

,
j
)
.
43
The numerator has the standard normal distribution, while the denominator is
the square root of the sum of : independent squared (0, 1) variables. The
degrees of freedom is not : but : / 1 to account for the fact that we have
used / + 1 degrees of freedoms (or data points) to estimate / + 1 coecients.
3.2 Testing Hypotheses about Single Parameter
3.2.1 Two-Sided Test
With the sampling distribution of

,
j
at hand, we can now test hypotheses. The
simplest hypothesis is the so-called statistical signicance test:
H
0
: ,
j
= 0 (54)
H
1
: ,
j
6= 0
If the hypothesis is valid then r
j
has no eect (no signicance) on the expected
value of j when all other factors are accounted for. H
0
is called the null hypoth-
esis or the null. Dont write H
0
:

,
j
= 0. It is a mistake since we are making
inferences about the true parameter, not its random estimate. In addition, we
need to formulate an alternative hypothesis H
1
the conclusion that we would
make it H
0
is not true. The alternative in (54) is called a two-sided alternative.
Though the true parameter ,
j
is not known, we use its estimate

,
j
to check if
the hypothesis is true.
Example 1. Consider the wage equation:
log naqc = ,
0
+,
1
cdnc +,
2
exper +n
Suppose we want to test the hypothesis that experience has no eect on wage
once education is accounted for:
H
0
: ,
2
= 0
H
1
: ,
2
6= 0
Suppose for a moment that the true value of ,
2
= 0. Then,

,
2
will be close
0 with a high probability, and the statistic:
T =

,
2
,
2
:c(

,
2
)
H0
=

,
2
:c(

,
2
)
(55)
will be also close to 0. Note also if

,
2
0, then T 0. If

,
2
< 0, then T < 0.
In other words, T mimics the behavior of

,
2
. In contrast to

,
2
, T has the
advantage that we know its distribution. By Theorem 15, under the null H
0
, T
has a t-distribution with : 3 degrees of freedom. See Figure C.4.
If H
0
is true, the value of the statistic T will fall into the unshaded segment
in the neighborhood of zero, see Figure C.6. Thus, small (in absolute value)
values of T will provide evidence in support of H
0
. On the other hand, if H
0
is
44
not true, i.e., ,
2
6= 0,

,
2

will be large with a high probability, and fall into


the region on the r-axis corresponding to the shaded area, see graph. In this
case, we would reject H
0
. But how "large" should be "large

,
2

" to reject
the null hypothesis? In other words, we need to specify the rejection region for
the hypothesis. In our example, this amounts to choosing the cut-o or critical
value of the t-distribution c such that if jTj c we reject H
0
in favor of H
1
,
and if jTj c, we dont reject H
0
. [Note on terminology: It is better to say not
reject instead of accept H
0
.] The probability that T falls in the rejection region
is
Pr(jTj cjH
0
) c (56)
is called the signicance level of the test. Note that the conditioning on H
0
is added to stress that the distribution of T is derived under the assumption
that H
0
is valid. Recall from the stat course, c is also called Type I error: the
probability of rejecting H
0
when it is in fact true. Note that (56) is equivalent
to
Pr(T c) =
c
2
and Pr(T < c) =
c
2
or (57)
Pr(jTj c) = 1 c
From (56), it is clear that choosing c is equivalent to choosing c. The choice
of c is at discretion of a researcher. The typical choices are: c = 0.1 (or
10%), c = 0.05 (or 5%), c = 0.01 (or 1%). For instance, if c = 0.05 (or 5%)
under the two-sided alternative, the probability of both the right-tail and left-
tail rejection regions for t-distribution (See Figure C.4.) is 0.025 each. In total,
the probability of both regions is 0.05.
After selecting c, one looks up the table for a t-distribution and nds the
critical value c that leaves c,2 probability in the right tail, then computes T
and checks the rejection rule: jTj c. If c = 0.05 and H
0
is rejected, then

,
2
is said to be statistically signicant at 5% signicance level. Note that if

,
2
is statistically signicant at c = 5%, it will be also signicant at any c 5%.
This is because the rejection rejection c = 5% is a subset of the rejection region
for any c 5%, and hence T falls automatically into the rejection region of
c 5%. However, if

,
2
is statistically signicant at c = 5%, it may fail to be
signicant at c < 5%. It needs to be checked separately.
To illustrate main steps in hypothesis testing, consider the following exam-
ple.
Example 2. Determinants of College GPA Consider the following
estimated regression for college GPA:
\
co|G1 = 1.39
(0:33)
+ 0.412
(0:094)
/:G1+ 0.015
(0:011)
CT 0.083
(0:026)
:/ijjcd
: = 141, 1
2
= 0.234
where /:G1 is the high school GPA, CT is the score on the ACT exam,
:/ijjcd is the number of skipped classes. The standard errors of the coecient
45
estimates are reported in parenthesis. We want to test if high school GPA has
any eect on college GPA.
1. The rst step is to formulate the hypothesis in mathematic terms:
H
0
: ,
1
= 0
H
1
: ,
1
6= 0
So, the null H
0
is that /:G1 has no eect on co|G1. The alternative is
two-sided, i.e., we allow both positive and negative correlation between /:G1
and co|G1.
2. The second step is to construct a test statistic and establish its distri-
bution. In general, a statistic is a function of the sample (data) that does not
involve any unknown parameters. A test statistic should meet two criteria: (i) it
should allow to check validity of H
0
, and (ii) it should have a known (tabulated)
distribution. In this example, a convenient test statistic is the t-statistic:
T =

,
1
,
1
:c(

,
1
)
H0
=

,
1
:c(

,
1
)
t
137
because it mimics H
0
and allows to check its validity, and second, it has a known
t-distribution for which tables exist. In our example, the degree of freedoms of
the T statistic is : / 1 = 141 4 = 137.
3. The third step is to choose a rejection region or rejection rule. For
example, let c = 0.05 (or 5%) so that the rejection rule under the two-sided
alternative is
reject H
0
if jTj c
The 5%-critical value of the t-distribution with 137 degrees of freedom for
the two-sided test is 1.96: c = 1.96.
4. The nal step is to compute the value of the test statistic and check
the rejection rule.
jTj =

,
1
:c(

,
1
)
=
0.412
0.094
= 4.38 c = 1.96
Thus, we reject H
0
, and conclude that

,
1
is statistically signicant at 5%
(and hence also at 10%) signicance level and high school GPA has a posi-
tive statistically signicant eect on college GPA. Note the two-sided test of
statistical signicance is included in all regression packages.
Lets now test signicance of the coecient on ACT score at 5% signicance
level. The T test is now:
T =
0.015
0.011
= 1.36 < 1.96
Hence, we cannot reject the null, and the coecient on ACT,

,
2
is not signicant
at 5% level. This means that even though the estimate

,
2
= 0.015, the true
,
2
= 0 with a high probability. We have got a positive estimate

,
2
= 0.015
only by chance (because of randomness of the sample).
46

You might also like