Professional Documents
Culture Documents
Identificación y Causalidad en Modelos
Empíricos
Stanislao Maldonado
U e s ty o Ca o a, e e ey
University of California, Berkeley
Elizabeth Servan (INDECOPI)
Antonio Campos (ADEX)
Antonio Campos (ADEX)
Isaac Martinez (UNMSM)
Lima, 9 al 13 de agosto del 2010
1
El curso
El curso
• Objetivo: una introducción a los métodos econométricos
Objet o u a oducc ó a os é odos eco o é cos
para la identificación y estimación de efectos causales
• Énfasis en los conceptos y en aplicaciones empíricas
Énfasis en los conceptos y en aplicaciones empíricas
• Horario:
Lunes: 9 a 1 p.m.
Martes‐Viernes: 8 a 1 p.m.
• Organización:
8‐10 a.m. laboratorio/ 10‐1 p.m. teoría
2
• Evaluación:
a uac ó
1 control de lectura + 4 trabajos teórico‐prácticos
1.
1 Control de lectura (Martes, 8‐8:15 a.m.): Holland
Control de lectura (Martes 8 8:15 a m ): Holland (1986)
2. Trabajo 1: Experimentos (Miércoles, 8 a.m.)
3. Trabajo 2: Selección sobre observables (Jueves, 8 a.m.)
4. Trabajo 3: Métodos de panel (Viernes, 8 a.m.)
5. Trabajo 4: Selección sobre no observables (Sábado, 9 a.m. al
smaldonadoz@berkeley edu )
smaldonadoz@berkeley.edu
Calificación final del curso: promedio simple de las 5 notas
3
o Se permite discusión grupal, pero cada participante deberá entregar
su propia versión escrita
o No se aceptará la entrega de trabajos por correo electrónico
o Penalidad por entrega tardía
p g
• Certificados:
A nombre del CIES y el INEI para
A nombre del CIES y el INEI para los que
los que tengan una nota
nota
aprobatoria
4
Presentaciones
• Nombre, institucion de procedencia, temas de
o b e, s uc o de p ocede c a, e as de
investigacion de interes (1 minuto)
5
Research Design, Causality and Structural
Research Design Causality and Structural
Models versus Potential Outcomes in
Economics
Stanislao Maldonado
University of California, Berkeley
C
Curso CIES INEI “Mi
CIES‐INEI “Microeconometría: Identificación y
t í Id tifi ió
Causalidad en Modelos Empíricos”
Lima, 9 de agosto del 2010
6
1. What is economics (and econometrics)
about?
b ?
• Economics is all about “economic” questions.
co o cs s a abou eco o c ques o s
• What is econometrics for?
Econometrics provides credible quantitative answers to
interesting economic questions
• What are the types of empirical research in economics
(Angrist and Krueger 1999)?
a. Descriptive analysis
b
b. Causal inference
Causal inference
7
a. Descriptive analysis
p y
Establish facts about economic reality that need to be
explained by theoretical reasoning and yield new insights
abut economic phenomena
b. Causal inference
Seeks to determine the effects of particular interventions
and policies, or to estimate the behavioral relationships
suggested by economic theory
8
• Not competing methods at all but the most interesting
p g g
questions in economics are about cause‐effect
relationships.
• Typical questions in economics (which type?):
– Does reducing class size improve student
Does reducing class size improve student’ss performance?
performance?
– Is there racial discrimination in labor market?
– What will the ratio of inflation be next year?
Wh t ill th ti f i fl ti b t ?
– What was the evolution of inequality in the world during the XX
century?
9
• O
Other use of econometrics is forecasting. I will talk a bit
e use o eco o e cs s o ecast g a ab
later.
• This
This course is about using econometrics for providing
course is about using econometrics for providing
credible quantitative answers to interesting causal
questions
10
2. Two approaches to causality in econometrics
2. Two approaches to causality in econometrics
• Two theories about causality (Pearl 2000):
y( )
o Structural modeling (Havelmo 1943, Cowles Commission)
oP
Potential outcomes framework (Neyman
t ti l t f k (N 1923 R bi
1923, Rubin
1977, Holland 1986).
• But, what is causality?
But what is causality?
“A causal effect is defined to be the effect on an outcome of a given
action or a treatment, as measured in an ideal randomized
controlled experiment In such an experiment the only systematic
controlled experiment. In such an experiment, the only systematic
reason for differences in outcomes between the treatment and
control groups is the treatment itself”. (Stock and Watson 2007)
11
• Structural modeling
Structural modeling
o Relies heavily in economic theory to guide empirical work
o IInterested in recovering the primitives of economic theory
t t di i th i iti f i th
and seeks to estimate decision rules derived from
economic models
• Potential outcomes or “experimentalist” approach
o Uses economic theory to frame questions
o The emphasis is on the problem of identifying the causal
effects from specific events and situations
12
o The ideal is to approximate a randomized experiment
pp p
• Modern econometrics is increasingly based on the
experimentalist approach
experimentalist approach
• However the research frontier is based on experimental
work using structural models
work using structural models.
• See Heckman (2005) and Pearl (2000) for a philosophical
di
discussion about these issues
i b t th i
13
3. Problems in estimating causal effects
g
• Correlation does not imply causation!
o e a o does o p y causa o
o X → Y causality
o Y→X
Y → X reverse causality
reverse causality
o X → Y and Y → X simultaneity
o Z → X and Z → Y
→ → omitted variables /
/
confounding
o X* → Y measurement error
• Example: what is the effect of schooling on earnings?
14
Omitted Variables
Omitted Variables
• Suppose we want to estimate assumed to
E ( y X ,W )
be linear in (X,W):
E ( y X ,W ) = X β + W γ
• But you estimate: y = X β + μ
N
W γ +ε
• We will have:
We will have:
β = ( X ' X ) −1 X ' y = β + ( X ' X ) −1 X 'W γ + ( X ' X ) −1 X ' ε
Asymptotically:
plimβ = β + Σ −XX1 Σ XW γ
15
• Where there is only one variable:
e e e e s o y o e a ab e
Cov (W , X )
p lim β = β + γ
ˆ
V (X )
Var
• Extent of omitted variables bias related to:
o size of correlation between X and W
o strength of relationship between y and W
16
Reverse Causality
Reverse Causality
• Idea is that correlation between y and X may be because
dea s a co e a o be ee y a d ay be because
it is y that causes X not the other way round
• Interested in causal model:
Interested in causal model:
y = Xβ +ε
• But also causal relationship in other direction:
X = yα + μ
17
μ + αε
s X=
• Reduced form is:
educed o
1 − αβ
• X correlated with ε . As we know, this leads to bias in OLS
,
estimates
18
Measurement Error
Measurement Error
• Most of our data are measured with error.
• Suppose causal model is:
y = X *β +ε
• But only observe X : X* plus some error:
X = X * +μ
• Classical measurement error:
E ( μ X *) = 0
19
• Can write causal relationship as:
a e causa e a o s p as
y = ( X − μ )β + ε
= X β − uβ + ε
= X β +υ
• Note that X correlated with composite error v.
Note that X correlated with composite error v
• This leads to bias/ inconsistency in OLS estimator
• Want E(y│X*) but can only estimate E(y│X)
20
Common Features of Problems
Common Features of Problems
• All problems have an expression in everyday language –
p ob e s a e a e p ess o e e yday a guage
omitted variables, reverse causality etc.
• All have an econometric form
All have an econometric form – the same one
the same one
• A correlation of X with the ‘error’ term
21
4. Potential outcomes framework
4. Potential outcomes framework
• The model was proposed originally by Neyman
e ode as p oposed o g a y by ey a ((1923) 9 3)
and further developed by Rubin (1974). We introduce
here the basic terminology:
– i is an index for individuals in a population.
– Di is the treatment or the potential cause of which we want to
is the treatment or the potential cause of which we want to
estimate the effect.
• Di = 1 if individual has been exposed to treatment.
p
• Di = 0 if individual has not been exposed to treatment.
22
– Yi ( Di ) iss thee ou
outcome
co e o
or thee eeffect
ec wee want
a too
attribute to the treatment.
• Yi (1) is the outcome in case of treatment.
is the outcome in case of treatment
• Yi (0) is the outcome in case of no treatment.
– Note that the outcome for each individual can be
written as follows:
(1) Yi = DiYi (1) + (1 − Di )Yi (0)
– Or simply: Yi = Yi (1) if Di = 1
Yi = Yi (0) if Di = 0
23
The fundamental problem of causal inference
p
• Definition 1: Causal effect
e to Causa e ect
For every individual i, the causal effect of D=1 is:
Δ i = Yi (1) − Yi (0)
• Proposition 1: Fundamental problem of causal inference
(Holland 1986)
It is not possible to observe for the same individual the
values D=1 and D=0 as well as the values Y(1) and Y(0).
al es D 1 and D 0 as ell as the al es Y(1) and Y(0)
Therefore, it is not possible to estimate the effect of D on
Y for each individual i.
f
24
• Need to think in terms of counterfactuals!
eed o e s o cou te actua s
25
Solutions to the fundamental problem of causal
i f
inference
• Two solutions:
o so u o s
– Scientific solution (See Holland 1986)
– Statistical solution
• The statistical solution is based on estimating the average
effect of the treatment instead of doing so at an
individual level.
26
1. Average treatment effect (ATE)
e age t eat e t e ect ( )
• Not yet estimable!
• Conditional version is also available: ATE(X)
27
2. Average treatment effect on the treated (ATT)
e age t eat e t e ect o t e t eated ( )
• Not yet estimable!
• Conditional version is also available: ATT(X)
28
3
3. Average treatment effect on the untreated (ATU)
e age t eat e t e ect o t e u t eated ( U)
• Not yet estimable!
• Conditional version is also available: ATU(X)
29
The selection problem
The selection problem
• Suppose we want to estimate ATE using observational
Suppose e a o es a e us g obse a o a
data. We compute a simple mean difference in outcomes
(MDO) or naïve estimator:
selection bias
30
The experimental benchmark
The experimental
• Key idea of this course:
ey dea o s cou se
how to approximate our research strategy to one
situation that resembles an experiment in which the
treatment is randomly assigned
• Angrist
g and Pischke ((2009): random assignment is the
) g
most credible and influential research design because
solves the “selection problem”
31
• Recall:
eca
The Fundamental Problem of Causal Inference
Groupp Y(1)
( ) Y(0)
( )
Treatment (D=1) Observable as Y Counterfactual
Control (D=0) Counterfactual Observable as Y
• We have:
E [Yi Di = 1] = E [Yi (1) Di = 1]
E [Yi Di = 0] = E [Yi (0) Di = 0]
32
• Computing ATT requires to know:
o pu g equ es o o E [Yi ((0)) Di = 1]
33
• Generally, none of these conditions hold with
e e a y, o e o ese co d o s o d
observational (non‐experimental) data due to the
existence of selection
• But!
There is an important case in which these conditions are
p
met. That is the case of a randomized experiment.
In this case, the treatment D is independent
In this case the treatment D is independent to potential
to potential
outcomes Y(1) and Y(0)
34
• Therefore,
e e o e,
• Then, we can compute ATE:
Then we can compute ATE:
(10) ATE = E [ Δ i ] = E [Yi (1) − Yi (0) ] = E [Yi (1) ] − E [Yi (0) ]
= E [Yi (1) Di = 1] − E [Yi (0) Di = 0]
= E [Yi Di = 1] − E [Yi Di = 0]
35
• Notice that with random assignment:
o ce a a do ass g e
ATE = ATT = ATU
• When
When there is no random assignment, we must assume
there is no random assignment we must assume
that the treatment is “as good as randomly assigned”.
This assumption can be written as follows:
p
• Notice that estimating ATT and ATU requires a weaker
version of (11)
36
• With non‐experimental data, we must assume some
o e pe e a da a, e us assu e so e
form of (11)
– one
one way: argue that treatment is ignorable after conditioning by
way: argue that treatment is ignorable after conditioning by
a set of covariates. This is known as selection on observables
37
Potential outcomes and regression
Potential outcomes and regression
• Re‐write the observed outcome as follows:
e e e obse ed ou co e as o o s
(1') Yi = DiYi (1) + (1 − Di )Yi (0)
= Yi (0) + {Yi (1) − Yi (0)} Di
• This is similar to:
(14) Yi = α + β Di + ε i
• Re‐write (1’) as follows:
(1'') Yi = Yi (0) + {Yi (1) − Yi (0)} Di
)} + {Yi ((1)) − Yi (0)
= E{Yi ((0)} ( )} Di + Yi (0)
( ) − E{Yi (0)}
( )}
α β εi
38
• Taking expectations conditional on D:
a g e pec a o s co d o a o
(15) E ( Yi Di = 1) = α + β + E (ε i Di = 1)
E ( Yi Di = 0) = α + E (ε i Di = 0)
• Estimating beta using OLS gives:
(16) E ( Yi Di = 1) − E ( Yi Di = 0) = β
N
Treatment effect
+ E (ε i Di = 1) − E (ε i Di = 0)
Selection bias
39
• When treatment is randomly assigned, the selection bias
e ea e s a do y ass g ed, e se ec o b as
is zero: E (ε i Di = 1) − E (ε i Di = 0) = 0
Selection bias
• Therefore:
(17) E ( Yi Di = 1) − E ( Yi Di = 0) = β
N
Treatment effect
40
The stable treatment unit value assumption
(SUTVA)
• Implies:
p es
the potential outcomes of an individual is unaffected by
potential changes in the treatment exposures of other
individuals (Morgan and Winship 2007, section 2.4)
• One way to understand SUTVA: no general equilibrium
y g q
effects due to the treatment.
• Example: Miguel and Kremer (2004) on worms
Example: Miguel and Kremer (2004) on worms
41
“No causation without manipulation” (Holland
1986)
• Poorly defined treatments are those in which the
oo y de ed ea e s a e ose c e
treatment cannot be potentially manipulated.
• Example:
– She scored highly on the exam because she is female.
– She scored highly on the exam because she studied.
– She scored highly on the exam because her teacher tutored her.
• In which case the potential outcomes are correctly
p y
defined?
42
5. Research design and causality
5. Research design and causality
Experimental
Regression
Research Selection
Selection
Design on observables Matching
Non‐Experimental
IV
Selection
on unobservable Fixed effects/DD
Regression
discontinuity
43
6 Validity issues in econometrics
6. Validity issues in econometrics
• “Validity” refers to the approximate truth of an inference
a d y e e s o e app o a e u o a e e ce
(Shadish, Cook and Campbell 2002).
• Types:
– Internal validity
– External validity
– Statistical conclusion validity
– Construct validity
• We will focus on the first two.
44
Internal versus external validity
Internal versus external validity
• Definition 2: Internal validity
e to te a a d ty
Refers to the validity of inferences about whether
observed covariation between X and Y reflects a causal
relationship from X to Y in the form in which the variables
were manipulated or measured.
• Definition 3: External validity
Concerns to the validity of inferences about the extent to
which a causal relationship holds over variation in
persons, settings, treatment and outcomes.
45
Threats to internal validity
Threats to internal validity
• Ambiguous temporal precedence: lack of clarity about
g p p y
which variable occurred first
• Selection: Systematic differences over conditions in
respondent characteristics
• History: Events occurring concurrently with treatment
• Maturation: Naturally occurring changes over time could
be attributed incorrectly to treatment
• Attrition: Loss of respondents to treatment or to
measurement produce biased treatment effects
46
• Testing: Exposure to a test can affect scores on
es g posu e o a es ca a ec sco es o
subsequent exposures to that test, fact that can be
correlated with the treatment
• Instrumentation: The nature of a measure may change
over time or condition in a way that can be confused
with the treatment
47
Threats to external validity
Threats to external validity
• Interaction of the causal relationship with units: An effect
e ac o of e causa e a o s p u s e ec
found in certain kind of units might no hold if other kind
of units had been studied
• Interaction of the causal relationship over treatment
variations: An effect found with one treatment variation
might not hold with other variations of that treatment
• Interaction of the causal relationship with outcomes: An
f p
effect found on one kind of outcome variation may not
hold if other outcome observations were used
48
• Interaction of the causal relationship with settings: An
e ac o of e causa e a o s p se gs
effect found in one kind of setting may not hold if other
kinds of settings were to be used
• Context‐dependent mediation: An explanatory mediator
of a causal relationship in one context may not mediate
in another context
49
Source: Roe and Just (2009) 50
Lecture II:
Experiments and causality
Stanislao Maldonado
University of California, Berkeley
Curso CIES‐INEI
CIES INEI “Microeconometría:
Microeconometría: Identificación y
Identificación y
Causalidad en Modelos Empíricos”
Lima, 10 de agosto del 2010
51
1. Randomized experiments in economics
1. Randomized experiments in economics
• Experiments are increasingly used in several fields in
p gy
economics (labor, economics of education, health
economics, development, behavior, political economy,
industrial organization public economics etc)
industrial organization, public economics, etc)
• Examples:
o Effect of school inputs on learning (Glewwe
Effect of school inputs on learning (Glewwe and Kremer 2002)
and Kremer 2002)
o Adoption of new technologies in agriculture (Duflo et al 2010)
o Corruption in licenses (Bertrand et al 2006)
o Moral hazard and adverse selection in consumer markets
lh d d d l k
(Karlan et al 2005)
• Economics is becoming more experimental!
Economics is becoming more experimental!
52
• Experiments have a long tradition in natural sciences and
pe e s a e a o g ad o a u a sc e ces a d
it is central in our modern view of “science”
• Some vocabulary (Shadish, Cook and Campbell 2002):
Some vocabulary (Shadish Cook and Campbell 2002):
o Experiment: An study in which an intervention is deliberatively
introduced to observe its effects
introduced to observe its effects
o Randomized experiment: An experiment in which units are
assigned to receive the treatment by a random process
g y p
o Quasi‐experiment: An experiment in which units are no
randomly assigned to treatment
53
o Natural experiment: not really an experiment since the cause
can not be manipulated, but contrast a naturally/institutional
occurring event with a comparison group
o Observational study: A study that simply observes the size and
direction of a relationship among variables
• In this lecture, we will pay attention to randomized
experiments
54
Randomized experiments
Randomized experiments
• Other names:
o Randomized assignment studies
o Randomized controlled trials (RCT)
o Randomized controlled experiments
Randomized controlled experiments
o Social experiments
• Definition
Definition 1: Random assignment (Shadish, Cook and
1: Random assignment (Shadish Cook and
Campbell 2002)
Any procedure that assigns units to treatment/control status based
only on chance, in which each unit has a nonzero probability of
l h i hi h h it h b bilit f
being assigned to a treatment/control status
• Random assignment is the same as random sampling?
Random assignment is the same as random sampling?
55
• The answer is no!
o Random sampling ensures that the selected sample is similar
to a population
o Random assignment makes samples of treatment and control
Random assignment makes samples of treatment and control
units similar to each other
• Why randomization works?
o Ensures alternative causes are not cofounded with treatment
o Reduces plausibility of validity issues by distributing them
randomly
o Equates groups on the expected value of all pre‐treatment
characteristics
56
o Allows the researcher to know and model the selection process
o Allows the computation of a valid estimate of the error variance
that it is also orthogonal to treatment
• Despite its power, random assignment is only one part of
an experimental design
• A typical experiment involves (JPAL‐MIT):
o D i
Design of the study
f th t d
o Random assign of units to treatment and control status
57
o Collect baseline data
o Verify randomization
o M
Monitor the process to make sure that original design is not
it th t k th t i i l d i i t
affected during the implementation
o Collect follow‐up data
Collect follow up data
o Estimate the impacts of the treatment, assessing whether the
impacts are statistically and practically significant
impacts are statistically and practically significant
58
Ideal randomization
Ideal randomization
59
When randomize? (SSC 2002)
When randomize? (SSC 2002)
• When demand outstrips supply
Randomization can be used as a tool for distributing a service fairly. Ex:
educational/training programs
• When an innovation can not be delivered to all units at once
Wh i i b d li d ll i
Ex: a curricular change
• When experimental units are spatially separated
Wh i t l it ti ll t d
Ex: family planning programs in isolated rural areas in Peru
• When
When a change is needed but solutions are acknowledged to
a change is needed but solutions are acknowledged to
be unknown
Ex: Domestic violence programs
60
• When no
e o randomize?
a do e
o When quick answers are needed
o When a great precision in estimating an effect is not needed
o When the treatment of interest cannot be manipulated
o When the contribution of the experiment to scientific/policy
knowledge is expected to be low compared to its costs
61
Random assignment in practice!
Random assignment in practice!
• Based on Banerjee
ased o a e jee eet al (2007), “Remedying education.
a ( 00 ), e edy g educa o
Evidence from two randomized experiments in India”,
Quarterly Journal of Economics.
• Data from Balsakhi program
o Launched in 1994 by NGO Pratham
o Provide tutors (typically young woman from the community) for
children at risk
o Other potential effects: class reduction and ability tracking
Other potential effects: class reduction and ability tracking
• Lets play with STATA doing random assignment!
62
Idealized experiments and causal effects
Idealized experiments and causal effects
• Recall the key idea:
Causal effects can be measured by randomly selecting
Causal effects can be measured by randomly selecting
individuals from a population and the randomly giving
some of the individuals the treatment
• The effect of random assignment:
(1) Yi = β 0 + β1 X i + ui
Where:
Yi : Outcome
O
X i : Treatment level
ui : All additional determinants of Y
63
• You know:
ou o
If X is randomly assigned:
E (ui X i ) = 0
o X is distributed independently of the omitted factor u
o Random assignment of X implies that the ortoghonality condition
holds
• Causal effect on Y on treatment level X:
β1 = E (Y X = x) − E (Y X = 0)
64
• The Differences Estimator
The Differences Estimator (DE)
If X binary:
o Causal effect can be estimated by the difference in the sample
average outcomes between the treatment and control groups
o Equivalently:
β can be estimated by OLS estimator b if treatment is randomly
assigned
g
65
2. Potential problems with experiments
2. Potential problems with experiments
• There is no free lunch in economic research!
There is no free lunch in economic research!
• Experiments have many advantages:
o Less subject to methodological debates
o Easier to conveyy
o More convincing to policy‐makers
• However, experiments may be subject to internal and
external validity threats
66
Threats to internal validity
Threats to internal validity
1. Failure to randomize
il d i
Ex: using last name to assign the treatment
2. Failure to follow treatment protocol
People don’t do what they are asked to do
o Partial compliance
Individuals assigned to the treatment may refuse to take it out. The same
with control units
o Incorrect measurement of treatment
67
3. Attrition
Subjects dropping out of the study after being randomly assigned
to treatment
o Random attrition
Random attrition
Ex: Selected to training program that get sick
o Endogenous attrition
Ex: More able individual dropping training program for getting job
bl i di id l d i i i f i j b
4. Experimental effects
Being in an experiment change behavior: treatment (Hawthorne
Being in an experiment change behavior: treatment (Hawthorne
effect), control (John Henry effect)
o Double – blind experiments: Placebos
68
5. SSmall sample
5 a sa p e
Not bias but causal effects are imprecisely estimated
69
Threats to external validity
Threats to external validity
1. Non representative sample
Population studied and the population of interest must be similar to
justify generalizing results
justify generalizing results
2. Non representative program or policy
Small scale experiments can be quite different than the
Small‐scale experiments can be quite different than the
program/policy to be implemented
3. General equilibrium effects
Turning a small and temporary small experimental program into a
widespread and permanent program might change the economic
environment
70
4. Treatments vs. eligibility effects
ea e s s e g b y e ec s
Participation in an actual program is voluntary. A different effect
should be expected
• Other problems with experiments:
o Costly
o Ethical issues
71
3. Regression estimators of causal effects using
experimental data
i ld
• If treatment is randomly received
ea e s a do y ece ed:
o Differences estimator is unbiased
o But is this efficient ?
• When experiment have some issues of internal validity,
then the differences estimator is biased
• Solution:
Differences Estimator with additional regressors (DER):
(2) Yi = β 0 + β1 X i + β 2W1i + ... + β r +1Wri + ui
72
• W is a set of “control variables”
s a se o co o a ab es
• What is the difference between a “treatment” and
“control”
control variable?
variable?
o Conditional mean‐zero assumption:
E (ui X i ) = 0
o Conditional mean independence assumption:
Conditional mean independence assumption:
E (ui X i , W1i ,..., Wri ) = γ 0 + γ 1W1i + ... + γ rWri
73
• Conditional mean independence implies:
o u can be correlated with W
o Given W, u does not depend on X
• When this assumption is true ?
When this assumption is true ?
o When E (ui X i ) = 0
o X is randomly assigned
o X is assigned randomly conditional on W
74
• Taking conditional expectations in both sides of equation
(2):
(3) E( Yi X i ,W1i ,..., Wri )
= β 0 + β1 X i + β 2W1i + ... + β r +1Wri + E (ui X i , W1i ,..., Wri )
= β 0 + β1 X i + β 2W1i + ... + β r +1Wri + γ 0 + γ 1W1i + ... + γ rWri
• Evaluating at X=1 and at X= 0 :
β1 = E (Y X = 1,W1i ,...,Wri ) − E (Y X = 0, W1i ,...,Wri )
• W must reflect non experimental – predetermined
outcomes
75
• Reasons for using the DER:
easo s o us g e
o Efficiency
OLS estimator of β using DER has smaller variance that the
g β obtained using
g
DE
o Check for randomization
If there is a failure to randomize, there will be a large difference between the
β estimated using DER and DE
o Adjust for
Adjust for “conditional”
conditional randomization
randomization
W can be used for controlling differences between treatment and control
groups that were not eliminated by the random assignment
76
• Estimation of causal effects for different groups:
s a o o causa e ec s o d e e g oups
o Adding interaction effects when characteristic is observable
• Estimation when there is partial compliance:
o X can be correlated with u, so OLS estimator is no longer
consistent
o Solution: IV
o Assigned treatment serves as instrument for actual treatment
77
• Testing for randomization
es g o a do a o
o Testing for random receipt of treatment
X i = γ 0 + γ 1W1i + ... + γ rWri + vi
F‐test for null hypothesis that treatment was received randomly
o Testing for random assignment
78
Example: Effect of class size reductions
Example: Effect of class size reductions
• Project STAR (Student‐Teacher Achievement Ratio)
ojec S (S ude eac e c e e e a o)
o 4‐year study, $12 million
o Upon entering the school system, a student was randomly
assigned to one of three groups:
• regular class (22 – 25 students)
• regular class + aide
regular class + aide
• small class (13 – 17 students)
o Regular class students re‐randomized after first year to regular
or regular+aide
o Y = Stanford Achievement Test scores
79
• Internal validity issues:
e a a d y ssues
o Partial compliance
o Attrition
• Empirical estimation:
Yi = β0 + β1SmallClassi + β2RegAidei + ui
SSmallClass
llCl i = 1 if in a small class
1 if i ll l
RegAidei = 1 if in regular class with aide
80
81
82
• Replicating results in STATA
ep ca g esu s S
83
4. Using experiments as a benchmark for
evaluating non‐experimental methods
l i i l h d
• Experimental data can be exploited to assess the bias using
non‐experimental techniques
• Seminal work by Lalonde (1986) showed that many
econometric procedures and comparison groups used in the
i d d i di h
literature provide estimates that are often far from
experimental results
• Other studies:
o Propensity score matching (Heckman et al 1997, Heckman et al
1998 D h ji and Wahba
1998, Dehejia d W hb 1999, Smith and Todd 2005, Diaz and
1999 S ith d T dd 2005 Di d
Handa 2006, among others)
o RDD (Budlemeyer and Skoufias 2003)
84
o Matching (Abadie and Imbens 2006, McKenzie et al 2010,
Arcenoux et al 2000)
o Difference in differences (Glewwe et al 2004)
o IV (McKenzie et al 2010)
IV (McKenzie et al 2010)
85
LaLonde (AER,1986)
• Analyzes data from a randomized experiment evaluating a job
training program, the National Supported Work
Demonstration (NSW), to assess whether standard
econometric procedures can reproduce experimental results
• To do so:
o Construct alternative control groups from household surveys
o Test standard methods: difference in differences, Heckman sample
selection model, and IV
• He
He shows that experimental results cannot be replicable by
shows that experimental results cannot be replicable by
using non‐experimental techniques and control groups
• Experimental effect: 800
Experimental effect: 800‐900
900 US$
US$
86
Si l diff
Simple difference
87
Simple difference
adjusted by age,
schooling and
ethnicity
88
Dif‐in‐dif
89
McKenzie, Gibson and Stillman (JEEA, 2010)
McKenzie, Gibson and Stillman (JEEA, 2010)
• How much do migrants stand to gain in income from moving
across borders?
b d ?
• Empirical problems:
o Selection:
Selection: income differences may be due to unobserved differences
income differences may be due to unobserved differences
in ability, skills, motivation, etc.
• This paper uses experimental data (random selection of
immigrants) from the Pacific Access Category (PAC):
immigrants) from the Pacific Access Category (PAC):
o PAC allows Tongans to participate in a visa lottery to migrate
permanently to New Zealand
y p
o Survey to winners and losers + data about non‐aplicants
• They use experimental data to study performance of non‐
experimental methods: first differences, OLS, DD, matching
and IV
and IV
90
91
92
93
94
95
5. Running regressions without apology
5. Running regressions without apology
• Without random assignment, a regression may or may
ou a do ass g e , a eg ess o ay o ay
not have causal interpretation
• But
But what
what’ss wrong with not having a causal interpretation
wrong with not having a causal interpretation
for an OLS coefficient?
o Description
o Prediction
96
• Example: schooling and earnings
a p e sc oo g a d ea gs
o On average, people with more schooling tend to earn more than
p p
people with less schooling
g
o Education predicts earnings in a narrow statistical sense
• P
Predictive power is summarized by the Conditional
di i i i d b h C di i l
Expectation Function (CEF)
97
Lecture III:
Regression
Stanislao Maldonado
University of California, Berkeley
Curso CIES‐INEI
CIES INEI “Microeconometría:
Microeconometría: Identificación y
Identificación y
Causalidad en Modelos Empíricos”
Lima, 11 de agosto del 2010
98
1. Agnostic Regression
1. Agnostic Regression
• Another look at regression: regression as an statistical
rather than econometric tool
• Conditions needed to run a regression are fairly simple
provided you interpret the result appropriately
provided you interpret the result appropriately
• I will show you that most of the assumptions you were
told to believe in order to run a regression are not
told to believe in order to run a regression are not
needed (and there is nothing wrong with that!)
• Example: schooling and earnings
o On average, people with more schooling tend to earn more than
people with less schooling
o Education predicts earnings in a narrow statistical sense
99
Conditional Expectation Function (CEF)
Conditional Expectation Function (CEF)
• We are interested in the relationship between the
p
dependent variable y and the explanatory variables x
• Some reasons:
o Description: how is the observed relationship between y and x?
o Prediction: can we use x to create a good predictor of y?
o Causality: what happens to y if we experimentally manipulate x?
Causality: what happens to y if we experimentally manipulate x?
• If we are not interested in causality, we may be
interested in studying the expected value of y conditional
interested in studying the expected value of y conditional
on x; E(y/x).
• This relationship is given by the CEF
This relationship is given by the CEF
100
E ( yi xi ) = h( xi )
• An the CEF residual as: ε i = yi − h( xi )
• Where: E (ε i xi ) = 0
• Notice that this condition holds by definition (no
exogeneity assumptions are needed)
exogeneity assumptions are needed)
• Proof: ε i = yi − h( xi ) → h( xi ) = yi − ε i
E ( yi xi ) = h( xi ) = E (h( xi ) xi ) = E ( yi − ε i xi ) = E ( yi xi ) − E (ε i xi )
Then: E ( yi xi ) = E ( yi xi ) − E (ε i xi ) → E (ε i xi ) = 0
101
• CEF residuals always has zero conditional expectation
y p
• Theorem 1: CEF Decomposition Property
yi = E ( yi xi ) + ε i
Where:
o The error term is mean independent: E (ε i xi ) = 0
o The error term is uncorrelated with any function of X
102
• Reason: CEF is the best predictor of y given x
easo s e bes p ed c o o y g e
• Theorem 2: CEF Prediction Property
The CEF solves E ( yi xi ) = arg h ( x ) min
The CEF solves: i E[( yi − h( xi )) 2 ],
]
So, it’s the minimum mean squared error predictor of y
given x
given x
• Proof: See Angrist and Pischke (2009), page 33.
103
Linear regression and the CEF
Linear regression and the CEF
• We know that CEF has nice properties, but what is its
e o a as ce p ope es, bu a s s
connection with the linear regression and why we want
to run a linear regression?
• Regression is closely linked with the CEF and the CEF
provides a natural summary of empirical relationships
• Theorem 3: Linear CEF
Suppose CEF is linear Then the population regression
Suppose CEF is linear. Then, the population regression
function is also linear:
E ( yi xi ) = xi' β ,
If: then β = E[ xi xi' ]-1 E[ xi yi ]
104
• Proof: again, Angrist
oo aga , g s aand Pischke
d sc e ((2009), page 37.
009), page 3
• Of course, there is no reason the CEF has to be linear but
we run regressions anyway mainly by practical reasons
we run regressions anyway mainly by practical reasons
• But, also for good theoretical reasons!
• Theorem 4: Best Linear Predictor Theorem
E ( yi xi ) = xi' β ,
The function is the best linear predictor of
y given x in a MMSE sense. Formally,
β = E[ xi xi' ]-1 E[ xi yi ] = arg b min E[( yi − xi b) 2 ]
105
• Proof: you know where to look at!
oo you o e e o oo a
• This last property is cool if you are interested in
prediction But we are no macro guys (right?)
prediction. But we are no macro guys (right?)
• We want to use regression as a tool for estimating the
CEF as a summary of the underlying relationship
CEF as a summary of the underlying relationship
between y and x
• Theorem 5: The Regression CEF Theorem
h h h
xi' β
The function provides the MMSE linear
approximation to E ( yi xi ) that is:
approximation to , that is:
β = arg b min E[( E ( yi xi ) − xi b) 2 ] 106
• Proof: ... Ok, you already know
oo O , you a eady o
• Regression provides the best linear approximation to the
CEF even when the CEF is non‐linear
CEF even when the CEF is non‐linear
• Notice that this result depends almost on nothing!
o Whether your data is i.i.d.
Wh h d i iid
o Whether your explanatory variables are fixed or random
o y g
Whether your regressors are correlated with the CEF residuals
o Whether the CEF is linear or not
o Whether your dependent variable is continuous, discrete, etc.
107
2. Regression and causality
2. Regression and causality
• When can we use a regression for recovering a causal
e ca e use a eg ess o o eco e g a causa
effect?
• A
A regression is causal
regression is causal when the CEF it approximates is
when the CEF it approximates is
causal
• When
When the CEF is causal?
the CEF is causal?
The CEF is causal when it describes differences in average
p
potential outcomes for a fixed reference population
p p
• Recall from previous lectures:
⎧Yi (1) if Di = 1
Potential Outcome = ⎨
⎩Yi (0) if Di = 0 108
• The observed outcome:
e obse ed ou co e Yi = Yi ((0)) + {Yi (1) ( )} Di
( ) − Yi (0)
• The mean differences outcome:
(1) E [Yi Di = 1] − E [Yi Di = 0] = E [Yi (1) Di = 1] − E [Yi (0) Di = 1]
MDO ATT
+ { E [Yi (0) Di = 1] − E [Yi (0) Di = 0]}
selection bias
• Assuming
Assuming that the selection process into the treatment is
that the selection process into the treatment is
given by characteristics observed by the researchers
leads to a way for solving the selection bias problem
109
• This assumption is known as “Conditional Independence
s assu p o s o as o d o a depe de ce
Assumption” (CIA):
((2)) {Yi (1),
( ), Yi (0)} ⊥ Di X
• Therefore,
(3) E [Yi X , Di = 1] − E [Yi X , Di = 0] = E [Yi (1) − Yi (0) X ]
• Regression provides an easy empirical strategy that
automatically turns the CIA into causal effects
automatically turns the CIA into causal effects
110
3. Applications: Krueger(1993)
3. Applications: Krueger(1993)
• Krueger (1993) tries to estimate the returns of using
g ( ) g
computers at work using US household survey data. He uses
the following empirical model:
l (Wi ) = X i β + α Ci + ε i
ln(
• Problem: those workers who use computer may be abler, and
would earn more even in the absence of computers
would earn more even in the absence of computers
• Despite of being a carefully executed empirical analysis, the
author fails to recover the causal effect of using computers on
author fails to recover the causal effect of using computers on
labor earnings
• Results are highly sensitive to changes in control variables.
Results are highly sensitive to changes in control variables
111
112
R b t
Robustness check 1: Controlling for using computer at home
h k 1 C t lli f i t th
113
Applications: DiNardo et al (1997)
Applications: DiNardo et al (1997)
• The authors revisited the question suggested by Krueger
q gg y g
(1993) using German data
• They find similar association between computers and
wages, but –taking advantage of a richer dataset‐ they
show that office tools (pencils, calculators, etc) have
returns in some cases higher than computers
returns in some cases higher than computers
• Although they are no able to prove that the returns of
computers are illusory, they do show how Krueger’ss
computers are illusory, they do show how Krueger
research design is unable to distinguish between a causal
relationship and a relationship due to selection
114
115
Lecture IV:
Matching
Stanislao Maldonado
University of California, Berkeley
Curso CIES‐INEI “Microeconometría: Identificación y
C CIES INEI “Mi t í Id tifi ió
Causalidad en Modelos Empíricos”
Lima, 11 de agosto del 2010
116
1. Introduction
1. Introduction
• Matching offers a way to estimate ATE when:
g y
o Controlled randomization is impossible
o There are not convincing natural experiments
g p
• Key idea: to compute treatment effects using carefully
selected “matches” between treatment and control units
• Problem: selection is based on observables, so depends
on a strong assumption (some form of exogeneity):
g p ( g y)
o Selection into treatment is completely determined by variables
that can be observed by the researcher
117
o “conditioning” on these observable variables the assignment to
treatment is random
• Some “names” for this assumption:
p
o Unconfoundedness (Rosembaum and Rubin 1983, Imbens 2004)
o Selection on observables
o Conditional independence
118
2 Identification
2. Identification
• Assumption A.1: Unconfoundedness
p ((Imbens 2004))
Assignment to treatment is unconfounded given pre‐
treatment variables if:
(1) {Yi (1), Yi (0)} ⊥ Di X
• Equivalent to say:
o Within each cell defined by X treatment is random
Withi h ll d fi d b X t t ti d
o The selection onto treatment depends only on the observables
X
• Assumption A.2: Overlap (Imbens 2004)
(2) 0<Pr { Di = 1 X } < 1
119
• Are these assumptions plausible? (what do you think?)
e ese assu p o s p aus b e ( a do you )
• Given A.1 and A.2:
(3) E [Yi (0) Di = 0, X ] = E [Yi (0) Di = 1, X ] = E [Yi (0) X ]
• These assumptions suggest the following estimation
strategy for ATE:
o Stratify
Stratify the data into cells defined by particular values of X
the data into cells defined by particular values of X
o Within each cell, compute ATE comparing treated and controls
o Average these differences with respect to the distribution of X in
the population of treated units
the population of treated units
121
3. Matching estimators
3. Matching estimators
• Some extra notation:
So e e a o a o
NT : Treated units
N C : Control units
• Define sets of weights with weights in each set:
NT NC
wi ( j ) : (i :1, 2,...NT , j :1, 2,...N C )
• Where: ∑ j
wi ( j ) = 1
• The generic matching estimator for ATE:
1 ⎡ ⎤
m
(6) τ M =
NT
∑ ⎢ yi − ∑ wi ( j ) yi ⎥
i∈{D =1} ⎢ ⎥⎦
⎣ j∈{D = 0}
122
• Just a simple average difference between the treated
us a s p e a e age d e e ce be ee e ea ed
units and the composite comparison units!
• Key:
Key: how to choose , which works as a
how to choose wi ( j ) which works as a “distance
distance
measure” (how “close” are control units to a treated one
for serving as a good counterfactual)
123
4. The “curse of dimensionality” and the
propensity score
• When sample size is small, the set of covariates is large and
p , g
some of them are multi‐valued or (even worse) continue, you
are in trouble (“the curse of dimensionality”)
o With K binary variables the number of cells is 2^k
o If some X are multivalued, then number of cells increases further
o It could be the case that cells will contain only treated (control) units.
Treatment effects can not be computed
• Solution: use the propensity score for solving the
dimensionality problem (Rosenbaum and Rubin 1983)
124
• Definition 1: Propensity score (Rosenbaum and Rubin
e to ope s ty sco e ( ose bau a d ub
1983)
The propensity score is the conditional probability of receiving
the treatment given the pre‐treatment variables:
(6) p (X)=Pr( Di = 1 X ) = E ( Di X )
• Two important properties:
o Lemma 1: Balancing of pre‐treatment characteristics given the
propensity score (See proof in Rosenbaum and Rubin 1983)
(6 ') Di ⊥ X | p ( X )
125
o Lemma 2: Unconfoundedness given the propensity score (See
proof in Rosenbaum and Rubin 1983)
Suppose that assignment to treatment is unconfounded, i.e
(6 '') {Yi (1), Yi (0)} ⊥ Di X
Then, assignment to treatment is unconfounded given the
propensity score,
score i.e
ie
(6 ''') {Yi (1), Yi (0)} ⊥ Di p( X )
• Using p(X), we have:
Ui (X) h
(7) E [Yi (0) Di = 0, p( X ) ] = E [Yi (0) Di = 1, p ( X ) ] = E [Yi (0) p ( X ) ]
127
Estimation strategy with the p(X)
Estimation strategy with the p(X)
1. Estimation of the propensity score
s a o o e p ope s y sco e
True p(X) is unknown!
9 For a given propensity assignment to treatment is random and
therefore treated and controls units are on average
observationally identical
observationally identical
o Any standard probability model can be used to estimate the
p(X), e.g. logit model:
eλ h ( X i )
(10) p (X)=Pr( Di = 1 X ) =
1 + eλ h ( X i ) 128
o Where h(X) is a function of covariates with linear and high
order terms
2. Estimation of the treatment effect (ATE, ATT or ATU)
given the propensity score
i th it
a. Match treated and controls units with the same (computed)
p(X)
b. Compute the treatment effect for each value of the
(computed) p(X)
c. Obtain the average of these conditional effects
• Problems: it is unfeasible to compute ATE for each value
since it is hard to find two units with the same p(X)
129
• Some alternative procedures:
So e a e a e p ocedu es
o Stratification on the score
o Nearest neighbor matching on the score
o Radius matching on the score
o Kernel matching on the score
o Weighting on the basis of the score
Weighting on the basis of the score
130
5. Applications: Arcenaux et al (2006)
5. Applications: Arcenaux et al (2006)
• They use data from a large scale field experiment (
ey use da a o a a ge sca e e d e pe e (
60,000 treated vs. 2 million of control units!) and a rich
set of covariates in order to assess the performance of
matching estimators
hi i
• Matching estimators are severely biased!
• The experiment:
o Conducted in Iowa and Michigan before 2002 election
g
o Treatment: Get‐out‐the‐Vote campaign
131
132
133
134
Lecture V:
Lecture V:
Panel Data: Fixed Effects and Differences‐in‐
Differences
Stanislao Maldonado
University of California, Berkeley
Curso CIES‐INEI “Microeconometría: Identificación y
C CIES INEI “Mi t í Id tifi ió
Causalidad en Modelos Empíricos”
Lima, 12 de agosto del 2010
135
1. Motivation
1. Motivation
• Ou
Our goal: to approximate an experimental design with
goa o app o a e a e pe e a des g
observational data
• Panel
Panel data allow us to exploit temporal and cohort
data allow us to exploit temporal and cohort
dimension to control for unobservables
• Two basic approaches:
Two basic approaches
o Fixed Effects
o Differences in Differences
136
2. Fixed effects
2. Fixed effects
• Motivation: what is the relationship between union
o a o a s e e a o s p be ee u o
membership and wages?
• Consider:
Yit : Observed outcome (e.g: log of earnings of worker i at time t )
Dit : Treatment status (e.g:
(e g: union status)
• We know that can be or depending on
Yit Yit (1) Yit (0)
treatment status
treatment status
• Assume also:
E[Yit (0) Ai , X it , t , Dit ] = E[Yit (0) Ai , X it , t ]
137
• Where:
ee
X it : Vector of observed time-varying covariates
Ai : Vector of unobserved but fixed cofounders
• Union status is as good of randomly assigned conditional
on and
X it Ai
• Key for fixed effects: does not depend on in a linear
Ai t
model for:
E[Yit (0) Ai , X it , t ] = α + λt + Ai'γ + X it' β
• We also assume that causal effect is additive and
constant: E[Y (1) A , X , t ] = E[Y (0) A , X , t ] + ρ
it i it it i it
138
• Replacing:
ep ac g E[Yiti (1) Ai , X iti , t ] = α + λt + ρ Diti + Ai'γ + X iti' β
ρ
• Where is the causal effect of interest
• This implies:
Yit = α i + λt + ρ Dit + X it' β + ε it
• Where:
ε it ≡ Yit (0) − E[Yit (0) Ai , X it , t ]
α i ≡ α + Ai'γ
139
d λt a e t eated as pa a ete s to be est ated
• α i aand are treated as parameters to be estimated in
the fixed effect model
• Too many parameters to estimate?
yp
o We can make this simple by looking for ways to “eliminate” the
individual fixed effects
• Two ways:
o Deviations from means
o Differencing
• Deviation from means:
Deviation from means:
140
o Compute individual averages:
Yi = α i + λ + ρ Di + X i' β + ε i
o Then subtracting:
Then subtracting:
• Differencing:
ΔYit = Δλt + ρΔDit + ΔX it' β + Δε it
o Where:
141
Example: Freeman(1984)
Example: Freeman(1984)
142
• Problems with fixed effects
ob e s ed e ec s
o Measurement error (attenuation bias)
143
3. Differences in Differences (DD)
3. Differences in Differences (DD)
• Motivation: what is the effect of minimum wage on
o a o a s e e ec o u age o
employment?
• Card and Krueger (1994):
g ( )
– Exploit a dramatic change in the New Jersey state minimum
wage (4.25 to 5.05)
– Data from fast foods restaurants in the border between New
D t f f tf d t t i th b d b t N
Jersey and Pennsylvania (February 1992, November 1992)
• The potential outcomes:
The potential outcomes:
Yist (1) : Fast food employment at restaurant i in state s at period t
if there is a high state minimun wage
144
Yist (0) : Fast food employment at restaurant i in state s at period t
if there
h is i a low
l state minimun
i i wage
• We assume:
E[Yist (0) s, t ] = γ s + λt
• Critical for DD: additive structure for potential outcomes
p
in the no‐treatment state
• Let:
⎧1 = high minimum wage states and periods
Dst = ⎨
⎩0 = otherwise
145
• We assume also that:
e assu e a so a i (0) s, t ] = δ
i (1) − Yist
E[Yist
• Then, the observed outcome:
Yist = γ s + λt + δ Dst + ε ist
• Where:
E[ε ist s, t ] = 0
• Then, we get:
, g
E[Yist s = PA, t = Nov] − E[Yist s = PA, t = Feb] = λnov − λ feb
δ
• Where is the causal effect of interest
147
Outcome
Average
g
Treatment
Effect on
Treatment Group the Treated
Control Group
Time
Treatment
148
Example: Card and Krueger (1994)
Example: Card and Krueger (1994)
149
• Is convincing this result?
Common trends
assumption
150
• Us
Using regressions for estimating DD:
g eg ess o s o es a g
Yist = α + γ NJ s + λ dt + δ ( NJ s * dt ) + ε ist
• Where
Where: NJ s * dt = Dst
• In terms of potential outcomes:
α = E[Yist s = PA, t = Feb] = γ PA + λ feb
γ = E[Yist s = NJ,
NJ t = F b] − E[Yist s = PA
Feb] PA, t = F b] = γ NJ + γ PA
Feb]
151
δ = {E[Yist s = NJ, t = Nov] − E[Yist s = NJ, t = Feb]}
−{E[Yist s = PA,
PA t = Nov] − E[Yist s = PA
PA, t = Feb]}
• Advantages:
o Better way to compute standard errors
o More useful when there are more treated and control units
o Easy way to control for covariates
152
Example: Galiani et al (2008)
Example: Galiani et al (2008)
• Galiani, Sebastian, Paul Gertler, Ernesto Schargrodsky.
a a , Sebas a , au e e , es o Sc a g ods y
2008. “School Decentralization: Helping the Good get
Better, but Leaving the Poor Behind”, Journal of Public
Economics. 92(10‐11): 2106‐2120.
153
Lecture VI:
Instrumental Variables
Stanislao Maldonado
University of California, Berkeley
Curso CIES‐INEI “Microeconometría: Identificación y
C CIES INEI “Mi t í Id tifi ió
Causalidad en Modelos Empíricos”
Lima, 12 de agosto del 2010
154
1. Motivation for IV
1. Motivation for IV
• IV was developed by Philip Wright in 1928, who was
p y p g ,
interested in studying the effect of a tax on imported
agricultural goods
• To do that, you need to estimate the demand and supply
curves, but you only observe equilibrium points
(simultaneity)! So, running OLS of quantity on prices will
(simultaneity)! So, running OLS of quantity on prices will
provide biased estimators
• They
They solve this problem by exploiting a third variable that
solve this problem by exploiting a third variable that
shifted supply but not demand. This variable is correlated
with prices (affects supply curve) but uncorrelated with
the unobservable variables (demand remains stable)
the unobservable variables (demand remains stable)
155
Demand and Supply in 3 periods
Demand and Supply in 3 periods
Price
S2
S1
S3
D3
D2
D1
Quantity
156
Eq ilibri m price and Q antit hen onl the S ppl c r e shifts
Equilibrium price and Quantity when only the Supply curve shifts
Price
S2
S1
S3
D1
Quantity
157
• IV allows a consistent estimation of the parameter of
a o s a co s s e es a o o e pa a e e o
interest
• Recall:
OLS consistent OLS is no longer consistent
158
• IV logic:
og c
159
2 Standard IV
2. Standard IV
• Consider:
(1) y = β 0 + β1 x1 + .. + β K xK + u
((2)) E (u ) = 0 and Cov((x j , u ) = 0, j = 1, 2,..K − 1
• But: Cov(xK , u ) ≠ 0
• Suppose
Suppose there exist a variable q in the error term that
there exist a variable q in the error term that
creates endogeneity in xK
• Problem: all betas are inconsistent!
Problem: all betas are inconsistent!
• IV provides a general solution to the endogeneity
problem
160
• We need to observe a variable, z, not in (1) that satisfies,
e eed o obse e a a ab e, , o ( ) a sa s es,
1. Z must be uncorrelated with u:
Cov(z u ) = 0
(3) Cov(z,
xK
2. The relationship between z and must follow the following
criteria:
(4) xK = δ 0 + δ1 x1 + .. + δ K −1 xK −1 + θ1 z + rK (reduced form)
Where: θ1 ≠ 0
• When there is only one variable this expression reduces
to:
Cov(z xK ) ≠ 0
Cov(z,
161
• Replacing (4) in (1):
p g( ) ( )
(5) y = α 0 + α1 x1 + .. + α K −1 xK −1 + λ1 z + v (reduced form for y)
• Where:
v = u + β K rK α j = β j + βKδ j λ1 = β Kθ1
• vv is uncorrelated with all explanatory variables, so OLS
is uncorrelated with all explanatory variables so OLS
can consistently estimate the reduced form parameters
in (5)
• Sometimes, reduced‐form estimates are interesting
• How IV solves the identification problem for β j in (1)?
How IV solves the identification problem for in (1)?
162
• Re‐write (1) as follows:
e e ( ) as o o s
(6) y = X β + u
• Where: X ≡ (1,
(1 x2 ,..., xK )
• Define: Z ≡ (1, x2 ,..., xK −1 , z )
• From (2) and (3), we have the following orthogonality
condition:
(7) E ( Z 'u ) = 0
• Multiplying (6) by Z’, taking expectations and using (7):
(8) E ( Z ' y ) = [ E ( Z ' X )]β + E ( Z 'u ) = [ E ( Z ' X )]β
163
• This system has a unique solution iff
s sys e as a u que so u o E ( Z ' X ) has full
as u
rank. Therefore:
(9) β = [ E ( Z ' X )]−1 E ( Z ' y )
• Given a random sample of X,y and Z, the instrumental
variables estimator of β is:
(10) β IV = [ Z ' X ]−1 Z ' y
164
Where instruments come from?
Where instruments come from?
• In practice, it is hard to find good instruments. There are
p ac ce, s a d o d good s u e s e e a e
two approaches:
o Use economic theory to suggest instruments
y gg
Ex: Wright insight about agricultural markets
o Look for exogenous source of variation that induces changes in
the endogenous regressor
Some examples:
• Distance
Distance to school (Card 1995)
to school (Card 1995)
• Rainfall variation (Miguel et al 2004)
• Colonization process (Acemoglu et al 2001)
165
3. Angrist‐Imbens‐Rubin
3. Angrist Imbens Rubin Causal Model
Causal Model
• Notation:
oa o
o N units denoted by i
o Two levels of treatment: D=0 and D=1
o Y is a measure of the outcome
o z is assignment to treatment: z=0 and z=1. Notice that:
• Assignment to treatment may or may not be random
• Correspondence between assignment and treatment may be imperfect
• Participation into treatment: Di = Di ( Z )
Outcome: Yi = Yi ( Z , D)
• Outcome:
166
• Notice that 3 causal effects can be defined:
o ce a 3 causa e ec s ca be de ed
o The effect of assignment Z on treatment D
o The effect of assignment Z on outcome Y
o The effect of treatment D on outcome Y
• The first two are called “intention to treat” (ITT) effects
• The
The AIR model defines the set of assumptions that
AIR model defines the set of assumptions that
ensures the identification of these effects
167
Assumptions of the AIR causal model
Assumptions of the AIR causal model
• Assumption A.1: Stable treatment unit value
p
assumption (SUTVA)
The potential outcomes and treatments for unit i are
independent of the potential assignments treatment and
independent of the potential assignments, treatment and
outcomes of unit j≠i
Therefore: ((1)) Di ( Z ) = Di ( Z i )
(2) Yi ( Z , D) = Yi ( Z i , Di )
• Then, we can define the ITT effects as follows:
• Definition 1: Causal effect of Z on D for unit i is
Di (1) − Di (0)
168
• Definition 2: Causal effect of Z on Y for unit i is
Yi (1, Di (1)) − Yi (0, Di (0))
• Counterfactual logic requires to think for each individual:
Counterfactual logic requires to think for each individual:
o Potential outcomes: [Yi (1,1), Yi (1, 0), Yi (0,1), Yi (0, 0) ]
Potential treatments: [ Di (0) = 0,
o Potential treatments: 0 Di (1) = 1]
0 Di (0) = 11, Di (1) = 0,
o Potential assignments: [ Z i = 0, Z i = 1]
• Only one state is actually observed!
Only one state is actually observed!
• If SUTVA holds, then we can classify individuals as
follows:
169
Source: Ichino (2006)
170
• Assumption A.2: Random assignment
ssu pt o a do ass g e t
All individuals have the same probability to be assigned
to the treatment
Pr( Z i = 1) = Pr( Z j = 1)
• Using A.1 and A.2, we can consistently estimate the two
g y
ITT effects:
Cov(Di Z i )
((3)) E ( Di Z i = 1)) − E ( Di Z i = 0)) =
V (Z i )
Var(
Cov(Yi Z i )
(4) E (Yi Z i = 1) − E (Yi Z i = 0) =
Var(Z i )
171
• Note that the ratio between (4) and (3) gives the
o e a e a o be ee ( ) a d (3) g es e
conventional IV estimator:
Cov(Y
( i Zi ) Cov((Di Z i ) Cov(Y
( i Zi )
(5) β IV = =
Var(Z i ) Var(Z i ) Cov(Di Z i )
• Questions:
o Under which assumptions this IV estimator gives an estimate of
t e a e age causa e ect o o a d o
the average causal effect of D on Y and for which group?
c g oup
o Does this estimate depends on the instrument we use?
172
• Assumption A.3: Non‐zero average causal effect of Z on
p g
D
The probability of treatment must be different in the two
assignment groups:
assignment groups:
Pr { Di (1) = 1} ≠ Pr { Di (0) = 1}
• Thi
This is similar to the requirement of having the
i i il t th i t f h i th
instrument correlated with the endogenous regressor
• A
Assumption A.4: Exclusion restriction
ti A 4 E l i t i ti
The assignment affects the outcome only through the
ttreatment,
eat e t,
(1 Di ) = Yi (0,
Yi (1, (0 Di ) = Yi ( Di )
173
• As in the standard IV case, A.3 can be tested but A.4
cannot
• Definition 3: Causal effect of D on Y for unit i is
Yi (1) − Yi (0)
• Again: we cannot compute this because counterfactual is
not observed!
• Solution: compare sample averages of the two
components for individuals who are in the two treatment
groups only because of different assignments (compliers
groups only because of different assignments (compliers
and defiers)
• Are these assumptions enough?
p g
174
• Using A.1 to A.4:
Us g o
(6) Yi (1, Di (1)) − Yi (0, Di (0)) = Yi ( Di (1)) − Yi ( Di (0))
Z →Y
Z →D D →Y
• This holds at an individual level!
• Using sample averages:
Using sample averages:
175
(7) E (Yi (1, Di (1)) − Yi (0, Di (0)) ) = E{[ Di (1) − Di (0)][Yi (1) − Yi (0)]}
Z →Y Z →D D →Y
• We
We still have an identification problem! (average effect
still have an identification problem! (average effect
for compliers may cancel with average effects for defiers)
• Assumption
Assumption A.5: Monotonicity
A 5: Monotonicity
No one does the opposite of her assignment, no matter
what the assignment is:
what the assignment is:
Di (1) ≥ Di (0), ∀i
176
• ATE for defiers
o de e s in zero
eo
• Notice that A.3 + A.5 implies strong monotonicity
o There is no defier
o There exist at least one complier
177
Local average treatment effect (LATE)
Local average treatment effect (LATE)
• Given A.5, we can write equation 7 as:
e 5, e ca e equa o as
(8) E (Yi (1, Di (1)) − Yi (0, Di (0)) ) = E{[ Di (1) − Di (0)][Yi (1) − Yi (0)]}
Z →Y Z →D D →Y
• Re
Re‐arranging this expression, we obtain an expression for
arranging this expression we obtain an expression for
LATE:
E (Yi ((1,, Di (1))
( )) − Yi (0, )) )
( , Di ((0))
(9) E{Yi (1) − Yi (0) Di (1) − Di (0) = 1} =
Pr{Di (1) − Di (0) = 1}
178
• Definition 4: LATE
e to
LATE is the average effect of treatment for those who change
treatment status because of a change of the instrument; ie.
the average effect of treatment for compliers
• It can be shown that:
Cov(Y , Z )
(10) E{Yi (1) − Yi (0) Di (1) − Di (0) = 1} = = β IV
Cov( D, Z )
• IV estimand is the LATE. LATE is the only treatment effect that
can be estimated by IV
179
4. Weak instruments
4. Weak instruments
• An instrument is weak when its correlation with the
treatment is low
• Consequences:
o If assumptions related with consistency are satisfied:
o The standard error of the IV estimate increases with the weakness
of the instrument
o In finite samples the IV estimate is biased in the same way as the
OLS estimate, and the weaker the instrument the closer the IV bias
to the OLS bias
o If assumptions related with consistency are violated, the
weakness of the instrument exacerbates the inconsistency of
the IV estimate
180
5 Application: Miguel et al (2004)
5. Application: Miguel et al (2004)
• Miguel et al (2004) study the impact of economic shocks
gue e a ( 00 ) s udy e pac o eco o c s oc s
on the likelihood of civil conflict
• They
They exploit variation in economic conditions due to
exploit variation in economic conditions due to
rainfall variation (a weather shock) in order to overcome
endogeneity issues (omitted bias and reverse causality)
using data for 41 African countries
• The first stage uses the following empirical model:
g g p
181
First stage
Pl b
Placebo test
F‐stat (eq.3)=4.5 (weak instrument!)
182
Reduced form
183
• The second stage:
e seco d s age
184
185
Lecture VII:
Regression Discontinuity Designs
Stanislao Maldonado
University of California, Berkeley
Curso CIES‐INEI “Microeconometría: Identificación y
C CIES INEI “Mi t í Id tifi ió
Causalidad en Modelos Empíricos”
Lima, 12 de agosto del 2010
186
1. Motivation
1. Motivation
• Goal: to approximate a experimental design using
oa o app o a e a e pe e a des g us g
observational data
• In
In absence of random assignment, causal effects can be
absence of random assignment causal effects can be
estimated exploiting characteristics of the assignment
rule
– Example: fellowships programs, poverty programs based on
scores, etc
• LATE can be estimated at the discontinuity that
determines which individuals are assigned to treatment
and to control
and to control
187
Regression discontinuity
eg ess o d sco t u ty
• Examples: Angrist and Lavy (1999), Van der Klauuw
p g y( ),
(2002), Di Nardo and Lee (2005), among others
• When to use this method?
o The treated/non‐treated can be ordered along a quantifiable
dimension
o This dimension can be used to compute a well‐defined index or
parameter
o The index/parameter has a cut
The index/parameter has a cut‐off
off point for elegibility
point for elegibility
o The index value is what drives the assignment of a unit to the
treatment. (or to non‐treatment).
188
• Intuitive explanation of the method:
p
o The treated units just above the cut‐off point are very similar to
the control units just below the cut‐off point.
o We compare outcomes for units just above and below the cutoff
point.
o This estimates the effect of the treatment for units AT the cut‐
off point, and may not be generalizable.
189
• Indexes are common in social programs
o Anti‐poverty programs
targeted to households below a given poverty index
o Pension programs
targeted to population above a certain age
o Scholarships
targeted to students with high scores on standardized test
o CDD Programs
CDD P
awarded to NGOs that achieve highest scores
190
Example: effect of cash transfer on
consumption
• Goal: Target transfer to poorest households
• Method:
o Construct poverty index from 1 to 100 with pre‐intervention
characteristics
o Households with a score <=50 are poor
o Households with a score >50 are non‐poor
• Implementation:
Implementation:
o Cash transfer to poor households
• Evaluation:
o Measure outcomes (i.e. consumption, school attendance rates)
before and after transfer, comparing households just above and
below the cut‐off
below the cut off point.
point
191
80
75
5
Outcome
70
65
e Regression Discontinuity Design - Baseline
Non-poor
Poor
60
6
20 30 40 50 60 70 80
Score 192
80
75
Outcome
e
70
65
Regression Discontinuity Design - Post Intervention
20 30 40 50 60 70 80
Score 193
80
75
Outcome
e Regression Discontinuity Design - Post Intervention
Treatment effect
70
65
20 30 40 50 60 70 80
Score 194
2. Treatment effects in RDD
2. Treatment effects in RDD
• We already know:
e a eady o
o Yi (1), Yi (0) are the potential outcomes
o βi = Yi (1) − Yi (0) is the causal effect
o I i is the treatment status
• If assignment is determined by randomization and full
compliance with treatment:
li ith t t t
{Yi (1), Yi (0)} ⊥ I
• The mean impact:
Th i t
E{βi } = E{Yi (1) I = 1} − E{Yi (0) I = 0}
195
• RDD arises when:
– Treatment status depends on an observable unit characteristic S
– There exist a known point in the support of S where the
probability of participation changes discontinuously
b bili f i i i h di i l
• Let be the discontinuity point, then:
s
+ −
Pr( I = 1 s ) ≠ Pr( I = 1 s )
• WLOG:
+ −
Pr( I = 1 s ) − Pr( I = 1 s ) > 0
196
Sharp and Fuzzy Discontinuity
Sharp and Fuzzy Discontinuity
• Sharp discontinuity
o The discontinuity precisely determines treatment
o Equivalent to random assignment in a neighborhood
o E.g. Social security payment depend directly and immediately
E g Social security payment depend directly and immediately
on a person’s age
• Fuzzy discontinuity
Fuzzy discontinuity
o Discontinuity is highly correlated with treatment .
g p g p p
o Use the assignment as an IV for program participation.
o E.g. Rules determine eligibility but there is a margin of
administrative error.
197
Sharp RDD
Sharp RDD
• Probability of treatment conditional on S steps from zero
obab y o ea e co d o a o S s eps o e o
to one as S crosses the threshold . Therefore:
s
I = 1(( S ≥ s )
• Observed outcome: Yi = Yi (0) + I i ( s ) β
• The
The difference of observed mean outcomes marginally above
difference of observed mean outcomes marginally above
and below is:
s
+ − + −
E{Yi s } − E{Yi s } = E{Yi (0) s } − E{Yi (0) s }
+ −
+ E{I i ( s ) β s } − E{I i ( s ) β s }
198
+ − +
= E{Yi (0) s } − E{Yi (0) s } + E{β s }
+
• The mean treatment effect at is identified if the
s
following condition is true:
g
Condition I: The mean value of Y(0) conditional on S is a
+
continuous function of S at s
continuous function of S at
+ −
E{Yi (0) s } = E{Yi (0) s }
• Counterfactual world: no discontinuity takes place at the
threshold for selection
199
• Then, we can compute:
e , e ca co pu e
+ + +
E{βi s } = E{Yi s } − E{Yi s }
• If sample is large enough: compute these expressions
using data for subjects in a neighborhood of the
discontinuity
• If
If sample is small: use some parametric assumptions
sample is small: use some parametric assumptions
about the regression curve away the discontinuity point
200
Fuzzy RDD
Fuzzy RDD
• Arises when there is no perfect compliance with the
ses e e e s o pe ec co p a ce e
assignation rule
• Treatment
Treatment status depends not only of S but some
status depends not only of S but some
unobservable characteristics
• A new condition is needed:
A new condition is needed
Condition II: The triple is stochastically
{Y (1), Y (0), I ( s )}
independent of S in a neighborhood s
independent of S in a neighborhood
• Standard exclusion restriction in an IV setup:
o S affects the outcome only through its effect on the treatment I
S affects the outcome only through its effect on the treatment I
201
• If Condition 2 is true:
+ −
E{Yi s } − E{Yi s } =
+ − + −
E{β I ( s ) > I ( s )}
)}Pr{{ I ( s ) > I ( s )}
+ − + −
− E{β I ( s ) < I ( s )}
)}Pr{{ I ( s ) < I ( s )}
• Where:
o RHS 1: average effect for compliers, times probability of
compliance
o RHS 2: average effect for defiers, times the probability of
defiance
202
• Remember from your IV class:
e e be o you c ass
o Always takers and never takers do not contribute because their
potential treatment status does not change at the discontinuity
o We need a strong monotonicity
W d i i assumption for ruling out the
i f li h
defiers
• The additional assumption:
Th dditi l ti
Condition 3: Participation into the program is monotone
around the discontinuity
around the discontinuity
• Then, the outcome comparison of subjects above and
below the threshold gives:
below the threshold gives:
203
+ −
+ − E{Yi s } − E{Yi s }
E{βi I ( s ) ≠ I ( s )} = + −
E{I i s } − E{I i s }
• This recovers the mean impact of the treatment in those
individuals in a neighborhood of who would switch their
treatment status if the threshold for participation
p p
switched from just above their score to just below it
• It is an analogous of LATE
• Denominator RHS: proportion of compliers at the
discontinuity
204
A regression framework for fuzzy RDD
A regression framework for fuzzy RDD
• Under the assumptions above:
U de e assu p o s abo e
Y = g (S ) + β T + ε
• Where:
o Y is the observed outcome
o g ( S ) is a polynomial in the score S
o T is a binary indicator that denotes actual exposure to
treatment
o I = 1( S ≥ s ) is the side of the threshold on which each
is the side of the threshold on which each
subject is located
205
3. Examples of RD
3. Examples of RD
• Eff
Effect of class size on scholastic achievement (Angrist
t f l i h l ti hi t (A i t and
d
Lavy, 1999)
• The long‐term effects of colonial mita in Peru
(Dell, 2010)
206
Angrist and Lavy (1999): Using Maimonides’
R l
Rule
• Effect of class size on learning outcomes – i.e. test scores
in 3rd and 4th grade
• Use Maimonides rule
o When there are fewer than 40 pupils: one class
o When there are more than 40 pupils: split the group into two
classes
207
208
209
• Maimonides’rule
a o des u e iss noto bebeingg used in all
a cases,
cases, e.g.
e g there
ee
are classes of 42 pupils
• Fuzzy discontinuity: instrumental variable instrumental
discontinuity: instrumental variable instrumental
o First use Maimonides’ rule to predict the size of the class (T)
o The explain the test results
test results (y) with
(y) with the predict class size (T hat)
(T hat)
210
Dell (2010): the effect of mita
Dell (2010): the effect of mita
• Goal: to estimate the long‐term effects of mining Mita in
Peru
• In Dell’s words:
211
212
213
Advantages of RD
Advantages of RD
• RD yields an unbiased estimate of treatment effect at the
y e ds a u b ased es a e o ea e e ec a e
discontinuity
• Can
Can take advantage of a known rule for assigning the
take advantage of a known rule for assigning the
benefit
o This is common in the design of social interventions
This is common in the design of social interventions
o No need to “exclude” a group of eligible households/ individuals
from treatment
214
Potential Disadvantages of RD
Potential Disadvantages of RD
• Local average treatment effects
o We estimate the effect of the program around the cut‐off point
o This is not always generalizable .
• Power:
o The effect is estimated at the discontinuity, so we generally have
fewer observations than in a randomized experiment with the
same sample size
same sample size
• Specification can be sensitive to functional form: make
sure the relationship between the assignment variable
and the outcome variable is correctly modeled including:
and the outcome variable is correctly modeled, including:
o Nonlinear Relationships
o Interactions
215