Professional Documents
Culture Documents
http:www.aeaweb.org/articles.php?doi=10.1257/jel.48.2.281
281
282 Journal of Economic Literature, Vol. XLVIII (June 2010)
boundaries) to estimate the willingness to pay a highly credible and transparent way of
for good schools. Following these early papers estimating program effects, RD designs can
in the area of education, the past five years be used in a wide variety of contexts cover-
have seen a rapidly growing literature using ing a large number of important economic
RD designs to examine a range of questions. questions. These two facts likely explain
Examples include the labor supply effect of why the RD approach is rapidly becoming
welfare, unemployment insurance, and dis- a major element in the toolkit of empirical
ability programs; the effects of Medicaid on economists.
health outcomes; the effect of remedial edu- Despite the growing importance of RD
cation programs on educational achievement; designs in economics, there is no single com-
the empirical relevance of median voter mod- prehensive summary of what is understood
els; and the effects of unionization on wages about RD designs—when they succeed,
and employment. when they fail, and their strengths and weak-
One important impetus behind this recent nesses.2 Furthermore, the “nuts and bolts” of
flurry of research is a recognition, formal- implementing RD designs in practice are not
ized by Jinyong Hahn, Petra Todd, and van (yet) covered in standard econometrics texts,
der Klaauw (2001), that RD designs require making it difficult for researchers interested
seemingly mild assumptions compared to in applying the approach to do so. Broadly
those needed for other nonexperimental speaking, the main goal of this paper is to fill
approaches. Another reason for the recent these gaps by providing an up-to-date over-
wave of research is the belief that the RD view of RD designs in economics and cre-
design is not “just another” evaluation strat- ating a guide for researchers interested in
egy, and that causal inferences from RD applying the method.
designs are potentially more credible than A reading of the most recent research
those from typical “natural experiment” reveals a certain body of “folk wisdom”
strategies (e.g., difference-in-differences or regarding the applicability, interpretation,
instrumental variables), which have been and recommendations of practically imple-
heavily employed in applied research in menting RD designs. This article represents
recent decades. This notion has a theoreti- our attempt at summarizing what we believe
cal justification: David S. Lee (2008) for- to be the most important pieces of this wis-
mally shows that one need not assume the dom, while also dispelling misconceptions
RD design isolates treatment variation that is that could potentially (and understandably)
“as good as randomized”; instead, such ran- arise for those new to the RD approach.
domized variation is a consequence of agents’ We will now briefly summarize the most
inability to precisely control the assignment important points about RD designs to set
variable near the known cutoff. the stage for the rest of the paper where
So while the RD approach was initially we systematically discuss identification,
thought to be “just another” program evalu- interpretation, and estimation issues. Here,
ation method with relatively little general and throughout the paper, we refer to the
applicability outside of a few specific prob- assignment variable as X. Treatment is, thus,
lems, recent work in economics has shown
quite the opposite.1 In addition to providing
the RD design in economics is unique as it is still rarely
used in other disciplines.
1 See Thomas D. Cook (2008) for an interesting his- 2 See, however, two recent overview papers by van
tory of the RD design in education research, psychology, der Klaauw (2008b) and Guido W. Imbens and Thomas
statistics, and economics. Cook argues the resurgence of Lemieux (2008) that have begun bridging this gap.
Lee and Lemieux: Regression Discontinuity Designs in Economics 283
tilted toward either finding an effect which case has a smaller bias with-
or finding no effect. out knowing something about the true
It has become standard to summarize function. There will be some functions
RD analyses with a simple graph show- where a low-order polynomial is a very
ing the relationship between the out- good approximation and produces little
come and assignment variables. This has or no bias, and therefore it is efficient to
several advantages. The presentation of use all data points—both “close to” and
the “raw data” enhances the transpar- “far away” from the threshold. In other
ency of the research design. A graph can situations, a polynomial may be a bad
also give the reader a sense of whether approximation, and smaller biases will
the “jump” in the outcome variable at occur with a local linear regression. In
the cutoff is unusually large compared to practice, parametric and nonparametric
the bumps in the regression curve away approaches lead to the computation of
from the cutoff. Also, a graphical analy- the exact same statistic.5 For example,
sis can help identify why different func- the procedure of regressing the outcome
tional forms give different answers, and Y on X and a treatment dummy D can
can help identify outliers, which can be be viewed as a parametric regression
a problem in any empirical analysis. The (as discussed above), or as a local linear
problem with graphical presentations, regression with a very large bandwidth.
however, is that there is some room for Similarly, if one wanted to exclude the
the researcher to construct graphs mak- influence of data points in the tails of the
ing it seem as though there are effects X distribution, one could call the exact
when there are none, or hiding effects same procedure “parametric” after trim-
that truly exist. We suggest later in the ming the tails, or “nonparametric” by
paper a number of methods to minimize viewing the restriction in the range of X
such biases in presentation. as a result of using a smaller bandwidth.6
Our main suggestion in estimation is to
• Nonparametric estimation does not not rely on one particular method or
represent a “solution” to functional specification. In any empirical analysis,
form issues raised by RD designs. It is results that are stable across alternative
therefore helpful to view it as a com-
plement to—rather than a substitute
5 See section 1.2 of James L. Powell (1994), where it
for—parametric estimation. is argued that is more helpful to view models rather than
When the analyst chooses a parametric particular statistics as “parametric” or “nonparametric.” It
functional form (say, a low-order poly- is shown there how the same least squares estimator can
simultaneously be viewed as a solution to parametric, semi-
nomial) that is incorrect, the resulting parametric, and nonparametric problems.
estimator will, in general, be biased. 6 The main difference, then, between a parametric and
When the analyst uses a nonparametric nonparametric approach is not in the actual estimation but
rather in the discussion of the asymptotic behavior of the
procedure such as local linear regres- estimator as sample sizes tend to infinity. For example,
sion—essentially running a regression standard nonparametric asymptotics considers what would
using only data points “close” to the happen if the bandwidth h—the width of the “window”
of observations used for the regression—were allowed to
cutoff—there will also be bias.4 With a shrink as the number of observations N tended to infinity.
finite sample, it is impossible to know It turns out that if h → 0 and Nh → ∞ as N → ∞, the bias
will tend to zero. By contrast, with a parametric approach,
when one is not allowed to make the model more flexible
4 Unless the underlying function is exactly linear in the with more data points, the bias would generally remain—
area being examined. even with infinite samples.
Lee and Lemieux: Regression Discontinuity Designs in Economics 285
and equally plausible specifications are said, as we show below, there has been an
generally viewed as more reliable than explosion of discoveries of RD designs that
those that are sensitive to minor changes cover a wide range of interesting economic
in specification. RD is no exception in topics and questions.
this regard. The rest of the paper is organized as fol-
lows. In section 2, we discuss the origins of the
• Goodness-of-fit and other statistical RD design and show how it has recently been
tests can help rule out overly restric- formalized in economics using the potential
tive specifications. outcome framework. We also introduce an
Often the consequence of trying many important theme that we stress throughout
different specifications is that it may the paper, namely that RD designs are partic-
result in a wide range of estimates. ularly compelling because they are close cous-
Although there is no simple formula ins of randomized experiments. This theme is
that works in all situations and con- more formally explored in section 3 where
texts for weeding out inappropriate we discuss the conditions under which RD
specifications, it seems reasonable, at designs are “as good as a randomized experi-
a minimum, not to rely on an estimate ment,” how RD estimates should be inter-
resulting from a specification that can be preted, and how they compare with other
rejected by the data when tested against commonly used approaches in the program
a strictly more flexible specification. evaluation literature. Section 4 goes through
For example, it seems wise to place less the main “nuts and bolts” involved in imple-
confidence in results from a low-order menting RD designs and provides a “guide to
polynomial model when it is rejected practice” for researchers interested in using
in favor of a less restrictive model (e.g., the design. A summary “checklist” highlight-
separate means for each discrete value ing our key recommendations is provided at
of X). Similarly, there seems little reason the end of this section. Implementation issues
to prefer a specification that uses all the in several specific situations (discrete assign-
data if using the same specification, but ment variable, panel data, etc.) are covered in
restricting to observations closer to the section 5. Based on a survey of the recent lit-
threshold, gives a substantially (and sta- erature, section 6 shows that RD designs have
tistically) different answer. turned out to be much more broadly applica-
ble in economics than was originally thought.
Although we (and the applied literature) We conclude in section 7 by discussing recent
sometimes refer to the RD “method” or progress and future prospects in using and
“approach,” the RD design should perhaps interpreting RD designs in economics.
be viewed as more of a description of a par-
ticular data generating process. All other
2. Origins and Background
things (topic, question, and population of
interest) equal, we as researchers might pre- In this section, we set the stage for the rest
fer data from a randomized experiment or of the paper by discussing the origins and the
from an RD design. But in reality, like the basic structure of the RD design, beginning
randomized experiment—which is also more with the classic work of Thistlethwaite and
appropriately viewed as a particular data Campbell (1960) and moving to the recent
generating process rather than a “method” of interpretation of the design using modern
analysis—an RD design will simply not exist tools of program evaluation in economics
to answer a great number of questions. That (potential outcomes framework). One of
286 Journal of Economic Literature, Vol. XLVIII (June 2010)
the main virtues of the RD approach is that Thistlethwaite and Campbell (1960) pro-
it can be naturally presented using simple vide some graphical intuition for why the
graphs, which greatly enhances its credibility coefficient τ could be viewed as an estimate
and transparency. In light of this, the major- of the causal effect of the award. We illustrate
ity of concepts introduced in this section are their basic argument in figure 1. Consider an
represented in graphical terms to help cap- individual whose score X is exactly c. To get
ture the intuition behind the RD design. the causal effect for a person scoring c, we
need guesses for what her Y would be with
2.1 Origins
and without receiving the treatment.
The RD design was first introduced by If it is “reasonable” to assume that all
Thistlethwaite and Campbell (1960) in their factors (other than the award) are evolving
study of the impact of merit awards on the “smoothly” with respect to X, then B′ would
future academic outcomes (career aspira- be a reasonable guess for the value of Y of
tions, enrollment in postgraduate programs, an individual scoring c (and hence receiving
etc.) of students. Their study exploited the the treatment). Similarly, A′′ would be a rea-
fact that these awards were allocated on the sonable guess for that same individual in the
basis of an observed test score. Students with counterfactual state of not having received
test scores X, greater than or equal to a cut- the treatment. It follows that B′ − A′′ would
off value c, received the award, while those be the causal estimate. This illustrates the
with scores below the cutoff were denied the intuition that the RD estimates should use
award. This generated a sharp discontinuity observations “close” to the cutoff (e.g., in this
in the “treatment” (receiving the award) as case at points c′ and c′′ ).
a function of the test score. Let the receipt There is, however, a limitation to the intu-
of treatment be denoted by the dummy vari- ition that “the closer to c you examine, the
able D ∈ {0, 1}, so that we have D = 1 if better.” In practice, one cannot “only” use
X ≥ c and D = 0 if X < c. data close to the cutoff. The narrower the
At the same time, there appears to be no area that is examined, the less data there are.
reason, other than the merit award, for future In this example, examining data any closer
academic outcomes, Y, to be a discontinuous than c′ and c′′ will yield no observations at all!
function of the test score. This simple rea- Thus, in order to produce a reasonable guess
soning suggests attributing the discontinu- for the treated and untreated states at X = c
ous jump in Y at c to the causal effect of the with finite data, one has no choice but to use
merit award. Assuming that the relationship data away from the discontinuity.7 Indeed,
between Y and X is otherwise linear, a sim- if the underlying function is truly linear, we
ple way of estimating the treatment effect τ know that the best linear unbiased estima-
is by fitting the linear regression tor of τ is the coefficient on D from OLS
estimation (using all of the observations) of
(1) Y = α + Dτ + Xβ + ε, equation (1).
This simple heuristic presentation illus-
where ε is the usual error term that can be trates two important features of the RD
viewed as a purely random error generat-
ing variation in the value of Y around the
regression line α + Dτ + Xβ. This case is 7 Interestingly, the very first application of the RD
depicted in figure 1, which shows both the design by Thistlethwaite and Campbell (1960) was based
on discrete data (interval data for test scores). As a result,
true underlying function and numerous real- their paper clearly points out that the RD design is funda-
izations of ε. mentally based on an extrapolation approach.
Lee and Lemieux: Regression Discontinuity Designs in Economics 287
3
Outcome variable (Y)
B′
2 τ
A″
0
c″ c c′
Assignment variable (X)
design. First, in order for this approach to in the theoretical work of Hahn, Todd, and
work, “all other factors” determining Y must van der Klaauw (2001), who described the
be evolving “smoothly” with respect to X. If RD evaluation strategy using the language
the other variables also jump at c, then the of the treatment effects literature. Hahn,
gap τ will potentially be biased for the treat- Todd, and van der Klaauw (2001) noted the
ment effect of interest. Second, since an RD key assumption of a valid RD design was that
estimate requires data away from the cut- “all other factors” were “continuous” with
off, the estimate will be dependent on the respect to X, and suggested a nonparamet-
chosen functional form. In this example, if ric procedure for estimating τ that did not
the slope β were (erroneously) restricted to assume underlying linearity, as we have in
equal zero, it is clear the resulting OLS coef- the simple example above.
ficient on D would be a biased estimate of The necessity of the continuity assump-
the true discontinuity gap. tion is seen more formally using the “poten-
tial outcomes framework” of the treatment
2.2 RD Designs and the Potential Outcomes
effects literature with the aid of a graph. It is
Framework
typically imagined that, for each individual i,
While the RD design was being imported there exists a pair of “potential” outcomes:
into applied economic research by studies Yi(1) for what would occur if the unit were
such as van der Klaauw (2002), Black (1999), exposed to the treatment and Yi(0) if not
and Angrist and Lavy (1999), the identifica- exposed. The causal effect of the treatment is
tion issues discussed above were formalized represented by the difference Yi(1) − Yi(0).
288 Journal of Economic Literature, Vol. XLVIII (June 2010)
4.00
3.50 Observed
3.00 D
E F
Outcome variable (Y)
2.50
B′
B
2.00
E[Y(1)|X]
A
1.50 Observed
A′
1.00
0.50
E[Y(0)|X]
0.00
Xd
0 0.5 1 1.5 2 2.5 3 3.5 4
Figure 2. Nonlinear RD
this continuity condition enables us to use c annot, therefore, be correlated with any
the average outcome of those right below other factor.9
the cutoff (who are denied the treat- At the same time, the other standard
ment) as a valid counterfactual for those assumption of overlap is violated since,
right above the cutoff (who received the strictly speaking, it is not possible to
treatment). observe units with either D = 0 or D = 1
Although the potential outcome frame- for a given value of the assignment variable
work is very useful for understanding how X. This is the reason the continuity assump-
RD designs work in a framework applied tion is required—to compensate for the
economists are used to dealing with, it also failure of the overlap condition. So while
introduces some difficulties in terms of we cannot observe treatment and non-
interpretation. First, while the continuity treatment for the same value of X, we can
assumption sounds generally plausible, it is observe the two outcomes for values of X
not completely clear what it means from an around the cutoff point that are arbitrarily
economic point of view. The problem is that close to each other.
since continuity is not required in the more
traditional applications used in econom- 2.3 RD Design as a Local Randomized
ics (e.g., matching on observables), it is not Experiment
obvious what assumptions about the behav-
ior of economic agents are required to get When looking at RD designs in this way,
continuity. one could get the impression that they
Second, RD designs are a fairly pecu- require some assumptions to be satisfied,
liar application of a “selection on observ- while other methods such as matching on
ables” model. Indeed, the view in James J. observables and IV methods simply require
Heckman, Robert J. Lalonde, and Jeffrey A. other assumptions.10 From this point of
Smith (1999) was that “[r]egression discon- view, it would seem that the assumptions
tinuity estimators constitute a special case for the RD design are just as arbitrary as
of selection on observables,” and that the those used for other methods. As we discuss
RD estimator is “a limit form of matching throughout the paper, however, we do not
at one point.” In general, we need two cru- believe this way of looking at RD designs
cial conditions for a matching/selection on does justice to their important advantages
observables approach to work. First, treat- over most other existing methods. This
ment must be randomly assigned conditional point becomes much clearer once we com-
on observables (the ignorability or uncon- pare the RD design to the “gold standard”
foundedness assumption). In practice, this is of program evaluation methods, random-
typically viewed as a strong, and not particu- ized experiments. We will show that the
larly credible, assumption. For instance, in a RD design is a much closer cousin of ran-
standard regression framework this amounts domized experiments than other competing
to assuming that all relevant factors are con- methods.
trolled for, and that no omitted variables are
correlated with the treatment dummy. In an 9 In technical terms, the treatment dummy D follows a
RD design, however, this crucial assumption degenerate (concentrated at D = 0 or D = 1), but nonethe-
is trivially satisfied. When X ≥ c, the treat- less random distribution conditional on X. Ignorability is
ment dummy D is always equal to 1. When thus trivially satisfied.
10 For instance, in the survey of Angrist and Alan B.
X < c, D is always equal to 0. Conditional Krueger (1999), RD is viewed as an IV estimator, thus hav-
on X, there is no variation left in D, so it ing essentially the same potential drawbacks and pitfalls.
290 Journal of Economic Literature, Vol. XLVIII (June 2010)
4.0
Observed (treatment)
3.5 E[Y(1)|X]
3.0
Outcome variable (Y)
2.5
2.0
Observed (control)
1.5 E[Y(0)|X]
1.0
0.5
0.0
0 0.5 1 1.5 2 2.5 3 3.5 4
Assignment variable (random number, X)
within a month of receiving the treatment. that the RD design is more closely related
If people with a larger monetary compen- to randomized experiments than to other
sation can afford to take more time looking popular program evaluation methods such
for a job, the potential outcome curves will as matching on observables, difference-in-
no longer be flat and will slope upward. The differences, and IV.
reason is that having a higher random num-
ber, i.e., a lower monetary compensation,
3. Identification and Interpretation
increases the probability of finding a job. So
in this “smoothly contaminated” randomized This section discusses a number of issues
experiment, the potential outcome curves of identification and interpretation that arise
will instead look like the classical RD design when considering an RD design. Specifically,
case depicted in figure 2. the applied researcher may be interested
Unlike a classical randomized experi- in knowing the answers to the following
ment, in this contaminated experiment questions:
a simple comparison of means no longer
yields a consistent estimate of the treatment 1. How do I know whether an RD design
effect. By focusing right around the thresh- is appropriate for my context? When
old, however, an RD approach would still are the identification assumptions plau-
yield a consistent estimate of the treatment sible or implausible?
effect associated with job search assistance.
The reason is that since people just above 2. Is there any way I can test those
or below the cutoff receive (essentially) the assumptions?
same monetary compensation, we still have
locally a randomized experiment around the 3. To what extent are results from RD
cutoff point. Furthermore, as in a random- designs generalizable?
ized experiment, it is possible to test whether
randomization “worked” by comparing the On the surface, the answers to these
local values of baseline covariates on the two questions seem straightforward: (1) “An
sides of the cutoff value. RD design will be appropriate if it is plau-
Of course, this particular example is sible that all other unobservable factors are
highly artificial. Since we know the monetary “continuously” related to the assignment
compensation is a continuous function of variable,” (2) “No, the continuity assump-
X, we also know the continuity assumption tion is necessary, so there are no tests for
required for the RD estimates of the treat- the validity of the design,” and (3) “The RD
ment effect to be consistent is also satisfied. estimate of the treatment effect is only appli-
The important result, due to Lee (2008), cable to the subpopulation of individuals at
that we will show in the next section is that the discontinuity threshold, and uninforma-
the conditions under which we locally have tive about the effect anywhere else.” These
a randomized experiment (and continuity) answers suggest that the RD design is no
right around the cutoff point are remark- more compelling than, say, an instrumen-
ably weak. Furthermore, in addition to tal variables approach, for which the analo-
being weak, the conditions for local random- gous answers would be (1) “The instrument
ization are testable in the same way global must be uncorrelated with the error in the
randomization is testable in a randomized outcome equation,” (2) “The identification
experiment by looking at whether baseline assumption is ultimately untestable,” and (3)
covariates are balanced. It is in this sense “The estimated treatment effect is applicable
292 Journal of Economic Literature, Vol. XLVIII (June 2010)
plausible that, among those scoring near the 3.1.1 Randomized Experiments from
threshold, it is a matter of “luck” as to which Nonrandom Selection
side of the threshold they land. Type A stu-
dents can exert more effort—because they To see how the inability to precisely con-
know a scholarship is at stake—but they do trol the assignment variable leads to a source
not know the exact score they will obtain. In of randomized variation in the treatment,
this scenario, it would be reasonable to argue consider a simplified formulation of the RD
that those who marginally failed and passed design:11
would be otherwise comparable, and that an
RD analysis would be appropriate and would (2) Y = Dτ + Wδ1 + U
yield credible estimates of the impact of the
scholarship. D = 1[X ≥ c]
These two examples make it clear that one
must have some knowledge about the mech- X = Wδ2 + V,
anism generating the assignment variable
beyond knowing that, if it crosses the thresh- where Y is the outcome of interest, D is the
old, the treatment is “turned on.” It is “folk binary treatment indicator, and W is the
wisdom” in the literature to judge whether vector of all predetermined and observable
the RD is appropriate based on whether characteristics of the individual that might
individuals could manipulate the assignment impact the outcome and/or the assignment
variable and precisely “sort” around the dis- variable X.
continuity threshold. The key word here is This model looks like a standard endog-
“precise” rather than “manipulate.” After enous dummy variable set-up, except that
all, in both examples above, individuals do we observe the assignment variable, X. This
exert some control over the test score. And allows us to relax most of the other assump-
indeed, in virtually every known application tions usually made in this type of model.
of the RD design, it is easy to tell a plausi- First, we allow W to be endogenously deter-
ble story that the assignment variable is to mined as long as it is determined prior to
some degree influenced by someone. But V. Second, we take no stance as to whether
individuals will not always be able to have some elements of δ1 or δ2 are zero (exclusion
precise control over the assignment variable. restrictions). Third, we make no assump-
It should perhaps seem obvious that it is nec- tions about the correlations between W, U,
essary to rule out precise sorting to justify and V.12
the use of an RD design. After all, individ- In this model, individual heterogeneity in
ual self-selection into treatment or control the outcome is completely described by the
regimes is exactly why simple comparison of pair of random variables (W, U); anyone with
means is unlikely to yield valid causal infer- the same values of (W, U) will have one of
ences. Precise sorting around the threshold two values for the outcome, depending on
is self-selection. whether they receive treatment. Note that,
What is not obvious, however, is that,
when one formalizes the notion of having 11 We use a simple linear endogenous dummy variable
imprecise control over the assignment vari- setup to describe the results in this section, but all of the
able, there is a striking consequence: the results could be stated within the standard potential out-
variation in the treatment in a neighborhood comes framework, as in Lee (2008).
12 This is much less restrictive than textbook descrip-
of the threshold is “as good as randomized.” tions of endogenous dummy variable systems. It is typically
We explain this below. assumed that (U, V ) is independent of W.
294 Journal of Economic Literature, Vol. XLVIII (June 2010)
Imprecise control
Precise control
“Complete control”
Density
0
x
since RD designs are implemented by run- Now consider the distribution of X, condi-
ning regressions of Y on X, equation (2) looks tional on a particular pair of values W = w,
peculiar since X is not included with W and U = u. It is equivalent (up to a translational
U on the right hand side of the equation. We shift) to the distribution of V conditional on
could add a function of X to the outcome W = w, U = u. If an individual has complete
equation, but this would not make a differ- and exact control over X, we would model it
ence since we have not made any assump- as having a degenerate distribution, condi-
tions about the joint distribution of W, U, and tional on W = w, U = u. That is, in repeated
V. For example, our setup allows for the case trials, this individual would choose the same
where U = Xδ3 + U′, which yields the out- score. This is depicted in figure 4 as the thick
come equation Y = Dτ + Wδ1 + Xδ3 + U′. line.
For the sake of simplicity, we work with the If there is some room for error but indi-
simple case where X is not included on the viduals can nevertheless have precise control
right hand side of the equation.13 about whether they will fail to receive the
treatment, then we would expect the density r andomized in a neighborhood of the thresh-
of X to be zero just below the threshold, but old. To see this, note that by Bayes’ Rule, we
positive just above the threshold, as depicted have
in figure 4 as the truncated distribution. This
density would be one way to model the first (3) Pr[W = w, U = u | X = x]
example described above for the type A stu-
= f (x | W = w, U = u) __
dents. Since type A students know about the Pr[W = w, U = u]
,
scholarship, they will double-check their f(x)
answers and make sure they answer the easy
questions, which comprise 50 percent of the where f (∙) and f (∙ | ∙) are marginal and
test. How high they score above the pass- conditional densities for X. So when
ing threshold will be determined by some f (x | W = w, U = u) is continuous in x, the
randomness. right hand side will be continuous in x, which
Finally, if there is stochastic error in the therefore means that the distribution of W, U
assignment variable and individuals do not conditional on X will be continuous in x.15
have precise control over the assignment That is, all observed and unobserved prede-
variable, we would expect the density of X termined characteristics will have identical
(and hence V ), conditional on W = w, U = u distributions on either side of x = c, in the
to be continuous at the discontinuity thresh- limit, as we examine smaller and smaller
old, as shown in figure 4 as the untruncated neighborhoods of the threshold.
distribution.14 It is important to emphasize In sum,
that, in this final scenario, the individual still
has control over X: through her efforts, she Local Randomization: If individuals have
can choose to shift the distribution to the imprecise control over X as defined above,
right. This is the density for someone with then Pr[W = w, U = u | X = x] is continu-
W = w, U = u, but may well be different— ous in x: the treatment is “as good as” ran-
with a different mean, variance, or shape of domly assigned around the cutoff.
the density—for other individuals, with dif-
ferent levels of ability, who make different In other words, the behavioral assumption
choices. We are assuming, however, that all that individuals do not precisely manipulate
individuals are unable to precisely control X around the threshold has the prediction
the score just around the threshold. that treatment is locally randomized.
This is perhaps why RD designs can be
Definition: We say individuals have so compelling. A deeper investigation into
imprecise control over X when conditional the real-world details of how X (and hence
on W = w and U = u, the density of V (and D) is determined can help assess whether it
hence X) is continuous. is plausible that individuals have precise or
imprecise control over X. By contrast, with
When individuals have imprecise con-
trol over X this leads to the striking implica-
15 Since the potential outcomes Y(0) and Y(1) are func-
tion that variation in treatment status will be
tions of W and U, it follows that the distribution of Y(0)
and Y(1) conditional on X is also continuous in x when indi-
viduals have imprecise control over X. This implies that
14 For example, this would be plausible when X is a the conditions usually invoked for consistently estimating
test score modeled as a sum of Bernoulli random vari- the treatment effect (the conditional means E[Y(0) | X = x]
ables, which is approximately normal by the central limit and E[Y(1) | X = x] being continuous in x) are also satisfied.
theorem. See Lee (2008) for more detail.
296 Journal of Economic Literature, Vol. XLVIII (June 2010)
most nonexperimental evaluation contexts, 3.2.2 Testing the Validity of the RD Design
learning about how the treatment variable is
determined will rarely lead one to conclude An almost equally important implication of
that it is “as good as” randomly assigned. the above local random assignment result is
that it makes it possible to empirically assess
3.2 Consequences of Local Random
the prediction that Pr[W = w, U = u | X = x]
Assignment
is continuous in x. Although it is impossible
There are three practical implications of to test this directly—since U is unobserved—
the above local random assignment result. it is nevertheless possible to assess whether
Pr[W = w | X = x] is continuous in x at the
3.2.1 Identification of the Treatment Effect
threshold. A discontinuity would indicate a
First and foremost, it means that the dis- failure of the identifying assumption.
continuity gap at the cutoff identifies the This is akin to the tests performed to
treatment effect of interest. Specifically, we empirically assess whether the randomiza-
have tion was carried out properly in randomized
experiments. It is standard in these analyses
E[Y | X = c + ε]
l
im to demonstrate that treatment and control
ε↓0
groups are similar in their observed base-
− lim
E[Y | X = c + ε] line covariates. It is similarly impossible to
ε↑0
test whether unobserved characteristics are
∑ ( wδ1 + u)
balanced in the experimental context, so the
= τ + l
im
ε↓0 w,u most favorable statement that can be made
about the experiment is that the data “failed
× Pr[W = w, U = u | X = c + ε] to reject” the assumption of randomization.
Performing this kind of test is arguably
∑
− lim (wδ1 + u)
more important in the RD design than in
ε↑0 w,u the experimental context. After all, the true
nature of individuals’ control over the assign-
× Pr[W = w, U = u | X = c + ε]
ment variable—and whether it is precise or
imprecise—may well be somewhat debat-
= τ,
able even after a great deal of investigation
into the exact treatment-assignment mecha-
where the last line follows from the continu- nism (which itself is always advisable to do).
ity of Pr[W = w, U = u | X = x]. Imprecision of control will often be nothing
As we mentioned earlier, nothing changes more than a conjecture, but thankfully it has
if we augment the model by adding a direct testable predictions.
impact of X itself in the outcome equation, There is a complementary, and arguably
as long as the effect of X on Y does not jump more direct and intuitive test of the impre-
at the cutoff. For example, in the example of cision of control over the assignment vari-
Thistlethwaite and Campbell (1960), we can able: examination of the density of X itself,
allow higher test scores to improve future as suggested in Justin McCrary (2008). If the
academic outcomes (perhaps by raising the density of X for each individual is continu-
probability of admission to higher quality ous, then the marginal density of X over the
schools) as long as that probability does not population should be continuous as well. A
jump at precisely the same cutoff used to jump in the density at the threshold is proba-
award scholarships. bly the most direct evidence of some degree
Lee and Lemieux: Regression Discontinuity Designs in Economics 297
of sorting around the threshold, and should researchers will include them in regressions,
provoke serious skepticism about the appro- because doing so can reduce the sampling
priateness of the RD design.16 Furthermore, variability in the estimator. Arguably the
one advantage of the test is that it can always greatest potential for this occurs when one
be performed in a RD setting, while testing of the baseline covariates is a pre-random-
whether the covariates W are balanced at the assignment observation on the dependent
threshold depends on the availability of data variable, which may likely be highly corre-
on these covariates. lated with the post-assignment outcome vari-
This test is also a partial one. Whether each able of interest.
individual’s ex ante density of X is continuous The local random assignment result allows
is fundamentally untestable since, for each us to apply these ideas to the RD context. For
individual, we only observe one realization of example, if the lagged value of the depen-
X. Thus, in principle, at the threshold some dent variable was determined prior to the
individuals’ densities may jump up while oth- realization of X, then the local randomization
ers may sharply fall, so that in the aggregate, result will imply that that lagged dependent
positives and negatives offset each other variable will have a continuous relationship
making the density appear continuous. In with X. Thus, performing an RD analysis on
recent applications of RD such occurrences Y minus its lagged value should also yield the
seem far-fetched. Even if this were the case, treatment effect of interest. The hope, how-
one would certainly expect to see, after strat- ever, is that the differenced outcome mea-
ifying by different values of the observable sure will have a sufficiently lower variance
characteristics, some discontinuities in the than the level of the outcome, so as to lower
density of X. These discontinuities could be the variance in the RD estimator.
detected by performing the local randomiza- More formally, we have
tion test described above.
E[Y − Wπ | X = c + ε]
lim
3.2.3 Irrelevance of Including Baseline ε↓0
Covariates
− lim
E[Y − Wπ | X = c + ε]
A consequence of a randomized experi- ε↑0
ment is that the assignment to treatment is,
by construction, independent of the base- ∑
= τ + l
im ( w(δ1 − π) + u)
ε↓0 w,u
line covariates. As such, it is not necessary to
include them to obtain consistent estimates
× Pr[W = w, U = u | X = c + ε]
of the treatment effect. In practice, however,
16 Another possible source of discontinuity in the ∑
− lim (w(δ1 − π) + u)
ε↑0 w,u
density of the assignment variable X is selective attrition.
For example, John DiNardo and Lee (2004) look at the
effect of unionization on wages several years after a union × Pr[W = w, U = u | X = c + ε]
representation vote was taken. In principle, if firms that
were unionized because of a majority vote are more likely
to close down, then conditional on firm survival at a later = τ,
date, there will be a discontinuity in X (the vote share) that
could threaten the validity of the RD design for estimat-
ing the effect of unionization on wages (conditional on where Wπ is any linear function, and W can
survival). In that setting, testing for a discontinuity in the include a lagged dependent variable, for
density (conditional on survival) is similar to testing for
selective attrition (linked to treatment status) in a standard example. We return to how to implement
randomized experiment. this in practice in section 4.4.
298 Journal of Economic Literature, Vol. XLVIII (June 2010)
c ovariates that can predict the assignment weight is given to those districts in which a
could be used in conjunction with the RD close election race was expected.
gap to learn about average treatment effects
3.4 Variations on the Regression
for the overall population. The understanding
Discontinuity Design
of the RD gap as a weighted average treat-
ment effect serves to highlight that RD causal To this point, we have focused exclu-
evidence is not somehow fundamentally dis- sively on the “classic” RD design introduced
connected from the average treatment effect by Thistlethwaite and Campbell (1960),
that is often of interest to researchers. whereby there is a single binary treatment
It is important to emphasize that the RD and the assignment variable perfectly pre-
gap is not informative about the treatment dicts treatment receipt. We now discuss two
if it were defined as “receipt of a scholar- variants of this base case: (1) when there is
ship that is awarded by scoring 90 percent so-called “imperfect compliance” of the rule
or higher on the scholarship exam.” This is and (2) when the treatment of interest is a
not so much a “drawback” of the RD design continuous variable.
as a limitation shared with even a carefully In both cases, the notion that the RD
controlled randomized experiment. For design generates local variation in treatment
example, if we randomly assigned financial that is “as good as randomly assigned” is
aid awards to low-achieving students, what- helpful because we can apply known results
ever treatment effect we estimate may not for randomized instruments to the RD
be informative about the effect of financial design, as we do below. The notion is also
aid for high-achieving students. helpful for addressing other data problems,
In some contexts, the treatment effect such as differential attrition or sample selec-
“away from the discontinuity threshold” may tion, whereby the treatment affects whether
not make much practical sense. Consider the or not you observe the outcome of interest.
RD analysis of incumbency in congressional The local random assignment result means
elections of Lee (2008). When the treatment that, in principle, one could extend the ideas
is “being the incumbent party,” it is implic- of Joel L. Horowitz and Charles F. Manski
itly understood that incumbency entails win- (2000) or Lee (2009), for example, to provide
ning the previous election by obtaining at bounds on the treatment effect, accounting
least 50 percent of the vote.17 In the election for possible sample selection bias.
context, the treatment “being the incum-
3.4.1. Imperfect Compliance: The
bent party by virtue of winning an election,
“Fuzzy” RD
whereby 90 percent of the vote is required
to win” simply does not apply to any real-life In many settings of economic interest,
situation. Thus, in this context, it is awkward treatment is determined partly by whether
to interpret the RD gap as “the effect of the assignment variable crosses a cutoff point.
incumbency that exists at 50 percent vote- This situation is very important in practice for
share threshold” (as if there is an effect at a variety of reasons, including cases of imper-
a 90 percent threshold). Instead it is more fect take-up by program participants or when
natural to interpret the RD gap as estimat- factors other than the threshold rule affect
ing a weighted average treatment effect of the probability of program participation.
incumbency across all districts, where more Starting with William M. K. Trochim (1984),
this setting has been referred to as a “fuzzy”
17 For this example, consider the simplified case of a RD design. In the case we have discussed
two-party system. so far—the “sharp” RD design—the
300 Journal of Economic Literature, Vol. XLVIII (June 2010)
probability of treatment jumps from 0 to 1 and “excludability” (i.e., X crossing the cutoff
when X crosses the threshold c. The fuzzy cannot impact Y except through impacting
RD design allows for a smaller jump in the receipt of treatment). When these assump-
probability of assignment to the treatment at tions are made, it follows that18
the threshold and only requires
τF = E[Y(1) − Y(0) |unit is complier, X = c],
Pr (D = 1 | X = c + ε)
lim
ε↓0
where “compliers” are units that receive the
≠ Pr (D = 1 | X = c + ε).
lim treatment when they satisfy the cutoff rule
ε↑0
(Xi ≥ c), but would not otherwise receive it.
Since the probability of treatment jumps In summary, if there is local random
by less than one at the threshold, the jump in assignment (e.g., due to the plausibility of
the relationship between Y and X can no lon- individuals’ imprecise control over X), then
ger be interpreted as an average treatment we can simply apply all of what is known
effect. As in an instrumental variable setting about the assumptions and interpretability
however, the treatment effect can be recov- of instrumental variables. The difference
ered by dividing the jump in the relationship between the “sharp” and “fuzzy” RD design
between Y and X at c by the fraction induced is exactly parallel to the difference between
to take-up the treatment at the threshold— the randomized experiment with perfect
in other words, the discontinuity jump in the compliance and the case of imperfect com-
relation between D and X. In this setting, the pliance, when only the “intent to treat” is
treatment effect can be written as randomized.
limε↓0 E[Y | X = c + ε] − limε↑0 E[Y | X = c + ε]
For example, in the case of imperfect
___________________________
τF =
, compliance, even if a proposed binary instru-
lim E[D | X = c + ε] − lim E[D | X = c + ε]
ε↓0 ε↑0
ment Z is randomized, it is necessary to rule
where the subscript “F” refers to the fuzzy out the possibility that Z affects the outcome,
RD design. outside of its influence through treatment
There is a close analogy between how the receipt, D. Only then will the instrumental
treatment effect is defined in the fuzzy RD variables estimand—the ratio of the reduced
design and in the well-known “Wald” formu- form effects of Z on Y and of Z on D—be
lation of the treatment effect in an instru- properly interpreted as a causal effect of D
mental variables setting. Hahn, Todd and on Y. Similarly, supposing that individuals do
van der Klaauw (2001) were the first to show not have precise control over X, it is neces-
this important connection and to suggest sary to assume that whether X crosses the
estimating the treatment effect using two- threshold c (the instrument) has no impact
stage least-squares (TSLS) in this setting. on y except by influencing D. Only then will
We discuss estimation of fuzzy RD designs in the ratio of the two RD gaps in Y and D be
greater detail in section 4.3.3. properly interpreted as a causal effect of D
Hahn, Todd and van der Klaauw (2001) on Y.
furthermore pointed out that the interpreta- In the same way that it is important to
tion of this ratio as a causal effect requires verify a strong first-stage relationship in an
the same assumptions as in Imbens and IV design, it is equally important to verify
Angrist (1994). That is, one must assume
“monotonicity” (i.e., X crossing the cutoff
cannot simultaneously cause some units to 18 See Imbens and Lemieux (2008) for a more formal
take up and others to reject the treatment) exposition.
Lee and Lemieux: Regression Discontinuity Designs in Economics 301
that a discontinuity exists in the relationship impact the continuous regressor of interest
between D and X in a fuzzy RD design. T. If γ = 0 and U2 = 0, then the model col-
Furthermore, in this binary-treatment– lapses to a “sharp” RD design (with a con-
binary-instrument context with unrestricted tinuous regressor).
heterogeneity in treatment effects, the IV Note that we make no additional assump-
estimand is interpreted as the average treat- tions about U2 (in terms of its correlation
ment effect “for the subpopulation affected with W or V ). We do continue to assume
by the instrument,” (or LATE). Analogously, imprecise control over X (conditional on W
the ratio of the RD gaps in Y and D (the and U1, the density of X is continuous).19
“fuzzy design” estimand) can be interpreted Given the discussion so far, it is easy to
as a weighted LATE, where the weights show that
reflect the ex ante likelihood the individual’s
X is near the threshold. In both cases, the (7) E[Y | X = c + ε]
lim
ε↓0
exclusion restriction and monotonicity con-
dition must hold. − lim
E[Y | X = c + ε]
ε↑0
3.4.2 Continuous Endogenous Regressor
= U
lim
E[T | X = c + ε]
ε↓0
In a context where the “treatment” is a
continuous variable—call it T—and there − γ.
E[T | X = c + ε]V
lim
ε↑0
is a randomized binary instrument (that can
additionally be excluded from the outcome The left hand side is simply the “reduced
equation), an IV approach is an obvious way form” discontinuity in the relation between
of obtaining an estimate of the impact of T y and X. The term preceding γ on the right
on Y. The IV estimand is the reduced-form hand side is the “first-stage” discontinuity in
impact of Z on Y divided by the first-stage the relation between T and X, which is also
impact of Z on T. estimable from the data. Thus, analogous
The same is true for an RD design when to the exactly identified instrumental vari-
the regressor of interest is continuous. Again, able case, the ratio of the two discontinuities
the causal impact of interest will still be the yields the parameter γ : the effect of T on Y.
ratio of the two RD gaps (i.e., the disconti- Again, because of the added notion of imper-
nuities in Y and T). fect compliance, it is important to assume
To see this more formally, consider the that D (X crossing the threshold) does not
model directly enter the outcome equation.
In some situations, more might be known
(6) Y = Tγ + Wδ1 + U1 about the rule determining T. For exam-
ple, in Angrist and Lavy (1999) and Miguel
T = Dϕ + Wγ + U2 Urquiola and Eric A. Verhoogen (2009),
class size is an increasing function of total
D = 1[X ≥ c] school enrollment, except for discontinui-
ties at various enrollment thresholds. But
X = Wδ2 + V,
19 Although it would be unnecessary to do so for the
which is the same set-up as before, except identification of γ, it would probably be more accurate to
with the added second equation, allowing describe the situation of imprecise control with the conti-
nuity of the density of X conditional on the three variables
for imperfect compliance or other factors (W, U1, U2). This is because U2 is now another variable
(observables W or unobservables U2) to characterizing heterogeneity in individuals.
302 Journal of Economic Literature, Vol. XLVIII (June 2010)
a dditional information about characteristics the relationship between the treatment vari-
such as the slope and intercept of the under- able D and X, a step function, going from
lying function (apart from the magnitude of 0 to 1 at the X = c threshold. The second
the discontinuity) generally adds nothing to column shows the relationship between the
the identification strategy. observables W and X. This is flat because X is
To see this, change the second equation in completely randomized. The same is true for
(6) to T = Dϕ + g(X) where g(∙) is any con- the unobservable variable U, depicted in the
tinuous function in the assignment variable. third column. These three graphs capture
Equation (7) will remain the same and, thus, the appeal of the randomized experiment:
knowledge of the function g(∙) is irrelevant treatment varies while all other factors are
for identification.20 kept constant (on average). And even though
There is also no need for additional theo- we cannot directly test whether there are no
retical results in the case when there is indi- treatment-control differences in U, we can
vidual-level heterogeneity in the causal effect test whether there are such differences in
of the continuous regressor T. The local ran- the observable W.
dom assignment result allows us to borrow Now consider an RD design (panel B of
from the existing IV literature and interpret figure 5) where individuals have imprecise
the ratio of the RD gaps as in Angrist and control over X. Both W and U may be sys-
Krueger (1999), except that we need to add tematically related to X, perhaps due to the
the note that all averages are weighted by the actions taken by units to increase their prob-
ex ante relative likelihood that the individu- ability of receiving treatment. Whatever the
al’s X will land near the threshold. shape of the relation, as long as individuals
have imprecise control over X, the relation-
3.5 Summary: A Comparison of RD and
ship will be continuous. And therefore, as we
Other Evaluation Strategies
examine Y near the X = c cutoff, we can be
We conclude this section by compar- assured that like an experiment, treatment
ing the RD design with other evaluation varies (the first column) while other factors
approaches. We believe it is helpful to view are kept constant (the second and third col-
the RD design as a distinct approach rather umns). And, like an experiment, we can test
than as a special case of either IV or match- this prediction by assessing whether observ-
ing/regression-control. Indeed, in important ables truly are continuous with respect to X
ways the RD design is more similar to a ran- (the second column).21
domized experiment, which we illustrate We now consider two other commonly
below. used nonexperimental approaches, referring
Consider a randomized experiment where to the model (2):
subjects are assigned a random number X and
are given the treatment if X ≥ c. By construc- Y = Dτ + Wδ1 + U
tion, X is independent and not systematically
related to any observable or unobservable D = 1[X ≥ c]
characteristic determined prior to the ran-
domization. This situation is illustrated in X = Wδ2 + V.
panel A of figure 5. The first column shows
A. Randomized Experiment
1 1
E[W|X]
E[U|X]
E[D|X]
0 0 0
0 x 0x 0 x
E[U|X]
E[W|X]
E[D|X]
0 0
0 x 0x 0
0 x
C. Matching on Observables
1
1 E[U|W, D = 0]
E[U|W, D = 1]
E[U|W, D]
E[D|W]
0 0
W W
D. Instrumental Variables
1 1 1
E[W|Z]
E[D|Z]
E[U|Z]
0 0
0
z z z
3.5.1 Selection on Observables: Matching/ hope that more W ’s will lead to less biased
Regression Control estimates, this is obviously not necessarily
the case. For example, consider estimating
The basic idea of the “selection on observ- the economic returns to graduating high
ables” approach is to adjust for differences school (versus dropping out). It seems natu-
in the W ’s between treated and control indi- ral to include variables like parents’ socioeco-
viduals. It is usually motivated by the fact nomic status, family income, year, and place
that it seems “implausible” that the uncon- of birth in the regression. Including more
ditional mean Y for the control group repre- and more family-level W ’s will ultimately
sents a valid counterfactual for the treatment lead to a “within-family” sibling analysis;
group. So it is argued that, conditional on W, extending it even further by including date
treatment-control contrasts may identify the of birth leads to a “within-twin-pair” analysis.
(W-specific) treatment effect. And researchers have been critical—justifi-
The underlying assumption is that condi- ably so—of this source of variation in edu-
tional on W, U and V are independent. From cation. The same reasons causing discomfort
this it is clear that about the twin analyses should also cause
skepticism about “kitchen sink” multivariate
E[Y | D = 1, W = w] matching/propensity score/regression con-
trol analyses.23
− E[Y | D = 0, W = w] It is also tempting to believe that, if the W ’s
do a “good job” in predicting D, the selection
= τ + E[U | W = w, V ≥ c − wδ2] on observables approach will “work better.”
But the opposite is true: in the extreme case
− E[U | W = w, V < c − wδ2 ] when the W ’s perfectly predict X (and hence
D), it is impossible to construct a treatment-
= τ. control contrast for virtually all observations.
For each value of W, the individuals will
Two issues arise when implementing this either all be treated or all control. In other
approach. The first is one of functional form: words, there will be literally no overlap in
how exactly to control for the W ’s? When the support of the propensity score for the
the W ’s take on discrete values, one possibil- treated and control observations. The pro-
ity is to compute treatment effects for each pensity score would take the values of either
distinct value of W, and then average these 1 or 0.
effects across the constructed “cells.” This The “selection on observables” approach is
will not work, however, when W has continu- illustrated in panel C of figure 5. Observables
ous elements, in which case it is necessary to W can help predict the probability of treat-
implement multivariate matching, propen- ment (first column), but ultimately one must
sity score, reweighting procedures, or non- assume that unobservable factors U must be
parametric regressions.22 the same for treated and control units for
Regardless of the functional form issue,
there is arguably a more fundamental ques- 23 Researchers question the twin analyses on the
tion of which W ’s to use in the analysis. While grounds that it is not clear why one twin ends up having
it is tempting to answer “all of them” and more education than the other, and that the assumption
that education differences among twins is purely random
(as ignorability would imply) is viewed as far-fetched. We
22 See Hahn (1998) on including covariates directly thank David Card for pointing out this connection between
with nonparametric regression. twin analyses and matching approaches.
Lee and Lemieux: Regression Discontinuity Designs in Economics 305
every value of W. That is, the crucial assump- When Z is continuous, there is an addi-
tion is that the two lines in the third column tional approach to identifying τ. The “Heckit”
be on top of each other. Importantly, there is approach uses the fact that
no comparable graph in the second column
because there is no way to test the design E[Y | W * = w*, Z = z, D = 1]
since all the W ‘s are used for estimation.
= τ + w*γ
3.5.2 Selection on Unobservables:
Instrumental Variables and “Heckit”
+ E CU | W * = w*, Z = z, V ≥ c − wδ2D
A less restrictive modeling assumption is
to allow U and V to be correlated, conditional E[Y | W * = w*, Z = z, D = 0]
on W. But because of the arguably “more
realistic”/flexible data generating process, = w*γ
another assumption is needed to identify τ.
One such assumption is that some elements
+ E CU | W * = w*, Z = z, V < c − wδ2D .
of W (call them Z) enter the selection equa-
tion, but not the outcome equation and are
also uncorrelated with U. An instrumental If we further assume a functional form for
variables approach utilizes the fact that the joint distribution of U, V, conditional
on W * and Z, then the “control function”
E[Y | W * = w*, Z = z] terms E CU | W = w, V ≥ c − wδ2D and
E CU | W = w, V < c − wδ2D are functions
= E[D | W * = w*, Z = z]τ + w* γ of observed variables, with the parameters
then estimable from the data. It is then pos-
+ E[U | W * = w*, Z = z] sible, for any value of W = w, to identify τ
as
= E[D | W * = w*, Z = z]τ + w* γ
on W *) independent of U.24 There does not and that the assumption invoked is that Z is
seem to be any way of testing the validity uncorrelated with U, conditional on W. In this
of this assumption. Different, but equally latter case, the design seems fundamentally
“plausible” Z’s may lead to different answers, untestable, since all the remaining observable
in the same way that including different sets variables (the W ’s) are being “used up” for
of W ’s may lead to different answers in the identifying the treatment effect.
selection on observables approach.
3.5.3 RD as “Design” not “Method”
Even when there is a mechanism that
justifies an instrument Z as “plausible,” it is RD designs can be valid under the more
often unclear which covariates W * to include general “selection on unobservables”
in the analysis. Again, when different sets of environment, allowing an arbitrary correla-
W * lead to different answers, the question tion among U, V, and W, but at the same time
becomes which is more plausible: Z is inde- not requiring an instrument. As discussed
pendent of U conditional on W * or Z is inde- above, all that is needed is that conditional on
pendent of U conditional on a subset of the W, U, the density of V is continuous, and the
variables in W *? While there may be some local randomization result follows.
situations where knowledge of the mecha- How is an RD design able to achieve
nism dictates which variables to include, in this, given these weaker assumptions? The
other contexts, it may not be obvious. answer lies in what is absolutely necessary
The situation is illustrated in panel D of in an RD design: observability of the latent
figure 5. It is necessary that the instrument index X. Intuitively, given that both the
Z is related to the treatment (as in the first “selection on observables” and “selection on
column). The crucial assumption is regard- unobservables” approaches rely heavily on
ing the relation between Z and the unob- modeling X and its components (e.g., which
servables U (the third column). In order for W ’s to include, and the properties of the
an IV or a “Heckit” approach to work, the unobservable error V and its relation to other
function in the third column needs to be flat. variables, such as an instrument Z), actually
Of course, we cannot observe whether this is knowing the value of X ought to help.
true. Furthermore, in most cases, it is unclear In contrast to the “selection on observ-
how to interpret the relation between W and ables” and “selection on unobservables”
Z (second column). Some might argue the modeling approaches, with the RD design
observed relation between W and Z should the researcher can avoid taking any strong
be flat if Z is truly exogenous, and that if Z is stance about what W ’s to include in the anal-
highly correlated with W, then it casts doubt ysis, since the design predicts that the W ’s
on Z being uncorrelated with U. Others will are irrelevant and unnecessary for identifi-
argue that using the second graph as a test is cation. Having data on W ’s is, of course, of
only appropriate when Z is truly randomized, some use, as they allow testing of the under-
lying assumption (described in section 4.4).
24 For IV, violation of this assumption essentially means For this reason, it may be more helpful to
that Z varies with Y for reasons other than its influence consider RD designs as a description of a par-
on D. For the textbook “Heckit” approach, it is typically ticular data generating process, rather than a
assumed that U, V have the same distribution for any value
of Z. It is also clear that the “identification at infinity” “method” or even an “approach.” In virtually
approach will only work if Z is uncorrelated with U, oth- any context with an outcome variable Y, treat-
erwise the last two terms in equation (8) would not cancel. ment status D, and other observable variables
See also the framework of Heckman and Edward Vytlacil
(2005), which maintains the assumption of the indepen- W, in principle a researcher can construct a
dence of the error terms and Z, conditional on W *. regression-control or instrumental variables
Lee and Lemieux: Regression Discontinuity Designs in Economics 307
(after designating one of the W variables a t ransparent way of graphically showing how
valid instrument) estimator, and state that the treatment effect is identified. We thus
the identification assumptions needed are begin the section by discussing how to graph
satisfied. the data in an informative way. We then
This is not so with an RD design. Either move to arguably the most important issue
the situation is such that X is observed, or it in implementing an RD design: the choice
is not. If not, then the RD design simply does of the regression model. We address this by
not apply.25 If X is observed, then one has presenting the various possible specifications,
little choice but to attempt to estimate the discussing how to choose among them, and
expectation of Y conditional on X on either showing how to compute the standard errors.
side of the cutoff. In this sense, the RD Next, we discuss a number of other prac-
design forces the researcher to analyze it in tical issues that often arise in RD designs.
a particular way, and there is little room for Examples of questions discussed include
researcher discretion—at least from an iden- whether we should control for other covari-
tification standpoint. The design also pre- ates and what to do when the assignment
dicts that the inclusion of W ’s in the analysis variable is discrete. We discuss a number of
should be irrelevant. Thus it naturally leads tests to assess the validity of the RD designs,
the researcher to examine the density of X or which examine whether covariates are “bal-
the distribution of W ’s, conditional on X, for anced” on the two sides of the threshold, and
discontinuities as a test for validity. whether the density of the assignment vari-
The analogy of the truly randomized able is continuous at the threshold. Finally,
experiment is again helpful. Once the we summarize our recommendations for
researcher is faced with what she thinks is a implementing the RD design.
properly carried out randomized controlled Throughout this section, we illustrate the
trial, the analysis is quite straightforward. various concepts using an empirical example
Even before running the experiment, most from Lee (2008) who uses an RD design to
researchers agree it would be helpful to dis- estimate the causal effect of incumbency in
play the treatment-control contrasts in the U.S. House elections. We use a sample of
W ’s to test whether the randomization was 6,558 elections over the 1946–98 period (see
carried out properly, then to show the simple Lee 2008 for more detail). The assignment
mean comparisons, and finally to verify the variable in this setting is the fraction of votes
inclusion of the W’s make little difference in awarded to Democrats in the previous elec-
the analysis, even if they might reduce sam- tion. When the fraction exceeds 50 percent,
pling variability in the estimates. a Democrat is elected and the party becomes
the incumbent party in the next election.
Both the share of votes and the probability
4. Presentation, Estimation, and Inference
of winning the next election are considered
In this section, we systematically discuss as outcome variables.
the nuts and bolts of implementing RD
4.1 Graphical Presentation
designs in practice. An important virtue
of RD designs is that they provide a very A major advantage of the RD design over
competing methods is its transparency, which
can be illustrated using graphical methods.
25 Of course, sometimes it may seem at first that an RD
A standard way of graphing the data is to
design does not apply, but a closer inspection may reveal that it
does. For example, see Per Pettersson (2000), which eventu- divide the assignment variable into a number
ally became the RD analysis in Pettersson-Lidbom (2008b). of bins, making sure there are two separate
308 Journal of Economic Literature, Vol. XLVIII (June 2010)
bins on each side of the cutoff point (to avoid at this point, i.e., of the treatment effect. Since
having treated and untreated observations an RD design is “as good as a randomized
mixed together in the same bin). Then, the experiment” right around the cutoff point, the
average value of the outcome variable can be treatment effect could be computed by com-
computed for each bin and graphed against paring the average outcomes in “small” bins
the mid-points of the bins. just to the left and right of the cutoff point.
More formally, for some bandwidth h, If there is no visual evidence of a discontinu-
and for some number of bins K 0 and K 1 to ity in a simple graph, it is unlikely the formal
the left and right of the cutoff value, respec- regression methods discussed below will yield
tively, the idea is to construct bins (bk , bk+1], a significant treatment effect.
for k = 1, . . . , K = K 0 + K 1, where A third advantage is that the graph also
shows whether there are unexpected compa-
bk = c − (K 0 − k + 1) h. rable jumps at other points. If such evidence
is clearly visible in the graph and cannot be
The average value of the outcome variable explained on substantive grounds, this calls
in the bin is into question the interpretation of the jump
__ N at the cutoff point as the causal effect of the
k = _
Y ∑ Yi 1 {bk < Xi ≤ bk+1}.
1 treatment. We discuss below several ways of
Nk i=1 testing explicitly for the existence of jumps at
It is also useful to calculate the number of points other than the cutoff.
observations in each bin Note that the visual impact of the graph
N is typically enhanced by also plotting a rela-
Nk = ∑ 1{bk < Xi ≤ bk+1} tively flexible regression model, such as a
i=1 polynomial model, which is a simple way
to detect a possible discontinuity in the of smoothing the graph. The advantage of
assignment variable at the threshold, which showing both the flexible regression line
would suggest manipulation. and the unrestricted bin means is that the
There are several important advantages regression line better illustrates the shape of
in graphing the data this way before starting the regression function and the size of the
to run regressions to estimate the treatment jump at the cutoff point, and laying this over
effect. First, the graph provides a simple way the unrestricted means gives a sense of the
of visualizing what the functional form of the underlying noise in the data.
regression function looks like on either side Of course, if bins are too narrow the esti-
of the cutoff point. Since the mean of Y in mates will be highly imprecise. If they are
a bin is, for nonparametric kernel regres- too wide, the estimates may be biased as they
sion estimators, evaluated at the bin mid- fail to account for the slope in the regression
point using a rectangular kernel, the set of line (negligible for very narrow bins). More
bin means literally represent nonparametric importantly, wide bins make the compari-
estimates of the regression function. Seeing sons on both sides of the cutoff less credible,
what the nonparametric regression looks like as we are no longer comparing observations
can then provide useful guidance in choosing just to the left and right of the cutoff point.
the functional form of the regression models. This raises the question of how to choose
A second advantage is that comparing the the bandwidth (the width of the bin). In
mean outcomes just to the left and right of the practice, this is typically done informally by
cutoff point provides an indication of the mag- trying to pick a bandwidth that makes the
nitude of the jump in the regression function graphs look informative in the sense that bins
Lee and Lemieux: Regression Discontinuity Designs in Economics 309
are wide enough to reduce the amount of first test is a standard F-test comparing the fit
noise, but narrow enough to compare obser- of a regression model with K′ bin dummies
vations “close enough” on both sides of the to one where we further divide each bin into
cutoff point. While it is certainly advisable to two equal sized smaller bins, i.e., increase the
experiment with different bandwidths and number of bins to 2K′ (reduce the bandwidth
see how the corresponding graphs look, it is from h′ to h′/2). Since the model with K′ bins
also useful to have some formal guidance in is nested in the one with 2K′ bins, a standard
the selection process. F-test with K′ degrees of freedom can be
One approach to bandwidth choice is used. If the null hypothesis is not rejected,
based on the fact that, as discussed above, the this provides some evidence that we are not
mean outcomes by bin correspond to kernel oversmoothing the data by using only K′ bins.
regression estimates with a rectangular ker- Another test is based on the idea that if the
nel. Since the standard kernel regression is a bins are “narrow enough,” then there should
special case of a local linear regression where not be a systematic relationship between Y
the slope term is equal to zero, the cross-val- and X, that we capture using a simple regres-
idation procedure described in more detail sion of Y on X, within each bin. Otherwise,
in section 4.3.1 can also be used here by this suggests the bin is too wide and that the
constraining the slope term to equal zero.26 mean value of Y over the whole bin is not
For reasons we discuss below, however, one representative of the mean value of Y at the
should not solely rely on this approach to boundaries of the bin. In particular, when
select the bandwidth since other reasonable this happens in the two bins next to the cut-
subjective goals should be considered when off point, a simple comparison of the two bin
choosing how to the plot the data. means yields a biased estimate of the treat-
Furthermore, a range a bandwidths often ment effect. A simple test for this consists of
yield similar values of the cross-valida- adding a set of interactions between the bin
tion function in practical applications (see dummies and X to a base regression of Y on
below). A researcher may, therefore, want the set of bin dummies, and testing whether
to use some discretion in choosing a band- the interactions are jointly significant. The
width that provides a particularly compelling test statistic once again follows a F distribu-
illustration of the RD design. An alternative tion with K′ degrees of freedom.
approach is to choose a bandwidth based on Figures 6–11 show the graphs for the
a more heuristic visual inspection of the data, share of Democrat vote in the next elec-
and then perform some tests to make sure tion and the probability of Democrats win-
this informal choice is not clearly rejected. ning the next election, respectively. Three
We suggest two such tests. Consider the sets of graphs with different bandwidths are
case where one has decided to use K′ bins reported using a bandwidth of 0.02 in figures
based on a visual inspection of the data. The 6 and 9, 0.01 in figures 7 and 10, and 0.005
ˆ
26 In
section 4.3.1, we consider the cross-validation on a regression with a constant term, which means Y (Xi)
ˆ ˆ
function CVY (h) = (1/N) ∑ Ni=1 (Xi))2 where Y
(Yi − Y (Xi) is the average value of Y among all observations in the bin
is the predicted value of Yi based on a regression using (excluding observation i). Note that this is slightly differ-
observations with a bin of width h on either the left (for ent from the standard cross-validation procedure in kernel
observations on left of the cutoff) or the right (for observa- regressions where the left-out observation is in the middle
tions on the right of the cutoff) of observation i, but not instead of the edge of the bin (see, for example, Richard
including observation i itself. In the context of the graph Blundell and Alan Duncan 1998). Our suggested procedure
discussed here, the only modification to the cross-valida- is arguably better suited to the RD context since estimation
tion function is that the predicted value Y ˆ (Xi ) is based only of the treatment effect takes place at boundary points.
310 Journal of Economic Literature, Vol. XLVIII (June 2010)
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
Figure 10. Winning the Next Election, Bandwidth of 0.01 (100 bins)
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
Figure 11. Winning the Next Election, Bandwidth of 0.005 (200 bins)
Lee and Lemieux: Regression Discontinuity Designs in Economics 313
in figures 8 and 11. In all cases, we also show when the outcome variable is the vote share
the fitted values from a quartic regression (figure 13). For both outcome variables, the
model estimated separately on each side of value of the cross-validation function grows
the cutoff point. Note that the assignment quickly for bandwidths smaller than 0.02,
variable is normalized as the difference suggesting that the graphs with narrower bins
between the share of vote to Democrats and (figures 7, 8, 10, and 11) are too noisy.
Republicans in the previous election. This Panel B of table 1 shows the results of our
means that a Democrat is the incumbent two suggested specification tests. The tests
when the assignment variable exceeds zero. based on doubling the number of bins and
We also limit the range of the graphs to win- running regressions within each bin yield
ning margins of 50 percent or less (in abso- remarkably similar results. Generally speak-
lute terms) as data become relatively sparse ing, the results indicate that only fairly wide
for larger winning (or losing) margins. bins are rejected. Looking at both outcome
All graphs show clear evidence of a discon- variables, the tests systematically reject mod-
tinuity at the cutoff point. While the graphs els with bandwidths of 0.05 or more (twenty
are all quite informative, the ones with the bins over the –0.5 to 0.5 range). The models
smallest bandwidth (0.005, figures 8 and 11) are never rejected for either outcome vari-
are more noisy and likely provide too many able once we hit bandwidths of 0.02 (fifty
data points (200) for optimal visual impact. bins) or less. In practice, the testing proce-
The results of the bandwidth selection pro- dure rules out bins that are larger than those
cedures are presented in table 1. Panel A shows reported in figures 6–11.
the cross-validation procedure always suggests At first glance, the results in the two pan-
using a bandwidth of 0.02 or more, which cor- els of table 1 appear to be contradictory. The
responds to similar or wider bins than those cross-validation procedure suggests band-
used in figures 6 and 9 (those with the largest widths ranging from 0.02 to 0.05, while the
bins). This is true irrespective of whether we bin and regression tests suggest that almost
pick a separate bandwidth on each side of the all bandwidths of less than 0.05 are accept-
cutoff (first two rows of the panel), or pick the able. The reason for this discrepancy is that
bandwidth that minimizes the cross-validation while the cross-validation procedure tries
function for the entire date range on both the to balance precision and bias, the bin and
left and right sides of the cutoff. In the case regression tests only deal with the “bias” part
where the outcome variable is winning the of the equation by checking whether the
next election, the cross-validation procedure value of Y is more or less constant within a
for the data to the right of the cutoff point and given bin. Models with small bins easily pass
for the entire range suggests using a very wide this kind of test, although they may yield a
bin (0.049) that would only yield about ten very noisy graph. One alternative approach is
bins on each side of the cutoff. to choose the largest possible bandwidth that
As it turns out, the cross-validation function passes the bin and the regression test, which
for the entire data range has two local min- turns out to be 0.033 in table 1, a bandwidth
ima at 0.021 and 0.049 that correspond to the that is within the range of those suggested by
optimal bandwidths on the left and right hand the cross-validation procedure.
side of the cutoff. This is illustrated in figure From a practical point of view, it seems to
12, which plots the cross-validation function be the case that formal procedures, and in par-
as a function of the bandwidth. By contrast, ticular cross-validation, suggest bandwidths
the cross-validation function is better behaved that are wider than those one would likely
and shows a global minimum around 0.020 choose based on a simple visual examination
314 Journal of Economic Literature, Vol. XLVIII (June 2010)
Table 1
Choice of Bandwidth in Graph for Voting Example
No. of bins Bandwidth Bin test Regr. test Bin test Regr. test
Notes: Estimated over the range of the forcing variable (Democrat to Republican difference in the share of
vote in the previous election) ranging between –0.5 and 0.5. The “bin test” is computed by comparing the fit
of a model with the number of bins indicated in the table to an alternative where each bin is split in 2. The
“regression test” is a joint test of significance of bin-specific regression estimates of the outcome variable on
the share of vote in the previous election.
of the data. In particular, both figures 7 and smoothing (relative to what formal bandwidth
10 (bandwidth of 0.01) look visually accept- selection procedures would suggest) to better
able but are clearly not recommended on the illustrate the variation in the raw data when
basis of the cross-validation procedure. This graphically illustrating an RD design.
likely reflects the fact that one important
goal of the graph is to show how the raw data 4.2 Regression Methods
look, and too much smoothing would defy the
purpose of such a data illustration exercise. 4.2.1 Parametric or Nonparametric
Furthermore, the regression estimates of the Regressions?
treatment effect accompanying the graphi-
cal results are a formal way of smoothing the When we introduced the RD design
data to get precise estimates. This suggests in section 2, we followed Thistlethwaite
that there is probably little harm in under- and Campbell (1960) in assuming that the
Lee and Lemieux: Regression Discontinuity Designs in Economics 315
435
434
Cross-validation function
433
432
431
430
429
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
Bandwidth
78.8
78.6
78.4
Cross-validation function
78.2
78.0
77.8
77.6
77.4
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07
Bandwidth
and Lemieux (2008), we focus on the conve- extremely noisy estimates of the treatment
nient case of the rectangular kernel. In this effect.28
setting, computing kernel regressions simply As a solution to this problem, Hahn, Todd,
amounts to computing the average value of Y and van der Klaauw (2001) suggests run-
in the bin illustrated in figure 2. The result- ning local linear regressions to reduce the
ing local average is depicted as the horizontal importance of the bias. In our setup with a
line EF, which is very close to true value of Y rectangular kernel, this suggestion simply
evaluated at X = Xd on the regression line. amounts to running standard linear regres-
Applying this local averaging approach is sions within the bins on both sides of the
problematic, however, for the RD design. cutoff point to better predict the value of the
Consider estimating the value of the regres- regression function right at the cutoff point.
sion function just on the right of the cutoff In this example, the regression lines within
point. Clearly, only observations on the right the bins around the cutoff point are close to
of the cutoff point that receive the treatment linear. It follows that the predicted values of
should be used to compute mean outcomes the local linear regressions at the cutoff point
on the right hand side. Similarly, only observa- are very close to the true values of A and B.
tions on the left of the cutoff point that do not Intuitively, this means that running local
receive the treatment should be used to com- linear regressions instead of just computing
pute mean outcomes on the left hand side. averages within the bins reduces the bias by
Otherwise, regression estimates would mix an order of magnitude. Indeed, Hahn, Todd,
observations with and without the treatment, and van der Klaauw (2001) show that the
which would invalidate the RD approach. remaining bias is of an order of magnitude
In this setting, the best thing is to com- lower, and is comparable to the usual bias
pute the average value of Y in the bin just in kernel estimation at interior points like D
to the right and just to the left of the cutoff (the small difference between the horizontal
point. These two bins are shown in figure 2. line EF and the true value of the regression
The RD estimate based on kernel regres- line evaluated at D).
sions is then equal to B′ − A′. In this exam- In the literature on nonparametric estima-
ple where the regression lines are upward tion at boundary points, local linear regres-
sloping, it is clear, however, that the esti- sions have been introduced as a means of
mate B′ − A′ overstates the true treatment reducing the bias in standard kernel regres-
effect represented as the difference B − A sion methods.29 One of the several contribu-
at the cutoff point. In other words, there tions of Hahn, Todd, and van der Klaauw
is a systematic bias in kernel regression (2001) is to show how the same bias-reducing
estimates of the treatment effect. Hahn,
Todd, and van der Klaauw (2001) provide a
more formal derivation of the bias (see also 28 The trade-off between bias and precision is a funda-
Imbens and Lemieux 2008 for a simpler mental feature of kernel regressions. A larger bandwidth
yields more precise, but potentially biased, estimates of the
exposition when the kernel is rectangu- regression. In an interior point like D, however, we see that
lar). In practical terms, the problem is that the bias is of an order of magnitude lower than at the cutoff
in finite samples the bandwidth has to be (boundary) point. In more technical terms, it can be shown
(see Hahn, Todd, and van der Klaauw 2001 or Imbens and
large enough to encompass enough obser- Lemieux 2008) that the usual bias is of order h2 at interior
vations to get a reasonable amount of pre- points, but of order h at boundary points, where h is the
cision in the estimated average values of bandwidth. In other words, the bias dies off much more
quickly when h goes to zero when we are at interior, as
Y. Otherwise, attempts to reduce the bias opposed to boundary, points.
by shrinking the bandwidth will result in 29 See Jianqing Fan and Irene Gijbels (1996).
318 Journal of Economic Literature, Vol. XLVIII (June 2010)
the RD design, as data from the right of the The regression model on the left hand side
cutoff would be used to estimate αl , which of the cutoff point is
is defined as a limit when approaching from
the left of the cutoff, and vice versa. Y = αl + βl (X − c) + ε,
In practice, however, estimates where
the regression slope or, more generally, the where c − h ≤ X < c,
regression function f (X − c) are constrained
to be the same on both sides of the cutoff while the regression model on the right hand
point are often reported. One possible justi- side of the cutoff point is
fication for doing so is that if the functional
form is indeed the same on both sides of the Y = αr + βr (X − c) + ε,
cutoff, then more efficient estimates of the
treatment effect τ are obtained by imposing where c ≤ X ≤ c + h.
that constraint. Such a constrained specifica-
tion should only be viewed, however, as an As before, it is also convenient to estimate
additional estimate to be reported for the the pooled regression
sake of completeness. It should not form the
core basis of the empirical approach. Y = αl + τ D + βl (X − c)
4.3.1 Local Linear Regressions and
+ (βr − βl) D (X − c) + ε,
Bandwidth Choice
As discussed above, local linear regres- where c − h ≤ X ≤ c + h,
sions provide a nonparametric way of consis-
tently estimating the treatment effect in an since the standard error of the estimated
RD design (Hahn, Todd, and van der Klaauw treatment effect can be directly obtained
(2001), Jack Porter (2003)). Following from the regression.
Imbens and Lemieux (2008), we focus on While it is straightforward to estimate the
the case of a rectangular kernel, which linear regressions within a given window of
amounts to estimating a standard regression width h around the cutoff point, a more dif-
over a window of width h on both sides of the ficult question is how to choose this band-
cutoff point. While other kernels (triangular, width. In general, choosing a bandwidth
Epanechnikov, etc.) could also be used, the in nonparametric estimation involves find-
choice of kernel typically has little impact ing an optimal balance between precision
in practice. As a result, the convenience and bias. One the one hand, using a larger
of working with a rectangular kernel com- bandwidth yields more precise estimates as
pensates for efficiency gains that could be more observations are available to estimate
achieved using more sophisticated kernels.30 the regression. On the other hand, the lin-
ear specification is less likely to be accurate
30 It has been shown in the statistics literature (Fan and arguably more transparent way of putting more weight on
Gijbels 1996) that a triangular kernel is optimal for esti- observations close to the cutoff is simply to reestimate a
mating local linear regressions at the boundary. As it turns model with a rectangular kernel using a smaller bandwidth.
out, the only difference between regressions using a rect- In practice, it is therefore simpler and more transparent
angular or a triangular kernel is that the latter puts more to just estimate standard linear regressions (rectangular
weight (in a linear way) on observations closer to the cutoff kernel) with a variety of bandwidths, instead of trying out
point. It thus involves estimating a weighted, as opposed different kernels corresponding to particular weighted
to an unweighted, regression within a bin of width h. An regressions that are more difficult to interpret.
320 Journal of Economic Literature, Vol. XLVIII (June 2010)
when a larger bandwidth is used, which can a vailable. The importance of undersmooth-
bias the estimate of the treatment effect. If ing only has to do with a thought experi-
the underlying conditional expectation is not ment of how much the bandwidth should
linear, the linear specification will provide shrink if the sample size were larger so that
a close approximation over a limited range one obtains asymptotically correct standard
of values of X (small bandwidth), but an errors, and does not help one choose a par-
increasingly bad approximation over a larger ticular bandwidth in a particular sample.32
range of values of X (larger bandwidth). In the econometrics and statistics litera-
As the number of observations avail- ture, two procedures are generally consid-
able increases, it becomes possible to use ered for choosing bandwidths. The first
an increasingly small bandwidth since linear procedure consists of characterizing the
regressions can be estimated relatively pre- optimal bandwidth in terms of the unknown
cisely over even a small range of values of joint distribution of all variables. The rel-
X. As it turns out, Hahn, Todd, and van der evant components of this distribution can
Klaauw (2001) show the optimal bandwidth then be estimated and plugged into the opti-
is proportional to N −1/5, which corresponds mal bandwidth function.33 In the context
to a fairly slow rate of convergence to zero. of local linear regressions, Fan and Gijbels
For example, this suggests that the bandwidth (1996) show this involves estimating a num-
should only be cut in half when the sample ber of parameters including the curvature of
size increases by a factor of 32 (25). For tech- the regression function. In practice, this can
nical reasons, however, it would be preferable be done in two steps. In step one, a rule-of-
to undersmooth by shrinking the bandwidth thumb (ROT) bandwidth is estimated over
at a faster rate requiring that h ∝ N −δ with the whole relevant data range. In step two,
1/5 < δ < 2/5, in order to eliminate an the ROT bandwidth is used to estimate the
asymptotic bias that would remain when optimal bandwidth right at the cutoff point.
δ = 1/5. In the presence of this bias, the For the rectangular kernel, the ROT band-
usual formula for the variance of a standard width is given by:
least square estimator would be invalid.31
In practice however, knowing at what rate 1/5
the bandwidth should shrink in the limit does σ R
hROT = 2.702 s ___________
˜ 2
2 t ,
∑ i=1 m′ ′ (xi)V
N
not really help since only one actual sam- U ˜
ple with a given number of observations is
31 See Hahn, Todd, and van der Klaauw (2001) and approximate the variance of the RD estimator in the actual
Imbens and Lemieux (2008) for more details. sample. This does not say anything about what bandwidth
32 The main purpose of asymptotic theory is to use the should be chosen in the actual sample available for imple-
large sample properties of estimators to approximate the menting the RD design.
distribution of an estimator in the real sample being con- 33 A well known example of this procedure is the
sidered. The issue is a little more delicate in a nonparamet- “rule-of-thumb” bandwidth selection formula in ker-
ric setting where one also has to think about how fast the nel density estimation where an estimate of the dis-
bandwidth should shrink when the sample size approaches persion in the variable (standard deviation or the
infinity. The point about undersmoothing is simply that σ, is plugged into the formula
interquartile range), ˆ
one unpleasant property of the optimal bandwidth is that σ ∙ N
0.9 ∙ ˆ
−1/5
. Bernard W. Silverman (1986) shows that
it does not yield the convenient least squares variance for- this formula is the closed form solution for the optimal
mula. But this can be fixed by shrinking the bandwidth bandwidth choice problem when both the actual density
a little faster as the sample size goes to infinity. Strictly and the kernel are Gaussian. See also Imbens and Karthik
speaking, this is only a technical issue with how to perform Kalyanaraman (2009), who derive an optimal bandwidth
the thought experiment (what happens when the sample for this RD setting, and propose a data-dependent method
size goes to infinity?) required for using asymptotics to for choosing the bandwidth.
Lee and Lemieux: Regression Discontinuity Designs in Economics 321
Table 2
RD Estimates of the Effect of Winning the Previous Election on the
Share of Votes in the Next Election
Bandwidth: 1.00 0.50 0.25 0.15 0.10 0.05 0.04 0.03 0.02 0.01
Polynomial of order:
Zero 0.347 0.257 0.179 0.143 0.125 0.096 0.080 0.073 0.077 0.088
(0.003) (0.004) (0.004) (0.005) (0.006) (0.009) (0.011) (0.012) (0.014) (0.015)
[0.000] [0.000] [0.000] [0.000] [0.003] [0.047] [0.778] [0.821] [0.687]
One 0.118 0.090 0.082 0.077 0.061 0.049 0.067 0.079 0.098 0.096
(0.006) (0.007) (0.008) (0.011) (0.013) (0.019) (0.022) (0.026) (0.029) (0.028)
[0.000] [0.332] [0.423] [0.216] [0.543] [0.168] [0.436] [0.254] [0.935]
Two 0.052 0.082 0.069 0.050 0.057 0.100 0.101 0.119 0.088 0.098
(0.008) (0.010) (0.013) (0.016) (0.020) (0.029) (0.033) (0.038) (0.044) (0.045)
[0.000] [0.335] [0.371] [0.385] [0.458] [0.650] [0.682] [0.272] [0.943]
Three 0.111 0.068 0.057 0.061 0.072 0.112 0.119 0.092 0.108 0.082
(0.011) (0.013) (0.017) (0.022) (0.028) (0.037) (0.043) (0.052) (0.062) (0.063)
[0.001] [0.335] [0.524] [0.421] [0.354] [0.603] [0.453] [0.324] [0.915]
Four 0.077 0.066 0.048 0.074 0.103 0.106 0.088 0.049 0.055 0.077
(0.013) (0.017) (0.022) (0.027) (0.033) (0.048) (0.056) (0.067) (0.079) (0.063)
[0.014] [0.325] [0.385] [0.425] [0.327] [0.560] [0.497] [0.044] [0.947]
Optimal order of 6 3 1 2 1 2 0 0 0 0
the polynomial
Observations 6,558 4,900 2,763 1,765 1,209 610 483 355 231 106
Notes: Standard errors in parentheses. P-values from the goodness-of-fit test in square brackets. The goodness-of-fit
test is obtained by jointly testing the significance of a set of bin dummies included as additional regressors in the
model. The bin width used to construct the bin dummies is 0.01. The optimal order of the polynomial is chosen using
Akaike’s criterion (penalized cross-validation).
on very wide bandwidths (0.5 or 1) are sys- these estimates linked to the fact that the lin-
tematically larger than those for the smaller ear approximation does not hold over a wide
bandwidths (in the 0.05 to 0.25 range) that data range. This is particularly clear in the
are still large enough for the estimates to be case of winning the next election where fig-
reasonably precise. A closer examination of ures 9–11 show some clear curvature in the
figures 6–11 also suggests that the estimates regression function.
for very wide bandwidths are larger than Table 4 shows the optimal bandwidth
what the graphical evidence would suggest.35 obtained using the ROT and cross-valida-
This is consistent with a substantial bias for tion procedure. Consistent with the above
35 In the case of the vote share, the quartic regression Similarly, the quartic regression shown in figures 9–11 for
shown in figures 6–8 implies a treatment effect of 0.066, winning the next election implies a treatment effect of
which is substantially smaller than the local linear regres- 0.375, which is again smaller than the local linear regres-
sion estimates with a bandwidth of 0.5 (0.090) or 1 (0.118). sion estimates with a bandwidth of 0.5 (0.566) or 1 (0.689).
Lee and Lemieux: Regression Discontinuity Designs in Economics 323
Table 3
Rd Estimates of the Effect of Winning the Previous Election on
Probability of Winning the Next Election
Bandwidth: 1.00 0.50 0.25 0.15 0.10 0.05 0.04 0.03 0.02 0.01
Polynomial of order:
Zero 0.814 0.777 0.687 0.604 0.550 0.479 0.428 0.423 0.459 0.533
(0.007) (0.009) (0.013) (0.018) (0.023) (0.035) (0.040) (0.047) (0.058) (0.082)
[0.000] [0.000] [0.000] [0.000] [0.011] [0.201] [0.852] [0.640] [0.479]
One 0.689 0.566 0.457 0.409 0.378 0.378 0.472 0.524 0.567 0.453
(0.011) (0.016) (0.026) (0.036) (0.047) (0.073) (0.083) (0.099) (0.116) (0.157)
[0.000] [0.000] [0.126] [0.269] [0.336] [0.155] [0.400] [0.243] [0.125]
Two 0.526 0.440 0.375 0.391 0.450 0.607 0.586 0.589 0.440 0.225
(0.016) (0.023) (0.039) (0.055) (0.072) (0.110) (0.124) (0.144) (0.177) (0.246)
[0.075] [0.145] [0.253] [0.192] [0.245] [0.485] [0.367] [0.191] [0.134]
Three 0.452 0.370 0.408 0.435 0.472 0.566 0.547 0.412 0.266 0.172
(0.021) (0.031) (0.052) (0.075) (0.096) (0.143) (0.166) (0.198) (0.247) (0.349)
[0.818] [0.277] [0.295] [0.115] [0.138] [0.536] [0.401] [0.234] [0.304]
Four 0.385 0.375 0.424 0.529 0.604 0.453 0.331 0.134 0.050 0.168
(0.026) (0.039) (0.066) (0.093) (0.119) (0.183) (0.214) (0.254) (0.316) (0.351)
[0.965] [0.200] [0.200] [0.173] [0.292] [0.593] [0.507] [0.150] [0.244]
Optimal order of 4 3 2 1 1 2 0 0 0 1
the polynomial
Observations 6,558 4,900 2,763 1,765 1,209 610 483 355 231 106
Notes: Standard errors in parentheses. P-values from the goodness-of-fit test in square brackets. The goodness-of-fit
test is obtained by jointly testing the significance of a set of bin dummies included as additional regressors in the
model. The bin width used to construct the bin dummies is 0.01. The optimal order of the polynomial is chosen using
Akaike’s criterion (penalized cross-validation).
discussion, the suggested bandwidth ranges showing more curvature for winning the next
from 0.14 to 0.28, which is large enough to election than the vote share, which calls for a
get precise estimates, but narrow enough to smaller bandwidth to reduce the estimation
minimize the bias. Two interesting patterns bias linked to the linear approximation.
can be observed in table 4. First, the band- Figures 14 and 15 plot the value of the
width chosen by cross-validation tends to be cross-validation function over a wide range
a bit larger than the one based on the rule- of bandwidths. In the case of the vote share
of-thumb. Second, the bandwidth is gener- where the linearity assumption appears more
ally smaller for winning the next election accurate (figures 6–8), the cross-validation
(second column) than for the vote share (first function is fairly flat over a sizable range of
column). This is particularly clear when the values for the bandwidth (from about 0.16
optimal bandwidth is constrained to be the to 0.29). This range includes the optimal
same on both sides of the cutoff point. This bandwidth suggested by cross-validation
is consistent with the graphical evidence (0.282) at the upper end, and the ROT
324 Journal of Economic Literature, Vol. XLVIII (June 2010)
Table 4
Optimal Bandwidth for Local Linear Regressions,
Voting Example
Notes: Estimated over the range of the forcing variable (Democrat to Republican difference in the share of vote in
the previous election) ranging between –0.5 and 0.5. See the text for a description of the rule-of-thumb and cross-
validation procedures for choosing the optimal bandwidth.
bandwidth (0.180) at the lower end. In the optimal bandwidth is larger in the case of the
case of winning the next election (figure vote share.
15), the cross-validation procedure yields This example also illustrates the impor-
a sharper suggestion of optimal bandwidth tance of first graphing the data before run-
around 0.15, which is quite close to both the ning regressions and trying to choose the
optimal cross-validation bandwidth (0.172) optimal bandwidth. When the graph shows
and the ROT bandwidth (0.141). a more or less linear relationship, it is natu-
The main difference between the two ral to expect different bandwidths to yield
outcome variables is that larger bandwidths similar results and the bandwidth selection
start getting penalized more quickly in the procedure not to be terribly informative. But
case of winning the election (figure 15) than when the graph shows substantial curvature,
in the case of the vote share (figure 14). This it is natural to expect the results to be more
is consistent with the graphical evidence in sensitive to the choice of bandwidth and that
figures 6–11. Since the regression function bandwidth selection procedures will play a
looks fairly linear for the vote share, using more important role in selecting an appro-
larger bandwidths does not get penalized as priate empirical specification.
much since they improve efficiency without
4.3.2 Order of Polynomial in Local
generating much of a bias. But in the case
Polynomial Modeling
of winning the election where the regression
function exhibits quite a bit of curvature, In the case of polynomial regressions,
larger bandwidths are quickly penalized for the equivalent to bandwidth choice is
introducing an estimation bias. Since there the choice of the order of the polynomial
is a real tradeoff between precision and regressions. As in the case of local linear
bias, the cross-validation procedure is quite regressions, it is advisable to try and report
informative. By contrast, there is not much a number of specifications to see to what
of a tradeoff when the regression function is extent the results are sensitive to the order
more or less linear, which explains why the of the polynomial. For the same reason
Lee and Lemieux: Regression Discontinuity Designs in Economics 325
78.2
78.1
78.0
Cross-validation function
77.9
77.8
77.7
77.6
77.5
77.4
77.3
77.2
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
Bandwidth
Figure 14. Cross-Validation Function for Local Linear Regression: Share of Vote at Next Election
435
434
433
Cross-validation function
432
431
430
429
428
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45
Bandwidth
Figure 15. Cross-Validation Function for Local Linear Regression: Winning the Next Election
326 Journal of Economic Literature, Vol. XLVIII (June 2010)
the presence of discontinuities in the regres- and 0.37–0.57 for the probability of winning
sion function at points other than the cutoff (table 3). One set of models the goodness-of-
point. In that sense, it provides a falsification fit test does not rule out, however, is higher
test of the RD design by examining whether order polynomial models with small band-
there are other unexpected discontinuities in widths that tend to be imprecisely estimated
the regression function at randomly chosen as they “overfit” the data.
points (the bin thresholds). To see this, Looking informally at both the fit of the
rewrite ∑ k=1
K
k Bk as
ϕ model (goodness-of-fit test) and the preci-
sion of the estimates (standard errors) sug-
K K gests the following strategy: use higher
∑ ϕk Bk = ϕ1 + ∑ (ϕk − ϕk−1) B +
k
, order polynomials for large bandwidths of
k=1 k=2 0.50 and more, lower order polynomials for
bandwidths between 0.05 and 0.50, and zero
where B +k =
∑ j=k
K
Bj is a dummy variable order polynomials (comparisons of means)
indicating that the observation is in bin for bandwidths of less than 0.05, since the
k or above, i.e., that the assignment variable latter specification passes the goodness-
X is above the bin cutoff bk. Testing whether of-fit test for these very small bandwidths.
all the ϕk − ϕk−1 are equal to zero is equiva- Interestingly, this informal approach more or
lent to testing that all the ϕk are the same less corresponds to what is suggested by the
(the above test), which amounts to testing AIC. In this specific example, it seems that
that the regression line does not jump at the given a specific bandwidth, the AIC provides
bin thresholds bk. reasonable suggestions on which order of the
Tables 2 and 3 show the estimates of the polynomial to use.
treatment effect for the voting example. For
4.3.3 Estimation in the Fuzzy RD Design
the sake of completeness, a wide range of
bandwidths and specifications are presented, As discussed earlier, in both the “sharp”
along with the corresponding p-values for and the “fuzzy” RD designs, the probability
the goodness-of fit test discussed above (a of treatment jumps discontinuously at the
bandwidth of 0.01 is used for the bins used cutoff point. Unlike the case of the sharp RD
to construct the test). We also indicate at the where the probability of treatment jumps
bottom of the tables the order of the polyno- from 0 to 1 at the cutoff, in the fuzzy RD
mial selected for each bandwidth using the case, the probability jumps by less than one.
AIC. Note that the estimates of the treat- In other words, treatment is not solely deter-
ment effect for the “order zero” polynomi- mined by the strict cutoff rule in the fuzzy
als are just comparisons of means on the two RD design. For example, even if eligibility for
sides of the cutoff point, while the estimates a treatment solely depends on a cutoff rule,
for the “order one” polynomials are based on not all the eligibles may get the treatment
(local) linear regressions. because of imperfect compliance. Similarly,
Broadly speaking, the goodness-of-fit tests program eligibility may be extended in some
do a very good job ruling out clearly mis- cases even when the cutoff rule is not satis-
specified models, like the zero order poly- fied. For example, while Medicare eligibility
nomials with large bandwidths that yield is mostly determined by a cutoff rule (age 65
upward biased estimates of the treatment or older), some disabled individuals under
effect. Estimates from models that pass the age of 65 are also eligible.
the goodness-of-fit test mostly fall in the Since we have already discussed the inter-
0.05–0.10 range for the vote share (table 2) pretation of estimates of the treatment effect
328 Journal of Economic Literature, Vol. XLVIII (June 2010)
in a fuzzy RD design in section 3.4.1, here we (12) Y = αr + τr T + fr (X − c) + εr,
just focus on estimation and implementation
issues. The key message to remember from where τr = τ δ. In this setting, τr can be
the earlier discussion is that, as in a standard interpreted as an “intent-to-treat” effect.
IV framework, the estimated treatment effect Estimation in the fuzzy RD design can
can be interpreted as a local average treat- be performed using either the local linear
ment effect, provided monotonicity holds. regression approach or polynomial regres-
In the fuzzy RD design, we can write the sions. Since the model is exactly identified,
probability of treatment as 2SLS estimates are numerically identical to
the ratio of reduced form coefficients τr/δ,
Pr (D = 1 | X = x) = γ + δT + g(x − c), provided that the same bandwidth is used
for equations (11) and (12) in the local lin-
where T = 1[X ≥ c] indicates whether the ear regression case, and that the same order
assignment variable exceeds the eligibil- of polynomial is used for g(∙) and f (∙) in the
ity threshold c.38 Note that the sharp RD is polynomial regression case.
a special case where γ = 0, g(∙) = 0, and In the case of the local linear regression,
δ = 1. It is advisable to draw a graph for Imbens and Lemieux (2008) recommend
the treatment dummy D as a function of the using the same bandwidth in the treatment
assignment variable X using the same proce- and outcome regression. When we are close
dure discussed in section 4.1. This provides to a sharp RD design, the function g(∙)
an informal way of seeing how large the is expected to be very flat and the optimal
jump in the treatment probability δ is at the bandwidth to be very wide. In contrast, there
cutoff point, and what the functional form is no particular reason to expect the func-
g(∙) looks like. tion f (∙) in the outcome equation to be flat
Since D = Pr(D = 1 | X = x) + ν, where or linear, which suggests the optimal band-
ν is an error term independent of X, the width would likely be less than the one for
fuzzy RD design can be described by the two the treatment equation. As a result, Imbens
equation system: and Lemieux (2008) suggest focusing on the
outcome equation for selecting bandwidth,
(10) Y = α + τD + f(X − c) + ε, and then using the same bandwidth for the
treatment equation.
(11) D = γ + δT + g(X − c) + ν. While using a wider bandwidth for the
treatment equation may be advisable on
Looking at these equations suggests esti- efficiency grounds, there are two practi-
mating the treatment effect τ by instru- cal reasons that suggest not doing so. First,
menting the treatment dummy D with using different bandwidths complicates the
T. Note also that substituting the treat- computation of standard errors since the
ment determining equation into the out- outcome and treatment samples used for the
come equation yields the reduced form estimation are no longer the same, meaning
the usual 2SLS standard errors are no longer
valid. Second, since it is advisable to explore
38 Although the probability of treatment is modeled as the sensitivity of results to changes in the
a linear probability model here, this does not impose any bandwidth, “trying out” separate bandwidths
restrictions on the probability model since g(x − c) is unre- for each of the two equations would lead to
stricted on both sides of the cutoff c, while T is a dummy
variable. So there is no need to write the model using a a large and difficult-to-interpret number of
probit or logit formulation. specifications.
Lee and Lemieux: Regression Discontinuity Designs in Economics 329
The same broad arguments can be used in 4.4.1 Inspection of the Histogram of the
the case of local polynomial regressions. In Assignment Variable
principle, a lower order of polynomial could
be used for the treatment equation (11) than Recall that the underlying assumption
for the outcome equation (12). In practice, that generates the local random assignment
however, it is simpler to use the same order result is that each individual has impre-
of polynomial and just run 2SLS (and use cise control over the assignment variable,
as defined in section 3.1.1. We cannot test
2SLS standard errors).
this directly (since we will only observe one
4.3.4 How to Compute Standard Errors? observation on the assignment variable per
individual at a given point in time), but an
As discussed above, for inference in the intuitive test of this assumption is whether
sharp RD case we can use standard least the aggregate distribution of the assignment
squares methods. As usual, it is recom- variable is discontinuous, since a mixture of
mended to use heteroskedasticity-robust individual-level continuous densities is itself
standard errors (Halbert White 1980) instead a continuous density.
of standard least squares standard errors. One McCrary (2008) proposes a simple two-
additional reason for doing so in the RD case step procedure for testing whether there is a
is to ensure the standard error of the treat- discontinuity in the density of the assignment
ment effect is the same when either a pooled variable. In the first step, the assignment vari-
regression or two separate regressions on able is partitioned into equally spaced bins
each side of the cutoff are used to compute and frequencies are computed within those
the standard errors. As we just discussed, it bins. The second step treats the frequency
is also straightforward to compute standard counts as a dependent variable in a local linear
errors in the fuzzy RD case using 2SLS meth- regression. See McCrary (2008), who adopts
ods, although robust standard errors should the nonparametric framework for asymptot-
also be used in this case. Imbens and Lemieux ics, for details on this procedure for inference.
(2008) propose an alternative way of comput- As McCrary (2008) points out, this test can
ing standard errors in the fuzzy RD case, but fail to detect a violation of the RD identifica-
nonetheless suggest using 2SLS standard tion condition if for some individuals there is
errors readily available in econometric soft- a “jump” up in the density, offset by jumps
“down” for others, making the aggregate den-
ware packages.
sity continuous at the threshold. McCrary
One small complication that arises in the
(2008) also notes it is possible the RD esti-
nonparametric case of local linear regres- mate could remain unbiased, even when
sions is that the usual (robust) standard errors there is important manipulation of the assign-
from least squares are only valid provided that ment variable causing a jump in the density.
h ∝ N −δ with 1/5 < δ < 2/5. As we men- It should be noted, however, that in order to
tioned earlier, this is not a very important point rely upon the RD estimate as unbiased, one
in practice, and the usual standard errors can needs to invoke other identifying assumptions
be used with local linear regressions. and cannot rely upon the mild conditions we
4.4 Implementing Empirical Tests of RD focus on in this article.39
Validity and Using Covariates
In this part of the section, we describe 39 McCrary (2008) discusses an example where students
how to implement tests of the validity of the who barely fail a test are given extra points so that they
barely pass. The RD estimator can remain unbiased if one
RD design and how to incorporate covariates assumes that those who are given extra points were chosen
in the analysis. randomly from those who barely failed.
330 Journal of Economic Literature, Vol. XLVIII (June 2010)
2.50
2.00
1.50
1.00
0.50
0.00
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
Figure 16. Density of the Forcing Variable (Vote Share in Previous Election)
One of the examples McCrary uses for his e stimation, replacing the dependent vari-
test is the voting model of Lee (2008) that able with each of the observed baseline
we used in the earlier empirical examples. covariates in W. A discontinuity would indi-
Figure 16 shows a graph of the raw densi- cate a violation in the underlying assump-
ties computed over bins with a bandwidth tion that predicts local random assignment.
of 0.005 (200 bins in the graph), along with Intuitively, if the RD design is valid, we
a smooth second order polynomial model. know that the treatment variable cannot
Consistent with McCrary (2008), the graph influence variables determined prior to the
shows no evidence of discontinuity at the realization of the assignment variable and
cutoff. McCrary also shows that a formal treatment assignment; if we observe it does,
test fails to reject the null hypothesis of no something is wrong in the design.
discontinuity in the density at the cutoff. If there are many covariates in W, even
abstracting from the possibility of misspecifi-
4.4.2 Inspecting Baseline Covariates
cation of the functional form, some discon-
An alternative approach for testing the tinuities will be statistically significant by
validity of the RD design is to examine random chance. It is thus useful to combine
whether the observed baseline covariates the multiple tests into a single test statistic to
are “locally” balanced on either side of the see if the data are consistent with no discon-
threshold, which should be the case if the tinuities for any of the observed covariates.
treatment indicator is locally randomized. A simple way to do this is with a Seemingly
A natural thing to do is conduct both Unrelated Regression (SUR) where each
a graphical RD analysis and a formal equation represents a different baseline
Lee and Lemieux: Regression Discontinuity Designs in Economics 331
covariate, and then perform a χ2 test for within a narrower window around the cut-
the discontinuity gaps in all questions being off point, such as the one suggested by the
zero. For example, supposing the underlying bandwidth selection procedures discussed in
functional form is linear, one would estimate section 4.3.1.
the system Figure 17 shows the RD graph for a base-
line covariate, the Democratic vote share in
w1 = α1 + Dβ1 + Xγ1 + ε1 the election prior to the one used for the
…
… assignment variable (four years prior to
… the current election). Consistent with Lee
(2008), there is no indication of a disconti-
wK = αK + DβK + XγK + εK nuity at the cutoff. The actual RD estimate
using a quartic model is –0.004 with a stan-
and test the hypothesis that β1, …, βK are dard error of 0.014. Very similar results are
jointly equal to zero, where we allow the obtained using winning the election as the
ε’s to be correlated across the K equations. outcome variable instead (RD estimate of
Alternatively, one can simply use the OLS –0.003 with a standard error of 0.017).
estimates of β1, … , βK obtained from a
4.5 Incorporating Covariates in Estimation
“stacked” regression where all the equations
for each covariate are pooled together, while If the RD design is valid, the other use for
D and X are fully interacted with a set of K the baseline covariates is to reduce the sam-
dummy variables (one for each covariate wk). pling variability in the RD estimates. We dis-
Correlation in the error terms can then be cuss two simple ways to do this. First, one can
captured by clustering the standard errors “residualize” the dependent variable—sub-
on individual observations (which appear tract from Y a prediction of Y based on the
in the stacked dataset K times). Under the baseline covariates W—and then conduct an
null hypothesis of no discontinuities, the
ˆ ˆ −1 ˆ
RD analysis on the residuals. Intuitively, this
Wald test statistic N β ′ V
β
(where β
ˆ
is procedure nets out the portion of the varia-
the vector of estimates of β1 , …, βK, and V ˆ tion in Y we could have predicted using the
is the cluster-and-heteroskedasticity con- predetermined characteristics, making the
sistent
ˆ
estimate of the asymptotic variance question whether the treatment variable
of β
) converges in distribution to a χ2 with can explain the remaining residual varia-
K degrees of freedom. tion in Y. The important thing to keep in
Of course, the importance of functional mind is that if the RD design is valid, this
form for RD analysis means a rejection of procedure provides a consistent estimate of
the null hypothesis tells us either that the the same RD parameter of interest. Indeed,
underlying assumptions for the RD design any combination of covariates can be used,
are invalid, or that at least some of the equa- and abstracting from functional form issues,
tions are sufficiently misspecified and too the estimator will be consistent for the same
restrictive, so that nonzero discontinuities parameter, as discussed above in equation
are being estimated, even though they do (4). Importantly, this two-step approach also
not exist in the population. One could use allows one to perform a graphical analysis of
the parametric specification tests discussed the residual.
earlier for each of the individual equations to To see this more formally in the paramet-
see if misspecification of the functional form ric case, suppose one is willing to assume
is an important problem. Alternatively, the that the expectation of Y as a function of X
test could be performed only for observations is a polynomial, and the expectation of each
332 Journal of Economic Literature, Vol. XLVIII (June 2010)
0.80
0.70
0.60
0.50
0.40
0.30
0.20
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
affects Y, it can be shown that the inclusion involves limiting the estimation to a window
of these regressors will not affect the consis- of data around the threshold and using a lin-
tency of the estimator for τ.41 The advantage ear specification within that window.45 We
of this second approach is that under these note that as the neighborhood shrinks, the
functional form assumptions and with homo- true expectation of W conditional on X will
skedasticity, the estimator for τ is guaranteed become closer to ˜
being linear, and so equa-
to have a lower asymptotic variance.42 By tion (13) (with X containing only the linear
contrast, the “residualizing” approach can in term) will become a better approximation.
some cases raise standard errors.43 For the voting example used throughout
The disadvantage of solely relying upon this paper, Lee (2008) shows that adding a
this second approach, however, is that it does set of covariates essentially has no impact on
not help distinguish between an inappropri- the RD estimates in the model where the
ate functional form and discontinuities in W, outcome variable is winning the next elec-
as both could potentially cause the estimates tion. Doing so does not have a large impact
of τ to change significantly when W is includ- on the standard errors either, at least up
ed.44 On the other hand, the “residualizing” to the third decimal. Using the procedure
approach allows one to examine how well the based on residuals instead actually slightly
residuals fit the assumed order of polynomial increases the second step standard errors—
(using, for example, the methods described a possibility mentioned above. Therefore in
in subsection 4.3.2). If it does not fit well, this particular example, the main advantage
then it suggests that the use of that order of of using baseline covariates is to help estab-
polynomial with the second approach is not lish the validity of the RD design, as opposed
justified. Overall, one sensible approach is to to improving the efficiency of the estimators.
directly enter the covariates, but then to use
4.6 A Recommended “Checklist” for
the “residualizing” approach as an additional
Implementation
diagnostic check on whether the assumed
order of the polynomial is justified. Below is a brief summary of our recom-
As discussed earlier, an alternative mendations for the analysis, presentation,
approach to estimating the discontinuity and estimation of RD designs.
˜ γ +
˜ ˜ ˜ , imply-
41 To see this, rewrite equation (13) as Y = Dτ + X the footnote above, e = 0, and thus D − X d = D
˜
Da + X b + Wc + μ, where a, b, c, and μ are linear projec- ing that the denominator in the ratio does not change when
tion coefficients ˜
and the residual from a population regres- W is included.
sion ε on D, X , and W. If a = 0, then adding W will not 43 From equation (14), the regression error variance
affect the coefficient on D. This will be true—applying the will increase if V(ε − uπ) > V(ε) ⇔ V(uπ) − 2C(ε, uπ) >
Frisch–Waugh ˜
theorem—when the covariance between 0, which will hold when, for example, ε is orthogonal to u
ε and D − X d − We ˜
(where d and e are coefficients from and π is nonzero.
projecting D on X and W ) is zero. This will be true when 44 If the true equation for W contains more polyno-
˜
e = 0, ˜
because ε is by assumption orthogonal to both D mial terms than X , then e, as defined in the preceding
and X . Applying the Frisch–Waugh theorem again, ˜
e is the footnotes (the coefficient obtained ˜
by regressing D on the
coefficient obtained by regressing D on W − X δ ≡ u; by residual from projecting W on X ), will not be zero. This
assumption u and D are uncorrelated, so e = 0. implies that including W will generally lead to inconsis-
42 The asymptotic variance for the least squares esti- tent estimates of τ, and may ˜
cause the asymptotic
variance
mator (without
including W) of τ is given by the ratio to increase (since V(D − X d − We) ≤ V( D ˜ )).
V(ε)/V( D ˜ ) where D˜ is the residual from the population 45 And we have noted that one can justify this by assum-
˜
regression of D on X . If W is included, then the least ing that in that specified neighborhood, the underlying
squares
˜
estimator has asymptotic variance of σ 2/V(D − function is in fact linear, and make standard parametric
d − We), where σ 2 is the variance of the error when W
X inferences. Or one can conduct a nonparametric inference
is included, ˜
and d and e are coefficients from projecting approach by making assumptions about the rate at which
D on X and W. σ 2 cannot exceed V(ε), and as shown in the bandwidth shrinks as the sample size grows.
334 Journal of Economic Literature, Vol. XLVIII (June 2010)
0.18
0.16
95 percent
0.14 confidence bands
Estimated treatment effect
0.12
0.10
0.08
0.06
0.00
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Bandwidth
Figure 18. Local Linear Regression with Varying Bandwidth: Share of Vote at Next Election
simpler here, as bin dummies are replaced 5.2 Panel Data and Fixed Effects
by dummies for each value of the discrete
assignment variable. In the presence of het- In some situations, the RD design will
eroskedasticity, the goodness-of-fit test can be embedded in a panel context, whereby
be computed by estimating the model and period by period, the treatment variable is
testing whether a set of dummies for each determined according to the realization of
value of the discrete assignment variable are the assignment variable X. Again, it seems
jointly significant. In that setting, the test sta- natural to propose the model
tistic follows a chi-square distribution with
J − K degrees of freedom. Yit = Ditτ + f (Xit; γ) + ai + εit
In Lee and Card (2008), the difference
between the true conditional expectation (where i and t denote the individuals and
E [Y | X = x] and the estimated regression time, respectively), and simply estimate a
function forming the basis of the goodness- fixed effects regression by including indi-
of-fit test is interpreted as a random specifi- vidual dummy variables to capture the unit-
cation error that introduces a group structure specific error component, ai . It is important
in the standard errors. One way of correcting to note, however, that including fixed effects
the standard errors for group structure is to is unnecessary for identification in an RD
run the model on cell means.46 Another way design. This sharply contrasts with a more
is to “cluster” the standard errors. Note that traditional panel data setting where the error
in this setting, the goodness-of-fit test can also component ai is allowed to be correlated
be interpreted as a test of whether standard with the observed covariates, including the
errors should be adjusted for the group struc- treatment variable Dit, in which case includ-
ture. In practice, it is nonetheless advisable to ing fixed effects is essential for consistently
either group the data or cluster the standard estimating the treatment effect τ.
errors in micro-data models irrespective of the An alternative is to simply conduct the
results of the goodness-of-fit test. The main RD analysis for the entire pooled-cross-
purpose of the test should be to help choose a section dataset, taking care to account for
reasonably accurate regression model. within-individual correlation of the errors
Lee and Card (2008) also discuss a num- over time using clustered standard errors.
ber of issues including what to do when The source of identification is a compari-
specification errors under treatment and son between those just below and above
control are correlated, and how to possibly the threshold, and can be carried out with
adjust the RD estimates in the presence of a single cross-section. Therefore, impos-
specification errors. Since these issues are ing a specific dynamic structure intro-
beyond the scope of this paper, interested duces more restrictions without any gain in
readers should consult Lee and Card (2008) identification.
for more detail. Time dummies can also be treated like
any other baseline covariate. This is appar-
ent by applying the main RD identification
46 When the discrete assignment variable—and the
result: conditional on what period it is, we
“treatment” dummy solely dependent on this variable—is are assuming the density of X is continuous
the only variable used in the regression model, standard at the threshold and, hence, conditional on
OLS estimates will be numerically equivalent to those X, the probability of an individual observa-
obtained by running a weighted regression on the cell
means, where the weights are the number of observations tion coming from a particular period is also
(or the sum of individual weights) in each cell. continuous.
338 Journal of Economic Literature, Vol. XLVIII (June 2010)
We note that it becomes a little bit more In summary, one can utilize the panel
awkward to use the justification proposed in nature of the data by conducting an RD
subsection 4.5 for directly including dum- analysis on the entire dataset, using lagged
mies for individuals and time periods on variables as baseline covariates for inclusion
the right hand side of the regression. This as described in subsection 4.5. The primary
is because the assumption would have to caution in doing this is to ensure that for
be that the probability that an observation each period, the included covariates are the
belonged to each individual (or the probabil- variables determined prior to the present
ity that an observation belonged to each time period’s realization of Xit.
period) is a polynomial function in X and,
strictly speaking, nontrivial polynomials are
6. Applications of RD Designs in
not bounded between 0 and 1.
Economics
A more practical concern is that inclusion
of individual dummy variables may lead to an In what areas has the RD design been
increase in the variance of the RD estimator applied in economic research? Where do
for another reason. If there is little “within- discontinuous rules come from and where
unit” variability in treatment status, then might we expect to find them? In this section,
the variation in the main variable of interest we provide some answers to these questions
(treatment after partialling out the individual by providing a survey of the areas of applied
heterogeneity) may be quite small. Indeed, economic research that have employed the
seeing standard errors rise when including RD design. Furthermore, we highlight some
fixed effects may be an indication of a mis- examples from the literature that illustrate
specified functional form.47 what we believe to be the most important
Overall, since the RD design is still valid elements of a compelling, “state-of-the-art”
ignoring individual or time effects, then the implementation of RD.
only rationale for including them is to reduce
6.1 Areas of Research Using RD
sampling variance. But there are other ways
to reduce sampling variance by exploiting the As we suggested in the introduction, the
structure of panel data. For instance, we can notion that the RD design has limited appli-
treat the lagged dependent variable Yit−1 as cability to a few specific topics is inconsistent
simply another baseline covariate in period with our reading of existing applied research
t. In cases where Yit is highly persistent over in economics. Table 5 summarizes our survey
time, Yit−1 may well be a very good predic- of empirical studies on economic topics that
tor and has a very good chance of reducing have utilized the RD design. In compiling
the sampling error. As we have also discussed this list, we searched economics journals as
earlier, looking at possible discontinuities in well as listings of working papers from econ-
baseline covariates is an important test of the omists, and chose any study that recognized
validity of the RD design. In this particular the potential use of an RD design in their
case, since Yit can be highly correlated with given setting. We also included some papers
Yit−1, finding a discontinuity in Yit but not from non-economists when the research was
in Yit−1 would be a strong piece of evidence closely related to economic work.
supporting the validity of the RD design. Even with our undoubtedly incomplete
compilation of over sixty studies, table 5
illustrates that RD designs have been applied
in many different contexts. Table 5 summa-
47 See the discussion in section 4.5. rizes the context of the study, the outcome
Lee and Lemieux: Regression Discontinuity Designs in Economics 339
Table 5
Regression Discontinuity Applications in Economics
Table 5 (continued)
Regression Discontinuity Applications in Economics
Jacob and Lefgren (2004b) Elementary schools, Test scores Summer school Standardized test
Chicago attendance, grade scores
retention
Leuven, Lindahl, Primary schools, Test scores Extra funding Percent disadvantaged
Oosterbeek, and Netherlands minority pupils
Webbink (2007)
Matsudaira (2008) Elementary schools, Test scores Summer school, Test scores
Northeastern United grade promotion
States
Urquiola (2006) Elementary schools, Test scores Class size Student enrollment
Bolivia
Urquiola and Class size sorting- RD Test scores Class size Student enrollment
Verhoogen (2009) violations, Chile
Van der Klaauw College enrollment, Enrollment Financial Aid offer SAT scores, GPA
(2002, 1997) East Coast College
Van der Klaauw (2008a) Elementary/middle Test scores, Title I federal funding Poverty rates
schools, New York student attendance
City
Labor Market
Battistin and Rettore Job training, Italy Employment rates Training program Attitudinal test score
(2002) (computer skills)
Behaghel, Crepon, Labor laws, France Hiring among age Tax exemption for Age of worker
and Sedillot (2008) groups hiring firm
Black, Smith, Berger, and UI claimants, Kentucky Earnings, benefit Mandatory reemploy- Profiling score
Noel (2003); Black, receipt/duration ment services (job (expected benefit
Galdo, and Smith (2007b) search assistance) duration)
Card, Chetty, Unemployment Unemployment Lump-sum severance Months employed,
and Weber (2007) benefits, Austria duration pay, extended UI job tenure
benefits
Chen and van der Klaauw Disability insurance Labor force Disability insurance Age at disability
(2008) beneficiaries, participation benefits decision
United States
De Giorgi (2005) Welfare-to-work Re-employment Job search assistance, Age at end of
program, United probability training, education unemployment spell
Kingdom
DiNardo and Lee (2004) Unionization, Wages, employment, Union victory in NLRB Vote share
United States output election
Dobkin and Individuals, California Educational attainment, Age at school entry Birthdate
Ferreira (2009) and Texas wages
Edmonds (2004) Child labor supply and Child labor supply, school Pension receipt of oldest Age
school attendance, attendance family member
South Africa
Hahn, Todd, and Discrimination, Minority employment Coverage of federal Number of employees
van der Klaauw (1999) United States antidiscrimination law at firm
Lalive (2008) Unemployment Unemployment Maximum benefit Age at start of
Benefits, Austria duration duration unemployment
spell, geographic
location
Lee and Lemieux: Regression Discontinuity Designs in Economics 341
Table 5 (continued)
Regression Discontinuity Applications in Economics
Table 5 (continued)
Regression Discontinuity Applications in Economics
Crime
Berk and DeLeeuw (1999) Prisoner behavior in Inmate misconduct Prison security levels Classification score
California
Berk and Rauma (1983) Ex-prisoners recidivism, Arrest, parole violation Unemployment Reported hours of
California insurance benefit work
Chen and Shapiro (2004) Ex-prisoners recidivism, Arrest rates Prison security levels Classification score
United States
Lee and McCrary (2005) Criminal offenders, Arrest rates Severity of sanctions Age at arrest
Florida
Hjalmarsson (2009) Juvenile offenders, Recidivism Sentence length Criminal history score
Washington State
Environment
Chay and Greenstone Health effects of Infant mortality Regulatory status Pollution levels
(2003) pollution, United States
Chay and Greenstone Valuation of air quality, Housing prices Regulatory status Pollution levels
(2005) United States
Davis (2008) Restricted driving Hourly air pollutant Restricted automobile Time
policy, Mexico measures use
Greenstone and Gallagher Hazardous waste, Housing prices Superfund clean-up Ranking of level of
(2008) United States status hazard
Other
Battistin and Rettore Mexican anti-poverty School attendance Cash grants Pre-assigned
(2008) program probability of being
(PROGRESA) poor
Baum-Snow and Marion Housing subsidies, Residents’ Increased subsidies Percentage of eligible
(2009) United States characteristics, new households in area
housing construction
Buddelmeyer and Skoufias Mexican anti-poverty Child labor and Cash grants Pre-assigned
(2004) program school attendance probability of being
(PROGRESA) poor
Buettner (2006) Fiscal equalization Business tax rate Implicit marginal tax Tax base
across municipalities, rate on grants to
Germany localities
Card, Mas, and Rothstein Racial segregation, Changes in census tract Minority share exceeding Initial minority share
(2008) United States racial composition “tipping” point
Cole (2009) Bank nationalization, Share of credit granted Nationalization of Size of bank
India by public banks private banks
Edmonds, Mammen, and Household structure, Household composition Pension receipt of Age
Miller (2005) South Africa oldest family member
Ferreira (2007) Residential mobility, Household mobility Coverage of tax benefit Age
California
Pence (2006) Mortgage credit, Size of loan State mortgage credit Geographic location
United States laws
Pitt and Khandker (1998) Poor households, Labor supply, children Group-based credit Acreage of land
Bangladesh school enrollment program
Pitt, Khandker, McKernan, Poor households, Contraceptive use, Group-based credit Acreage of land
and Latif (1999) Bangladesh Childbirth program
Lee and Lemieux: Regression Discontinuity Designs in Economics 343
variable, the treatment of interest, and the esting “treatment” to examine since it seems
assignment variable employed. nothing more than an arbitrary label. On the
While the categorization of the various other hand, if failing the exam meant not
studies into broad areas is rough and some- being able to advance to the next grade in
what arbitrary, it does appear that a large share school, the actual experience of treated and
come from the area of education, where the control individuals is observably different, no
outcome of interest is often an achievement matter how large or small the impact on the
test score and the assignment variable is also outcome.
a test score, either at the individual or group As another example, in the U.S. Congress,
(school) level. The second clearly identifiable a Democrat obtaining the most votes in
group are studies that deal with labor market an election means something real—the
issues and outcomes. This probably reflects Democratic candidate becomes a represen-
that, within economics, the RD design has so tative in Congress; otherwise, the Democrat
far primarily been used by labor economists, has no official role in the government. But
and that the use of quasi-experiments and in a three-way electoral race, the treatment
program evaluation methods in documenting of the Democrat receiving the second-most
causal relationships is more prevalent in labor number of votes (versus receiving the low-
economics research. est number) is not likely a treatment of inter-
There is, of course, nothing in the struc- est: only the first-place candidate is given
ture of the RD design tying it specifically to any legislative authority. In principle, stories
labor economics applications. Indeed, as the could be concocted about the psychological
rest of the table shows, the remaining half effect of placing second rather than third
of the studies are in the areas of political in an election, but this would be an exam-
economy, health, crime, environment, and ple where the salience of the treatment is
other areas. more speculative than when treatment is a
concrete and observable event (e.g., a can-
6.2 Sources of Discontinuous Rules
didate becoming the sole representative of a
Where do discontinuous rules come from, constituency).
and in what situations would we expect to
encounter them? As table 5 shows, there is 6.2.1 Necessary Discretization
a wide variety of contexts where discontinu-
ous rules determine treatments of interest. Many discontinuous rules come about
There are, nevertheless, some patterns that because resources cannot, for all practical
emerge. We organize the various discontinu- purposes, be provided in a continuous man-
ous rules below. ner. For example, a school can only have a
Before doing so, we emphasize that a good whole number of classes per grade. For
RD analysis—as with any other approach a fixed level of enrollment, the moment a
to program evaluation—is careful in clearly school adds a single class, the average class
spelling out exactly what the treatment is, size drops. As long as the number of classes
and whether it is of any real salience, inde- is an increasing function of enrollment,
pendent of whatever effect it might have on there will be discontinuities at enrollments
the outcome. For example, when a pretest where a teacher is added. If there is a man-
score is the assignment variable, we could dated maximum for the student to teacher
always define a “treatment” as being “having ratio, this means that these discontinuities
passed the exam” (with a test score of 50 per- will be expected at enrollments that are
cent or higher), but this is not a very inter- exact multiples of the maximum. This is the
344 Journal of Economic Literature, Vol. XLVIII (June 2010)
essence of the discontinuous rules used in identified three broad motivations behind
the analyses of Angrist and Lavy (1999), M. the use of these discontinuous rules.
Niaz Asadullah (2005), Caroline M. Hoxby First, a number of rules seem driven
(2000), Urquiola (2006), and Urquiola and by a compensatory or equalizing motive.
Verhoogen (2009). For example, in Kenneth Y. Chay, Patrick
Another example of necessary discretiza- J. McEwan and Urquiola (2005), Edwin
tion arises when children begin their school- Leuven et al. (2007), and van der Klaauw
ing years. Although there are certainly (2008a), extra resources for schools were allo-
exceptions, school districts typically follow a cated to the neediest communities, either on
guideline that aims to group children together the basis of school-average test scores, dis-
by age, leading to a grouping of children advantaged minority proportions, or poverty
born in year-long intervals, determined by rates. Similarly, Ludwig and Miller (2007),
a single calendar date (e.g., September 1). Erich Battistin and Enrico Rettore (2008),
This means children who are essentially of and Hielke Buddelmeyer and Emmanuel
the same age (e.g., those born on August 31 Skoufias (2004) study programs designed to
and September 1), start school one year apart. help poor communities, where the eligibility
This allocation of students to grade cohorts of a community is based on poverty rates. In
is used in Elizabeth U. Cascio and Ethan G. each of these cases, one could imagine pro-
Lewis (2006), Dobkin and Fernando Ferreira viding the most resources to the neediest and
(2009), and McCrary and Royer (2003). gradually phasing them out as the need index
Choosing a single representative by way declines, but in practice this is not done, per-
of an election is yet another example. When haps because it was impractical to provide
the law or constitution calls for a single rep- very small levels of the treatment, given the
resentative of some constituency and there fixed costs in administering the program.
are many competing candidates, the choice A second motivation for having a discon-
can be made via a “first-past-the-post” or tinuous rule is to allocate treatments on the
“winner-take-all” election. This is the typi- basis of some measure of merit. This was
cal system for electing government officials the motivation behind the merit award from
at the local, state, and federal level in the the analysis of Thistlethwaite and Campbell
United States. The resulting discontinuous (1960), as well as recent studies of the effect
relationship between win/loss status and the of financial aid awards on college enroll-
vote share is used in the context of the U.S. ment, where the assignment variable is some
Congress in Lee (2001, 2008), Lee, Enrico measure of student achievement or test
Moretti and Matthew J. Butler (2004), score, as in Thomas J. Kane (2003) and van
David Albouy (2009), Albouy (2008), and in der Klaauw (2002).
the context of mayoral elections in Ferreira Finally, we have observed that a number
and Joseph Gyourko (2009). The same idea of discontinuous rules are motivated by the
is used in examining the impacts of union need to most effectively target the treatment.
recognition, which is also decided by a secret For example, environmental regulations or
ballot election (DiNardo and Lee 2004). clean-up efforts naturally will focus on the
most polluted areas, as in Chay and Michael
6.2.2 Intentional Discretization
Greenstone (2003), Chay and Greenstone
Sometimes resources could potentially (2005), and Greenstone and Justin Gallagher
be allocated on a continuous scale but, in (2008). In the context of criminal behav-
practice, are instead done in discrete lev- ior, prison security levels are often assigned
els. Among the studies we surveyed, we based on an underlying score that quantifies
Lee and Lemieux: Regression Discontinuity Designs in Economics 345
potential security risks, and such rules were age. Receipt of pension benefits is typi-
used in Richard A. Berk and Jan de Leeuw cally tied to reaching a particular age (see
(1999) and M. Keith Chen and Jesse M. Eric V. Edmonds 2004; Edmonds, Kristin
Shapiro (2004). Mammen, and Miller 2005) and, in the
United States, eligibility for the Medicare
6.3 Nonrandomized Discontinuity Designs
program begins at age 65 (see Card,
Throughout this article, we have focused Dobkin, and Maestas 2008) and young
on regression discontinuity designs that fol- adults reach the legal drinking age at 21
low a certain structure and timing in the (see Christopher Carpenter and Dobkin
assignment of treatment. First, individuals 2009). Similarly, one is subject to the less
or communities—potentially in anticipa- punitive juvenile justice system until the
tion of the assignment of treatment—make age of majority (typically, 18) (see Lee and
decisions and act, potentially altering their McCrary 2005).
probability of receiving treatment. Second, These cases stand apart from the typical
there is a stochastic shock due to “nature,” RD designs discussed above because here
reflecting that the units have incomplete assignment to treatment is essentially inevi-
control over the assignment variable. And table, as all subjects will eventually age into
finally, the treatment (or the intention to the program (or, conversely, age out of the
treat) is assigned on the basis of the assign- program). One cannot, therefore, draw any
ment variable. parallels with a randomized experiment,
We have focused on this structure because which necessarily involves some ex ante
in practice most RD analyses can be viewed uncertainty about whether a unit ultimately
along these lines, and also because of the receives treatment (or the intent to treat).
similarity to the structure of a randomized Another important difference is that
experiment. That is, subjects of a random- the tests of smoothness in baseline char-
ized experiment may or may not make deci- acteristics will generally be uninformative.
sions in anticipation to participating in a Indeed, if one follows a single cohort over
randomized controlled trial (although their time, all characteristics determined prior to
actions will ultimately have no influence on reaching the relevant age threshold are by
the probability of receiving treatment). Then construction identical just before and after
the stochastic shock is realized (the random- the cutoff.48 Note that in this case, time is
ization). Finally, the treatment is adminis- the assignment variable, and therefore can-
tered to one of the groups. not be manipulated.
A number of the studies we surveyed This design and the standard RD share
though, did not seem to fit the spirit or the necessity of interpreting the discontinu-
essence of a randomized experiment. Since it ity as the combined effect of all factors that
is difficult to think of the treatment as being switch on at the threshold. In the example of
locally randomized in these cases, we will Thistlethwaite and Campbell (1960), if pass-
refer to the two research designs we identi- ing a scholarship exam provides the symbolic
fied in this category as “nonrandomized” dis-
continuity designs. 48 There are exceptions to this. There could be attrition
6.3.1 Discontinuities in Age with Inevitable over time, so that in principle, the number of observations
could discontinuously drop at the threshold, changing the
Treatment composition of the remaining observations. Alternatively,
when examining a cross-section of different birth cohorts at
Sometimes program status is turned a given point in time, it is possible to have sharp changes in
on when an individual reaches a certain the characteristics of individuals with respect to birthdate.
346 Journal of Economic Literature, Vol. XLVIII (June 2010)
honor of passing the exam as well as a mon- no discontinuity at age 67, when Social
etary award, the true treatment is a pack- Security benefits commence payment. On
age of the two components, and one cannot the other hand, some medical procedures
attribute any effect to only one of the two. are too expensive for an under-65-year-old
Similarly, when considering an age-activated but would be covered under Medicare upon
treatment, one must consider the possibility turning 65. In this case, individuals’ greater
that the age of interest is causing eligibility awareness of such a predicament will tend to
for potentially many other programs, which increase the size of the discontinuity in uti-
could affect the outcome. lization of medical procedures with respect
There are at least two new issues that to age (e.g., see Card, Dobkin, and Maestas
are irrelevant for the standard RD but are 2008).
important for the analysis of age discontinui- At this time we are unable to provide any
ties. First, even if there is truly an effect on more specific guidelines for analyzing these
the outcome, if the effect is not immediate, age/time discontinuities since it seems that
it generally will not generate a discontinu- how one models expectations, information,
ity in the outcome. For example, suppose and behavior in anticipation of sharp changes
the receipt of Social Security benefits has in regimes will be highly context-dependent.
no immediate impact but does have a long- But it does seem important to recognize
run impact on labor force participation. these designs as being distinct from the stan-
Examining the labor force behavior as a dard RD design.
function of age will not yield a discontinuity We conclude by emphasizing that when
at age 67 (the full retirement age for those distinguishing between age-triggered treat-
born after 1960), even though there may be ments and a standard RD design, the involve-
a long-run effect. It is infeasible to estimate ment of age as an assignment variable is not
long-run effects because by the time we as important as whether the receipt of treat-
examine outcomes five years after receiving ment—or analogously, entering the control
the treatment, for example, those individuals state—is inevitable. For example, on the sur-
who were initially just below and just above face, the analysis of the Medicaid expansions
age 67 will be exposed to essentially the same in Card and Lara D. Shore-Sheppard (2004)
length of time of treatment (e.g., five years).49 appears to be an age-based discontinuity
The second important issue is that, be- since, effective July 1991, U.S. law requires
cause treatment is inevitable with the pas- states to cover children born after September
sage of time, individuals may fully anticipate 30, 1983, implying a discontinuous relation-
the change in the regime and, therefore, may ship between coverage and age, where the
behave in certain ways prior to the time when discontinuity in July 1991 was around 8 years
treatment is turned on. Optimizing behavior of age. This design, however, actually fits
in anticipation of a sharp regime change may quite easily into the standard RD framework
either accentuate or mute observed effects. we have discussed throughout this paper.
For example, simple life-cycle theories, First, note that treatment receipt is not
assuming no liquidity constraints, suggest inevitable for those individuals born near the
that the path of consumption will exhibit September 30, 1983, threshold. Those born
strictly after that date were covered from
July 1991 until their 18th birthday, while
49 By contrast, there is no such limitation with the
those born on or before the date received no
standard RD design. One can examine outcomes defined
at an arbitrarily long time period after the assignment to such coverage. Second, the data generating
treatment. process does follow the structure discussed
Lee and Lemieux: Regression Discontinuity Designs in Economics 347
above. Parents do have some influence streets, parks, commercial development, etc.
regarding when their children are born, but In order for this to resemble a more standard
with only imprecise control over the exact RD design, one would have to imagine the
date (and at any rate, it seems implausible relevant boundaries being set in a “random”
that parents would have anticipated that such way, so that it would be simply luck deter-
a Medicaid expansion would have occurred mining whether a house ended up on either
eight years in the future, with the particular side of the boundary. The concern over the
birthdate cutoff chosen). Thus the treatment endogeneity of boundaries is clearly recog-
is assigned based on the assignment variable, nized by Black (1999), who “. . . [b]ecause
which is the birthdate in this context. of concerns about neighborhood differences
Examples of other age-based discontinui- on opposite sides of an attendance district
ties where neither the treatment nor control boundary, . . . was careful to omit boundar-
state is guaranteed with the passage of time ies from [her] sample if the two attendance
that can also be viewed within the standard districts were divided in ways that seemed
RD framework include studies by Cascio and to clearly divide neighborhoods; attendance
Lewis (2006), McCrary and Royer (2003), districts divided by large rivers, parks, golf
Dobkin and Ferreira (2009), and Phillip courses, or any large stretch of land were
Oreopoulos (2006). excluded.” As one could imagine, the selec-
tion of which boundaries to include could
6.3.2 Discontinuities in Geography
quickly turn into more of an art than a science.
Another “nonrandomized” RD design We have no uniform advice on how to ana-
is one involving the location of residences, lyze geographic discontinuities because it
where the discontinuity threshold is a bound- seems that the best approach would be par-
ary that demarcates regions. Black (1999) ticularly context-specific. It does, however,
and Patrick Bayer, Ferreira, and Robert seem prudent for the analyst, in assessing
McMillan (2007) examine housing prices on the internal validity of the research design,
either side of school attendance boundaries to carefully consider three sets of questions.
to estimate the implicit valuation of different First, what is the process that led to the loca-
schools. Lavy (2006) examines adjacent tion of the boundaries? Which came first:
neighborhoods in different cities, and there- the houses or the boundaries? Were the
fore subject to different rules regarding boundaries a response to some preexisting
student busing. Rafael Lalive (2008) com- geographical or political constraint? Second,
pares unemployment duration in regions in how might sorting of families or the endog-
Austria receiving extended benefits to adja- enous location of houses affect the analysis?
cent control regions. Karen M. Pence (2006) And third, what are all the things differing
examines census tracts along state borders between the two regions other than the treat-
to examine the impact of more borrower- ment of interest? An exemplary analysis and
friendly laws on mortgage loan sizes. discussion of these latter two issues in the
In each of these cases, it is awkward to context of school attendance zones is found
view either houses or families as locally ran- in Bayer, Ferreira, and McMillan (2007).
domly assigned. Indeed this is a case where
economic agents have quite precise control
7. Concluding Remarks on RD Designs in
over where to place a house or where to live.
Economics: Progress and Prospects
The location of houses will be planned in
response to geographic features (rivers, lakes, Our reading of the existing and active lit-
hills) and in conjunction with the planning of erature is that—after being largely ignored
348 Journal of Economic Literature, Vol. XLVIII (June 2010)
50 For example, Trochim (1984) characterizes the three make it clear that the existence of a deterministic rule for
central assumptions of the RD design as: (1) perfect adher- the assignment of treatment is not sufficient for unbiased-
ence to the cutoff rule, (2) having the correct functional ness, and it is necessary to assume the influence of all other
form, and (3) no other factors (other than the program of factors (omitted variables) are the same on either side of the
interest) cause the discontinuity. More recently, William discontinuity threshold (i.e., their continuity assumption).
R. Shadish, Cook, and Campbell (2002) claim on page 243 51 Urquiola and Verhoogen (2009) emphasize the sort-
that the proof of the unbiasedness of RD primarily follows ing issues may well be specific to the liberalized nature of
from the fact that treatment is known perfectly once the the Chilean primary school market, and that they may or
assignment variable is known. They go on to argue that this may not be present in other countries.
deterministic rule implies omitted variables will not pose 52 See, for example, footnote 23 in van der Klaauw
a problem. But Hahn, Todd, and van der Klaauw (2001) (1997) and page 549 in Angrist and Lavy (1999).
Lee and Lemieux: Regression Discontinuity Designs in Economics 349
is the notion that we can empirically some control over the assignment vari-
examine the degree of sorting, and one able, as long as this control is impre-
way of doing so is suggested in McCrary cise—that is, the ex ante density of the
(2008). assignment variable is continuous—the
consequence will be local randomiza-
• RD Designs as Locally Randomized tion of the treatment. So in a number
Experiments: Economists are hesitant of nonexperimental contexts where
to apply methods that have not been resources are allocated based on a sharp
rigorously formalized within an econo- cutoff rule, there may indeed be a hid-
metric framework, and where crucial den randomized experiment to utilize.
identifying assumptions have not been And furthermore, as in a randomized
clearly specified. This is perhaps one of experiment, this implies that all observ-
the reasons why RD designs were under- able baseline covariates will locally have
utilized by economists for so long, since it the same distribution on either side of
is only relatively recently that the under- the discontinuity threshold—an empiri-
lying assumptions needed for the RD cally testable proposition.
were formalized.53 In the recent litera- We view the testing of the continuity
ture, RD designs were initially viewed of the baseline covariates as an impor-
as a special case of matching (Heckman, tant part of assessing the validity of any
Lalonde, and Smith 1999), or alterna- RD design—particularly in light of the
tively as a special case of IV (Angrist and incentives that can potentially generate
Krueger 1999), and these perspectives sorting—and as something that truly
may have provided empirical researchers sets RD apart from other evaluation
a familiar econometric framework within strategies. Examples of this kind of test-
which identifying assumptions could be ing of the RD design include Jordan D.
more carefully discussed. Matsudaira (2008), Card, Raj Chetty
Today, RD is increasingly recognized and Andrea Weber (2007), DiNardo
in applied research as a distinct design and Lee (2004), Lee, Moretti and Butler
that is a close relative to a randomized (2004), McCrary and Royer (2003),
experiment. Formally shown in Lee Greenstone and Gallagher (2008), and
(2008), even when individuals have Urquiola and Verhoogen (2009).
53 An example of how economists’/econometricians’ nonzero”(p. 647). After reading the article, an econometri-
notion of a proof differs from that in other disciplines is cian will recognize the discussion above not as a proof of
found in Cook (2008), who views the discussion in Arthur the validity of the RD, but rather as a restatement of the
S. Goldberger (1972a) and Goldberger (1972b) as the first consequence of z being an indicator variable determined
“proof of the basic design,” quoting the following passage by an observed variable x, in a specific parameterized
in Goldberger (1972a) (brackets from Cook 2008): “The example. Today we know the existence of such a rule is
explanation for this serendipitous result [no bias when not sufficient for a valid RD design, and a crucial neces-
selection is on an observed pretest score] is not hard to sary assumption is the continuity of the influence of all
locate. Recall that z [a binary variable representing the other factors, as shown in Hahn, Todd, and van der Klaauw
treatment contrast at the cutoff] is completely determined (2001). In Goldberger (1972a), the role of the continuity of
by pretest score x [an obtained ability score]. It cannot omitted factors was not mentioned (although it is implicitly
contain any information about x* [true ability] that is not assumed in the stylized model of test scores involving nor-
contained within x. Consequently, when we control on x mally distributed and independent errors). Indeed, appar-
as in the multiple regression, z has no explanatory power ently Goldberger himself later clarified that he did not set
with respect to y [the outcome measured with error]. More out to propose the RD design, and was instead interested
formally, the partial correlation of y and z controlling on in the issues related to selection on observables and unob-
x vanishes although the simple correlation of y and z is servables (Cook 2008).
350 Journal of Economic Literature, Vol. XLVIII (June 2010)
Albouy, David. 2009. “Partisan Representation in Con- Services More Effective Than the Services Them-
gress and the Geographic Distribution of Federal selves? Evidence from Random Assignment in the
Funds.” National Bureau of Economic Research UI System.” American Economic Review, 93(4):
Working Paper 15224. 1313–27.
Angrist, Joshua D. 1990. “Lifetime Earnings and the Black, Sandra E. 1999. “Do Better Schools Matter?
Vietnam Era Draft Lottery: Evidence from Social Parental Valuation of Elementary Education.” Quar-
Security Administrative Records.” American Eco- terly Journal of Economics, 114(2): 577–99.
nomic Review, 80(3): 313–36. Blundell, Richard, and Alan Duncan. 1998. “Kernel
Angrist, Joshua D., and Alan B. Krueger. 1999. “Empir- Regression in Empirical Microeconomics.” Journal
ical Strategies in Labor Economics.” In Handbook of of Human Resources, 33(1): 62–87.
Labor Economics, Volume 3A, ed. Orley Ashenfelter Buddelmeyer, Hielke, and Emmanuel Skoufias. 2004.
and David Card, 1277–1366. Amsterdam; New York “An Evaluation of the Performance of Regression
and Oxford: Elsevier Science, North-Holland. Discontinuity Design on PROGRESA.” World Bank
Angrist, Joshua D., and Victor Lavy. 1999. “Using Mai- Policy Research Working Paper 3386.
monides’ Rule to Estimate the Effect of Class Size Buettner, Thiess. 2006. “The Incentive Effect of Fis-
on Scholastic Achievement.” Quarterly Journal of cal Equalization Transfers on Tax Policy.” Journal of
Economics, 114(2): 533–75. Public Economics, 90(3): 477–97.
Asadullah, M. Niaz. 2005. “The Effect of Class Size on Campbell, Donald T., and Julian C. Stanley. 1963.
Student Achievement: Evidence from Bangladesh.” “Experimental and Quasi-experimental Designs for
Applied Economics Letters, 12(4): 217–21. Research on Teaching.” In Handbook of Research on
Battistin, Erich, and Enrico Rettore. 2002. “Testing for Teaching, ed. N. L. Gage, 171–246. Chicago: Rand
Programme Effects in a Regression Discontinuity McNally.
Design with Imperfect Compliance.” Journal of the Canton, Erik, and Andreas Blom. 2004. “Can Student
Royal Statistical Society: Series A (Statistics in Soci- Loans Improve Accessibility to Higher Education
ety), 165(1): 39–57. and Student Performance? An Impact Study of
Battistin, Erich, and Enrico Rettore. 2008. “Ineli- the Case of SOFES, Mexico.” World Bank Policy
gibles and Eligible Non-participants as a Double Research Working Paper 3425.
Comparison Group in Regression-Discontinuity De- Card, David, Raj Chetty, and Andrea Weber. 2007.
signs.” Journal of Econometrics, 142(2): 715–30. “Cash-on-Hand and Competing Models of Inter-
Baum-Snow, Nathaniel, and Justin Marion. 2009. “The temporal Behavior: New Evidence from the Labor
Effects of Low Income Housing Tax Credit Devel- Market.” Quarterly Journal of Economics, 122(4):
opments on Neighborhoods.” Journal of Public Eco- 1511–60.
nomics, 93(5–6): 654–66. Card, David, Carlos Dobkin, and Nicole Maestas. 2008.
Bayer, Patrick, Fernando Ferreira, and Robert McMil- “The Impact of Nearly Universal Insurance Cover-
lan. 2007. “A Unified Framework for Measuring age on Health Care Utilization: Evidence from Medi-
Preferences for Schools and Neighborhoods.” Jour- care.” American Economic Review, 98(5): 2242–58.
nal of Political Economy, 115(4): 588–638. Card, David, Carlos Dobkin, and Nicole Maestas. 2009.
Behaghel, Luc, Bruno Crépon, and Béatrice Sédillot. “Does Medicare Save Lives?” Quarterly Journal of
2008. “The Perverse Effects of Partial Employment Economics, 124(2): 597–636.
Protection Reform: The Case of French Older Work- Card, David, Alexandre Mas, and Jesse Rothstein.
ers.” Journal of Public Economics, 92(3–4): 696–721. 2008. “Tipping and the Dynamics of Segregation.”
Berk, Richard A., and Jan de Leeuw. 1999. “An Evalu- Quarterly Journal of Economics, 123(1): 177–218.
ation of California’s Inmate Classification System Card, David, and Lara D. Shore-Sheppard. 2004.
Using a Generalized Regression Discontinuity “Using Discontinuous Eligibility Rules to Identify
Design.” Journal of the American Statistical Associa- the Effects of the Federal Medicaid Expansions on
tion, 94(448): 1045–52. Low-Income Children.” Review of Economics and
Berk, Richard A., and David Rauma. 1983. “Capital- Statistics, 86(3): 752–66.
izing on Nonrandom Assignment to Treatments: A Carpenter, Christopher, and Carlos Dobkin. 2009.
Regression-Discontinuity Evaluation of a Crime- “The Effect of Alcohol Consumption on Mortality:
Control Program.” Journal of the American Statisti- Regression Discontinuity Evidence from the Mini-
cal Association, 78(381): 21–27. mum Drinking Age.” American Economic Journal:
Black, Dan A., Jose Galdo, and Jeffrey A. Smith. 2007a. Applied Economics, 1(1): 164–82.
“Evaluating the Regression Discontinuity Design Cascio, Elizabeth U., and Ethan G. Lewis. 2006.
Using Experimental Data.” Unpublished. “Schooling and the Armed Forces Qualifying Test:
Black, Dan A., Jose Galdo, and Jeffrey A. Smith. 2007b. Evidence from School-Entry Laws.” Journal of
“Evaluating the Worker Profiling and Reemploy- Human Resources, 41(2): 294–318.
ment Services System Using a Regression Disconti- Chay, Kenneth Y., and Michael Greenstone. 2003.
nuity Approach.” American Economic Review, 97(2): “Air Quality, Infant Mortality, and the Clean Air Act
104–07. of 1970.” National Bureau of Economic Research
Black, Dan A., Jeffrey A. Smith, Mark C. Berger, and Working Paper 10053.
Brett J. Noel. 2003. “Is the Threat of Reemployment Chay, Kenneth Y., and Michael Greenstone. 2005.
Lee and Lemieux: Regression Discontinuity Designs in Economics 353
“Does Air Quality Matter? Evidence from the Hous- upport and Elderly Living Arrangements in a Low-
S
ing Market.” Journal of Political Economy, 113(2): Income Country.” Journal of Human Resources,
376–424. 40(1): 186–207.
Chay, Kenneth Y., Patrick J. McEwan, and Miguel Fan, Jianqing, and Irene Gijbels. 1996. Local Polyno-
Urquiola. 2005. “The Central Role of Noise in mial Modelling and Its Applications. London; New
Evaluating Interventions That Use Test Scores to York and Melbourne: Chapman and Hall.
Rank Schools.” American Economic Review, 95(4): Ferreira, Fernando. Forthcoming. “You Can Take It
1237–58. With You: Proposition 13 Tax Benefits, Residential
Chen, M. Keith, and Jesse M. Shapiro. 2004. “Does Mobility, and Willingness to Pay for Housing Ameni-
Prison Harden Inmates? A Discontinuity-Based ties.” Journal of Public Economics.
Approach.” Yale University Cowles Foundation Dis- Ferreira, Fernando, and Joseph Gyourko. 2009. “Do
cussion Paper 1450. Political Parties Matter? Evidence from U.S. Cities.”
Chen, Susan, and Wilbert van der Klaauw. 2008. “The Quarterly Journal of Economics, 124(1): 399–422.
Work Disincentive Effects of the Disability Insur- Figlio, David N., and Lawrence W. Kenny. 2009.
ance Program in the 1990s.” Journal of Economet- “Public Sector Performance Measurement and
rics, 142(2): 757–84. Stakeholder Support.” Journal of Public Economics,
Chiang, Hanley. 2009. “How Accountability Pressure 93(9–10): 1069–77.
on Failing Schools Affects Student Achievement.” Goldberger, Arthur S. 1972a. “Selection Bias in Evalu-
Journal of Public Economics, 93(9–10): 1045–57. ating Treatment Effects: Some Formal Illustrations.”
Clark, Damon. 2009. “The Performance and Competi- Unpublished.
tive Effects of School Autonomy.” Journal of Political Goldberger, Arthur S. 1972b. Selection Bias in Evalu-
Economy, 117(4): 745–83. ating Treatment Effects: The Case of Interaction.
Cole, Shawn. 2009. “Financial Development, Bank Unpublished.
Ownership, and Growth: Or, Does Quantity Imply Goodman, Joshua. 2008. “Who Merits Financial Aid?:
Quality?” Review of Economics and Statistics, 91(1): Massachusetts’ Adams Scholarship.” Journal of Pub-
33–51. lic Economics, 92(10–11): 2121–31.
Cook, Thomas D. 2008. “‘Waiting for Life to Arrive’: Goolsbee, Austan, and Jonathan Guryan. 2006. “The
A History of the Regression-Discontinuity Design Impact of Internet Subsidies in Public Schools.”
in Psychology, Statistics and Economics.” Journal of Review of Economics and Statistics, 88(2): 336–47.
Econometrics, 142(2): 636–54. Greenstone, Michael, and Justin Gallagher. 2008.
Davis, Lucas W. 2008. “The Effect of Driving Restric- “Does Hazardous Waste Matter? Evidence from
tions on Air Quality in Mexico City.” Journal of Politi- the Housing Market and the Superfund Program.”
cal Economy, 116(1): 38–81. Quarterly Journal of Economics, 123(3): 951–1003.
De Giorgi, Giacomo. 2005. “Long-Term Effects of a Guryan, Jonathan. 2001. “Does Money Matter?
Mandatory Multistage Program: The New Deal for Regression-Discontinuity Estimates from Education
Young People in the UK.” Institute for Fiscal Studies Finance Reform in Massachusetts.” National Bureau
Working Paper 05/08. of Economic Research Working Paper 8269.
DesJardins, Stephen L., and Brian P. McCall. 2008. Hahn, Jinyong. 1998. “On the Role of the Propensity
“The Impact of the Gates Millennium Scholars Pro- Score in Efficient Semiparametric Estimation of
gram on the Retention, College Finance- and Work- Average Treatment Effects.” Econometrica, 66(2):
Related Choices, and Future Educational Aspirations 315–31.
of Low-Income Minority Students.” Unpublished. Hahn, Jinyong, Petra Todd, and Wilbert van der
DiNardo, John, and David S. Lee. 2004. “Economic Klaauw. 1999. “Evaluating the Effect of an Antidis-
Impacts of New Unionization on Private Sector crimination Law Using a Regression-Discontinuity
Employers: 1984–2001.” Quarterly Journal of Eco- Design.” National Bureau of Economic Research
nomics, 119(4): 1383–1441. Working Paper 7131.
Ding, Weili, and Steven F. Lehrer. 2007. “Do Peers Hahn, Jinyong, Petra Todd, and Wilbert van der
Affect Student Achievement in China’s Secondary Klaauw. 2001. “Identification and Estimation of
Schools?” Review of Economics and Statistics, 89(2): Treatment Effects with a Regression-Discontinuity
300–312. Design.” Econometrica, 69(1): 201–09.
Dobkin, Carlos, and Fernando Ferreira. 2009. “Do Heckman, James J. 1978. “Dummy Endogenous Vari-
School Entry Laws Affect Educational Attainment ables in a Simultaneous Equation System.” Econo-
and Labor Market Outcomes?” National Bureau of metrica, 46(4): 931–59.
Economic Research Working Paper 14945. Heckman, James J., Robert J. Lalonde, and Jeffrey A.
Edmonds, Eric V. 2004. “Does Illiquidity Alter Child Smith. 1999. “The Economics and Econometrics of
Labor and Schooling Decisions? Evidence from Active Labor Market Programs.” In Handbook of
Household Responses to Anticipated Cash Trans- Labor Economics, Volume 3A, ed. Orley Ashenfelter
fers in South Africa.” National Bureau of Economic and David Card, 1865–2097. Amsterdam; New York
Research Working Paper 10265. and Oxford: Elsevier Science, North-Holland.
Edmonds, Eric V., Kristin Mammen, and Douglas L. Heckman, James J., and Edward Vytlacil. 2005. “Struc-
Miller. 2005. “Rearranging the Family? Income tural Equations, Treatment Effects, and Econometric
354 Journal of Economic Literature, Vol. XLVIII (June 2010)
Congress 2000 Contributed Paper 1373. Social Security Notch.” Review of Economics and
Pettersson-Lidbom, Per. 2008a. “Does the Size of the Statistics, 88(3): 482–95.
Legislature Affect the Size of Government? Evidence Thistlethwaite, Donald L., and Donald T. Campbell.
from Two Natural Experiments.” Unpublished. 1960. “Regression-Discontinuity Analysis: An Alter-
Pettersson-Lidbom, Per. 2008b. “Do Parties Matter for native to the Ex Post Facto Experiment.” Journal of
Economic Outcomes? A Regression-Discontinuity Educational Psychology, 51(6): 309–17.
Approach.” Journal of the European Economic Asso- Trochim, William M. K. 1984. Research Design for
ciation, 6(5): 1037–56. Program Evaluation: The Regression-Discontinuity
Pitt, Mark M., and Shahidur R. Khandker. 1998. “The Approach. Beverly Hills: Sage Publications.
Impact of Group-Based Credit Programs on Poor Urquiola, Miguel. 2006. “Identifying Class Size Effects in
Households in Bangladesh: Does the Gender of Developing Countries: Evidence from Rural Bolivia.”
Participants Matter?” Journal of Political Economy, Review of Economics and Statistics, 88(1): 171–77.
106(5): 958–96. Urquiola, Miguel, and Eric A. Verhoogen. 2009. “Class-
Pitt, Mark M., Shahidur R. Khandker, Signe-Mary Size Caps, Sorting, and the Regression-Discontinu-
McKernan, and M. Abdul Latif. 1999. “Credit Pro- ity Design.” American Economic Review, 99(1):
grams for the Poor and Reproductive Behavior in 179–215.
Low-Income Countries: Are the Reported Causal van der Klaauw, Wilbert. 1997. “A Regression-Dis-
Relationships the Result of Heterogeneity Bias?” continuity Evaluation of the Effect of Financial Aid
Demography, 36(1): 1–21. Offers on College Enrollment.” New York University
Porter, Jack. 2003. “Estimation in the Regression Dis- C.V. Starr Center for Applied Economics Working
continuity Model.” Unpublished. Paper 10.
Powell, James L. 1994. “Estimation of Semiparametric van der Klaauw, Wilbert. 2002. “Estimating the Effect
Models.” In Handbook of Econometrics, Volume 4, of Financial Aid Offers on College Enrollment: A
ed. Robert F. Engle and Daniel L. McFadden, 2443– Regression-Discontinuity Approach.” International
2521. Amsterdam; London and New York: Elsevier, Economic Review, 43(4): 1249–87.
North-Holland. van der Klaauw, Wilbert. 2008a. “Breaking the Link
Shadish, William R., Thomas D. Cook, and Donald T. between Poverty and Low Student Achievement:
Campbell. 2002. Experimental and Quasi-Experi- An Evaluation of Title I.” Journal of Econometrics,
mental Designs for Generalized Causal Inference. 142(2): 731–56.
Boston: Houghton Mifflin. van der Klaauw, Wilbert. 2008b. “Regression-Disconti-
Silverman, Bernard W. 1986. Density Estimation for nuity Analysis: A Survey of Recent Developments in
Statistics and Data Analysis. London and New York: Economics.” Labour, 22(2): 219–45.
Chapman and Hall. White, Halbert. 1980. “A Heteroskedasticity-Consistent
Snyder, Stephen E., and William N. Evans. 2006. “The Covariance Matrix Estimator and a Direct Test for
Effect of Income on Mortality: Evidence from the Heteroskedasticity.” Econometrica, 48(4): 817–38.
This article has been cited by:
1. Guido W. Imbens. 2010. Better LATE Than Nothing: Some Comments on Deaton (2009) and
Heckman and Urzua (2009)Better LATE Than Nothing: Some Comments on Deaton (2009) and
Heckman and Urzua (2009). Journal of Economic Literature 48:2, 399-423. [Abstract] [View PDF
article] [PDF with links]