You are on page 1of 27
Panel Data Econometrics MANUEL ARELLANO OXFORD OXFORD Great Clarendon Set, Onfnd OX2 6DP ‘Oxford University Presi depen ofthe University of Ono Ie firers the Universi dbjesne of exelene in esac, Sshlrhp, ‘hd eoucaon by puting wedi ‘Oxford New York Auckland Bangkok Buenas Altes Cape Town Chen Dats Slt Del Hong Korg santa Karat Kolats ‘Kuala Lumpur Maré Metourne Mevico City Muna Nao "Sho Paulo Shang Tape Tokyo Toroma Oxord is registered ade ask of Oxford University es Inthe UR at in cera other cones, Publised inthe United Stes by Oxford Universi Press Ine, New York © Mane! Ang 2005 “The mol is of he author hve Been assed ‘Drshse rig Oxlond Unversity Press nt) Fist published 2003, Alig served. No pr ofthis publication way be epradsed sod in aetval sytem or rane nay thor yay eos, ‘tlt he poe pemison in ting of Oxo Lnnsriy Pes ‘or aexpesty peri by au ore ems teed with especie Teprotrape gis epaioaion. Enquiries conceringtepoeion couside he scope of he toe tu beset othe Rigs Depart, ‘Oxford Unieriy essa he Sees above, ‘You must ot sic this book in any ter binding or cover “sh you must impose hs sane Conon ery segue, ‘Brsh spay Catloguing in Paitin Dats ate abe Livary of Congress Cataloging in Publistion Data ‘ats abe ISBN 0-19 924508-2 ISBN 0-19 924529-0 Pb) s79used ‘ypese bythe shor Pemed in ex Bets [haceste pape by idles Ld Kings 1p. Norfolk ‘To Olympia, Manuel, and Jaime 2 Unobserved Heterogeneity ‘The econometric interest on panel data, specially in microcconometric applien- tions, has been the result of at least two different types of motivation ‘+ First, the desire of exploiting panel data for controlling unobserved time- invariant heterogeneity in cross-sectional models. ‘= Second, the use of panel data as a way of disentangling comporents of variance and estimating transition probabilities among states, and more generally to study the dynamics of cross-sectional populatior ‘These motivations can be loosely associated with two strands of the panel data literature labelled fired effects and random effects models. We next take these two motivations and model types in tum. First in the context cf static models iu Patt I, and then in the context of dynamic models in Parts IT and TL . 2.1 Overview |A sizeable part of econometric netivity deals with empirical description and foreensting, but anothor part sins at quantifying strictnral or cansal wlation- ships. Structural relations are nocdled for policy evaluation and often for testing theoties, ‘The regression niodel isan essential statistical tool for both descriptive and structural econometrics. However, regression lines from economic da:a often cannot be given a causal interpretation. he reason being that in the relation of interest between observables and unobservables we might expect explanatory variables to be correlated with unobservables, whereas in a regression model ‘regressors and unobservables are uncorrelated by construction. 7 8 Unobserved Heterogencity ‘There are several instances in which we would expect corelation between ‘observables aad unobservables. One is the classical sopply-and-demand simul- taneity problem due to time aggregation and market equilibrium. ‘That is, a regression of quantity on prie could not be interpreted as a demand equation because we would expect an unobservable exogenous shift in demand to affect not only purchases but also prices through the supply side effect of quantities on proes. Another is measurement error: if the explanatory variable we observe is not the variasle to whom agents respond but an error ridden measure of it, the unobservable term in the equation of interest will contain the measurement cxror which will be correlated with the regressor. Finally, there may be correlation due to unobserved heterogeneity. This has been a pervasive problem in cross-sectional regression analysis. If eharac- teristics that have a dizecteflet on both let- and right-hand side variables are omitted, explanatory variables will be correlated with errors and rogression coefficients wil be biased measures of the structural elects. ‘Thus researchers have often been confronted with massive eross-sectional data sets from which preciso correlations can be determined but that, nevertheless, had no informa- tion about parameters of policy interest. ‘The traditional response of econometrics Lo these probloms has been mule tiple regression and instrumental variable models. Regrettably, although the statistical theory of the problem is well understood, we often lack data on the conditioning variables or the instruments to achieve identification of structural parameters in that way. ‘A major motivation for using panel data has been the ability to control for possibly corelated, time-invariant heterogeneity without observing it ‘Suppose & cross-sectional regression of the form wa = Brg en + @ay such thot Eva | sraym) = 0. If ny is observed {ean be identified from a multiple regression of y'on « and 7. If n, is not observed identification of 6 requites either lack of correlation between ‘ta and my in which case or the availability of an exterual instrument =; that is uncorrelated witi both vq and 9, but correlated with 2, in which ease Cov(zyyir) 0 8 Conlaccza) Cov zn) Suppose that neither of these two options is available, bu and ig for the same individuals in a second period (so that 7 swe observe tua 2) such that 2.1 Overview ° Ee Brin tng +002 (2.2) rand both vy and uy satisfy E(w | x12, 202,74) =O. Then 6 is identified in the regression in first-differences even if 7, is not observed. We have Blain — va ws ra) + (a ta) 23) Cov(Arn, Avia) Var(Qza} es A Classical Example: Agricultural Cobb-Douglas Production Func- tion (Mundlak, 1961; Hoch, 1962; Chamberlain, 1984) Suppose equa- tion (2.1) represents a production function for an agricultural product. The Index i denotes farms and € time periods (seasons oF years). Al: ye = Log output. = Log of a vatiable input (labour). n= An input that remains constant over time (soil quality) te = A stochastie put which is outside the farmer's contra (rainfall) Suppose nis known by the farmer but not by the econometriian. If farmers maximize expected profits there will be a cros-setional corration between labour and soil quality. Therefore, the population coefficient in asimple regres- sion oft. on 24, ill differ from j. If were observed by the eeonometrician, the coollcient on © in a multiple cros-sectioual regression of guy on zu and 1; will coincide with 8. Now suppose that data on ya and xa for a second petiod become available. Moreover, suppose that rainfall inthe second period is unpredictable from rainfall inthe ist period (permanent differences in raine fall can be made part of n,), so that rainfall is independent of a farm’s labour demand in the two periods. ‘Thus, even in the absence of data en 1, the avail ability of panel data affords the identification of the technologeal parameter B. A Firm Money Demand Example (Mulligan, 1997; Bover and Watson, 2000) Suppose firms minimize cost for a given o1tput si, sub- ject to a production function si¢ = F(zi2) and to some transaction services sir = (avm{lPA)"”, shore x denotes @ composite input, m is demand for cash, ¢ is labour employed in transactions, and a represents the firm's financial sophistication. There will be economies of scale in the demand for money by firms if ¢ #1. The resulting money demand equation is log my = k + clog si — blog Ry/.) — logar + 2 es) Here k is a constant, 2 is the opportunity eost of holding money, w is the wage ‘of workers involved in transaction services, and v is a measurement error in the 10 Unobsorved Heterogeneity demand for cash. In general a will be correlated with output through the eash- in-advance constraint. Thus, the coefficient of output (or sales) in a regression ‘of logy on logs and log(R/w) will not coincide with the scale parameter of Interest. However if firm panel data is available and a vaties across firms but rot over time in the period of analysis, economies of scale can be identified from the regression in changes. ‘An Example in which Panel Data Does Not Work: Returns to Education “Structural” returns to education are important in the assess ‘ment of educational policies. It has been widely believed in the literature that crassscetional regression estimates ofthe returns eould not be trusted because of omitted “ability” that if correlated with education attainment would bias returns (ef. Gelches, 1977). In the earlier notation: ‘va = Log wage (ot earnings). ir = Years of full-time education 11, = Unobserved ability. {9 = Returns to education. “The problem in this example is that aye typically lacks time series variation, $6 a regression in first-differences will not be able to identify 6 in this case In this context data on siblings and cross-sectional instrumental variables have proved more useful for identifying return to schooling free of ability bis than panel data. This example illustrates a more general problem. Information about in the regression in fist-differences will depend on the ratio of the vaianees of Av and Az. In the earnings-education equation, we are in the extreme situation ‘where Var(Az) = 0, but if Var(cAz) is small regressions in changes may contain very little information about parametors of interest even if the eros-setional sample size is very large. Econometric Measurement versus Forecasting Problems ‘The pre- vious examples suggest that the ability to contral for unobserved heterogeneity is mainly an advantage in the context of problems of econometric measure ment as opposed to problems of forecasting. This is an important distinction, Including individual effects we manage to identify certain coefficients at the expense of leaving part of the regression uumodelled (the one that only has ‘ross-sectional variation). Note that the part of the variance of y accounted by 2 could be very small relative to 1 and e (5, 80, and 15 per cent would not be an unrealstie situation ‘Wiring the fn’ cont se Cie = mez + Ruamie + we (aif (25) wt fom te reorder condition AC /2m MeO), again Oot “OB 2.2 Fixed Fifects Models n in, for example, intertemporal labour supply models ofthe type considered by eeckman and MaCurdy (1980))? In a ease ike eis it is easy to obtain higher RE by including lagged dependent variables or proxies for the fixed effets Regressions ofthis type would be useful in cross-sectional forecasting exercises for the population from which the data come (like in credit seoring or in the ‘timation of probabilities of tax fraud), but they may be of no use if the objective i to measure the effect of 2 on y holding constant all ine-invariant hevorogencty ‘An equation with individual specific intercepts may still be useful when the interest i in forecasts forthe same individuals in different time poriods, but not ‘when we are interested in forecasts for individuals other than those included in the sample Non-Fxogeneity and Random Coefficients ‘The identification of causal cffects through regression coefficients in dilerences or deviations depends on the lack of correlation between + and v at all lags and leads (strict exogeneity), If-r is measuted with error (Chapter 4) or is correlated with lngged errors (Chapter 8), regressions in deviations may actually make things worse. Another difficulty arises when the effect of z on y is itself heterogencous. In such case regression coofficients in differences eannot in general be interpreted fs average causal effects. Specifically, suppose that fis allowed to vary eross- seetionally in (2.1) and (2.2) s0 that vie = Batic bmg tom (E12) E (wa eins 2i2) 8) (2.6) In these circumstances, the regression coefficient (2.4) differs from 1 (3,) unless {8 is mean independent of Ax. The availability of panel data still affords ‘ddomtification of average causal effects in random coefficients modls as long. as 2 is strictly exogenous. However, if x is not exogenous and fis heterogencous ‘we run into serious identification problems in short panels:* 2.2 Fixed Effects Models 2.2.1 Assumptions ur basic assumptions for what we call the “static fixed effects morte!” are as follows. We assume that {(Yay-~ YeryZity—+ZirsM ot = |} is a random 2ince 8 and 7 ate potentially corente, the variance of y ned not coincide with the sum ofthe variance of 23, 9 and 0 ‘See Griliches and Maree (190) fora castionagy tale on Sued cflects solitons, and an sessment of empneal production fonctions based on frtn pane data ‘Chamberlain (1902) considers the estimation of random coofcents model with strictly cxogenons variables. The problem of enibeston from shost pane with notrenogenous = ‘Sutscwned in Chamberlain (2003), and Arellano and Honoré 2001). Estimation fom long Iattrogencous pans couedered in Peearan and Sah (1905) 2 Unobserved Heterogeneity ‘sample and that ms en together with Assumption AL Blo [25.0) = 0 (l= Lyons (nym ve! and 2 = (Bi4yy er). We observe ya and the & x 1 vector of explanatory variales z_ but not 7, which i therefore an unobsers- able time-invariant sogressor. Similarly, we shall refer to classical” ervors when the additional auiliary sssunption holds: Assumption A2: Var(vs | 21.0) ol. Uncier Assumption A2 the errors are conditionally homoskedastic and not serially correlated Under Assumption AI we have Blue Bem) = Xi tne (2s) where i = (ayn) £8 a T x1 vector of ones, and X; = (za, aT xk matrix, zur) is implication of (2.8) for the expected value of y; given =; Ely | 2) = XB + Elm 2) (29) Moreover, under Assumption A2 Var(y | 25.15) = 08 Fr (2.10) which implies Vary: | 25) =2%lr + Varta, Lede! ny A weaker set of assumptions Assumption A1 Eley |zd PT). Assumption Az’ Var(e; | x) = 0% Although we shall often rely on the weaker assumption Evie | 24) ‘convenience—since many results of interest can be obtained with it 0 for in many 2.2 Fixed Bifects Models 13 applied instances it will be difficult to imagine how («| 4) = 0 would hold without (vie | ty74) = 0 also holding> ‘Another possibility is to replace mean independence assumptions by lack ‘of correlation assumptions, but similar remarks apply: in practice it may be dificult to imagine the linear projection conditions E*(ve | 2s) = 0 or E*(v | Zim) = 0 holding without the stronger mean independence conditions also holding, Nevertheless, lack of correlation may still bea convenient. way of providing a focus for the presentation of essential identification results NNotiee that under Assumptions AJ” and 2" we have the same expression for Bi | 20) 95 in (2.9) but a different one for Var(y, | 24) since m, and uy may be conditionally correlated given 2y Var(ue | 2e) = 07 + Var(y, | zie’ + Covlny v4 | ee + «Covln.v, | 2). (222) AJ (or A1’) is the fandamental assumption in this context. IL implies that the eror vat any period is uncorrelated with past, present, and future values of z (or, conversely, that at any period is uncorrelated with past, present, and future values of v). AJ is, therefore, an assumption of strict ‘ezagencity that rules out, for example, the possibility that current values of tare influenced by past errors. In the agricultural production funetion example, (labour) will be uncorrelated with v (rainfall) at all lags and leads provided the latter is unpredictable from past rainfall (given permanent differences in rainfall that would be subsumed in the farm effects, and possibly seasonal or other deterministic compouonts). If rainfall in period ¢ is predictable from rainfall in period ¢— 1 —which ts known to the farmer in t—labour demand in period ¢ will in general depend on exes) (Chamberlain, 1984, 1258-1259), Conditional models without strietly exogenous explanatory variables will be considered in Part IL ‘Assumption A? is, on the other hand, an anxiliary assumption under which Classical least-squares results are optimal. However, lack of compliance with, ‘A2 is often to be expected in applications. Here, we first present results under ‘A2, and subsequently discuss estimation and inference with heteroskedastic and serially correlated errors. As for the nature ofthe effects, strictly speaking, the term fived effects would rofer to a sampling process in which the samme units are (possibly) repeatedly sampled for a given period holding constant the effects. Tn such context one often has in mind a distribution of individual effects chosen by the researcher. Nove tha formally eves weaker aesimption would be Fi = een La) since this would be equivalent to saying Ut Ao | 21) could be an abitary fonction of EE selch doesnot vary with Homevet. Efe 124) ~ lt) for any we eould lays redefine nai vc a nf = n+ 20s) a of, = ie ~ eC respects, so that would Stil be fod overtime and Bt 2) “4 Unobserved Heterogeneity Tere we imagine sample randomly draw from a multivariate population of observable data and unobservable elects. ‘This notion may or may not corre- Spond tothe plysial nature of data collection. It would be s, for example, the case uf some household survey, but not sith data on all quoted firms oF OECD oomtries. In thove cases, the multivariate population from which the data are supposed to comme sa hypothetinl one. Moreover, we are interested in tnodels which only specify features ofthe conditional distribution f(y | ism) ‘Therefore, we ate not concemed with whether the distribution that generates the data on and ny (21,74) 88, is representative of some cross-sectional population or of the researcher's wishes. We just reeard (Yat.m) 8 8 fom sample from the (perhaps artical) multivariate poptlation with joint tribution f (vi) = f (| 207) F (it) and foeus on the conditional istribation of 1. S0 in common with mich of the econometric literature, two use the term fixed elfocts to refer to a situation im which f(y |) is lo Unrestricted 2.2.2 Within-Group Estimation With 7 = 2 there is just one equation after differencing. Under Assumptions AL and A2, the equation in first-difforonces is a classical regression model and hhence ordinary least-squares (OLS) in first-differences js the optimal estimator of J in the standard least-squares sense, ‘To soe the irrelevance of the equations in levels in this model, note that a non-singular transformation of the original ‘wo-cquation system is Elva |20) = 28+ Bal) B (Aya |i) = Aeig8. Since E(n; | 2.) is an unknown unrestricted fonction of 2, knowledge of the function F (ya |) is uninformative about B in the first equation. ‘Thus no information about is lost by only considering the equation in first-dilferences, If T > 3 we have a system of — 1 equations in first-ifferences: Aya = Axg8-+ Aue Aur = Axip8+ dur, Which in compact form can be written as Dy, = DXB+ Dv, 2.13) whore D is the (I~ 1) x T matrix first-ifference operator 2.2 Fixed Effects Models, D : 24) 0 0 0 aa Previded each of the errors in frst-dfferences are mean independent of ‘the 2s for all periods (under Assumption AI or AI") B(Dy, | 2;) = 0, OLS estimates of @ in this system given by Bou: (e (oxy bx) Lory pn (as) will be unbiased and consistent for large N. However, if the us are homoskedas- tie and non-autocorrlated classical errors (under Assumption A2 or A2"), the errors in first-differences will be correlated for adjacent periods with Var(Du | #4) “DD. 2.0) Following standard regression theory, the optimal estimator in this ease is siven by generalized least-squares (GLS), which takes the form oy Bw (ExPom px) ox (poy Dy. (27) Moreover, note that in this case GLS itself is a feasible estimator since DD! ‘does not depend on unknown cooficien's ‘The idempotent matrix D’ (DD’)~" D also takes the form® D'(DD)" D= Ip=ul fT = Q, say. (218) ‘The matrix Q is known as the deviations-frem-time-means or within-group operator because it transforms the original time series into deviations from time means: j = Qyy, whose elements are given by Te = We Te ty verity thi, note thatthe TT mate rung woy? lesuch that HOH = Fr 90 tha alo 20H = Ip ot WT (Do D= Hy. 16 Unobserved Heterogeneity with y= TST wus Therefore, Bye; can also be expressed as OLS in deviations from time means Bue ~ [SoMa 9 2} eB) ew). 219) “This is probably the most popular estimator in panel data analysis, and itis ‘known under a variety of names including within group and covariance estima. Its also known as the dummy-voriahle least-squares or “ved effets” ost rnator. This name reflects the fact that since Baya isa least-squares estimator alter subtracting individual means to the observations, itis numerically the same asthe estimator of 8 that would be obtained in @ OLS regression of y on {rand a att of N dummy variables, one foreach individual in the sample. ‘Thus Sy cam also be regard asthe rest of estimating jointly ky OLS 8 and the realizations ofthe individual efets that appear inthe sample. ‘To see this, consider the system of T equations in loves wa X Grn ‘and write it in stacked form as yaXG+On+0, (2.20) sty)! nd v= (of, 9 ¥p)! ate NT 1 vectors, X= (Xf, X$)" Js an Tx k matrix, C is an NT N matrix of individual dummies given by C= [wor, and 1 = (ny,--.ny)’ is the Nx 1 vector of individual specific effects or intercepts. Using the result from partitioned regression, the OLS regression of y on X and C gives the following expression for estimated [X! (Ir — C(C'CY1C') X]* X! (Ive —C(C'CY"C') » (2.21) which clearly coincides with By; since Ivr — C(C'C)"1C' = Iy @Q. “The expressions for the estimated effects ate _ 13 = FD (we -Bwe) = ») (2.22 We do not need to go beyond standard regression theory to obtain the sampling properties of these estimators. The fact that Biv; is the GLS for the system of T ~ 1 equations in first-diferences tells us that it will be unbiased. he name “within-group” originated in Ue coutest of data witha group structure (ike tata om families and amy members}. Banc data can be regarded ne special case of ts ‘ope ada whch “ons fred by te ite sos arsine foe vm Individu 2.2 Fixed Effects Models 7 and optimal in finite samples, It will aso be consistent as N tends to infinity for fixed T and asymptotically normal under usual regulatity conditions. ‘The 4, will also be unbiased estimates of the 1, for samples of any size, but being time series averages, their variance can only tend to zero ac T tends to infinity. ‘Therefore, they cannot be consistent estimates for fixed T and large N. Clearly, the within-group estimates Byyc; will also be consistent as T tends to infinity regardless of whether N is fixed or not. Fixed effects or analysis-of covariance models have along tradition in econo- rmetvies. ‘Their use was frst nggasted in two Cowles Comm ssion papers by Hil dreth (1949, 1950), and early applications were conducted by Mundlak (1961) and Hoch (1962). The motivation in these two studies vas to rely on fixed effects in order to control for simultaneity bias in the estimation of agricultural production functions. Orthogonal Deviations Finally, itis worth finding out the form of the transformation to the original data that results from doing first-diflerences and further applying a GLS transformation to the differenced data to remove the rmoving-average serial correlation induced by differencing (Arellano and Bover, 1995), ‘The required transformation is given by the (T'—1) x T matrix: (opry*? b. If we choose (D1D’)~"? to be the upper triangular Cholesky factorization, the operator A can be shown to take the form 4 where figgl(T —1){T, 0 /2)724* (=e! a-Tyet GzP GT aaTy O02 Gary. Bar QT! BT oa 0 1 2 1p 0 0 0 I -1 ‘Therefore, a T x 1 time sevies error transformed by A, u T —1 elements of the formn Ay; will consist of (oqenay + rl (2.23) where = (T—0)(P—t+1). Clearly, A'A = Q and AA’ = Ira. Wo then refer to this transformation as forward orthogonal deviations. Thus, if Var(t,) = ofr we also have Var(o?) = 2/1. So orthogonal deviations can be regarded as an alternative transformation, which in common with frst- ‘iffrencing eliminates individual effets but in contrast i dacs not introduce serial correlation in the transformed errors. Moreover, the within-group esti nator ea also be regarded as OLS in orthogonal deviations. In terms of the 18 Unobserved Heterogencity within-group algebra it makes no difference whether forward or backward ot- ‘hogonal deviations are used. However, forward orthogonal deviations will turn cout to be specially useful in the diseussion of dynamic models. 2.3. Heteroskedasticity and Serial Correlation 2.3.1 Robust Standard Errors for Within-Group Estima- tors If Assumption A holds but A2 does not (that is, using orthogonal deviations, if Blof | 2)) = O but Var(vt | x) # 07/1), the ordinary rexression formulae for estimating the within-group variance will lead to inconsistent standard errors. Such formula is given by Var(Bwya) = F(X" XY where X* = Uw @ AVX, 9 (2.24) Iv © Aly. and 3? is the unbiased residual Bm pyre —XBwad'ta" —X"Bwo) (225) However, sinee (208) Gee Hn pxer and E(Xj"vf) = 0, the right-hand side of the previous express sample average of zero-mean random variables to which standard theoretn for multivariate itd observations ean be applied for fixed T as N’ tends 1 infinity hats ot SN 0, EXT ef o'XI)} ‘Therefore, an estimate of the asymptotic variance of the withis-geoup for that is tobust to heteroskedasticity aud serial correlation of arbitrary forms for fixed T and large 0 cau be obtained as Var(Biya) = (XX (Saran ) vey? (2.26) with & = uf — X;Byye (Arellano, 1987). For large 7’ and fixed N, however, sch an estimate of the variance would not be consistent and an alternative estimate will be required. We next diseuss this ease 2.3 Heteroskedasticity and Serial Correlation 19 Robust Standard Brror for Lar and Fed 1 The previo siston thoy seh alae se ae in Apne toon eo tna independance With age ee NV we can allow for arbitrary cross-sectional dependence by relying on suffi- Sy wa ns es depeies jr) dese the within-group estimator of 8 ae mand Jee wie = (le dl) where di is @ V x 1 veetor with one in the i-th position and zero elsewhere, Moreover, let Y= pia FEST ncaa oon Too T A fat at or equivalently v= pint EEE wines w, (228) where . Win = Sowa A positive semi-definite estimator of V of the type suggested by Newey and West (1987) takes the form P=) SPotem (A+) 29) whore (2m) “ (m+), =p VN aGeco wen = 4 YS weer ean) and Te = yy —tie3yo — fi. The effect of the weight function w(2,m) is to smooth the sample autocovariance function by assigning decli sample autocovarianees as & inereases. Provided the data are a mixing sequence, and the bound m on the number of autocovariances used to form Vis chosen as a suitable function of T, V ean be shown to be & consistent estimator of V as T — =o for fixed V using Newey and West's Theorem ‘Therefore, an estimate of the asymptotic variance of the within-group esti- ‘ator of G auc 1 that is cobust to cross-sectional dependence, hetereskedastic- lty, and serial correlation of arbiteary forms, for mixing data with large T and fixed JV can be obtained as Fie =r(SSumt) “0(S sane). em at Met St ‘weights to 2» Unobserved Heterogeneity 2.3.2 Optimal GLS with Heteroskedasticity and Autocor- relation of Unknown Form Returning to a large N and fixed T environment, if Var(vf | xe) = (xs) where (x; is a symmetric matrix of order T containing unknown functions of x;, the optimal estimator of f will be of the form Vxy'oeay. 232) Bucus = (Saree ‘Thisestimator is unfeasible because Q(x) is unknown. A feasible semi paramet- ie GLS estimator would use instead a nonparametric estimator of (vey | z;) based on within-group residuals. Under appropriate regularity conditions and ‘8 suitable choice of nonparametric estimator, feasible GLS can be shown to attain for lerge NV the same eflicieney as Gye.s using the results in Robinson (198") ‘A special case which gives rise to a straightforward feasible GLS (for small TT and large N), first discussed by Kiefer (1980), is one in which the conditional variance of 9 is a constant but non-sealnr matrix: Var(u? | 24) = 9. ‘This assumption rules out conditional heteroskedasticity, but allows for autocorre- lation and unconditional time series heteroskodasticity in the original equation errors ti. In this case, a feasible GLS estimator takes the form (= xy-x2) xo (2.33) Brats whore @ is given by the orthogonal-devition WG residual iutertemporal co- variance miatrie (238) 2.3.3 Improved GMM and Minimum Distance Estima- tion under Heteroskedasticity and Autocorrelation of Unknown Form ‘The basic coudition E(o | 2) = 0 implies that any function of x; is uncorre- lated to vj and therefore a potential instramental variable. Thus, any list of ‘moment conditions of the forin B [hale ei ou y (2.35) for given fictions he(#s) sult that 8 i identified from (2.35), coud be used to obtain a consistent GMM estimator of 3. 2.3 Heteroskedasticity and Sorial Correlation a Under O(2;) = 0%F,+ the optimal moment conditions aze given by B(Xsot (238) in the sense that the vatiance of the corresponding optimal method-of-moments estimator (witich in this case is OLS in orthogonal deviations, or the WG estimator) cannot be reduced by using other funetions of 2, as instruments in addition to (2.36). Por arbitrary (24) the optimal moment conditions are BLXSO'a,)ef] 237 Which gives rive tothe optimal GLS estimator Bers aiven in (2.52) ‘The & moment conitions (2:7), however cannot be sitet usd be- cause 8(23) unknown. “The spe impeoved estimators that we consider im ths section are based on the fact that optimal GMM from a wider ist of moments than (2.36) can be asymptotically more efficient chan WG when Q(x;) # o7fr,, although not as efficient as optimal GLS. In particular, it seems natural to conser GMM estimators of the system of 7-1 equations in cthogonal deviations (ar fst-dierencs) sing the explanatory variables for all ime periods 0 sopaate Instruments foreach equaion Ell @x) =0. (2.288) ‘Note that the & moments in (2.36) are linear combinations of the much larger set of KT(T’— 1) moments contained in (2.38), Also, it ig convenient to write (2.38) as E (Apt) = Bait ~ X78) (239) {Fp_,@:e}). With this notation, the optimal GMM estimator from san [ Bee) (Ee) (Ee) () 20) Optimality requires that the weight matrix ayy fsa consistent estimate up to 2 nultplicative constant of the inverse of the variaice of the orthogonality conditions (Zire Under Assumption 42 F (Zfofes'Z) = a2E(Z1Z,), and therefore an op- tial choice is Ay = (52, 2/Z,)"'. "tn such a ease the resulting estimator is numerically the same as tte within-group estimator because the columns in X? te Hinear combinations of those i Z; “Appendic Iprovides a review of retlts on optimal instruments in conditional moves 2 Unobservod Heterogeneity ‘More generally, an optimal choice under heteroskedsticity and serial cor- relation of unknown form is given by (241) “The resulting estimator, Bocarns Say, will be asymptotically equivalent to WG Under Assumption A2 but strictly mote efficient for large NV when the assump- tion is viclated. Ie will, nevertheless, be inefficient relative to Byers» ‘The relationship among the large sample variances ofthe three estimators is there- fore Vor(Buars) < VarBocssar) < VarOwe)s ‘with equality in both eases when Assumption A2 holds Minirium Distance Estimators of the previous type were considered by Chamberlain (1982, 1954) who motivated them as minimum distance (MD) estimators from a linear projection of ys on fyi ~ 9(2i8)! A’'OMaIA le ~ 91.8). (2.68) which is a nonlinear version of the unfeasible GLS estimator given in (2.32). 2.5.2 Linear Structural Equation Inthe standard fixed effects mode! all the “endogeneity” of. in the relationship between y aud «is captured by the correlation between 2 and n since 2 and the time-varying error v are assumed to be uncorrelated at all Ings and leads We now consider an instrumental viable fixe effects model in sich x may also be coreelated with v, but a vector of instruments 2 = (ely vein) i tilable (posibly overlapping with some ofthe components of 2) which may te eoreelated with but not with v. ‘The form ofthe model is therefore wa = 20-4 0 together wit E(u) =0. (267) In common with the standard case, the levels are uninformative about in this model because B (n, | 2) is an unknown unrestricted function of 2. ‘Thus, the basie condition is F'(vf | 2)) =O and the unfeasible optimal instrumental- variable estimator is Bu = (e yx) DV Bey (2.68) sviere B()donotes the (1 ~ 1) x k matrix of optimal instruments given by (et. Newey, 1998, vl Appenx B) Biz) = ODE (XE |) (2.69) and (53) = Ver(op Feasible approaches to optimal estimation can be based on an estimator B(z0 of Bley). Alternatively, in parallel with the development in Section 2.3.3 ‘we may consider GNM estiniators based on the orthogonality conditious E@lox) =o (270) "The fortn of these estimators is the same as in (240) with Z; = (Er-1 8 =). Using the invene of (32, 272;) as weight matrix, we obtain the 2SLS-type Sn [ee ee) eof Brewe 2.5 Nonlinear Models with Additive Effects 29 (2¥2)(E%) (as) em and E(%; | 2) is linear in 2, then Byyayg and By ae asymptotically equivalent. On the other hand, i =y = ty the stalistc Bye, bolls down tothe ordinary within-group estimator 2.5.3 Nonlinear Simultaneous Equations Finally, we consider a system of g nonlinear simultaneous equations with addi- live effects. ‘The previous models can be regarded as special eases of this one with g = 1. We have: pal ti,8) tw (27) Ex [207 = 0 (C= yon) (273) ‘which can be stacked over time for individual i to give the g-equation system plri.®) = (vn) +4. In this model n, denotes a g x 1 vector of additive effets, and v, is'a vector of errors for different equations and time periods of dimension gf We consider estimation from the orthogonal-deviation condi restrictions E(of | x3) = 0 where vf = (A I)v,. The un instrumental variable estimator in this case solves the estimating equations Yo Bay (Ae foe 0) =0 (27) Bla) = (2.75) and O23) = Var(ut | 2) is a g(P—1) x oP before, feasible approaches include the use of estimated optimal instruments, and GMM estimation based on a particular cheiee of wnconditional moment restrictions, 1) covariance matrix. As 3 Error Components ‘The analysis in the previous chapter was motivated by the desire of ident regression coefficients that are fre from unobserved cross-sectional hetor ity bias. Another major motivation for using panel data is the possibility of separating out permanent from transitory components of variation, 3.1 A Variance Decomposition ‘The starting point of our discussion is a simple variance-components model of the form, sgt te (31) where 42 is an intercept, 1, ~ td(0,03), ve ~ fid(0,02), and ny and vy are independent of each other. ‘The cross-sectional variance of ya im any given period is given by (02 +02). This model tells us that a fraction 02/(a3 +a?) of the total variance Corresponds to differences that remain constant over time while the rest are differences that vary randomly over time and units Dividing total variance into two components that are either completely fixed ‘or completely random will often be unrealistic, but this model and its extensions are at the basis of much useful econometric descriptive works A prot example is the study of earnings inequality and mobility (cf. Lillard aud Wills, 1978). In the analysis of transitions between earings classes, the model allows us to distinguish between aggregate or unconditional transition probabilities and individual transition probabilities given certain values of the permanent ‘characteristics represented by 1 Tdeed, given 7, the us are independent over time but with diferent means for different units, $0 that we have Ul ny ~ id ((a +5 ).07 Fr) ‘whereas unconditionally we have 3 2 Error Components ew tia, 2) with = ofp bodW, (32) ‘Vhus the unconditional correlation between sue and ys for any two periods #4 sis given by oe a ao? “14% Corr (uses vis) = (33) with A= 03 /o?. In this model individual transition probabilities given 9, are independent of the state of origin: Pr(aS ye SbjeS men Sdn) =Prasve Sb| m4), (34) but not so unconditionally: Pra < ye SbleS yey Sd) (35) Fig sland) + Fig slase) Fa where Fis (bd) = Priya < bingy € 4) and Fad) = Piggy < @) ‘This is 50 because Fyx-a(byd) # F(OF.-s(@) due to the correlation between vee and yyqe1) induced by the permanent effects. In effect, letting G(.) be the caf of 7 we have Fiscalb) =f Pri < nen £41 mace [Petia sti Patnen SAI mdcin). 9) ‘Thus, decomposition (3.1) allows us to distingnish between probability statements for individuals with a given value of n from (aggregate) probability statements for groups of observationally equivalent individuals. Fatimating the Variance-Components Model One possiilty i to approach estimation conditionally given 1, ‘hati, to estimate the realizations of the permanent effets that occur in the sample and o?. Natural unbiased cstimates in this ease would be = - T= LN) (1) and rep LL wi) (3) 3.1. A Variance Decomposition 33 where G) =F Ffy yu and F = N—1SII pe However, typically both of and o? will be parameters of interest. To obtain an estimator of o? note tha the variance of i given by ate ex Varia, ‘Therefore, an unbiased estimator of 02 ean be obtained as the difference be- teen the estimated variance of 7, and 32/7: $= Lw-w A problem with this estimator is that it is not guaranteed to be non-negative by construction. ‘The statisties (3.8) and (3.10) can be regarded as Gaussian ML estimates under y, ~ AV(us,9). ‘To see this, note that using transformation (2.68) in general we have: mC )—al(6). (C8 oJ} a9 log f (us) = log J (i) + lox f (at), (3.12) 80 that the log likelihood of (yn...) is given by $ (3.10) Uy, 3*, 07) = Ls(u,9*) + Lav(o*), (3.13) where Le-w* (aa) a Ln(u22) 06 =F to; and @.5) Clearly the ML estimates of o% and 3® are given by (8.8) and the sample variance of J,, respectively." Moreover, the ML estiuiator of a is given by (3.10) in view of the invariance property of maxignusn likelilood estimation, Note that with large N’ and short T we can obtain precise estimates of 02 and a? but not of the individual realizations 1. Conversely, with stall V and lange T we would be able fo obtain accurate estimates of n, and o® but not "ote shat 2% at Error Components of 02, the intuition being that although we ean estimate the individual well thore mag’ be too few of them to abtain a good estimate oftheir variance For lage No is jostidentied when T'= 2 in which case we have 0 = Coulyirs ya)? 3.2. Error-Components Regression 3.2.1. The Model Often one is interested in the analysis of error-components models given some conditioning variables. The conditioning variables may be time-varying, time- invariant ot both, denoted as x and fi, respectively. For example, we may be {nverested in separating ont permanent and transitory components of individual tcarnings by labour miarket experience and educational categories “This gives rise to a regression version of the previous model in which in principle, not only ye but also 03 and o? could be functions of zie and f. Nevertheless, in the standard erior-components regression model 1 is period: specific and made a linear funetion of zx and fy while the variance parameters fae assumed not to vary with the regressors. The model is therefore B+ fort we (3.16) ut tie (a7) together with the following assumption forthe composite vector of errors wy (ayes tar)? wile ~ Hid(0, 0° + ote) (a8) where 1) = (2h. orte AY. “This model is similar to the one diseussed in the previous chapter except in one futamental aspect. The individual effect in the umobserved-heterosencity trode! was potoutially covtcleted with aye. Iadood, this was the motivation for Considering sich a niodel in the frst place. Tn contrast, in Ue error-components model 1, and vj, are two components of a regression error and hence both are nncorrelated with the regresors. Forinally, this model is a specialization of the unobserved-heterogencity rnodel vf the previons chapter under Assumptions AL and A2 in which in addition F(a |10) =0 (a9) 3 (320) ‘To reconcile the notation used in the two stances, note that it the un- observed heterogeneity model, the time-invariant component of the regression Var, | with T = 2, (9.10) coinides withthe sample covariance between ya and Bs 5.2 Brror-Components Regression 35 {ff is subsumed under the individual effect m. Moreover, in the unobserved heterogeneity model we did not specify an intercept so that E(n,) was not restricted, whereas for the eror-components model E(n,) = 0, and fy will typically contain a constant term. ‘Note that in the error-components model @ is identified in a single cross section, The parameters that require panel data for identification in this model are the variances of the components of the error 02 and a, which typically will be parameters of central interest in this context, ‘There are also applications of model (3.16)-(3.17) in which the main interest lies in the estimation of and ~. Tn these cases it is natural to regard the error-components model as a restrictive version of the unobserved heterogeneity model of Chapter 2-with uncorrelated individual effects. 3.2.2. GLS and ML Estimation Under the previous assumptions, OLS in lovels provides unbiased andl consist but inefficient estimators of @ and 7: fous = (© ws i) my 2 where W = (ff) X= (aye and Optimal estimation is achieved 2 Nerlove estimator:* =r. rough GLS, also known asthe Balstra~ Bors = (Su a) wet. (3.22) “This GLS estimator is, nevertheless, unfeasible, since 9 depends on o? and o?, ‘which are unknown. Feasible GLS f obtained by replacing them by Consistent estimates. Usually, the following are use: (323) (3.24) where fie u-Ty Fu = 20-75 and gg denotes the btecengrowp estimator, Shieh is ven by the OLS egresnn of Fon Set Baletra and Nevlove (1960). 36 Error Components Fe (© a) Een (325) Alternatively, the full set of parameters 8, 7, 02, and 0% may be jointly estimated by maximum likelihood. As in the case without regressors, the log likelihood ean be decomposed as the sum of the between and with log likelc hoods, In this ease we have: CB) im el(7EF C0 cate Je 0 that under normality the errer-components log likelihood equals: L (8.4.07, Ly (9,7,87) + Lw (8,0) (27) where LGM fy 828) = X16) (uf — X78). (3.29) Separate maximization of Ly and Lp give rise to within-group and be- tween group estimation, respectively. Thus, the error-eomponents likcihood can he regarded as enforcing the restriction that the parameter vectors that appear in Lay and Ly coincide, This immediatly suggests a (ikelitood-ratio) spcifeation text that willbe further disenssed below Moreover, in the absence of individual effects #3 = 0 so that 2? = 9?/T. Tins, the OLS estimator in levels (3.21) eau be regarded as the MLE that inxiizes the Log ikeliiood (3.27) subject to the restriction a? = 02/T. Again. i suggests a liklibood-ratio (LR) test of the existence of (ancorreated) dicts based on the comparison ofthe restricted and unrestricted likelioods Such a test will overtheles, be sesitive to distributional assumptions 3.2.3 GLS, Within-Groups, and Between-Groups We have motivated error-components regression models from a direct interest i the components themselves. Sometimes, however, correlation between indi- idual effects am togressors can be regarded as an empirical issne. We shall address th testing of such hypothesis in the next section. Now we note the al- agebraie conneeticas betsscon within-gronps. between-gromps. and the Balestra Nerlove GLS estimators. 3.3 Testing for Correlated Unobserved Heterogeneity 37 ‘Transforming the original system by Has in (8.26) we obtain Hy = HW,5+ Hay or wm = 28+ hem w= XB+uy (3 )im~aela(F aan, )] The usefulness of this transformation is that the complete system is divided. into two conditionally independent sub-systems. Namely, T —1 within-group, equations which are free from individval effects, and one average equation which is not, Thorefore, in terms of the transformed model, Bgzs ean be written as 4 weighted least-squares estimator of the form: with, Fou [Seen sso] Swyr sean) — es) shore ¢? is the ratio of the within to the between error variances g® = @2/32, We = AWs, and D; = T~!Wy. ‘Thus Bors can be regarded as a mati ‘weighted average of the within-group and between-group estimators (ef. Made dala, 1971). ‘The statistic (3.31) is identical to (3.22). For feasible GLS, 6° replaced by the ratio of the within to the between sample residual variances @ = 4/2". For maximum likelihood, 6 corresponds to tie ratio of residual variances evaluated at the MLE itself 3.3. Testing for Correlated Unobserved Hetero- geneity Sontetimes correlated unobserved heterogeneity isa basie property ofthe model of interest. An example is a “-constant labour supply or desnand equation ‘where 9, is determined by the marginal utility of initial wealth, which according to tho underlying life-cyele model will depend on. wages or prices in all periods (cf MaCurdy, 1981; Browuing, Deaton, and Irish, 1985). Another example is ‘when a regressor isa lagged dependent variable, as in the autoregressive models discussed in Part Il In eases like this, testing for lack of correlation between {tm 6 = 1 (oro = 0) (830) bile dow to the OLS in lew enna (82), href of 0 then 0 andy eso wihin-groups "lth eainaten of ad 9 wil be nw ng by corseaton (an onc lo shown otaBteaimntecate may bomegatve Tw hi einai that efor teconata {FLO mayest nbn hin. Tis bl wa cody al 197) 8 Brror Components regressors and individual effects is not warranted since we wish the model to hhave this property. (On other oceasions, correlation between regressors and individual effects can be regarded as an empirical issue. In these cases testing for correlated un- observed heterogeneity can be a useful specification test for regression models fstimated in levels. Researchers may have a preference for models in levels ‘because estimates in levels are in general more precise than estimates in devia- tions (dramatically so when the time sevies variation in the regressors relative to te cresenectional vasiation is small), or becnuse of an interest in regreecone that lack time series variation .3.1 Specification Tests We have already suggested a specification test of correlated effects from a likeli hhood ratio perspective. This was a test of equality of the coefficients appear ing in the WG and BG likelihoods. Similarly, from a least-squares perspective, ‘we may consider the system (3.32) (3.33) where b, 6, and e are such that B*(e; | Ze fi) = 0, and formulate the problem ‘a8 a (Wald) test of the mull hypothesis Hy: 8 (3.34) ‘The least-squares perspective is of interest because it, can easily arcouo- date robust generalizations to hoteroskedasticity and setial correlation as we shall see below. Under the unobserved-heterogenelty moclel BUG, |W) 218+ Sf + En |), 0 that (3.32) can be regarded as a specification of the alternative hypothesis cof the form Hy Bly lui) = BAL fda (3.35) with b= 9-4 A and c= 7-4 A2. Ho is, therefore, equivalent to Ay = 0. Nove Usat Hp does not specify that Xz = D, which is not testable Under (3.35) and the additional assumption Var(y, | ws) = 03, the error covariance matrix of the system (3.52)-(3.83) is given by Var (| ws) = 27 Indeed under the assumptions ofthe errrcomponents model b Ie rand 3.3 Testing for Correlated Unobserved Heterogencity 39 Cou (ei, 17 | ws) = 0, and Var (uf jw.) = o7Fr—1. ‘Thus the optimal LS esti mates of (i, and 8 are the BG and the WG estimators, respectively. Explicit expressions for the BG estimator of b and its estimated variance matrix are: X'Mx) X’My (3.36) where M =f ~ POPP) FY, P= (fy fu), X = (jy-sUiy)'- Likewise, the estimated variance matrix of Us YN Too = Var (inc) (337) "y)', and 9 = WG estimator is (3.38) Moreover, since Cou (Fac,y) = 0, the Wald test of (3.34) is given by b= (Bre ~ fave)’ Pe + Pre! (Fc ~Bwa) e239 Under Ho, the statistic h will havea x? distribution with k degrees of freedom in large samples. Clearly, h will be sensitive to the nature of the variables included in fi. Forexample, My might be rejected when f; only contains a constant term, bot not when a larger set of time-invariant regressors is included, Hausman (1978) originally motivated the testing of correlated effects as a comparison between WG and the Balestra-Nerlove GLS estimator, suggesting, a statistic of the form va) Wwe ~Pors)"* Bors Bwe)s (840) where Reus = (xX 4 PR MR ay __, Under Ho boti estimators are consistent, so we would expect the difference Bazs~ Bw to be small. Moreover, since Bors is elicient, the variance of the difference must be given by the difference Of variances. Otherwise, we coud find a linear combination of the two estimators that would be more efficent than GLS. Under Hf the WG estimator remains consistent but GLS does not, so their difference and the test statistic will tend to be lange. A statistic of te form given in (3.40) is known as a Hauistann test statistic. As shown by Hausman and Taylor (1981), (8.40) is in fact the same statistic as (3.39). Thus ‘can be regarded both as a Hausman test or as a Wald test of the restriction, Dy = 0 from OLS estimates of the model under the alternative (Arellano, 1993), 40 Error Components Ya ‘petween-group line within-group lines Figure 3.1: Within-group and between-group lines Fixed Effects versus Random Effects These specification tests are sometimes described as tests of randonn effects against fixed effects. However, recording to the previous disewsion, for typical econometric panels, we shall not be testing the nature of the sampling process but the dependence between inulividual effets nul rogressors. Thus, for our purposes individual effects may be regarded as- being random without less of generality. Provided one has an interest in parti regression coelficieuts holding effects constant, what matters is whether the effects are independent of te observed regressors oF not ore generally, Une econontrns IRornture hae tend tre statistical models with Incident porunttne (Sepia ant Seat HIS) ns srmpaentetie random ects model in which the contiontldteaion ofthe lets giv som cotton acts is left tapered (Chambrnin, 1092, 19D, and acess sh Lancaster, 2000) 3.3 Testing for Correlated Unobserved Heterogeneity a Figure 3.1 provides a simple illustration for the scatter diagram of a panel Gta set with N-=4 and T= 5, In this example there is a marked difference between the positive slope of the within-group lines and the negative one of the betweex-group regression. This situation is the result of the strong negative association between the individual intercepts and the individual averages of the regressors 3.3.2. Robust Alternatives Robust Wald ‘Testing If the errors are heteroskesastie and/or serially correlated, the previous formulae for the Inrge sample variances of the WG, BG, and GLS estimators are not valid. Moreover, WG and GLS cannot be ranked in terms of efficiency so that the variance of the difference between the ‘wo does not coincide with the difference of variances, Following, the least-squares Wald approach, Arellano (1993) discussed a generalized test. which is robust to heveroskedasticity and autocorrelation of arbitrary forms. ‘To describe this procedure, itis convenient to write system (8.32)-(8.33) in the following matrix form. ()-G Fa why tel In this format, OLS estimation of # dizectly provides the difference Bix we toxether with Gace and yp o (“se *) (Sure) “Swe (3) Bwe . A robust estimate of the symplote vatiance of @ ean be obtained sing White's formulae (el, White. 1984} var(@) = (= wh! fa te = [te te aay) Vs Ve where a = yf — WIG and d= 6~f, Hence, a generalized Wald test of the null (3.34) that is robust to heteroskedasticity and autocorzelation is given by = ne (3.45) a Error Components GMM Estimation and Testing Under the null of uncorrelated effects ‘we may consider GMM estimation based on the orthogonality conditions® Ble; — 88 — fil = (3.46) PUG FBS] =0 (347) Bl(ui — X18) ez) = 0. (3.48) {In parallel with the development in Section 2.3.3, the resulting estimates of ‘and 7 will be asymptotically equivalent to Balestea-Nerlove GLS with classical frrors, but strictly more afficient when heteroskedasticity or autocorrelation, is present. However, under the alternative of correlated elfects, any GMM estimate that relies on the moments (3,46) will be inconsistent for 8. Thus, we may test for correlated effects by considering an incremental test of the over identifying restrictions (8.46). Note that under the alternative, GMM estimates based on (3.47)-(3.48) will be consistent for but not necessarily for + Optimal GMM estimates in this contest minimize a criterion of the form (6) = [xm - wana} (& nega) D> ZB - wo “ “ “ 49) where H@ are some one-step consistent residuals. Under uncovtelated effects the instrument matrix 2; takes the form af oO o 4 teaox): (so whereas under correlated effets we shall we ft 0 5 Ch inven) (en 3.4 Models with Information in Levels z Sometimes it is of central interest to micasure the effect of a tine-invariant ex: pluvatory vriable controlling for unobserved heterogeucity. Returus to school ing holding unobserved ability constant is a pronsinent example. In those cases, |e could ato ad Blut — X18 Ki=°, In which cise, the entire et of moments ea be epee in ters ofthe oil equation Ellu X92) om] ‘When f; contains a constant term ony his atonnts Co including a at f tine dames i Ue inten 3.4 Models with Information in Levels 43 2s explained in Chapter 2, panel data isnot dieeetly useful. Hausman and Tay- Jor (1981) argued, however, that panel data might still be useful in an indirect way ifthe model contained time-varying explanatory variables that were on- correlated with the effets Suppose there are subsets of the time-invariant and time-varying explana- tory variables, fy and aye = (2y..24¢r) respectively, that can Be assumed 4 priori to be uncorrelated with the efleets, In such case, the llowing subset of the orthogonality conditions (3.46)-(3.48) hold Bleu G,-78~ fen] =0 (352) Elfin Gi — 218 fin) (353) Eli - X78) @24 (354) ‘The parameter vector # will be ideutified from the moments for the errors in deviations (3.54). The basic point noted by Hausmnan and ‘Taylor is that the coefficients 7 may also be identified using the variables 2i¢ and fig as instruments for the errors in levels, provided the rank condition is satisfied Given identification, the coufficents and 7 can be estimated by GMM, ith classical errors, optimal GMM estimation based on the orthogonality tions (3.52)-(3.54) leads to n= (erm) e(on) Son) En] [Gr ef) Ema) Enso where m, = (2, fy)’. Note that with m; = (2), J), (8.58) coincides with the Balestra-Nerlove GLS estimator (3.31) because the variables in %, are linear combinations of those in (2%, f.), austin and Taylor cid in fact consider (8.55) with m, = (2. FY". Addi- tional estimators of this type were considered by other authors. Bhargava and Sorgat: (1983) andl Amensiya and MaCurdy (1986) suggested that a weaker identification condition and further efficiency could be achieved by using as instruments the full set of lags and leads of the x4 variables instead of theit ‘ime means, as described in our presentation. Moreover, Bhargava and Sargan (2983) and Breusch, Mizon, and Schinidt (1989) suggested to use deviations fromt tine means of correlated time-varying regressors as additional instruments fon the assumption of constant correlation with the effects, Finally, Arellano and Bover (1995) presented a GMM formulation for models containing instruments for errors in levels, aud devived the information bound for these models ‘The notion that a tinte-varying variable that is uncorrelated with an indi- vidual effect can be used at the same time as an instrument for itself and for aa Ervor Components ‘a correlated time-invariant variable is potentially appealing. Nevertheless, the jinpact of these models in applied work has been limited, due to the difficulty in finding exogenous variables at ean be convincingly regarded a priori as being, lmoorrelated with the mdividual effects. For example, Chowdhury ancl Nickel (1985) attempted to estimate returns to education using wion status, sickness, land past unemployment variables as candidates for possible time-varying vari- fables uncorrelated with the effect, but concluded that the instraments were sither invalid oF iuefetive within their PSID sample. 3.5 Estimating the Error Component Distribu- tions If the motivation for using am error-components model is to stuily trausition probabilities or first passage times,” te emphasis on the variances secs mis placed since knowledge of the distributions is required in onler to evaluate probabilities. Empirical work on camings mobility often assuined a nor dise tribution for the components of fog earnings errors, but the distributions ean ko be nouparametrically estimated using deconvolution techniqnes, Horowitz ‘and Markaton (1996) proposes estimators along these lines which we describe Tet tis consider the error-componeu model (3.16)-(8.17) w+ Ab tie nt Me om together with the assumption that eu is ii aul dependent of 4. aul both Let g(a (r)e f(r) ad (7) denote the elaraeteristicfontetions (¢f) of muy anid e, espretively. We then B(enrn) 7b] (aso) and halt) = E(e7") = thel)P where / = ¥=E (when not nse as a subseripy for livid) saul [of denotes the ruodniis of the comple variable 2. Under the asstmption that 1 as a spinnuettieestvibation (so that fi,(7} > O an veal for all 7). we Inve h(n) =a (08) Insoet © > 0, the mobabiity Hat the Reape ie or cheek a Wye) Brlgn < genie 0. he tows 48 Error in Variables ‘This type of model is relevant in at Least two conceptually different sit tuations “One corresponds to instances of actual measurement errors die to hisreporting, rounding of ervors, ete. The other arises when the variable of varemie interest i a latent varkable which does not correspond exactly to the fone that is available in cue data. Taeither case, the worry is that the variable to whiel agents respond does not coincide with the one that is entered asa regressor in the model, ‘The result see eat che inobservable component in the reatiouship between yy eal a will cette n multiple of the 1icasarement error i addition to Uhe error fern I. the original velationslip: yaareees as) uy = Wy = Bes (46) Clearly, the observed regressor 2, will be correlated wit us even if the Intent variable x! Is wo. “This problem is ofen of practical significance, specially jm regression anal gain with miro data, since the resulting biases way be Tange. Note that the srrsitude ofthe Fas does not depend on the absolute maga of the neu vent error variance but on the “noise to sighal” ratio A. Por example, if ieee chat 50 per cent ofthe total variance obsorved in Bs de to wieastre= arent error_—ihieh is not an tauconmmon sitvation--the population regression ‘cocficient of y, om 2, will be half the value of 3 ‘As for solutions, suppose we have the means of asvessing the extent of the neamurement error, $0 that or oF are kuown oF ean be estinated. ‘Then can be determined as » Cort yu ti) Cony) Varto) Bata) Sok an More generally in @-mouel with & regressois and a conformable wetor of cvelficiats ware (e— es) us) 0%, Bahl) = oad A= RS with E(t (le + A) [Eee | Bloun) = (Elna) ~ 221 | Ble) wo) Tir this notation, 4) will inelide a coastant term, anil possibly aber ve gresors without uieatrenient error, This situation will be gelleeted by the oenrrenee of zers in the corresponding elewsents of 8 “The expression (1.9) snguests au estimator oF the fori (ene) om suotes 9 consistent estimate of where 4.2 Measuroment Brror Bias and Unobserved Heterogeneity Bias cy Alternatively if we have a second noisy measure of 2 aaltc (aan) such that the measurement error ¢, is independent of e; and the other unob: vera itn be eda a etrienal vasale, Test or slr, we Cowl s-u) Cow} + Sou) Ean) Covtel +Gual ten Covlyenzt) _ arth = (42 Gon Moreover, since also Cavin) Covtes=) as hee mo oer titi i i oie in sine vas the instrntnental variable solution is wot diferent fromthe pe vit on Tne is ait oF yy pees hel ey the systematic and measurement error variances. Note that since . vor )= (49% ake) a we can dletonnine the variances of the mobservables as of = Covlsi.n) (415) of = Variers) ~ Covlzi.24) (116) } = Cor(eu»9), (a7) In ccononnetries the instrmmental & techie able npproach is the snoxt widely sed Tins, tke response to measurement error bias pprobleins is akin 10 the vesponse fo sinsultaneity bias ; This similarity how Ese ner ae ein eet The pie rable from the tn . sent error ini longer atively ep 1st fords and ewes ean Sgro es with in vaaiables), - mes 4.2. Measurement Error Bias and Unobserved Heterogencity Bias Let ns consider a crnseseerional muidel shat combines auensucement efor a imorsorvid hetero marten re (48) Byror in Variables alter, where all unobservables are independent, except sand n,. ‘The population regression coefficient of yon 3 is given by Covin, 2») “+o? (a0) aponents to the bias, The fit one is die 8 jsrentent error ant depends on 8 ‘The yond is de to unobserved het- trogenety aad opens om Cor(,74)- Sometines these to binses ted offset each other. For example, if 3 > 0 and Cov(nvz) > 0, the measurement trot bins will be negative while he unobserved heterogencity bias will be pose itive, -\ fall offsetting would oniy occur if Cou(n. #1} = 228, something that coud uly happen by chance Note that there are 1390 6 Measurement Error Bias in First-Differences Suppose we have panel late with T— 2 an consider a regression in first-diferences as a way of re- tmiving unobserved heterogenity Bias. In such a ease we obtait Covidyn Aza) __ 8 Var(Sna) 14 Aa (420) where Ag = Var(Aeia}/Var(Ar},) “The ina point to take here is that estliferencing may exacerbate the nucasrentent evar bing. The reason is es follows. IF ey i8 a iid emror then Vantdea) 2 202, at ip also fd then Ag = 2, and he wusasurement error Tania levehs an fredifenences will be of the same niaguitude, However, if isa ctationary Cine secs with positive erial correlation Var(A2ls) = 2 [of — Covtel, rl)} < 209 (42y) sal tevefore Ay > 2 A related exnugple of this sit the nnalsats of le retirus to selooig with data ou twin slings (tba 17h. 1976): Goldberger, NTS: Ashenfolter and Krueger. 1904). Regressions in atticreuves remove genetic ablty bias bub mnay exneet bate measnrentent etter 1 iu data ithe a geen stenctue arises ‘eran are independent but their ‘ene schooling attainments are highly coreelated (Griliches. L977. 1973) Tuder the same eicumstaness, Uke withingroup micastrement er7er bias swith fo 2 will be smaller thw that in Birstadiffereuces bat higher than the ruesioment error bias iat levels (cL. Grilcles antl Hansa, 1986) bas in schooling eae sibbngs” measeurean inte prosace o h 4.3 Instrumental Variable Estimation with Panel Data st ‘Therefore, the finding of significantly different results in regressions in fest= differences and orthogonal deviations niay be an indication of the presence of 4.3. Instrumental Panel Data Variable Estimation with ‘The availability of panel data helps to solve the probleu of measurement ervor bins by providing internal instruwents as loug as we are wil serial depevidence in the measurement error. In a model without unobserved lieterogencity the following orthogonality conditions are valid provided the measurement ertor is white noise and T > 2: ng to restrict the 1 EL) tiem | ow enn a eu8)] 0 (0 nN, (42) Note that this situation is compatible with the presence of seria correlation in the distorbuatee term in the relationship between y aad. “Phis isso beoense the disturbance is unade af two components: anal only the secoud is quired 10 be white ‘conditions above ois for the validity of the mouse Also vote that ilentifiction of J from the peesions: montents rites tla 1 i predhctable trom its past aud farune values. ‘This, the rani eotalition for ‘dentification would Ea the intent variable 2, was also white noise, Thy model sith stushservel beteroge ity and a white noise measurement fertor, we eau rely on the following moments for the estore i tinstlivenees provided 1232 2 Esror in Variables EY} THD | (yy — Ard] et) (t= QT (423) Moments of this type anid GMA estimators based! on hem were proposed by Griliches and Heasiiat (1986). See also Wansbeek and) Meijer (2000), aad ‘Woutsbeek (2001) for further discrssion and some extensions. With T= 3 we would lave the following two orthogonality conditions Blais (Aye — Ar l= Cen) Elen (Ana ~ Avi) 0 (23) [As in the previons cas, if xf, were white woine the yak condition fr idemifeation would not be satisfied. Abo, if a1, was a random valk thea Corint, Diya) Dat Conn. dia) #0. Note that these instrmnental sas able nett can be expected to be weft iy the same eiremustnuces unr ‘which differencing csncerbat abe these re ine series depentence Hf thas i Tanenstned persistence in sexelsivey de to however, the situation iy not diferent frost the homogencons white wise ase thd the rank condition wil sti fil. Specially. suppose tha wveasnrement error Dias. Nae wovserwal heterogeneity Wks (26) where Gy is sid over Ftd faa Iadepenent of xy. Then Cor(ra. Ars} Corirae dra) = 0. with the result ane Ji. unidentifiable frou (124) oud sherlai (LS, 1985) who noted (125), This sitation sens discussed by C tlie observational eepivalee betsren the measerenent ext and she fixes! clfvers oles whet tle proces for) 8 it 28), Finally. uote that the asstunptions abont the mieasirenient error properties cea be relaxes soincachat provided tlw panel is silficient Tong aud there is stitable dependoace inthe example, 24, cot be allowed ment regress. F te be a firs-ordee maaving, average process in achicl ese che valid fusteniens in rhe frstailferenew equation for period f would be (rae ccotyreatatan nese ten) war 4d Measiiing Economies of Seale in Firm Money Deane sa 4.4 Illustration: Measuring Economies of Scale in Firm Money Demand As an illostration of the previous discrssion, we report some estimates from Bover ann Watson (2000) concerning conomies of seale in a firm money de mand equation of the type discussed in: Chapter 2 ‘The equations estimated by Bover and Watson are of the general form given mn (25) log mice = (8) osu + 88) +, + (4.28) ‘The seale coefficient o(t) is specified as a second-order polynomial in to al. low for changes i economies of seale over the sample period. ‘The year dunnies (2) captine chaniges in relative interest raves togother with other agareyate ef fects. Tlie individual effect is meant to represent permanent differences across firms in the production of transnetion services ($0 that y= — liga). aul e contains measurement errors in eash holdings and sales. We would expect a hon-negative correlation between sales and a, implying Coe (log s.1) < 0 and a downviard unobserved heterogeneity bias in ecouonties of sale Table 4.1 Firui Money Denne! Estinantes Sample period 1986 1906 ors OI O18 GNM GRIM GMT Levels Orthogonal Isedilf. Lstlifl —Istcil Is deviations am ormor mt, err Tog sales Bea oF Gor 46) 12) GB Log sales 02 08 0s bs 8 03 xtrenl (32) (0, nn) ne) Log sales 001 ne oot aot you sera? (12) (66) 2) tat Sansn ie 39 ny Animes. Patios fy brakes rolst to ltetesbnstvit © serial en N=5019. Soureos Bover ancl Wasnt (20), 54 Error in Variables [Ail the estimates in Table 4.1 are obtained from an unbalanced panel of 5640 Spanish Grins with at lense fone consecutive anmual observations during the petiod 1986~1996.° ‘The comparison between OLS in levels and orthogonal deviations (columns 1 and 2) is consistent with a positive unobserved lieterogeneity: bias (the op posite to what we expected), but tle smaller sales effect obtainod by OLS first-difevences (column 3) suggests that measurement error bias may be nportant Coluunu 4 shows two-step robust GMM estimates based on the moments B (log suA\v,,) = 0 for all t and s (in addition to time dumasies). ‘These esti rates are of the form given in (2.40) with weight matrix (2.41). In the absenee Of measurement error, we would expect thent to be consistent. for the same pa- rameters as OLS in orthogonal devintions and first-differences. In fact, im the tne of Table 41 the Inst two differ, the GMM sales coeflicieut lies between the toro, and the lest statistic of overidentifying restrictions (Sargan) és marginal Coltinn 5 shows GMM estimates based on EB (og serie Lyd 28th T Sete T) (429) tins allowing for boty comtelated firm effects aud serially independent mul- tiplicative measurement ebvors in sales. Interestingly, now the leading sales ‘Coefficient is rmiel higher and close to unity, and the Sargan test has a p-value rose 10 40 per cent. Finally, colin 6 shows GMM estinates bused o1t E (log siete) 20 (t= teat Lan Tia= 1D). (430) Iu this ease, as with the other estimates in levels, firm effets in (4.28) are replaced by industey effects. ‘Therefor, the estimates i eolnnn 6 allow for Covially oncorrelated mensirewicut error it sales Init tit for correlated effects. ‘The leading sales eflect in this ease is close to OLS in levels, stazesting; that inn levels the mensrentent error bias i not as Hruportant as in ele estimation in differences. ‘The Sargan vest provides a son rejection. whiel cant be inter sd nat rejection of the nfl of lack of corvelation between sales and firm allowing, for menstrement ero br fe What i invevesting about this example is tliat a comparison ersscen es hnintes in levels nnd deviations without consideration of the posiility of me Sstement error (eg, restrietedl t eonupare columns 1 and 2, or La 3. i Harsuan-type testing). would fend to the conclusion of correlated effects, but ‘with biases going in entirely the wrong dteetion, The se of wt sane nel ei fuente for the retinatores ete ow 920), the invatetion uf fener here Ge Part II Time Series Models with Error Components

You might also like