You are on page 1of 2

Tutorial Week 6: Stata Data Analysis Examples on Panel data regression models: Pooled model

A researcher is interested in examining the effect of public spending on education on

educational outcomes for the years 1992 through 1998. It is the district-level analogue of

the school-level data used by Papke (2005).

- The response variable of interest in this question is:

the percentage of fourth graders in local district receiving a passing score on

a standardized math test in year .

- The key explanatory variable is:

the real expenditures per pupil in the district in year .

- Other independent variables are

real expenditures per pupil in the district.

total enrolment rate in district in year .

is the percentage of students in the district in year eligible for the school lunch

program. (So is a proxy of the district-wide poverty rate.)

- The model of interest is

= + . . . + + ( ) + + ( ) +

+ +

1
Tutorial Week 6: Stata Data Analysis Examples on Panel data regression models: Pooled model

.The data is in the data file MATHPNL.dta, open the data file in Stata and answer the

following questions:

a. How many cross-section units and how many time series periods are in the data set?

Is the panel data balanced?

b. Estimate a pooled regression which include an intercept along with the year dummies

to allow to have a nonzero expected value.

c. Is the sign of the lunch coefficient what you expected? Interpret the magnitude of the

coefficient.

d. What are the estimated effects of the spending variables? Is the sign of the lunch

coefficient what you expected? Interpret the magnitude of the coefficient.

e. Does the district poverty rate have a large impact on test pass rates? Explain.

f. Use first differencing to estimate the model in part (c). The simplest approach is to

allow an intercept in the first-differenced equation and to include dummy variables for

the years 1994 through 1998. Interpret the coefficient on the spending variable.

g. Now, add one lag of the spending variable to the model and reestimate using first

differencing. Note that you lose another year of data, so you are only using changes

starting in 1994. Discuss the coefficients and significance on the current and lagged

spending variables.

h. Obtain heteroskedasticity-robust standard errors for the first-differenced regression in

part (g). How do these standard errors compare with those from part (g) for the

spending variables?

You might also like