You are on page 1of 5

Assignment 1 Econometrics 1 JE1A17 2013

Important! Before you start to answer these questions, read the assignment instructions in the lab compendium. Also, notice that some of the page references in the following text may be somewhat different (but fairly similar) in different editions of Gujarati. Clearly write out Q1a, Q1b, Q1c etc for every answered question!!! If nothing else is stated, test at 5% level of significance. I accept minor mistakes as long as I have seen that you have made a serious attempt. Motivate all your answers! This is a part of the examination, therefore these exercises should be solved within each group without help from any of the teachers. The deadlines are available in the lab compendium. Notice that some page references may be somewhat inexact, since the pages in the lab compendium change after each added or removed section in the most updated version of the document. Also there is a risk that a couple of the page references in Gujarati are from version 4. Notice that some of the wf1-files are changed for this exercise (this is why there is a suffix _2013 for the files). If you for instance use the data files from Gujaratis website, without this suffix, the estimations may be different. These exercises are fairly simple since this is your first assignment. They may be slightly more complicated for the next hand-in, but we start with these warm-up exercises. Good luck! Question 1 Bivariate ordinary least squares regression analysis Do storks deliver babies? Open the file Storks_2013.wf1. The variable Storks represents the number of nesting storks in a sample of American cities, and the variable Birthrates refer to the number of born children per year in the same cities. In this (somewhat silly) exercise you should study the relationship between the number of nesting storks and the number of born children. In this exercise it is illustrated that there is a difference between correlation and causal relationships. Similar spurious regression analyses can easily be found when browsing the web, despite that these papers are serious attempts. The difference in these papers is that storks and birthrates are substituted with variables such as GDP, R&D, unemployment, FDI, oil-price, without any solid theoretical economic model. Therefore, do not make a statistical study where the theoretical arguments for causality are very weak. Base your study on previous research, and make sure that the variables are not trending. When writing your bachelor or master thesis you should test an economic theory and do not try to find correlation relationships by data mining, since many variables systematically correlate with other irrelevant variables without any existing causality relationships whatsoever. If you just compare a sufficient amount of totally unrelated variables, you will sooner or later find two unrelated variables that are systematically correlated. In this example, we are interested in the following regression model: (Birth rates)i = 0 + 1 (Nesting Storks)i + i. If nothing else is stated conduct the significance tests at 5% significance level. Q1a) Scatter plot the data with the number of nesting storks on the X-axis and the number of born children on the Y-axis (include a regression line too). Does the relationship (the functional form) appear to be linear? Are there any strong outliers/extreme values? Q1b) Estimate the following regression model: (Birth rates)i = 0 + 1 (Nesting Storks)i + i

Q1c) How high is the coefficient of determination (R2). Give an interpretation of this value. Give two reasons for why we do not use the Adjusted R2 in this case? Q1d) Explain the relationship between the t-statistic (15.06120853113408) and the slope coefficient (B=4.435216270161431) and the standard error (Std. Error = 0.2944794410749366). Q1e) Is the slope coefficient significantly different from zero? Use a t-test that is compared to a critical value from a t-table. Thus, test H0: 1=0, against H1: 10 by the use of a t-test. Draw conclusions and motivate your answer. [This implies that we have a two-tailed test so if we test at 5% significance level, we assume that we have 2.5% in each tail]. Q1f) Is the slope coefficient significantly different from zero? Use a p-value (prob. in EViews) to test H0: 1=0, against H1: 10. Thus, this gives the same answer as in the previous question. Q1g) If we would assume that the model is appropriate, how would you interpret the estimated coefficient for 1? [When I say interpret I mean if the independent variable increases by one unit, how much does the dependent variable increase in the correct number of units]. Write out the specific numbers and make an interpretation of what this implies. Q1h) Conduct the same type of interpretation of the estimated value for 0 (as you did in the previous question of the estimated value of 1). Q1i) Type down the estimated model: (Birth rates)i = 0 + 1 (Nesting Storks)i where you substitute the estimates of 0 and 1 with the estimated values. If you study the spread sheet you will see that we have an observation with 1520 nesting storks in a city, where the birthrate is 8400 babies. Apply the estimated model to predict the birth rate if we have 1520 nesting storks in a city (according to our estimated model). How far apart is our regression-line estimate from this observation? What do you call this distance (which is often marked out by an epsilon)? Q1j) Can we conclude that, in (large) cities with a high amounts of nesting storks, these storks are able to deliver more babies compared to smaller cities with fewer nesting storks? Use the words causality and correlation in your answer when you discuss this phenomena. Question 2 Multiple regression analysis and Tests of Linear Restrictions Due to the fact that this is an introductory course and since we are still learning, the primary objective is still not to test your knowledge yet. Therefore, as a learning exercise replicate problem 9 about the BigMac index in the lab compendium (starts somewhere around page 35 in the lab compendium). Obviously, this is based on the magazine The Economists version of the purchasing power parity model The BigMac index (www.economist.com/content/big-macindex). Hand in EViews outputs with interpretations anyway so that I can see that you have completed the exercise. [The exercise is based on the data set BigMac_2013.sav, which differs from the data set as in Gujarati].

Question 3 Seasonal dummy variables (and multicollinearity) Open the file SeasonalDummies_2013.wf1. The data consist of eight years of quarterly timeseries observations (32 observations over 1999-2006). Where y represents sales figures in units of boats sold, x1 is the number of people in the sales staff, and x2 is spending on advertising (in thousands of SEK). It is recommended that you first look over the example from the lab compendium (chapter 9). (Disregard any non-stationarity issues which is out of the scope of this course at this point). Apply 10% significance level for this entire exercise. Q3a) Run a the following regression yt = 0 + 1 x1t + 2 x2t + e1t Which of the above coefficients are significant/non-significant at 10% level of significance? (When you run the regression the residuals are saved in the variable RESID. You may also save (or actually copy) this variable into a new variable named epsilon1 by the following command: genr epsilon1 = resid. This creates a new residual variable called epsilon1. This variable is later used forthcoming problems). Q3b) Can you detect any multicollinearity problems in the regression? Base this decision on the variance inflating factor (VIF). You may obtain the VIF-value by first running the regression ls y c x1 x2 and then choose View>Coefficient Diagnostics>Variance Inflation Factors. Look at centered VIF. Read more about VIF at pages 328 and 340 (approximately the same pages in different versions of Gujarati). When you evaluate whether there are multicollinearity problems you should apply Gujaratis rule-of-thumb which is equal to 10 (see page 340 in Gujarati). If you want to understand what you just did, it is adviced that you step-by-step calculate equation (10.5.1) in Gujarati. You will of course obtain exactly the same result by calculating VIF=1/(1- r22,3), but you will understand what VIF means. First calculate the correlation between the independent variables x1 and x2 (that is r2,3) and save it in a scalar r: scalar r = @cor(x1,x2). Then calculate the VIF scalar by the command: scalar VIF = 1/(1-r^2) or scalar VIF = 1/(1-r*r) where r^2 is equal to r*r, that is the square of r=r2. This is just a small exercise that will make you understand that it is easy to use the syntax in EViews. If you do not know what syntax to use just go to Help>Command & Programming Ref (pdf) where you will be able to find many other types of commands. Moreover, it is good to know that Help>Users Guide I and Help>Users Guide II is very useful for more or less every problem that you may have when you use EViews. These manuals are very good. Anyway, back to the question. Can you detect multicollinearity problems in the regression? If there would be multicollinearity problems in a data set, what would be the consequences? Motivate! Q3c) Create a new variable called Timetrend (that is, a variable that is just increasing over time) by the following command: genr Timetrend = @trend. Make a scatter plot with the variable Timetrend on the X-axis and with the residuals epsilon1on the Y-axis. The variable epsilon1 was created in part a of this problem. Can we observe some form of systematic pattern (e.g. that quarter 1 is usually the lowest quarter, or that e.g. quarter 3 is often the highest quarter or any other form of systematic pattern over time)? If there is a pattern in the residuals, what implications does that lead to? Q3d) Construct quarterly dummy variables and run the following regression model in EViews: Equation (*) yt = 0 + 1 x1t + 2 x2t + 3 D1 + 4 D2 + 5 D4 + e2t where D1 is a dummy variable that filter out the seasonal effects of quarter 1, D2 is a dummy variable that filter out the seasonal effects of quarter 2, and D4 is a dummy variable that filter out the seasonal effects of quarter 4 (skip dummy 3, which implies that the constant will

automatically filter out possible seasonal effects in quarter 3). Use the same system of seasonal dummy variables that were presented in the lab compendium to manually construct these dummy variables. Just type in the 0s and 1s in the spreadsheet (copy and paste may speed this up). Remember that we cannot use 4 seasonal dummy variables (if we have quarterly data), we can maximally use 4-1=3 seasonal dummies. Disregard if not not every dummy variable is significant. If you want to save time you may use some fast commands in EViews. Create seasonal dummy 1 (Q1): genr D1 = @seas(1), Create seasonal dummy 2 (Q2): genr D2 = @seas(2), Create seasonal dummy 4 (Q4): genr D4 = @seas(4). You may put all these three commands in the same program window (File>New>Program. Paste the syntax. Click: Run). This will create three seasonal dummies. Also make sure that you save the residuals from the regression model in a variable called epsilon2. You have to figure out this command by yourself, since we did the same thing in a previous question. Q3e) Can you detect any multicollinearity problems in this new regression model Equation (*) (where the dummy variables also are included)? Q3f) Make a scatter plot with the variable Timetrend on the X-axis and with the residuals from eq. (*) (that is, from the dummy-variable model) with epsilon2on the Y-axis. Can we see some form of systematic seasonal pattern (e.g. that quarter 1 is usually the lowest quarter, or that e.g. quarter 3 is usually the highest quarter or any other form of systematic pattern over time)? If there is no seasonal pattern in the residuals, what conclusions can we draw from that? Q3g) Compare the t-tests for the estimated coefficients of 1 and 1, and for 2 and 2 . Why was it necessary to include dummy variables in the original model? What are the primary differences that we can observe when we compare the two models (compare the model with and the model without dummy variables)? Does the conclusions differ between the models when you look at the p-values for the alpha and beta coefficients? Q3h) If you compare the coefficients of determination R2 (for the two regression models: yt = 0 + 1 x1t + 2 x2t + e1t compared to yt = 0 + 1 x1t + 2 x2t + 3 D1 + 4 D2 + 5 D4 + e2t) Can you draw any conclusions. Which other measure(s) would you suggest instead of R2 ? Question 4 Elasticities and logged models These questions are partly based on problem 7.19 in Gujarati. Read these questions in Gujarati. Apply Table 7-9_2013.wf1. Q4a) 7.19 b Hints: Read the example about (i): own-price, (ii): cross-price and (iii): income elasticities in the compendium (7.16). Make an interpretation that is consistent with (i), (ii), or (iii) for 7.19b, and make an interpretation such as; if x increases by 1 %, this will lead to a p % increase/decrease in y. Q4b) 7.19 g Q4c) Model M1: Ln(Yt) = 1 + 2 X2t + 3 Ln(X3t) + 4 Ln(X6t) Interpret the estimated coefficient for 2 ! Hints: Interpret the coefficient estimates in the following manner: Ceteris paribus: If X2 increases with 1 unit (or percent), this implies that Y increases (or decreases) with ??? units (or percent(s)). Thus, notice that there is a difference between unit and percent. You must apply

the section at the last page in the EViews lab compendium called: Coefficient interpretations of logged regression models. Q4d) Model M2: Yt = 1 + 2 Ln(X2t) + 3 Ln(X3t) + 4 Ln(X6t) Interpret the estimated coefficient for 2 ! Hints: Interpret the coefficient estimates in the following manner: Ceteris paribus: If X2 increases with 1 unit (or percent), this implies that Y increases (or decreases) with ??? units (or percent(s)). Thus, notice that there is a difference between unit and percent. You must apply the section at the last page in the EViews lab compendium called: Coefficient interpretations of logged regression models. Q4e) Model M3: Yt = 1 + 2 Ln(X2t) + 3 Ln(X3t) + 4 Ln(X4t) + 5 Ln(X5t) + 6 Ln(X6t) For this model (M3) test the following test hypothesis with a Wald test on 1% level of significance: H0: 4=5=6=0 against H1: At least one i 0 where i=4,5,6. Read pages 35-39, and pages 68-70 in the lab compendium. Choose the automatic Wald test in EViews (or alternatively do it manually if you like). Run the model M3, and choose: View>Coefficient Diagnostics>Wald Coefficient Restrictions. Type in the restrictions c(4)=c(5)=c(6)=0. Draw conclusions! Should we include these variables if we test at 1% level of significance? Question 5 Semielasticities These questions are partly based on problem 6.18 in Gujarati. Read these questions in Gujarati. Apply Table 6-3_2013.wf1. Y1=EXPSERVICES, Y2=EXPDUR, Y3=EXPNONDUR, X=PCEXP. Q5a) 6.18 Hint: Apply the principles from page 100 in the compendium regarding semielasticities. Estimate the semielasticity, ln(Expnondur)t = 1 + 2 T, by using the EViews command: ls log(Expnondur) c @trend You do not need to make any comparison with 6.17. However, you must make an interpretation of 2, that is, what is the quarterly and the yearly growth of expenditure on non-durable goods? Q5b) Extra questions for 6.18: Test H0: 2=0.01 against H1: 2 0.01. That is form a t-ratio to obtain the test statistic. This test statistic is compared to the t-distribution table in the end of Gujarati. Draw conclusions! Can we reject or not reject H0?

You might also like