Professional Documents
Culture Documents
DEPARTMENT OF ECONOMICS
Exam:
ECON 301B - APPLIED STATISTICS AND ECONOMETRICS
Date of exam: Monday, 9 December 2002
Time for exam:
9 a.m. - 3 p.m
calculators.
All questions should be answered..
The grade scale is A,B,C,D,Fail (with A as best grade) .
Comments are given in arial font after each question.
Some of the views expressed in the comments could have been different. It is
the coherence and quality in the argument that matters.
Scientific journals constitute the medium of communication between scientists, and
also the memory (storage) of science. The economics of (scientific) journals is
interesting. Bergstrom1 argues that journals owned by private publishers are grossly
overpriced, and he recommends several actions to reduce the large profits made by
these publishers. Bergstrom provides data to substantiate his case. There are 180
economic journals in his database, of which 16 are published by scholarly societies
such as the American Economic Association. These 16 journals are published on a
non-profit basis, as opposed to the remaining journals that have private publishers. We
shall concentrate on the following variables:
P
Y
C
A
N
S
Bergstrom, T.C. 2000. Free Labor for Costly Journals? Journal of Economic Perspectives. 15: 183198.
2
It is rare that an article in an economics journal is as explicit as Bergstrom in its
policy recommendations aimed at reducing the profits of economic agents, but
Bergstrom clearly has a dual role: as disinterested analyst, and as an academic
economist with an economic interest. In his section What can we do, Bergstrom
suggests: (i) To expand the much cheaper and also generally better non-profit journals
owned by professional societies. (ii) To support new electronic journals. And (iii) to
punish overpriced journals by cancelling library subscriptions, defecting editorial
boards, not sending good papers to these journals, and refuse to referee papers from
them.
(a)
Price is clearly rising with number of pages for both private and non-profit
journals, but apparently more so for private journals. The graph shows that
variability in price increases with number of parameters. Whether price is
linearly related to number of parameters is hard to tell. Too much noise in the
graph!
(b)
Figure 1 does not show a relationship between P and N that agrees well with
the classical assumptions behind OLS. Why? Explain from the figure why
LP ln( P) might be close to linear in LN ln( N ) , and that the classical
assumptions might be better satisfied on this log-log scale. Use L as a prefix to
denote logged variables throughout.
The OLS assumption of homoscedasticity is clearly not met for price versus
number of pages, according to Figure 1. The log-transformation will stretch
the lower end and contract the upper end of variation. It is concave! To take
log of price will counteract the obvious higher variation at higher number of
pages. To also take log of number of pages will hopefully preserve, and
perhaps improve on the possible linear pattern in the graph.
(c)
A matrix of pair-wise scatter plots for logged variables is given in Figure 2 for
non-profit journals, and in Figure 3 for privately published journals. Regarding
LP as the response variable, how does this variable seem to respond to the
other variables? You might comment further on the plots, but be brief.
The upper right corners in Figures 2 and 3 show the scatter of LP versus LN
. The scatter in Figure 2 has a small number of points, and one must therefore
guard against over-interpretation. The scatter in Figure 3 shows a fairly nice
linear and homoscedastic scatter for the 164 private journals, perhaps with
slightly more variability in the lower end. This pattern is not contradicted by
the scatter in Figure 2, other than the variability there seems larger at the
upper end. Both these possible violations of homoscedasticity might be small
sample illusions.
Upper rows in both graphs, from left to right: LP seems uncorrelated with LA
and nearly so with LC . The latter is surprising, since many citations adds
value to the journal. But this effect might be masked by other variables we
2
3
will see from the regressions. LC is positively correlated with both LA and
LN . There is thus potential for masking the effect of LC . Figures 2 and 3
show no reasons not to analyse these data by linear methods (regression) on
the log scale for the variables price, age, citations and number of pages.
(d)
2 measures the effect on the expected value of LP of a journal being nonprofit versus private, given the same number of pages. The empirical result is
that a non-profit journal is priced at about exp(-1.18)=.31 of a private journal
with the same number of pages. 3 measures the price elasticity with respect
to number of pages. The elasticity in this model is assumed the same for both
private and non-profit journals.
(e)
2 does not have quite the same basic interpretation. The expected
difference in log price for two other wise identical journals, one being nonprofit and the other private, is now 2 6 LN 7 LA 8 LC , and not only 2 . It
is surprising that this estimated effect now seems positive: that non-profit
journals are more expensive than private. But note the large standard error.
The effect is not statistically significant!
3
4.87 9.01,
The results by OLS are given in Table 3. Which of the three models considered
so far would you prefer? Discuss and test!
As measured by Adj R-squared, model (g) is the superior of the three. It fits
the data much better than model (d), and only slightly worse than model (e)
(as measured by R-squared, and equivalently by Root MSE). Model (g) is
simpler and slightly easier to interpret than model (e). In model (g), S effects
price only through differential effect of citation. It is nice to isolate the effect of
the most interesting covariate, and it is nice to get strong significance (and
expected sign) for this covariate. This is the case in model (g). This discussion
leads to model (g) as the preferred one.
The three models are only partially nested: (g) is obtained from (e) by setting
three coefficients to zero, and (d) is obtained from (e) by setting 6 coefficients
to zero. But model (d) cannot be obtained from (g) by setting coefficients to
4
5
zero. Since model (d) obviously is inferior, we only test model (g) versus
model (e). The test problem is H 0 : 2 6 7 0 versus at least one of the
coefficients being non-zero. The F-statistic is
F 57.591 56.758 /(8 5) / .576 .482 . This number is compared to the Fdistribution with 3 and 171 degrees of freedom, and the conclusion is that
there is hardly evidence for claiming one or more of the three coefficients
being non-zero ( H 0 is not rejected at any meaningful level).
(h)
Table 4 gives the variance inflation factors for model (g). What do these
numbers tell you? Suggest a change of variables that will reduce the unwanted
effects of large inflation factors, but without changing the essence of model (g).
The two first VIFs are terribly high. They tell us that LN 2 and LN are strongly
linearly related, as they certainly must be. That the p-value for LN (testing its
coefficient being non-zero) is as low as 12% is really impressing in the
presence of this strong colinearity. The colinearity could be reduced by
replacing LN 2 by LN 2* LN 2 LN 2 | LN , which is the residual obtained by
regressing LN on LN 2 by OLS. This would alter the coefficient for LN and
also its SE, but the coefficient for LN 2 would not be changed. The fit (Rsquared, sums of squares) would remain unchanged.
(i)
Returning to Bergstroms paper. Do you agree that private journals are overpriced? Based on your preferred model, describe the pricing policy and profit
generation in private journals.
Are economists in academia loyal to their non-profit journals in the sense that
University libraries are more prone to subscribe to a journal published by a
scholarly society when everything else is equal? To address this question, the
following model is considered.
LY 1 2 S 3 LP 4 LN 5 LA 6 LC u
The OLS results for this model are given in Table (5). Discuss the issue raised.
Note that the supplier side in the journal market is a mixed bag. Non-profit
journals are generally priced according to real production cost, with the hard
work of editing and refereeing done on a no-pay basis. These journals are thus
priced with little regard to what could have been their market price.
The signs for LP , LN , LA , and LC do make sense. However, libraries do
seem to subscribe less to non-profit journals than privately published ones,
everything else equal. This effect is non-significant. It could be due to heavier
marketing by private publishers. For private publishers, we do also have a
6
problem with simultaneity. Private publishers are likely to fix their price in
response to what they perceive as the demand function. A privately published
journal of the same quality as a society journal is thus likely to be priced
higher, see point (i). The number of subscribers would then be reduced, and it
is thus likely that subscription is higher for non-profit journals than private
journals of the same quality.
(k)
Inspecting the empirical residuals from model (j), a pattern is noted. The pattern
seems to be
Our data consists of 180 journals in economics. This is pretty much the
collection of academic journals in this field that use the English language. This
collection is thus not a random sample from some existing population. Explain
the statistical meaning of a confidence interval, say that in point (e), and discuss
the difficulties involved in this interpretation since we do not sample in a
simplistic sense.
7
confidence. This interpretation might be extended slightly. The particular
study might not be reproducible by drawing an independent sample. But, in
studies where the assumed model indeed represents the uncertainties
involved, the fraction of studies leading to their produced confidence intervals
covering the true values (which might vary from study to study) will match the
degree of confidence (which is kept fixed) in the long run.
Repeated sampling makes no sense in the present study. The question is
whether the assumed model, say (e), correctly represents the uncertainties
involved. I have my doubts. There are, for example, reasons why some
journals are privately published and others are society journals. The society
journals tend to be more general in scope than the private ones. It is easier to
carve out a market segment in a limited area, say labour economics of
fisheries economics, and there establishing a private journal with a dominating
position, than in the more general areas. Perhaps such conditions are more
important for the pricing than the covariates in (e).
The big question is rather: to what extent are our results externally valid? That
is, to what extent do our results have validity for extrapolation in time,
language area and field of science? It is hard to give a good answer here. The
best approach might be to study the journals in a few other fields of science
and see if the same pattern emerges.
APPENDIX
(Output based on Stata)
price_NonProfit
price_Private
2120
20
167
2632
2.5
3.5
4.5
8
7
6
LP
5
4
4.5
4
3.5
LA
3
2.5
9
8
7
LC
6
5
8
7
LN
6
5
4
Figure 2. Scatter plots for logged variables. The plot for LC (on the y-axis)
versus LN is, for example, found in row 3 and column 4. Non-profit journals.
8
8
6
LP
4
2
5
4
LA
3
2
10
8
6
LC
4
2
8
7
LN
6
5
2
10
Source |
SS
df
MS
-------------+-----------------------------Model | 36.8357611
2 18.4178806
Residual | 119.232662
177 .673630857
-------------+-----------------------------Total | 156.068423
179 .871890631
Number of obs
F( 2,
177)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
180
27.34
0.0000
0.2360
0.2274
.82075
-----------------------------------------------------------------------------price
LP |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------society
S | -1.183172
.2207267
-5.36
0.000
-1.618767
-.7475775
pages
LN |
.812689
.1315476
6.18
0.000
.5530854
1.072293
_cons |
.3748265
.8661977
0.43
0.666
-1.334578
2.084231
------------------------------------------------------------------------------
Table 1. Regression results for model (d). OLS. Stata output with variable text added
(price as a reminder that LP is ln( price) etc.).
10
Source |
SS
df
MS
-------------+-----------------------------Model | 57.5912891
8 7.19891114
Residual | 98.4771338
171 .575889671
-------------+-----------------------------Total | 156.068423
179 .871890631
Number of obs
F( 8,
171)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
180
12.50
0.0000
0.3690
0.3395
.75887
-----------------------------------------------------------------------------price
LP |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------society
S |
2.866088
2.946723
0.97
0.332
-2.950549
8.682725
pages
LN | -4.143584
2.472981
-1.68
0.096
age
LA | -.4891446
.1040427
-4.70
0.000
-.694518
-.2837712
citations LC | -.0161615
.064683
-0.25
0.803
-.1438415
.1115185
SLN | -.4852809
.4685923
-1.04
0.302
-1.410251
.4396893
SLC | -.1073762
.2245599
-0.48
0.633
-.5506426
.3358902
SLA |
.0218138
.3993713
0.05
0.957
-.7665187
.8101464
LN2 |
.3910415
.1870774
2.09
0.038
.0217631
.7603198
_cons |
17.69035
8.153315
2.17
0.031
1.596248
33.78446
------------------------------------------------------------------------------
Source |
SS
df
MS
-------------+-----------------------------Model | 56.7578429
5 11.3515686
Residual | 99.3105799
174 .570750459
-------------+-----------------------------Total | 156.068423
179 .871890631
Number of obs
F( 5,
174)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
180
19.89
0.0000
0.3637
0.3454
.75548
-----------------------------------------------------------------------------price
LP |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pages
LN | -3.750736
2.416129
-1.55
0.122
-8.51943
1.017957
age
LA | -.4836628
.0991712
-4.88
0.000
-.6793961
-.2879295
citations LC | -.0071069
.0617511
-0.12
0.909
-.1289846
.1147708
SLC | -.1690801
.0313662
-5.39
0.000
-.2309874
-.1071729
LN2 |
.3557514
.1820416
1.95
0.052
-.0035425
.7150454
_cons |
16.57367
7.994926
2.07
0.040
.7941495
32.35318
------------------------------------------------------------------------------
Variable |
VIF
1/VIF
-------------+---------------------LN2 |
423.92
0.002359
LN |
419.79
0.002382
LC |
2.11
0.474096
LA |
1.24
0.803837
SLC |
1.21
0.823833
-------------+---------------------Mean VIF |
169.65
10
11
Source |
SS
df
MS
-------------+-----------------------------Model | 139.755649
5 27.9511297
Residual |
86.234402
174 .495600012
-------------+-----------------------------Total | 225.990051
179 1.26251425
Number of obs
F( 5,
174)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
180
56.40
0.0000
0.6184
0.6074
.70399
----------------------------------------------------------------------------------subscriptions LY |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
----------------------------------------------------------------------------------society
S | -.2190234
.2061648
-1.06
0.290
-.6259291
.1878823
price
LP | -.4394106
.0695569
-6.32
0.000
-.5766944
-.3021268
pages
LN |
.3482928
.1591566
2.19
0.030
.0341669
.6624188
age
LA |
.4272673
.0983372
4.34
0.000
.2331801
.6213546
citations
LC |
.4110117
.0571062
7.20
0.000
.2983018
.5237217
_cons |
1.209037
.8558679
1.41
0.160
-.4801824
2.898256
-----------------------------------------------------------------------------------
Number of obs
F( 5,
174)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
180
84.20
0.0000
0.7076
0.6992
.61124
---------------------------------------------------------------------------------subscriptions LY |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------------------------------------------------------------------------------society
S | -.1202159
.1696996
-0.71
0.480
-.4551505
.2147186
price
LP | -.4362878
.0587879
-7.42
0.000
-.552317
-.3202587
pages
LN |
.3057825
.1365795
2.24
0.026
.0362167
.5753484
age
LA |
.5066978
.095116
5.33
0.000
.3189682
.6944274
citations
LC |
.4090865
.0528861
7.74
0.000
.3047057
.5134673
_cons |
1.212265
.7035282
1.72
0.087
-.1762821
2.600813
----------------------------------------------------------------------------------
11