You are on page 1of 20

International Review of Economics Education 12 (2013) 6079

Contents lists available at SciVerse ScienceDirect

International Review of Economics


Education
journal homepage: www.elsevier.com/locate/iree

Teaching basic econometric concepts using Monte Carlo


simulations in Excel
Genevieve Briand a,*, R. Carter Hill b
a

Instructor, School of Economic Sciences, Washington State University, Pullman, Washington 99164-6210, United States
Ourso Family Professor of Econometrics and Thomas Singletary Professor of Economics, Economics Department,
Louisiana State University, Baton Rouge, Louisiana 70803, United States

A R T I C L E I N F O

A B S T R A C T

Article history:
Available online 8 April 2013

Monte Carlo experiments can be a valuable pedagogical tool for


undergraduate econometrics courses. Today this tool can be used in
the classroom without the need to acquire any specialized
econometrics software. This paper argues that Microsoft Excel,
which is already available at many ofce and home computer
stations, offers the opportunity to run meaningful Monte Carlo
simulations and to successfully teach students basic econometric
concepts. The reader is guided, step-by-step, through two different
exercises. The rst one is a repeated sampling exercise showing that
least squares estimators are unbiased. The second one expands on
the rst to explain the true meaning of condence interval
estimates of least squares estimators.
2013 Elsevier Ltd. All rights reserved.

Keywords:
Teaching
Econometrics
Excel
Monte Carlo simulations

1. Introduction
Monte Carlo experiments rely on repeated random sampling to simulate and compute results of
interest to researchers, instructors or students. Undergraduate econometrics textbooks make use of
Monte Carlo simulations to help teach basic econometric concepts. In Gujarati and Porter (2009), the
authors state that the reader will be asked to conduct Monte Carlo experiments using different
statistical packages (p.12). Hill et al. (2011) illustrate the sampling properties of the least squares
and interval estimators in the beginning chapters of their textbook (pp. 8893 and pp. 127129). In
later chapters, they make use of Monte Carlo simulations to explore the properties of the least

* Corresponding author.
E-mail address: gbriand@wsu.edu (G. Briand).
1477-3880/$ see front matter 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.iree.2013.04.001

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

61

squares estimator in the case of random independent variables (pp. 2734, 280) and limited
dependent variables (pp. 4424). Murray (1999) points out that Monte Carlo techniques offer a
rich device for discovery learning (p. 3089). He further states that as I explored the potential of
Monte Carlo techniques for learning econometrics, I realized how much Monte Carlo methods
highlight the central role of sampling properties in econometrics (p. 309). In his preface for
teachers, Murray (2006) explains that his textbook starts out with the Monte Carlo approach to
estimators and returns to Monte Carlo analyses to facilitate learning about heteroskedasticity,
errors in variables, and consistency (p. xxvii and p. xxx). Kennedy (2008, 1998a, 1998b) is a strong
advocate of Monte Carlo experiments as a pedagogical tool for undergraduate econometrics courses.
He suggests the use of explain how to do a Monte Carlo study problems to teach student the
sampling-distribution concept which, he argues, is the statistical lens allowing students to make
sense of the statistics world (1998a, 1998b). He proposes an array of such problems in Appendix D of
his textbook (2008).
Kennedy recommends that instructors do not ask students to actually do a Monte Carlo study
(1998a). He cautions of the high opportunity cost of having them learn how to program (1998b).
Although Murray (1999) eagerly champions using computers to teach econometrics and allow a
hands-on, discovery mode of learning, he also warns that a computer classroom can be very costly in
faculty time and institutional dollars (p. 308).
We, on the other hand, following Judge (1999) and Craft (2003), argue that Microsoft Excel offers
the means to run meaningful Monte Carlo simulations and to successfully teach students basic
econometric concepts, at a relatively low opportunity cost. Cahill and Kosicki (2000 p. 771) offer this
perfect summary of arguments for using Microsoft Excel:
From a practical perspective, spreadsheet software such as Excel is a natural choice to use in
exploring economic models because it is widely available on most campuses. This availability
eliminates the task of seeking funding for the purchase and support of specialized software
packages. In addition, spreadsheet software is relatively easy to use, and its exibility makes it
useful in many different courses at all levels of the traditional economics curriculum. Most
economics students almost certainly will use it after graduation in both career and personal
settings. Most important, it minimizes black-box features that characterize much computerassisted learning software.
Our paper differs from Judge (1999) and Crafts (2003) in that we restrict ourselves to presenting
how Monte Carlo simulations can be run in Excel. Instructors will decide for themselves how to
incorporate them in their econometrics courses. Our exposition includes many screen shots and
provides step-by-step instructions for using Excel to run the Monte Carlo simulation exercises.
The rst Monte Carlo simulation exercise we go through is a repeated sampling exercise showing
that least squares estimators are unbiased. The second one expands on the rst to explain the true
meaning of condence interval estimates of least squares estimators.
2. Repeated sampling and unbiasedness
Following Hill et al. (2011) and Briand and Hill (2012), the examples that follow are developed
around the idea of studying the relationship between household weekly income and their
corresponding weekly expenditure on food. We consider the experiment of randomly selecting
households from a population, and subsequently estimating the simple linear regression model:
y = b1 + b2x + e, where y represents weekly food expenditure and x represents weekly income. In a
Monte Carlo experiment the repeated sampling properties of estimators and tests are observed
directly, by creating many samples of data, applying an estimator or test to each sample, recording the
outcomes, and then summarizing the outcomes. Samples are created using a specic data generating
process (DGP) by which we create a sample of N values. The ingredients of the DGP include (i) choosing
the sample size N, (ii) choosing x values (which may be xed in repeated samples, or not) (iii) selecting
parameter values, and (iv) randomly selecting values of the regression error terms e from a probability
distribution with a given mean and variance. Given these elements we can create an outcome y via a
process that resembles a controlled experimental outcome.

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

62

Fig. 1. Probability distribution functions of food expenditure given income level and linear relationship between expected food
expenditure and income.
Table 1
Monte Carlo experiment parameters.

1
2
3
4
5
6
...
26

N=
x1=
x2=

40
1000
2000

x
=$B$2
...
=$B$3

s=
b1=
b2=

50
100
0.10

We will work with random samples of N = 40 households: 20 of them with weekly income
x = $1000 and 20 of them with weekly income x = $2000. These values will remain xed in repeated
samples.
To illustrate the unbiasedness of the least squares estimators, the actual values of the regression
parameters b1 and b2, and the variance of the regression error s2 do not matter. We choose b1 = 100
and b2 = 0.10. Then E(y|x = 1000) = $200, and E(y|x = 2000) = $300. In keeping with the assumptions of
the normal linear regression model, the random errors e will be chosen to have independent normal
distributions with mean 0 and constant, homoskedastic, variance s2 = 2500. Because x is xed in
repeated samples, the distribution of y is normal with mean E(y|x = 1000 or 2000) and variance
s2 = 2500. These distributions and the regression function E(y|x) = b1 + b2x are shown in Fig. 1.
The objective of the rst Monte Carlo exercise is to show that if we draw many samples of size
N = 40 using the specied data generation process, the average value of the least squares estimates b1
and b2 will be close to their true parameter values b1 and b2. If, for example, we use 1000 Monte Carlo
P
samples, then the sample average 1=1000 1000
s1 b2s b2 , where b2s is the least squares estimate of
b2 in the sth Monte Carlo sample, will be close to b2 = 0.10. The expected value of the least squares
estimator is based on an innite number of repetitions. If we could compute an innite number of least
squares estimates b1 and b2, their average value would equal their parameter values b1 and b2. A
Monte Carlo simulation is only based on a nite number of repetitions, and thus the sample average b2
will not exactly equal E(b2) = b2, however the sample average b2 will converge towards b2 as the
number of Monte Carlo samples is increased.
To begin the Monte Carlo simulation exercise, we rst enter the following labels, values and
formulas in cells A1:B3,1 D1:E3, A5:B6 and A26, as shown in Table 1.
We then Copy the cell reference from A6 into A7:A25 and the cell reference from A26 into A27:A45.
1

A1:B3 refers to the range of cells between A1 and B3, inclusively. For more on Excel basic skills, please see Appendix A.

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

63

We have specied the sample size N, the x values and the parameter values of the data generating
process, b1, b2 and s. Next, we randomly select error terms e and generate a random sample of
households food expenditure values, ys.
2.1. Generating a random sample
We use Excel functions2 NORMINV and RAND to generate random values using what is called the
inversion method. This technique, briey described in Appendix B,3 can be used to generate random
deviates from a probability distribution for a continuous random variable for which the cumulative
distribution function, or cdf, has an inverse. Let u be a uniformly distributed random variable on the
interval [0,1] and let F(x) denote the cdf for a N(m,s2) random variable X, such that P(X  x) = F(x). Then a
random value x from the distribution N(m,s2) is created by solving the equation u = F(x) for x as
x = F1(u), where F1 denotes the inverse function. In Excel the function RAND creates a uniform
random value, u, and NORMINV is the inverse function for the normal distribution, F1. Thus by
nesting these two functions we can create a random value from a N(m,s2) distribution. The statistical
basis for this result is discussed in Appendix B.
The general syntax of the NORMINV function is:
=NORMINV(probability, m, s)

Fct. (1)

The NORMINV function computes the value x of a normally distributed variable X with mean m,
standard deviation s, and with probability =P(Xx), where 0 probability 1.
We obtain random error values from the N(m,s2) distribution by specifying the probability
argument of the NORMINV function to be the RAND function. The general syntax of the RAND function
is as follows:
=RAND()

Fct. (2)

The RAND function4 returns a uniformly distributed random number u, with 0  u < 1.
Nesting the RAND function into the NORMINV function we have5
=NORMINV(RAND(), m, s)

Fct. (3)

Our interest here lies specically in the normal distribution of random errors, e, with mean m = 0
and standard deviation of s = 50. Having these, the nal step is to generate a random weekly food
expenditure value drawn from households with one of two weekly income (x = $1000 or x = $2000) via
the known linear function: y = b1 + b2x + e, where b1 = 100 and b2 = 0.10.
In cells B6 and B26 we enter the equations shown in Table 2.
We Copy the formula from B6 into B7:B25 and from B26 into B27:B45.
The random sample that we obtain6 is presented on Screen Shot 1 and its scatter plot on Fig. 2. Fig. 2
can be compared to Fig. 1. The rst group of points from the scatter plot (on the left in Fig. 2) was
2
Using Excel functions to generate random numbers is a better alternative than using the Excel Random Number Generation
(RNG) Data Analysis tool. This approach has two main advantages: (1) avoiding using an add-in, which may not be installed by
default in students computer labs, and (2) avoiding falling back into the black box trap. However, one advantage of the RNG
Data Analysis tool is that it makes it easier to repeat the exercise with different sample size.
3
For a bit more see, for example, Wikipedia: http://en.wikipedia.org/wiki/Inverse_transform_sampling.
4
For a description of the algorithm that is used in the RAND function, Excel random number generator function, see http://
support.microsoft.com/kb/828795 Microsoft (2011). Microsoft acknowledges that there were some problems with RAND in
previous versions of Excel, but it has been improved. For a brief introduction to the issues see, for example, Wikipedia: http://
en.wikipedia.org/wiki/Random_number_generation.
5
The Box-Muller transformation is an alternative to using the inversion method. However it is specic to generating values
from the normal distribution, while the inversion method can be used to generate random values from many distributions. For
more on the Box-Muller transformation see, for example, http://www.exceluser.com/explore/statsnormal.htm, Kyd (2011); or
http://mathworld.wolfram.com/Box-MullerTransformation.html Weisstein (2011).
6
RAND() generates different random values each time it is calculated. Thus the users random sample values will differ from ours.

64

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

Table 2
Specifying the regression function.
6
...
26

B
=$E$2+$E$3*$B$2+NORMINV(RAND(),0,$E$1)
=$E$2+$E$3*$B$3+NORMINV(RAND(),0,$E$1)

Fig. 2. Scatter plot of random sample.

drawn from the probability distribution function of food expenditure for income level x = $1000 (in the
front in Fig. 1). The second group of points from the scatter plot (on the right in Fig. 2) was drawn from
the probability distribution function of food expenditure for income level x = $2000 (in the back in
Fig. 1). The tted regression line, which runs between the two groups of points in Fig. 2, is an estimate
of the true regression line depicted in Fig. 1
Screen Shot 1:

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

65

Table 3
How LINEST reports parameter estimates.

row 1

column 1

column 2

b2

b1

Table 4
How to report the estimates.

47
48

b1=
b2=

=INDEX(LINEST(B6:B45,A6:A45),2)
=INDEX(LINEST(B6:B45,A6:A45),1)

2.2. Generating least squares estimates


Next, use the LINEST function to obtain the least squares estimates for the intercept and slope
parameters, based on the random sample just drawn. For this purpose, the general syntax of the
LINEST function is as follows:
=LINEST(ys, xs)

Fct. (4)

The rst argument of the LINEST function species the y values, and the second argument species
the x values, the least squares estimates are based on. In our case, we specify7:
=LINEST(B6:B45,A6:A45)

Fct. (5)

The LINEST function creates a table where the least squares estimates are stored in Excel, rst the
slope coefcient estimate, and then the intercept coefcient estimate. The estimates are reported as
shown in Table 3.
We nest the LINEST function in the INDEX function to get the estimated coefcients, one at a time.
The INDEX function returns values from within a table. In the case of a table with only one row, the
INDEX function general syntax is as follows:
=INDEX(table of results, column_num)

Fct. (6)

The rst argument of the INDEX function species the source table. In our case, this is the table of
results generated by the LINEST function above. So, replace table of results by LINEST(B6:B5,A6:A5). The second argument indicates from which column of the table to retrieve the
result of interest. If we want to retrieve the estimate of the intercept coefcient, b1, from the table
above, we would indicate that it can be found in column 2 by replacing column_num by 2.
We report the estimated coefcients at the bottom of our worksheet. In cell A47:B48 enter the
labels and equations shown in Table 4.
The estimates of the intercept and slope coefcients we obtain from our sample of data are:
Screen Shot 2:

7
Note that because we will nest the LINEST function in the INDEX function, we will effectively be working with regular
formulas, as opposed to the more complex array formulas. In addition, we choose to work with the LINEST function instead of
the SLOPE and INTERCEPT functions so we can also generate standard errors estimates (see Section 3.1).

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

66

Table 5
Key cell formulas to report estimates.
Cell

Formula

Copied to

A6
A26
B6
B26
B47
B48

=$B$2
=$B$3
=$E$2+$E$3*$B$2+NORMINV(RAND(),0,$E$1)
=$E$2+$E$3*$B$3+NORMINV(RAND(),0,$E$1)
=INDEX(LINEST(B6:B45,A6:A45),2)
=INDEX(LINEST(B6:B45,A6:A45),1)

A7:A25
A27:A45
B7:B25
B27:B45

Recall that each random sample is different and will yield different estimates, which may or may
not be close to the true parameter values. The property of unbiasedness is about the average values of
b1 and b2 if many samples of the same size are drawn from the same population. In the next section, we
thus repeat our sampling and least squares estimation exercise.
Screen Shot 3 presents the generation of a random sample and its least squares estimates
accompanied by Table 5 of the formulas used and addresses where the formulas were copied to.8
Screen Shot 3:

2.3. Repeated sampling


We would like to draw 9 additional random samples. For that, Copy the formula from B6 into
C6:K25 and the formula from B26 into C26:K45.
Next, before copying the formula to obtain coefcient estimates for the new samples, transform
the Relative cell reference A6:A45 into an Absolute cell reference $A6:$A45this retains the same
8

This presentation is drawn from Ragsdale (2008) Managerial Decision Modeling textbook.

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

67

Table 6
Average the estimates.

47
48
50
51

b1=
b2=

=INDEX(LINEST(B6:B45,$A6:$A45),2)
=INDEX(LINEST(B6:B45,$A6:$A45),1)

K
...
...
=AVERAGE(B47:K47)
=AVERAGE(B48:K48)

x-values for the next 9 rounds of least squares estimations. Copy the formulas from B47:B48 into
C47:K48. In cells K50:K51, we compute the AVERAGEs of the estimates from the 10 samples by
entering the formulas shown in Table 6.
The estimates and average values that we obtain for the 10 samples are:
Screen Shot 4:

Screen Shot 5 presents the repeated sampling of 10 random samples and their least squares
estimatesaccompanied by Table 7 of the formulas used and addresses where the formulas were
copied to.
Screen Shot 5:

Taking the averages of estimates from many samples, the averages will approach the true
parameter values b1 and b2. To show that this is the case, we repeated the exercise againand you

68

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

Table 7
Key cell formulas: 10 random samples.
Cell

Formula

Copied to

A6
A26
B6
B26
B47
B48
L47
L48

=$B$2
=$B$3
=$E$2+$E$3*$B$2+NORMINV(RAND(),0,$E$1)
=$E$2+$E$3*$B$3+NORMINV(RAND(),0,$E$1)
=INDEX(LINEST(B6:B45,$A6:$A45),2)
=INDEX(LINEST(B6:B45,$A6:$A45),1)
=AVERAGE(B47:K47)
=AVERAGE(B48:K48)

A7:A25
A27:A45
B7:B25, C6:K25
B27:B45, C26:K45
C47:K47
C48:K48

Table 8
Monte Carlo average values.
Number of samples

10

100

1000

Parameter Values

Average value of b1
Average value of b2

89.87141
0.105656

97.52929
0.101557

99.09254
0.100546

100
0.1

could easily ask your students to do the same.9 We encourage a reader who has so far been replicating
the Excel steps along with us to continue reading, and abstain for now from repeating the exercise. The
cell references that are given next in this paper assume that this is the case. In Table 8 are the average
values of b1 and b2 obtained when the number of samples is increased from 10 to 100, and nally to
1000.
Note that for 1000 random samples, or even 100, the average of the least squares estimates are very
close to the true values.
3. Repeated sampling and interval estimation
In the second Monte Carlo exercise we will illustrate the meaning of 95% level of condence in
interval estimation. A 95% interval estimator is bk  t(0.975,NK)se(bk), where bk is the least squares
estimator of bk, t(0.975,NK) is the 97.5 percentile from a t-distribution with NK degrees of freedom and
se(bk) is the standard error of bk. In a large number of repeated samples from the same population, 95% of
interval estimates will contain the true underlying population parameter.
This section provides step-by-step instructions for constructing a Monte Carlo simulation template
in Excel. If the following exercise were proposed to students, it would require of them to spend more
time familiarizing themselves with new Excel functions as well as guring how to use them to
properly design a Monte Carlo simulation template. The authors believe that the time spent on those
two activities will enhance their grasp of basic econometric concepts as well as present an opportunity
for them to rene their spreadsheet software skills.
This time, we would like to draw 90 additional random samples. Copy the formula from K6 into
L6:CW25 and the formula from K26 into L26:CW45.
3.1. The LINEST function revisited
The LINEST function will obtain the least squares estimates and their standard errors with one
additional option. The general syntax of the LINEST function is:

9
As an alternative to the method used here, see Barreto and Howland (2005) add-in: http://www3.wabash.edu/
econometrics/EconometricsBook/Basic%20Tools/ExcelAddIns/MCSim.htm.

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

69

Table 9
How LINEST reports estimates and standard errors.

row 1
row 2

column 1

column 2

b2
se(b2)

b1
se(b1)

=LINEST(ys, xs,,TRUE)

Fct. (7)

The rst argument of LINEST function species the y values; the second argument species the x
values; we ignore the third argument by putting a space between the second and third commas; and the
fourth argument, TRUE, indicates that we would like LINEST to return additional regression statistics.
The LINEST function creates a table where it stores the least squares and standard errors estimates.
The order in which they are reported is shown in Table 9.
We nest the LINEST function in the INDEX function to get the estimated coefcients, one at a time.
The INDEX function returns values from within a table. The INDEX function general syntax is as
follows:
=INDEX(table of results, row_num, column_num)

Fct. (8)

The rst argument of the INDEX function species the source table. The second argument and third
argument indicate the intersection of a row and a column at which the result of interest can be found.
The nested commands are:
b1:
se(b1):
b2:
se(b2):

=INDEX(LINEST(y-values,x-values,
=INDEX(LINEST(y-values,x-values,
=INDEX(LINEST(y-values,x-values,
=INDEX(LINEST(y-values,x-values,

TRUE),1,2)
TRUE),2,2)
TRUE),1,1)
TRUE),2,1)

Fct.
Fct.
Fct.
Fct.

(9)
(10)
(11)
(12)

3.2. The simulation template


The template shown in Table 1010 reports estimated coefcients, standard errors, t-percentile
values and limits of the interval estimates (Lower Limit: LL and Upper Limit: UL). Next, we count how
many of the 100 interval estimates contain the true parameters values. Finally, we compute summary
statistics for our estimated coefcients and standard errors. Specify cells A47:B68 as shown in Table 10
(some cells are outlined in different shades of gray only to distinguish groups of similar or related cells
which we comment on shortly).
In cells A47:B48, the sample size (N) and a value are specied, for a 100(1  a)% condence
interval. The t-distribution degrees of freedom (d.f.) and t-percentile value (tc) are computed and
reported in cells A49:B50.
Cells A51:B52 and A60:B61 are used to report and compute coefcient estimates (bk, k = 1,2) and
standard errors (se(bk), k = 1,2). In the formula typed in cells B51:B52 and B60:B61, the cell references
to the x values are in Absolute format, $A6:$A45, as opposed to Relative format, as we will be using
the same x values for all 100 repetitions.
Cells A53:B54 and A62:B63 are used to compute and report interval estimates. The t-critical value
tc will be the same over all repetitions, so its cell reference in the formulas of the intervals limits is
specied in Absolute format, $B$50.
In cells A55:B55 and A64:B64, we establish and report whether the true parameters values are
contained in the interval estimates. In cells A56:B56 and A65:B65, we keep track of the number of
interval estimates that do contain the true parameters values.
10

Available from the authors upon request: gbriand@wsu.edu.

70

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

Table 10
The simulation template.

47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68

N=
a=
d.f. =
tc =
b1 =
se(b1) =
LL =
UL =
b1 in CI
Yes
average b1s =
std. dev. (b1s) =
average se(b1)s =
b2 =
se(b2) =
LL =
UL =
b2 in CI
Yes
average b2s =
std. dev. (b2s) =
average se(b2)s =

40
0.05
=B47-2
=TINV(B48,B49)
=INDEX(LINEST(B6:B45,$A6:$A45, TRUE),1,2)
=INDEX(LINEST(B6:B45,$A6:$A45, TRUE),2,2)
=B51-$B$50*B52
=B51+$B$50*B52
=IF(OR(100<B53,100>B54),No, Yes)
=COUNTIF(B55:CW55, Yes)
=AVERAGE(B51:CW51)
=STDEV(B51:CW51)
=AVERAGE(B52:CW52)
=INDEX(LINEST(B6:B45,$A6:$A45, TRUE),1,1)
=INDEX(LINEST(B6:B45,$A6:$A45, TRUE),2,1)
=B60-$B$50*B61
=B60+$B$50*B61
=IF(OR(0.1<B62,0.1>B63),No, Yes)
=COUNTIF(B64:CW64, Yes)
=AVERAGE(B60:CW60)
=STDEV(B60:CW60)
=AVERAGE(B61:CW61)

In cells A57:B58 and A66:B67 we compute and report the average and standard deviation of the
100 estimates of the intercept and slope coefcients. The average of the parameter estimates should
be close to the true parameter values because the least squares estimator is unbiased. The standard
deviations of the estimates are the actual nite sample variability of estimates in the Monte Carlo
experiment. In cells A59:B59 and A68:B68 we compute and report the average of the 100 standard
errors of the intercept and slope coefcients, which should be close to the estimates standard
deviations. This provides an opportunity to remind students that standard errors reect the sampling
variability of the estimates.
3.3. The TINV function
The TINV function returns 100(1  a/2) percentile values for a t-distribution (t-critical values) with
specied degrees of freedom. The syntax of the TINV function is:
=TINV(a, degrees of freedom)

Fct. (13)

where a is the two-tail probability.


3.4. The IF function
Use the IF and OR logical functions to indicate, for each interval estimate, whether or not it contains
the true parameter value. The general syntax for the IF function is
IF(logical_test,value_if_true,value_if_false)

Fct. (14)

where:
Logical_test is any value or expression that can be evaluated to be TRUE or FALSE. In this
exercise we want to determine whether or not the true parameter value, bk, is within the
estimated interval [LL,UL], where LL = bk  tcse(bk) and UL = bk + tcse(bk). The logical expression we

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

71

use is: if bk < LL or bk > UL. If bk is outside [LL,UL], then this expression is TRUE. Otherwise, the
expression is FALSE.
Value_if_true is the value that is returned if logical_test is TRUE. For example, if this argument is
the text string No and the logical_test argument is TRUE, then the IF function displays the text No.
Value_if_false is the value that is returned if logical_test is FALSE. For example, if this argument is the
text string Yes, and the logical_test argument is FALSE, then the IF function displays the text Yes.
3.5. The OR function
Use the OR function to write the logical_test. The general syntax of the OR function is
OR(argument_1,argument_2)

Fct. (15)

If the rst logical expression, argument_1, or the second logical expression, argument_2, is TRUE,
then the OR function returns TRUE. It returns FALSE only if both arguments are FALSE.
The general syntax for the OR function, nested in the IF function, is:
IF(OR(argument_1,argument_2),value_if_true,value_if_false)

Fct. (16)

Applied to our exercise, the nested function looks like this (which is what we have in cells B55 and
B64):
IF(OR(bk < LL, bk > UL),No,Yes)

Fct. (17)

If bk is outside [LL,UL], then the logical_test bk < LL or bk > UL is TRUE, and No is returned to
indicate that bk is not in the estimated condence interval. Otherwise, the logical expression is FALSE,
and Yes is returned to indicate that bk is in the estimated condence interval.
3.6. The COUNTIF function
Finally, we use the COUNTIF function to count the number of times bk is within the interval
estimate [LL,UL]. The COUNTIF function is a statistical function that counts the number of cells within
a range that meet a given criteria. Its general syntax is:
COUNTIF(cell_range,criteria)

Fct. (18)

Cell_Range is one or more cells to count. Criteria is the number, expression, cell reference, or text
that denes which cells will be counted. Since we are interested in counting how many interval
estimates, among all the ones we construct, actually contain the true parameter value, we count the
Yes that are generated following the application of our logical_test (this is what we do in cells B56
and B65):
COUNTIF(cell_range,Yes)

Fct. (19)

Review with students the meaning of the formulas and values in B47:B68. Copy the content of
B51:B55 to C51:CW55 and copy the content of B60:B64 to C60:CW64.
3.7. Results
Screen Shot 6 presents the repeated sampling of 100 random samples and 95% condence interval
estimatesthe accompanying table of the formulas used and addresses where the formulas were
copied to can be found in Appendix C.

72

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

Screen Shot 6:

We nd that 94 out of our 100 condence intervals contain b1, and 95 out of 100 interval estimates
contained b2 (see Screen Shot 6). Note that each replication will result in different random samples,
different interval estimates and thus a different number of intervals that will contain the true
parameters values.
With our simulation of 1000 samples, we nd that 954 out of 1000 condence intervals contained
the true parameter value, both for the intercept and slope coefcients. With our simulation of 10,000
samples we nd that 95% of both the intercept and slope coefcients interval estimates contained the
true parameters values.
Finally by computing the summary statistics of the Monte Carlo estimates we can illustrate that the
standard errors se(bs,k), s = 1, . . ., S and k = 1,2, measure the sampling variation in the estimates bs,k. The

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

73

average of the standard errors for each coefcient is close to the standard deviation of S estimates of
each coefcient (B59 versus B58, and B68 versus B67 on Screen Shot 6).
3.8. Extensions
The exercise developed thus far can be modied or extended to investigate additional concepts.
Some suggestions are:
 Effect of changes in variation of the x values on se(bk);
 Effect of random errors from alternative distributions on the sampling distribution of the bk
estimators, as shown in Appendix B;
 Effect of rescaling x and/or y values on bk estimates; and
 Effect of changes in sample size on se(bk).
4. Conclusion
This paper presented a step-by-step guide to Microsoft Excel fundamentals and two Monte Carlo
simulations exercises. The rst Monte Carlo simulation is a repeated sampling exercise showing that
least squares estimators are unbiased. The second one expands on the rst to explain the true meaning
of condence interval estimates of least squares estimators. In these two exercises, the sampling
distribution concept is presented in the context of the regression analysis, as Kennedy (2001) suggests
it should be introduced.
The use of Excel requires students to understand what they are doing. We hope that the experience
of using Excel in an undergraduate econometrics course will be akin to that of programming
procedures in a graduate econometrics text such as in Mittelhammer et al. (2000 p.713). The way we
suggest to use Excel in an undergraduate econometrics course is different than what Barreto and
Howland (2006) propose. They use Excel workbooks powered by Visual Basic macros that enable
Monte Carlo simulations to be run by students with a click of a button (p. i). While Barreto and
Howland use of Excel would seem paramount to what Day (1987) refers to as a canned program in
the context of macro-economic simulation exercises, ours would rather presents the advantages of a
student-built model (p. 351), giving students active learning opportunities.
Becker and Greene (2001) point out that the starting point for any course in statistics and
econometrics is the calculation and use of descriptive statistics and mastery of basic spreadsheets
skills (emphasis added) (p. 173). Some students are still not familiar with Excel, and a few are even
reluctant to use it; but at the end of a course using Excel to teach undergraduate econometrics,
students and instructors alike, come out of the experience with a stronger understanding of core
econometrics concepts and with better Excel skills.
Acknowledgement
We are grateful to anonymous referees for helpful comments and suggestions.
Appendix A. Microsoft Excel fundamentals (Using Excel 2007)
A.1. Starting Excel
Find the Excel shortcut on your desktop and double click on it to start Excel (left clicks).
Screen Shot A1:

74

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

Alternatively, left-click the Windows Start button at the bottom left corner of your computer
screen.
Screen Shot A2:

Slide your mouse over All programs, Microsoft Ofce, and nally Microsoft Ofce Excel 2007.
Left-click on the last one to start Excel. To create a shortcut instead, right-click on it; slide the mouse
over Send to, and then select (i.e. drag the mouse over and left-click on) Desktop (create shortcut).
From the Windows Start button, an easier way yet to start Excel is to select Run, type Excel in the
Open window of the Run dialog box and select the OK button or simply press your Enter key.
Screen Shot A3:

Excel opens to a new le, titled Book1. Find the name of the open le on the very top of the Excel
window, on the Title bar. An Excel le like Book1 contains several sheets. By default, Excel opens to
Sheet1 of Book1. Determine which sheet is open by looking at the Sheet tabs found in the lower left
corner of the Excel window.
Your screen might look slightly different than the one shown below. If your computer screen is
bigger, Excel will automatically display more of its available options. For example, in the Styles group
of command, instead of the Cell styles button, there might be a colorful display of cell styles.
Screen Shot A4:

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

75

The Active cell is surrounded by a border and is in Column A and Row 1; its Cell reference is
A1. Below the title bar is a Tab list. The Home tab is the one Excel opens to. Under each tab we
nd groups of commands. Under the home tab, the rst one is the Clipboard group of commands,
named after the tasks it relates to. The wide bar including the tab list and the groups of commands
is referred to as the Ribbon. The content of the Active cell shows up in the Formula bar (as shown
above, there is nothing in it). Perhaps the most important of all of this is to locate the Help button
on the upper right corner of the Excel window. Finally, we can use the Scroll bars and the arrows
around them to navigate up-down and right-left in the worksheet. And we potentially have a long
way to go: each worksheet in Microsoft Excel 2007 contains 1,048,576 rows and 16,384
columns!!!!
A.2. Entering data and carrying out calculations
To enter labels and data into an Excel worksheet move the cursor to a cell (i.e. drag our mouse over
and left-click on) and begin typing. Type X in cell A1, then press the Enter key to get to cell A2 or
navigate by moving the cursor with the mouse, or use the Arrow keys (to move right, left, up or down).
Fill in the rest of the worksheet as shown below in Table A1.
Excels primary usefulness is to carry out repeated calculations. We can add, subtract, multiply and
divide; and apply mathematical and statistical functions to the data in a worksheet. To illustrate, we
are going to compute the squares of the numbers we just entered. There are two main ways to perform
calculations in Excel. One is to write formulas using arithmetic operators, which we demonstrate
below; the other is to write formulas using mathematical functionsthese and other functions will be
used in the main section of this paper.
Place the cursor in cell B2. We want to compute the square of the value from cell A2. Let us
emphasize that the trick to using Excel efciently is NOT to re-type values already stored in the
worksheet, but instead to use references of cells where the values are stored. So, to compute the square
of 10, which is the value stored in cell A2, instead of typing the formula =10*10, type the formula
=A2*A2 or =A2^2 (the asterisk and the caret can be typed by simultaneously pressing the Shift key and
the * or ^ key) as shown in Table A2.
Press Enter. Note that: (1) a formula always starts with an equal sign; this is how Excel
recognizes it is a formula, and (2) formulas are not case sensitive, so we could also have typed
=a2^2 instead.
The way Excel understands the instructions we gave in cell B2 is square the value found at the
address A2. It is important to fully understand how Excel interprets address A2. To Excel address
A2 means from where you are at, go left by one cellbecause this is where A2 is located vis-a`-vis
B2. In other words, an address gives directions: left-right, up-down, and distances: number of cells
awayall in reference to the cell where the formula is entered.
Table A1
Entering data.

1
2
3

x
10
20

Table A2
Carrying out calculations.

1
2
3

x
10
20

y
=A2^2

76

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

Table A3
Absolute cell reference.

1
2
3

x
10
20

y
=$A$2^2

To copy the formula in cell B2 to cell B3 place the cursor back into cell B2, and move it to the southeast corner of the cell, until the fat cross turns into a skinny one, as shown below:
Screen Shot A5:

Then left-click, hold it, drag it down to the next cell below, and release!
Excel has copied the formula we typed in cell B2 into the cell below. The formula entered in cell B2
instructed Excel to collect the value stored one-cell away from its left, and then square itthose exact
same instructions are now found in cell B3. Place your cursor back into B3, and look at the Formula
bar. You can see that, in this cell, these same instructions translate into =A3^2, and the value
computed is thus 400.
Screen Shot A6:

Now, if we had wanted the values in B2 and B3 to be equal to the square of the value stored in cell
A2, then we would transform the cell reference in cell B2 from a Relative Cell Reference into an
Absolute Cell Reference, before copying it to cell B3. This ensures that an address does NOT change
when we copy a formula from one cell to another.
A Relative Cell Reference is made into an Absolute Cell Reference by preceding both the row
and column references by a dollar sign. Place the cursor back in cell B2 (i.e. move the mouse over
and left-click), and in the Formula bar, place the cursor before the A, insert a dollar sign (by
pressing the Shift key and the $ key at the same time); then move the cursor before the rst 2 and
insert another dollar sign; and nally, place the cursor at the end of the formula and press Enter.
See Table A3.
Copy the formula from cell B2 to cell B3. Place the cursor back into B3, and look at the Formula bar.
You can see that, this time, the formula is still =$A$2^2which means the instruction (or address)
given to Excel has not changed. The value computed is thus 100.
Appendix B. The inversion method for generating random deviates
Using the inversion method is an opportunity to explain to students the usefulness of change-ofvariable techniques and cumulative distribution functions.

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

77

B.1. Distributions of functions of random variables


Let X be a continuous random variable with probability density function, pdf, f(x). Let Y = g(X) be a
function that is strictly increasing or strictly decreasing. This condition ensures that the function is
one-to-one, so that there is exactly one Y value for each X value, and exactly one X value for each Y
value. The importance of this condition on g(X) is that we can solve Y = g(X) for X. That is, we can nd an
inverse function X = w(Y). Then the pdf for Y is given by


dwy

hy f wy  
(A1)
dy 
where j j denotes the absolute value. This is called the change-of-variable technique.11 As an example,
let X be a continuous random variable with pdf f(x) = 2x for 0 < x < 1. Let Y = g(X) = 2X. The inverse
function is X wY Y=2 and dwy=dy 1=2, so hy f wy  jdwy=dtj 2y=2  j1=2j y=2
for 0 < x < 2.
B.2. The inversion method for generating random values
This method is an application of the change-of-variable technique. We wish to draw a random
value from a continuous probability distribution with pdf f(y) and cdf F(y). We will use the fact that
dF(y)/dy = f(y). Also, pdf of a uniformly distributed random variable, u, on the (0,1) interval is f(u) = 1 for
0<u<1 otherwise f(u) = 0. The steps are
1.
2.
3.
4.

Draw a uniform random number u1 in the (0,1) interval.


Let u1 = F(y1) where F(y) is the cdf of the desired random deviate.
Solve the equation in step 2 for y1 using y1 = F1(u1) = g(u1).
The value y1 is a random value from the target pdf f(y).

From the change-of-variable technique, the inverse function wy Fy, so that


f wy  jdwy=dyj 1  jdFy=dyj f y.
As an illustration suppose our target distribution is the pdf f(y) = 2y for 0 < y < 1. The corresponding
cdf is F(y) = y2. Set a uniformly distributed random number u1 Fy1 y21 . Solve for y1 = F1(u1) = (u1)1/2.
This value is a random draw from the distribution f(y) = 2y.
Exactly the same strategy is used for creating a random draw from a normal distribution. Suppose
our objective is to draw a random value from a normal distribution, with mean m and variance s2. Let
the pdf of the normal density be denoted f(y) and the corresponding cdf F(y). Using the inversion
method we obtain a random draw from this density by nding y1 = F1(u1) where u1 is a uniform
random number in the (0,1) interval. The Excel function NORMINV is F1 and RAND provides the
uniform random value.
It is worth noting that once this technique is in place there are endless possibilities for modifying
the structure of the Monte Carlo experiment. We might have students examine the probability
distribution of the least squares estimators when the regression disturbances are distributed
something other than normal. Several examples for using errors with mean 0 and variance s2 are:
 Triangular random errors. Generate random values from the distribution f(y) = 2y for 0 < y < 1 as
discussed above. The mean and variance of this distribution are 2/3 and 1/18,prespectively.
For the

regression random errors to have mean 0 and variance s2 use e s y  2=3= 18. Students will be
surprised to nd that even with modest sized samples that the least squares estimators
distributions are not far from normal.
 Chi-square random errors.12 Generate standard normal random values y with mean 0 and variance
1, the standard normal, following the steps described above. Then y2 has a chi-square distribution

11
12

For a description of the method at the undergraduate level, see Hill et al. (2011).
See, for example, Cameron and Trivedi (2010).

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

78

with 1 degree of freedom, implying that it has mean 1 andpvariance


2. For the regression random

errors to have mean 0 and variance s2 use e s y2  1= 2


 Extreme value random errors. The extreme value density is
f y expy  expexpy

(A2)

which has cdf F(y) = exp( exp( y)). Despite its imposing form, we can obtain random values from
this distribution using the inversion method, since y = F1(u) =  ln( ln(u)) where u is a uniformly
distributed random value. The mean and variance of this distribution are 0.57722 and p2/6.13 For the
p
regression random errors to have mean 0 and variance s2 use e s y  0:57722= p2 =6
Random values from many distributions can be similarly formed. Books of statistical distributions
provide formulas for generating random deviates given uniform random values.
With Monte Carlo studies based on these alternative non-normal distributions, it is easy to
illustrate using a histogram how effective the Central Limit Theorem can be, resulting in nearly
normally distributed least squares estimators in even moderately sized samples. Furthermore the
performance of interval estimators, or hypothesis tests, can be studied under these alternative
assumptions as we have illustrated in Section 3.
Appendix C. Repeated sampling and interval estimation
Screen Shot 6 accompanying table of the formulas used and addresses where the formulas were
copied to.
Key cell formulas
Cell

formula

Copied to

A6
A26
B6
B26
B49
B50
B51
B52
B53
B54
B55
B56
B57
B58
B59
B60
B61
B62
B63
B64
B65
B66
B67
B68

=$B$2
=$B$3
=$E$2+$E$3*$B$2+NORMINV(RAND(),0,$E$1)
=$E$2+$E$3*$B$3+NORMINV(RAND(),0,$E$1)
=B47-2
=TINV(B48,B49)
=INDEX(LINEST(B6:B45,$A6:$A45,TRUE),1,2)
=INDEX(LINEST(B6:B45,$A6:$A45,TRUE),2,2)
=B51-$B$50*B52
=B51+$B$50*B52
=IF(OR(100<B53,100>B54),No,Yes)
=COUNTIF(B55:CW55, Yes)
=AVERAGE(B51:CW51)
=STD(B51:CW51)
=AVERAGE(B52:CW52)
=INDEX(LINEST(B6:B45,$A6:$A45,TRUE),1,1)
=INDEX(LINEST(B6:B45,$A6:$A45,TRUE),2,1)
=B60-$B$50*B61
=B60+$B$50*B61
=IF(OR(0.1<B62,0.1>B63),No, Yes)
=COUNTIF(B64:CW64, Yes)
=AVERAGE(B60:CW60)
=STD(B60:CW60)
=AVERAGE(B61:CW61)

A7:A25
A27:A45
B7:B25, C6:CW25
B27:B45, C26:CW45

C51:CW51
C52:CW52
C53:CW53
C54:CW54
C55:CW55

C60:CW60
C61:CW61
C62:CW62
C63:CW63
C64:CW64

References
Barreto, H., Howland, F.M., 2005. Excel Add-Ins for Introductory Econometrics using Monte Carlo Simulation with Microsoft
Excel. Cambridge University Press, New York Available at: http://www3.wabash.edu/econometrics/EconometricsBook/
Basic%20Tools/ExcelAddIns/index.htm.

13

For example, Forbes et al. (2011).

G. Briand, R.C. Hill / International Review of Economics Education 12 (2013) 6079

79

Barreto, H., Howland, F.M., 2006. Introductory Econometrics using Monte Carlo Simulation with Microsoft Excel. Cambridge
University Press, New York.
Becker, W.E., Greene, W.H., 2001. Teaching statistics and econometrics to undergraduates. Journal of Economic Perspectives 15
(4) 169182.
Briand, G., Hill, R.C., 2012. Using Excel 2007 for Principles of Econometrics, third edition. John Wiley & Sons, New York.
Cahill, M., Kosicki, G., 2000. Exploring economic models using Excel. Southern Economic Journal 66 (3) 770792.
Cameron, A.C., Trivedi, P., 2010. Microeconomics Using Sata, Revised Edition. Stata Press, College Station, TX.
Craft, R.K., 2003. Using spreadsheets to conduct Monte Carlo experiments for teaching introductory econometrics. Southern
Economic Journal 69 (3) 716735.
Day, E., 1987. A note on simulation models in the economics classroom. Journal of Economic Education 18 (4) 351356.
Forbes, C., Evans, M., Hastings, N., Peacok, B., 2011. Statistical Distributions, fourth edition. John Wiley & Sons, Hoboken, NJ
pp. 99.
Gujarati, D.N., Porter, D.C., 2009. Basic Econometrics, fth edition. McGraw-Hill/Irwin, New York.
Hill, R.C., Grifths, W.E., Lim, G.C., 2011. Principles of Econometrics, fourth edition. John Wiley & Sons, New York.
Judge, G., 1999. Simple Monte Carlo studies on a spreadsheet. Computers in Higher Education Economics Review (online) 13 (2) ,
Available at:In: http://www.economicsnetwork.ac.uk/cheer/ch13_2/ch13_2p12.htm.
Kennedy, P.E., 1998a. Using Monte Carlo studies for teaching econometrics. In: Becker, W.E., Watts, M. (Eds.), Teaching
Economics to Undergraduates: Alternatives to Chalk and Talk. Edward Elgar, Cheltenham and Northampton, pp. 141159.
Kennedy, P.E., 1998b. Teaching undergraduate econometrics: a suggestion for fundamental change. American Economic
Review, Papers and Proceedings 88 (2) 487491.
Kennedy, P.E., 2001. Bootstrapping Student understanding of what is going on in econometrics. Journal of Economic Education
32 (2) 110123.
Kennedy, P.E., 2008. A Guide to Econometrics, 6th edition. Blackwell Publishing, Malden, MA.
Kyd, C. 2011. An Excel tutorial: an introduction to Excels normal distribution functions. ExcelUSerFor Business Users of
Microsoft Excel. Excel User, Inc. Available at: http://www.exceluser.com/explore/statsnormal.htm.
Microsoft. 2011. Description of the RAND Function in Excel. Microsoft Support. Microsoft Corporation. Available at: http://
support.microsoft.com/kb/828795.
Mittelhammer, R.C., Judge, G.G., Miller, D.J., 2000. Econometric Foundations. Cambridge University Press, New York.
Murray, M.P., 1999. Econometrics lectures in a computer classroom. Journal of Economic Education 30 (3) 308321.
Murray, M.P., 2006. Econometrics: A Modern Introduction. Pearson Education, Boston.
Ragsdale, C.T., 2008. Managerial Decision Modeling. South-Western Cengage Learning, Mason.
Weisstein, E.W. 2011. BoxMuller Transformation. MathworldA Wolfram Web Resource. Wolfram Research, Inc. Available at:
http://mathworld.wolfram.com/Box-MullerTransformation.html.
Wikipedia. 2011. Inverse Transform Sampling. WikipediaThe Free Encyclopedia. Wikimedia Foundation, Inc. Available at:
http://en.wikipedia.org/wiki/Inverse_transform_sampling.
Wikipedia. 2011. Random Number Generation. WikipediaThe Free Encyclopedia. Wikimedia Foundation, Inc. Available at:
http://en.wikipedia.org/wiki/Random_number_generation.

You might also like