Professional Documents
Culture Documents
probability to solve problems. The term Monte Carlo Method was coined by S. Ulam and
Nicholas Metropolis in reference to games of chance, a popular attraction in Monte
Carlo, Monaco (Hoffman, 1998; Metropolis and Ulam, 1949).
Computer simulation has to do with using computer models to imitate real life or
make predictions. When you create a model with a spreadsheet like Excel, you have a
certain number of input parameters and a few equations that use those inputs to give you
a set of outputs (or response variables). This type of model is usually deterministic,
meaning that you get the same results no matter how many times you re-calculate. [
Example 1: A Deterministic Model for Compound Interest ]
most closely matches data we already have, or best represents our current state of
knowledge. The data generated from the simulation can be represented as probability
distributions (or histograms) or converted to error bars, reliability predictions, tolerance
zones, and confidence intervals. (See Figure 2).
Uncertainty Propagation
Step 2:
Step 3:
Step 4:
Step 5:
Analyze the results using histograms, summary statistics, confidence intervals, etc.
breaking Income and Expenses down into more fundamental and measurable quantities,
we found that the number of leads (L) affected both income and expenses. Therefore,
income and expenses are not independent. We could probably break the problem down
even further, but we won't in this example. We'll assume that L, R, P, H, and C are all
independent.
Note: In my opinion, it is easier to decompose a model into independent variables
(when possible) than to try to mess with correlation between random inputs.
Let's say we want to run n=5000 evaluations of our model. This is a fairly low number
when it comes to Monte Carlo simulation, and you will see why once we begin to analyze
the results.
A very convenient way to organize the data in Excel is to make a column for each
variable as shown in the screen capture below.
Although we still need to analyze the data, we have essentially completed a Monte Carlo
simulation. Because we have used the volatile RAND() formula, to re-run the simulation
all we have to do is recalculate the worksheet (F9 is the shortcut).
This may seem like a strange way to implement Monte Carlo simulation, but think about
what is going on behind the scenes every time the Worksheet recalculates: (1) 5000 sets
of random inputs are generated (2) The model is evaluated for all 5000 sets. Excel is
handling all of the iteration.
Until I get around to providing another example that uses macros, let me just say that if
your model is not simple enough to include in a single formula you can create your own
custom Excel function (see my article on user-defined functions), or you can create a
macro to iteratively evaluate your model and dump the data into a worksheet in a similar
format to this example.
In practice, it is usually more convenient to buy an add-on for Excel than to do a Monte
Carlo analysis from scratch every time. But not everyone has the money to spend, and
hopefully the skills you will learn from this example will aid in future data analysis and
modeling.
The histogram tells a good story, but in many cases, we want to estimate the probability
of being below or above some value, or between a set of specification limits.
[ Running MC ]
[ Summary Statistics ]
This is probably the easiest method, but you have to re-run the tool each to you do a new
simulation. AND, you still need to create an array of bins (which will be discussed
below).
Method 2: Using the FREQUENCY function in Excel.
This is the method used in the spreadsheet for the sales forecast example. One of the
reasons I like this method is that you can make the histogram dynamic, meaning that
every time you re-run the MC simulation, the chart will automatically update. This is how
you do it:
Step 1: Create an array of bins
The figure below shows how to easily create a dynamic array of bins. This is a basic
technique for creating an array of N evenly spaced numbers.
To create the dynamic array, enter the following formulas:
B6 = $B$2
B7 = B6+($B$3-$B$2)/5
Then, copy cell B7 down to B11
The next figure is a screen shot from the example Monte Carlo simulation. I'm not going
to explain the FREQUENCY function in detail since you can look it up in the Excel's
help file. But, one thing to remember is that it is an array function, and after you enter the
formula, you will need to press Ctrl+Shift+Enter. Note that the simulation results (Profit)
are in column G and there are 5000 data points ( Points: J5=COUNT(G:G) ).
Figure 4: Example Histogram Created Using a Scatter Plot and Error Bars.
[ Running MC ]
[ Summary Statistics ]
Summary Statistics
Sales Forecast Example - Part IV of V
In Part III of this Monte Carlo Simulation example, we plotted the results as a histogram
in order to visualize the uncertainty in profit. In order to provide a concise summary of
the results, it is customary to report the mean, median, standard deviation, standard
error, and a few other summary statistics to describe the resulting distribution. The
screenshot below shows these statistics calculated using simple Excel formulas.
The mean is also known as the "First Moment" of the distribution. In relation to physics, if the probability distribution represented mass, then the mean would be the balancing
point, or the center of mass.
If you sort the results from lowest to highest, the median is the "middle" value or the
50th Percentile, meaning that 50% of the results from the simulation are less than the
median. If there is an even number of data points, then the median is the average of the
middle two points.
Extreme values can have a large impact on the mean, but the median only depends upon
the middle point(s). This property makes the median useful for describing the center of
skewed distributions such as the Lognormal distribution. If the distribution is symmetric
(like the Normal distribution), then the mean and median will be identical.
Skewness
Skewness describes the asymmetry of the distribution relative to the mean. A positive
skewness indicates that the distribution has a longer right-hand tail (skewed towards more
positive values). A negative skewness indicates that the distribution is skewed to the left.
Kurtosis
Kurtosis describes the peakedness or flatness of a distribution relative to the Normal
distribution. Positive kurtosis indicates a more peaked distribution. Negative kurtosis
indicates a flatter distribution.
Standard Error
If you repeated the Monte Carlo simulation and recorded the sample mean each time, the
distribution of the sample mean would end up following a Normal distribution (based
upon the Central Limit Theorem). The standard error is a good estimate of the
standard deviation of this distribution, assuming that the sample is sufficiently large (n
>= 30).
The standard error is calculated using the following formula:
In Excel: =STDEV(G:G)/SQRT(COUNT(G:G))
To get a 90% or 99% confidence interval, you would change the value 1.96 to 1.645 or
2.575, respectively. The value 1.96 represents the 97.5 percentile of the standard normal
distribution. (You may often see this number rounded to 2). To calculate a different
percentile of the standard normal distribution, you can use the NORMSINV() function in
Excel.
Commentary
Keep in mind that confidence intervals make no sense (except to statisticians), but they
tend to make people feel good. The correct interpretation: "We can be 95% confident
that the true mean of the population falls somewhere between the lower and upper
limits." What population? The population we artificially created! Lest we forget, the
results depend completely on the assumptions that we made in creating the model and
choosing input distributions. "Garbage in ... Garbage out ..." So, I generally just stick to
using the standard error as a measure of the uncertainty in the mean. Since I tend to use
Monte Carlo simulation for prediction purposes, I often don't even worry about the
mean. I am more concerned with the overall uncertainty (i.e. the spread).