M 2008 Meyer Folleto Statistics and Data Analysis in Geology

Statistics and Data Analysis in Geology
6. Normal Distribution probability plots central limits theorem
Dr. Franz J Meyer Earth and Planetary Remote Sensing, University of Alaska Fairbanks
Statistics & Data Analysis in Geology
Franz Meyer
Normal Distribution
Franz Meyer
Normal Distribution
An Enormously Important Distribution
The normal distribution is the most commonly used distribution in statistics

Partly this is due to the fact that the normal distribution is a reasonable description of many processes from industrial processes to intelligence test scores Also, under specific conditions one can assume that sampling distributions are normally distributed even if the samples are drawn from populations that are not normally distributed (this is discussed further when we talk about the Central Limits Theorem) The normal distribution is also referred to as bell curve and you see a few examples below
There are an infinite number of normal distributions that differ according to their mean () and variance (2)
Franz Meyer
3 3
Normal Distribution
Almost all natural processes follow the normal distribution The shape of a Normal distribution corresponds to a binomial distribution with p = 0.5 (compare to coin toss example of lecture 5) As N becomes large, the function becomes continuous and can be represented by the following equation 2 2 1 ( X ) 2 f (X ) = for < X < e 2 it also can be thought of as
X NX N! p q for p = 0.5 X !( N X )!
Approximation of histogram by normal distribution
Two normal distributions with different mean but same standard deviation
A normal distribution can be characterized by only two parameters, and

Statistics & Data Analysis in Geology Franz Meyer
Two normal distributions with same mean but different standard deviation
4 4
Normal Distribution
The Standard Normal Distribution or Z Distribution
It is often useful to standardize the variables so that populations can be compared. Standardization means that the mean, , = 0 and the standard deviation , = 1 Then the equation becomes:
f (X ) =
e 2
X 2 / 2
for < X <
and the curve is expressed in numbers of standard deviations from the mean
Franz Meyer
Normal Distribution
So you convert the normal distribution to the Z distribution by converting the original values to standard scores, which allows comparison among populations with different means and variances Thats interesting as all normal distributions share the following characteristics:
Symmetry Unimodality Continuous range from - to + A total area under the curve of 1 A common values for the mean, median, and mode
We can make some assumptions about how the data is distributed within any normal distribution
About 68% of the data fall within 1 About 95% of all data fall within 2 About 99.5% of all data fall within 3
Franz Meyer
6 6
Normal Distribution
Standardization of normal random variables
Franz Meyer
7 7
Normal Distribution
For any sample, the way to standardize the data is called Z-transformation. For every point we calculate a Z-score, which is really a measure of how many standard deviations a point is from the mean.
Xi X depending on if you are dealing with a sample S or population. Z scores can be positive or negative. Zi = or Z i =
Xi
Franz Meyer
Normal Distribution
For any sample, the way to standardize the data is called Z-transformation. For every point we calculate a Z-score, which is really a measure of how many standard deviations a point is from the mean.
Xi X depending on if you are dealing with a sample S or population. Z scores can be positive or negative. Zi = or Z i =
A shell specimen with a value of 12 mm (X = 12) is drawn from a population with = 10, = 2. What is that samples Z score? Z = 12 10 2 = 2 2 = 1 or the sample is one standard deviation longer than the mean What if that same sample is drawn from a population with = 10, = 1 (Same mean different variance)?
Xi
Example:
Z = (12 10 ) 1 = 2 1 = 2
In absolute terms the specimen is the same distance from the mean, however relative to the population as a whole, it is further away (more anomalous).
Franz Meyer
Normal Distribution
Example cont.:
What if a different specimen (X = 14) is drawn from the population in example 1 with = 10, = 2?
Z = (14 10 ) 2 = 4 2 = 2
So this sample is in the same position relative to the population as that from example 2.
Z score 4
10
Franz Meyer
12
14
16
mm
10
Normal Distribution
For each normal distribution, the area under the curve is equal to 1. That is, the total probability is equal to 1 (as it was with the binomial distribution). Mathematically we can express this as:
+
f ( X )dX = 1
+
For Z-transformed data this is:
f ( X )dx =
1 2
( X ) 2 dx = 1
Franz Meyer
11
Normal Distribution
Similarly, we can calculate the probability of a sample as being less than or equal to some preset value Z as
1 2
( X ) 2 dx
A different way to represent the normal distribution is by Cumulative Probability: They are plots of the area under the curve versus X. They can be made for any distribution. These types of plots are called OGIVE PLOTS, and I will come back to them later.
12
Normal Distribution
For the normal distribution, it is a pain in the neck to calculate this integral for every problem that we are going to do, so tables have been constructed.
Franz Meyer
13 13
Normal Distribution
Franz Meyer
14
Normal Distribution
The numbers in the table below are answers to the question: What is the Z value corresponding to a particular area under the curve?
Franz Meyer
15
Normal Distribution
Example of Cumulated Probability
Grades of chip samples from a body of ore have a normal distribution with a mean of 12% () and a standard deviation of 1.6 % ().
(curve to the right helps to visualize the distribution)
Problem 1: Find the probability of a specimen of 15% or less

Calculate Z score
(15-12)/1.6 = +1.88
The chart on slide 13 gives cumulative probability from very small (minus infinity) to the value:
+1.88 = 0.97 (we have to interpolate between +1.8 and +1.9)
Make a sketch to see if this makes sense So the probability of finding a sample with less than 15% ore is 97%
Franz Meyer
16
Normal Distribution
Problem 2: What is the probability of finding ore greater than 14%?

Z = (14-12)/1.6 = +1.25 the probability associated with this Z score is 0.895. This is the probability of 14% or less. The probability of 14% or more is 1 0.895 or 0.105 So the probability of finding a sample more than 14% ore is 10.5%
Franz Meyer
17
Normal Distribution
Problem 3: What is the probability of finding ore grade of less than 8%?
Z = (8-12)/1.6 = -2.5 the probability associated with this Z score is 0.0062 So the probability of finding a sample less than 8% ore is 0.62%, not very likely
Franz Meyer
18
Normal Distribution
Problem 4: What is the probability of a sample being between 8% and 15%?

Calculate the Z scores for each value:
Z8 = (8-12)/1.6 = -2.5 --> 0.62% Z15= (15-12)/1.6 = 1.88 --> 97%
Subtract the smaller from the larger: 97 0.62 = 96.38%, so about 96% or all samples fall in that range.
Franz Meyer
19
Normal Distribution
area under the curve

1 2 1.96 3 0.8413 - 0.1587 = 0.6826 0.9773 - 0.0228 = 0.9545 0.9987- 0.00140 = 0.9973 68% 95.5% = 95% 99%
Franz Meyer
20
Normal Distribution The Central Limits Theorem
Franz Meyer
21
Normal Distribution
The Central Limits Theorem
If you draw a number of samples from a normal distribution population, we find that the sample means will form a normal distribution
BUT we don't always know the distribution of the population
Central Limits Theorem:

CLT states that independent of their original statistical distribution, the re-averaged sum of a sufficiently large number of identically distributed independent random variables will be approximately normally distributed. In other words, if sufficiently large sets of random samples are taken from any population, and the means are calculated for those samples, then these sample means will tend to be normally distributed.
Franz Meyer
22
Normal Distribution
Central Limits Theorem:

Again in other words: if we take all possible samples of size n from any population with a mean of and a standard deviation of , the distribution of sample means will have: mean of
X =
also written as
XX =
n
Standard deviation of means,
sX =
This is also called the standard error of the mean, se will be normally distribution when the parent population is normal will approach a normal distribution as N approaches infinity regardless of the distribution of the parent population.
Franz Meyer
23
Normal Distribution
25
Franz Meyer
24
Normal Distribution
Some animated examples:

http://www.statisticalengineering.com/central_limit_theorem.htm
Uniform distribution:
Log-normal distribution:
Parabolic distributions:
Franz Meyer
25
Normal Distribution
This means, if we average enough we can always reduce data of unknown statistics to data of known properties. Practically, we can use our Z-statistic
Zi =
Xi
useful when we want to infer something from single values taken from a normal population (Xi drawn from population) and adapt it for CLT for a sample of size N drawn from a population with known mean and standard deviation. X Z= (2)
(1)
1/ n
X se
You can see that equation (1) is the same as (2) if n = 1 (a single sample) So both equations are just more specific forms of the general equation
Z=
se is the standard deviation of means =

1/ n
26
Franz Meyer
Normal Distribution
For the example from earlier:

A sample with a value of 14% (X = 14) is drawn from a population with = 12, = 1.6. What is the probability of finding a single sample equal to or greater than 14% ore? First calculate that samples Z score.
Z=
14 12 2 = = 1.25 1.6 1.6
Or the probability of finding one such sample or greater was about 10.5%.
Franz Meyer
27
Normal Distribution
For the example from earlier:

Now, what if we selected 4 samples (n = 4) and the mean of those specimens was 14%?
Z=
14 12 2 2 = = = 2.5 1.6 1 / 4 1.6 (1 / 2) 0.8
And the probability of finding four such specimens is less, in fact it is only 0.62%!!!
Franz Meyer
28

M 2008 Meyer Folleto Statistics and Data Analysis in Geology

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

M 2008 Meyer Folleto Statistics and Data Analysis in Geology

Uploaded by

Copyright:

Available Formats

Statistics and Data Analysis in Geology

6. Normal Distribution probability plots central limits theorem

Statistics & Data Analysis in Geology

Statistics & Data Analysis in Geology

The normal distribution is the most commonly used distribution in statistics

Statistics & Data Analysis in Geology

Approximation of histogram by normal distribution

A normal distribution can be characterized by only two parameters, and

for < X <

Statistics & Data Analysis in Geology

Statistics & Data Analysis in Geology

Standardization of normal random variables

Statistics & Data Analysis in Geology

Statistics & Data Analysis in Geology

Statistics & Data Analysis in Geology

For Z-transformed data this is:

Statistics & Data Analysis in Geology

Statistics & Data Analysis in Geology

Statistics & Data Analysis in Geology

Statistics & Data Analysis in Geology

Problem 1: Find the probability of a specimen of 15% or less

Statistics & Data Analysis in Geology

Problem 2: What is the probability of finding ore greater than 14%?

Statistics & Data Analysis in Geology

Statistics & Data Analysis in Geology

Problem 4: What is the probability of a sample being between 8% and 15%?

Statistics & Data Analysis in Geology

area under the curve

Statistics & Data Analysis in Geology

Normal Distribution The Central Limits Theorem

Statistics & Data Analysis in Geology

Central Limits Theorem:

Statistics & Data Analysis in Geology

Central Limits Theorem:

Standard deviation of means,

Statistics & Data Analysis in Geology

Statistics & Data Analysis in Geology

Some animated examples:

Statistics & Data Analysis in Geology

se is the standard deviation of means =

For the example from earlier:

14 12 2 = = 1.25 1.6 1.6

Statistics & Data Analysis in Geology

For the example from earlier:

14 12 2 2 = = = 2.5 1.6 1 / 4 1.6 (1 / 2) 0.8

Statistics & Data Analysis in Geology

You might also like