You are on page 1of 28

Statistics and Data Analysis in Geology

6. Normal Distribution probability plots central limits theorem

Dr. Franz J Meyer Earth and Planetary Remote Sensing, University of Alaska Fairbanks

Statistics & Data Analysis in Geology

Franz Meyer

Normal Distribution

Statistics & Data Analysis in Geology

Franz Meyer

Normal Distribution
An Enormously Important Distribution

The normal distribution is the most commonly used distribution in statistics


Partly this is due to the fact that the normal distribution is a reasonable description of many processes from industrial processes to intelligence test scores Also, under specific conditions one can assume that sampling distributions are normally distributed even if the samples are drawn from populations that are not normally distributed (this is discussed further when we talk about the Central Limits Theorem) The normal distribution is also referred to as bell curve and you see a few examples below

There are an infinite number of normal distributions that differ according to their mean () and variance (2)

Statistics & Data Analysis in Geology

Franz Meyer

3 3

Normal Distribution
Almost all natural processes follow the normal distribution The shape of a Normal distribution corresponds to a binomial distribution with p = 0.5 (compare to coin toss example of lecture 5) As N becomes large, the function becomes continuous and can be represented by the following equation 2 2 1 ( X ) 2 f (X ) = for < X < e 2 it also can be thought of as
X NX N! p q for p = 0.5 X !( N X )!

Approximation of histogram by normal distribution

Two normal distributions with different mean but same standard deviation

A normal distribution can be characterized by only two parameters, and


Statistics & Data Analysis in Geology Franz Meyer

Two normal distributions with same mean but different standard deviation

4 4

Normal Distribution
The Standard Normal Distribution or Z Distribution

It is often useful to standardize the variables so that populations can be compared. Standardization means that the mean, , = 0 and the standard deviation , = 1 Then the equation becomes:

f (X ) =

e 2

X 2 / 2

for < X <

and the curve is expressed in numbers of standard deviations from the mean

Statistics & Data Analysis in Geology

Franz Meyer

Normal Distribution
The Standard Normal Distribution or Z Distribution

So you convert the normal distribution to the Z distribution by converting the original values to standard scores, which allows comparison among populations with different means and variances Thats interesting as all normal distributions share the following characteristics:
Symmetry Unimodality Continuous range from - to + A total area under the curve of 1 A common values for the mean, median, and mode

We can make some assumptions about how the data is distributed within any normal distribution
About 68% of the data fall within 1 About 95% of all data fall within 2 About 99.5% of all data fall within 3

Statistics & Data Analysis in Geology

Franz Meyer

6 6

Normal Distribution
The Standard Normal Distribution or Z Distribution

Standardization of normal random variables

Statistics & Data Analysis in Geology

Franz Meyer

7 7

Normal Distribution
The Standard Normal Distribution or Z Distribution

For any sample, the way to standardize the data is called Z-transformation. For every point we calculate a Z-score, which is really a measure of how many standard deviations a point is from the mean.

Xi X depending on if you are dealing with a sample S or population. Z scores can be positive or negative. Zi = or Z i =

Xi

Statistics & Data Analysis in Geology

Franz Meyer

Normal Distribution
The Standard Normal Distribution or Z Distribution

For any sample, the way to standardize the data is called Z-transformation. For every point we calculate a Z-score, which is really a measure of how many standard deviations a point is from the mean.

Xi X depending on if you are dealing with a sample S or population. Z scores can be positive or negative. Zi = or Z i =
A shell specimen with a value of 12 mm (X = 12) is drawn from a population with = 10, = 2. What is that samples Z score? Z = 12 10 2 = 2 2 = 1 or the sample is one standard deviation longer than the mean What if that same sample is drawn from a population with = 10, = 1 (Same mean different variance)?

Xi

Example:

Z = (12 10 ) 1 = 2 1 = 2

In absolute terms the specimen is the same distance from the mean, however relative to the population as a whole, it is further away (more anomalous).

Statistics & Data Analysis in Geology

Franz Meyer

Normal Distribution
The Standard Normal Distribution or Z Distribution

Example cont.:
What if a different specimen (X = 14) is drawn from the population in example 1 with = 10, = 2?

Z = (14 10 ) 2 = 4 2 = 2

So this sample is in the same position relative to the population as that from example 2.

Z score 4
Statistics & Data Analysis in Geology

10
Franz Meyer

12

14

16

mm
10

Normal Distribution
The Standard Normal Distribution or Z Distribution

For each normal distribution, the area under the curve is equal to 1. That is, the total probability is equal to 1 (as it was with the binomial distribution). Mathematically we can express this as:
+

f ( X )dX = 1
+

For Z-transformed data this is:

f ( X )dx =

1 2

( X ) 2 dx = 1

Statistics & Data Analysis in Geology

Franz Meyer

11

Normal Distribution
The Standard Normal Distribution or Z Distribution

Similarly, we can calculate the probability of a sample as being less than or equal to some preset value Z as

1 2

( X ) 2 dx

A different way to represent the normal distribution is by Cumulative Probability: They are plots of the area under the curve versus X. They can be made for any distribution. These types of plots are called OGIVE PLOTS, and I will come back to them later.

12

Normal Distribution

For the normal distribution, it is a pain in the neck to calculate this integral for every problem that we are going to do, so tables have been constructed.

Statistics & Data Analysis in Geology

Franz Meyer

13 13

Normal Distribution

Statistics & Data Analysis in Geology

Franz Meyer

14

Normal Distribution

The numbers in the table below are answers to the question: What is the Z value corresponding to a particular area under the curve?

Statistics & Data Analysis in Geology

Franz Meyer

15

Normal Distribution
Example of Cumulated Probability

Grades of chip samples from a body of ore have a normal distribution with a mean of 12% () and a standard deviation of 1.6 % ().
(curve to the right helps to visualize the distribution)

Problem 1: Find the probability of a specimen of 15% or less


Calculate Z score
(15-12)/1.6 = +1.88

The chart on slide 13 gives cumulative probability from very small (minus infinity) to the value:
+1.88 = 0.97 (we have to interpolate between +1.8 and +1.9)

Make a sketch to see if this makes sense So the probability of finding a sample with less than 15% ore is 97%

Statistics & Data Analysis in Geology

Franz Meyer

16

Normal Distribution
Example of Cumulated Probability

Problem 2: What is the probability of finding ore greater than 14%?


Z = (14-12)/1.6 = +1.25 the probability associated with this Z score is 0.895. This is the probability of 14% or less. The probability of 14% or more is 1 0.895 or 0.105 So the probability of finding a sample more than 14% ore is 10.5%

Statistics & Data Analysis in Geology

Franz Meyer

17

Normal Distribution
Example of Cumulated Probability

Problem 3: What is the probability of finding ore grade of less than 8%?
Z = (8-12)/1.6 = -2.5 the probability associated with this Z score is 0.0062 So the probability of finding a sample less than 8% ore is 0.62%, not very likely

Statistics & Data Analysis in Geology

Franz Meyer

18

Normal Distribution
Example of Cumulated Probability

Problem 4: What is the probability of a sample being between 8% and 15%?


Calculate the Z scores for each value:
Z8 = (8-12)/1.6 = -2.5 --> 0.62% Z15= (15-12)/1.6 = 1.88 --> 97%

Subtract the smaller from the larger: 97 0.62 = 96.38%, so about 96% or all samples fall in that range.

Statistics & Data Analysis in Geology

Franz Meyer

19

Normal Distribution
Example of Cumulated Probability

area under the curve


1 2 1.96 3 0.8413 - 0.1587 = 0.6826 0.9773 - 0.0228 = 0.9545 0.9987- 0.00140 = 0.9973 68% 95.5% = 95% 99%

Statistics & Data Analysis in Geology

Franz Meyer

20

Normal Distribution The Central Limits Theorem

Statistics & Data Analysis in Geology

Franz Meyer

21

Normal Distribution
The Central Limits Theorem

If you draw a number of samples from a normal distribution population, we find that the sample means will form a normal distribution
BUT we don't always know the distribution of the population

Central Limits Theorem:


CLT states that independent of their original statistical distribution, the re-averaged sum of a sufficiently large number of identically distributed independent random variables will be approximately normally distributed. In other words, if sufficiently large sets of random samples are taken from any population, and the means are calculated for those samples, then these sample means will tend to be normally distributed.

Statistics & Data Analysis in Geology

Franz Meyer

22

Normal Distribution
The Central Limits Theorem

Central Limits Theorem:


Again in other words: if we take all possible samples of size n from any population with a mean of and a standard deviation of , the distribution of sample means will have: mean of

X =

also written as

XX =
n

Standard deviation of means,

sX =

This is also called the standard error of the mean, se will be normally distribution when the parent population is normal will approach a normal distribution as N approaches infinity regardless of the distribution of the parent population.

Statistics & Data Analysis in Geology

Franz Meyer

23

Normal Distribution
The Central Limits Theorem

25

Statistics & Data Analysis in Geology

Franz Meyer

24

Normal Distribution
The Central Limits Theorem

Some animated examples:


http://www.statisticalengineering.com/central_limit_theorem.htm

Uniform distribution:

Log-normal distribution:

Parabolic distributions:

Statistics & Data Analysis in Geology

Franz Meyer

25

Normal Distribution
The Central Limits Theorem

This means, if we average enough we can always reduce data of unknown statistics to data of known properties. Practically, we can use our Z-statistic

Zi =

Xi

useful when we want to infer something from single values taken from a normal population (Xi drawn from population) and adapt it for CLT for a sample of size N drawn from a population with known mean and standard deviation. X Z= (2)

(1)

1/ n
X se

You can see that equation (1) is the same as (2) if n = 1 (a single sample) So both equations are just more specific forms of the general equation

Z=

se is the standard deviation of means =


Statistics & Data Analysis in Geology

1/ n
26

Franz Meyer

Normal Distribution
The Central Limits Theorem

For the example from earlier:


A sample with a value of 14% (X = 14) is drawn from a population with = 12, = 1.6. What is the probability of finding a single sample equal to or greater than 14% ore? First calculate that samples Z score.

Z=

14 12 2 = = 1.25 1.6 1.6

Or the probability of finding one such sample or greater was about 10.5%.

Statistics & Data Analysis in Geology

Franz Meyer

27

Normal Distribution
The Central Limits Theorem

For the example from earlier:


Now, what if we selected 4 samples (n = 4) and the mean of those specimens was 14%?

Z=

14 12 2 2 = = = 2.5 1.6 1 / 4 1.6 (1 / 2) 0.8

And the probability of finding four such specimens is less, in fact it is only 0.62%!!!

Statistics & Data Analysis in Geology

Franz Meyer

28

You might also like