You are on page 1of 71

STAT7055

LECTURE 5
Sampling Distributions

STAT7055 - Lecture 5

1 / 69

Introduction

Statistical Inference
I

Statistical inference: Determining the behaviour


of a population by studying a sample from that
population.
I

I
I

Population parameters describe populations, e.g.,


population mean and population variance 2 .
Population parameters usually unknown.
To find out more about them, we take a random sample
of data.

Calculate statistics from sample, e.g., sample mean X


2
and sample variance s .
Use X and s2 to estimate and 2 .

STAT7055 - Lecture 5

2 / 69

Introduction

Statistical Inference
I

So we make inferences about population parameters


using sample statistics.
But suppose we take a new sample - we would then
and s2 .
get a different X
There is variability associated with these sample
statistics.
It would be nice to know how they differ from
sample to sample.

STAT7055 - Lecture 5

3 / 69

Introduction

Sampling Distribution
I

Sampling distribution: The distribution of a


sample statistic that would arise if we were to
repeatedly take samples from a population.
Knowing the sampling distribution of a sample
statistic can tell us how accurately we are
estimating a population parameter.
We will focus on the sampling distribution of the
sample mean and the sampling distribution of the
sample proportion.

STAT7055 - Lecture 5

4 / 69

Sampling Distribution of Sample Mean

Example 1

Example 1
I

Suppose that we roll a fair four-sided die and let X


be the number that comes up.
The population can be thought of as an infinite
number of rolls of the die.
Lets repeatedly draw samples of size 2, i.e., roll the
die twice.
Find the sampling distribution of the sample mean,

X.

STAT7055 - Lecture 5

5 / 69

Sampling Distribution of Sample Mean

Example 1

Probability Distribution of X
x

1 2 3 4

p(x)
I

1
4

1
4

1
4

1
4

Remember that probability distributions represent


populations.
We can calculate the population mean and
population variance from the probability distribution
in the usual way.
STAT7055 - Lecture 5

6 / 69

Sampling Distribution of Sample Mean

Example 1

Mean and Variance of X


x

1 2 3 4

p(x)
= E(X) =

1
4

1
4

1
4

1
4

(x p(x))

all x

5
2
2
= V (X) = E(X 2 ) (E(X))2
 2
15
5
5
=

=
2
2
4
=

STAT7055 - Lecture 5

7 / 69

Sampling Distribution of Sample Mean

Example 1

Sampling Distribution of X
I

Lets calculate the sample mean for all possible


samples of size 2:

1
1
First 2
roll 3
4
I

=1
= 1.5
=2
= 2.5

Second roll
2
3
=2
= 1.5 X
= 2.5
=2
X
=3
= 2.5 X
= 3.5
=3
X

= 2.5
=3
= 3.5
=4

Note that each possible sample of size 2 has


1
probability 16
of being drawn.
STAT7055 - Lecture 5

8 / 69

Sampling Distribution of Sample Mean

Example 1

Sampling Distribution of X
I

We can determine the sampling distribution of X


from the previous table:
x

1.5

2.5

3.5

p(
x)

1
16

2
16

3
16

4
16

3
16

2
16

1
16

We can calculate this sampling distribution exactly


since we can completely list all possible samples of
size 2.
STAT7055 - Lecture 5

9 / 69

Sampling Distribution of Sample Mean

Example 1

Mean of X
x

1.5

2.5

3.5

p(
x)

1
16

2
16

3
16

4
16

3
16

2
16

1
16

=
E(X)

(
x p(
x))

all x

5
2
= E(X) =
=

STAT7055 - Lecture 5

10 / 69

Sampling Distribution of Sample Mean

Example 1

Variance of X
x

1.5

2.5

3.5

p(
x)

1
16

2
16

3
16

4
16

3
16

2
16

1
16

2

2
E(X)
 2
55
5
=

8
2
5
=
8
V (X) 2
=
=
2
2

=E X

V (X)

STAT7055 - Lecture 5

11 / 69

Sampling Distribution of Sample Mean

Example 2

Example 2
I

I
I

What happens when we cant list all possible


samples of a certain size?
The best we can do is to repeatedly draw samples,
calculate sample statistics and approximate the
sampling distribution.
Suppose that X U (0, 100).
X is continuous, so we clearly cannot list all
possible samples (of any size!).
Lets run a little experiment...

STAT7055 - Lecture 5

12 / 69

Sampling Distribution of Sample Mean

Example 2

Experiment

1.
2.
3.
4.
5.

Generate some data.


Display this data in a histogram.
Analyse some sample statistics.
Repeat with different sample sizes (n).
See what happens!

STAT7055 - Lecture 5

13 / 69

Sampling Distribution of Sample Mean

Example 2

n = 20

STAT7055 - Lecture 5

14 / 69

Sampling Distribution of Sample Mean

Example 2

n = 20

STAT7055 - Lecture 5

15 / 69

Sampling Distribution of Sample Mean

Example 2

Histogram and Some Statistics

n
20

X 53.23572497
26.900781
s
2
s 723.6520183

STAT7055 - Lecture 5

16 / 69

Sampling Distribution of Sample Mean

Example 2

Histogram and Some Statistics


I

Why does the histogram not look more like a


uniform distribution?

STAT7055 - Lecture 5

17 / 69

Sampling Distribution of Sample Mean

Example 2

Histogram and Some Statistics


I

Why is the sample mean not equal (or close) to:


E(X) =

a + b 0 + 100
=
= 50?
2
2

Why is the sample variance not equal (or close) to:


(100 0)2
(b a)2
=
= 833.33?
V (X) =
12
12

STAT7055 - Lecture 5

18 / 69

Sampling Distribution of Sample Mean

Example 2

n = 1000

STAT7055 - Lecture 5

19 / 69

Sampling Distribution of Sample Mean

Example 2

Histogram and Some Statistics

n
1000

X 49.3829
s 29.35405
s2 861.66

STAT7055 - Lecture 5

20 / 69

Sampling Distribution of Sample Mean

Example 2

n = 32000

STAT7055 - Lecture 5

21 / 69

Sampling Distribution of Sample Mean

Example 2

Histogram and Some Statistics

n
32000

X 49.97777
s 28.86928
s2 833.4354

STAT7055 - Lecture 5

22 / 69

Sampling Distribution of Sample Mean

Example 2

Summary

As n increases, the distribution of the observations


in the sample will look more like the distribution of
the population from which the sample was drawn.
As n increases, the sample statistics (such as the
sample mean and sample variance) will converge
towards their population values.

STAT7055 - Lecture 5

23 / 69

Sampling Distribution of Sample Mean

Example 2

Another Experiment

Lets now re-do the previous experiment, but with


several samples at the same time.
Well examine the sample means of the samples,
and analyse their characteristics.
See what happens when we change both the sample
size (n) and the number of samples generated (m).

STAT7055 - Lecture 5

24 / 69

Sampling Distribution of Sample Mean

Example 2

Another Experiment

STAT7055 - Lecture 5

25 / 69

Sampling Distribution of Sample Mean

Example 2

n = 20 and m = 20

STAT7055 - Lecture 5

26 / 69

Sampling Distribution of Sample Mean

Example 2

n = 20 and m = 20

STAT7055 - Lecture 5

27 / 69

Sampling Distribution of Sample Mean

Example 2

Histogram and Some Statistics of the


Sample Means
n
20
m
20
X 52.6569567
X
sX 7.19465383
s2X 51.7630437

STAT7055 - Lecture 5

28 / 69

Sampling Distribution of Sample Mean

Example 2

n = 20 and m = 256

I
I

Distribution looks funny...


Lets re-try the exercise, but increase the number of
samples generated from m = 20 to m = 256.
We will keep the number of observations in each
sample to n = 20, as before.

STAT7055 - Lecture 5

29 / 69

Sampling Distribution of Sample Mean

Example 2

n = 20 and m = 256

STAT7055 - Lecture 5

30 / 69

Sampling Distribution of Sample Mean

Example 2

n = 20 and m = 256

STAT7055 - Lecture 5

31 / 69

Sampling Distribution of Sample Mean

Example 2

Histogram and Some Statistics of the


Sample Means
n
20
m
256
X 50.39852
X
sX 6.387126
s2X 40.79538

STAT7055 - Lecture 5

32 / 69

Sampling Distribution of Sample Mean

Example 2

Summary
I

The sampling distribution of the sample means


looks like a normal distribution!
This phenomenon, called the Central Limit
Theorem, will occur regardless of the underlying
population distribution (in this case, a uniform
distribution).
Also, the mean of the sampling distribution of the
sample means is equal to the population mean of X.

STAT7055 - Lecture 5

33 / 69

Sampling Distribution of Sample Mean

Example 2

n = 100 and m = 256

Lets see what happens when we increase the


number of observations in each sample from n = 20
to n = 100.
Well leave the number of samples to be m = 256,
as before.

STAT7055 - Lecture 5

34 / 69

Sampling Distribution of Sample Mean

Example 2

n = 100 and m = 256

STAT7055 - Lecture 5

35 / 69

Sampling Distribution of Sample Mean

Example 2

Histogram and Some Statistics of the


Sample Means

n
100
m
256

XX 49.96038
sX 3.000858
s2X 9.005149

STAT7055 - Lecture 5

36 / 69

Sampling Distribution of Sample Mean

Example 2

Comparison of n = 20 to n = 100
I

Compare the results from when n = 20 and


m = 256 to when n = 100 and m = 256.
n
20
m
256

XX 50.39852
sX 6.387126
s2X 40.79538

n
100
m
256

XX 49.96038
sX 3.000858
s2X 9.005149

STAT7055 - Lecture 5

37 / 69

Sampling Distribution of Sample Mean

Example 2

Comparison of n = 20 to n = 100
n = 100

n = 20

STAT7055 - Lecture 5

38 / 69

Sampling Distribution of Sample Mean

Example 2

Summary
I

The variance and standard deviation of the sample


means reduce significantly when the sample size n
increases!
This is because the variance of the sampling
distribution of the sample means is equal to:
2
X

2
=
n

with the standard deviation equal to:

X =
n
STAT7055 - Lecture 5

39 / 69

Sampling Distribution of Sample Mean

Example 2

Standard Error
I

Note that when we talk about the standard


deviation of a sample statistic, this is not the same
as the population standard deviation of X (our
data).
To make this distinction clear, the standard
deviation of the sampling distribution of a statistic
is called the standard error (SE) of the statistic.
And thats why we use the notation X , where the
subscript indicates that it is the standard deviation

of X.
STAT7055 - Lecture 5

40 / 69

Sampling Distribution of Sample Mean

Central Limit Theorem

Mean and Variance of Sample Mean


I

So the sample mean is given by:


= 1 (X1 + . . . + Xn )
X
n

Since random sample, each Xi is independent and


E(Xi ) = and V (Xi ) = 2 .
Therefore, we can use the laws of expected value
and
and variance to calculate X = E(X)
2

X
= V (X).

STAT7055 - Lecture 5

41 / 69

Sampling Distribution of Sample Mean

Central Limit Theorem

Mean and Variance of Sample Mean


=E
X = E(X)


1
(X1 + . . . + Xn )
n

1
E (X1 + . . . + Xn )
n
1
= (E(X1 ) + . . . + E(Xn ))
n
1
= ( + . . . + )
n
1
= n =
n
=

STAT7055 - Lecture 5

42 / 69

Sampling Distribution of Sample Mean

Central Limit Theorem

Mean and Variance of Sample Mean


2

X
= V (X) = V


1
(X1 + . . . + Xn )
n

1
V (X1 + . . . + Xn )
n2
1
= 2 (V (X1 ) + . . . + V (Xn ))
n

1
= 2 2 + . . . + 2
n
1
2
= 2 n 2 =
n
n
=

STAT7055 - Lecture 5

43 / 69

Sampling Distribution of Sample Mean

Central Limit Theorem

Central Limit Theorem


I

If X is a random variable (discrete or continuous)


with mean and variance 2 , then in general:


2

2
N X = , =
as n
X
X
n

Which means if we standardise:



X

Z N (0, 1) as n

STAT7055 - Lecture 5

44 / 69

Sampling Distribution of Sample Mean

Central Limit Theorem

How Large Should n Be?


I

Generally, it depends on the population distribution


of X.
I

If X has a normal distribution, then the sample mean


has a normal distribution for all sample sizes.
If X has a distribution that is close to normal, the
approximation is good for small sample sizes (e.g.,
n 20).
If X has a distribution that is far from normal, the
approximation requires larger sample sizes (e.g., n > 50).

STAT7055 - Lecture 5

45 / 69

Sampling Distribution of Sample Mean

Example 1

Example 1
I

Consider the marks of all students who took an


economics test. If marks are normally distributed,
with mean equal to 72 and standard deviation equal
to 9, find:
(a) The probability that any one student will have a mark
over 78.
(b) The probability that a sample of 10 students will have an
average mark over 78.

STAT7055 - Lecture 5

46 / 69

Sampling Distribution of Sample Mean

Example 1

Solution - Part (a)


I
I

Let X be the mark of a randomly selected student.



Then X N = 72, 2 = 92 .


X 78 72
P (X > 78) = P
>

9
= P (Z > 0.67)
= 1 P (Z < 0.67)
= 1 0.7486
= 0.2514
STAT7055 - Lecture 5

47 / 69

Sampling Distribution of Sample Mean

Example 1

Solution - Part (a)

STAT7055 - Lecture 5

47 / 69

Sampling Distribution of Sample Mean

Example 1

Solution - Part (b)


I

be the average mark of the sample of 10


Let X
students.


2
92
2

By CLT, X N X = = 72, X = n = 10 .
> 78) = P
P (X

78 72
X
>
9
X
10

= P (Z > 2.11)
= 1 P (Z < 2.11)
= 1 0.9826
= 0.0174
STAT7055 - Lecture 5

48 / 69

Sampling Distribution of Sample Mean

Example 1

Solution - Part (b)

STAT7055 - Lecture 5

48 / 69

Sampling Distribution of Sample Mean

Example 1

Example 1
I

Why does the probability drop from 25.14% to


1.74%?
I

Interpretation:
I

Standard deviation of sample means (or the standard


error ) is always smaller than standard deviation of single
observations.
Sample means have less variation than single
observations.

E.g., much more likely to find an individual who is


very smart, than a random sample of 10 students
who are all very smart.
STAT7055 - Lecture 5

49 / 69

Sampling Distribution of Sample Mean

Example 2

Example 2

Given a normal distribution with a mean of 50 and a


variance of 100, find the probability that the sample
mean of a sample of 25 observations will differ from
the population mean by less than 4 units.

STAT7055 - Lecture 5

50 / 69

Sampling Distribution of Sample Mean

Example 2

Solution
I

We only care about the distance between the


sample mean and population mean.
That is, we dont care if the sample mean is larger
than the population mean or smaller than the
population mean, as long as the difference between
them is less than 4 units.
| < 4).
So we want to find P (|X

STAT7055 - Lecture 5

51 / 69

Sampling Distribution of Sample Mean

Example 2

Solution

| < 4) = P (4 < X
< 4)
P (|X
!

4
4
X
<
<
=P

10
10

25

25

= P (2 < Z < 2)
= 0.9544

STAT7055 - Lecture 5

52 / 69

Sampling Distribution of Sample Mean

Example 3

Example 3
I

The number of accidents per week at a hazardous


intersection varies randomly with mean 2.2 and
standard deviation 1.4.
This distribution is discrete and so is certainly not
normal.
What is the approximate probability that there are
fewer than 100 accidents at the intersection in a
year?

STAT7055 - Lecture 5

53 / 69

Sampling Distribution of Sample Mean

Example 3

Solution
I

I
I

Let Xi be the number of accidents that occur in the


ith week of the year.
We know E(Xi ) = 2.2 and V (Xi ) = 1.42 for all i.
We want to find the probability of fewer than 100
accidents in the year, i.e.:
!
!
P52
52
X
100
i=1 Xi
<
P
Xi < 100 = P
52
52
i=1
< 1.923)
= P (X

STAT7055 - Lecture 5

54 / 69

Sampling Distribution of Sample Mean

Example 3

Solution
I

But we know from the CLT that




2
1.4
N 2.2,
X
52

Therefore,

< 1.923 = P
P X

<

1.923 2.2

1.4

52

= P (Z < 1.43)
= 0.0764
STAT7055 - Lecture 5

55 / 69

Sampling Distribution of Sample Proportion

Binomial Distribution
I

I
I

Recall the Binomial distribution that was introduced


in Week 3.
If X Bin(n, p), then X was counting the number
of successes in n independent Bernoulli trials.
We assumed that p was known.
But in reality, p is an unknown population
parameter.

STAT7055 - Lecture 5

56 / 69

Sampling Distribution of Sample Proportion

Sample Proportion
I

Therefore, just like we did with the population mean


, we need to estimate the population proportion p
from a sample.
For example, suppose we are interested in whether
Coke or Pepsi is the more popular soft drink.
Let p denote the population proportion of people
who prefer Coke over Pepsi.

STAT7055 - Lecture 5

57 / 69

Sampling Distribution of Sample Proportion

Sample Proportion
I

If X denotes the number of people who prefer Coke


over Pepsi in a randomly selected sample, what is a
reasonable estimate of p?
If n is the size of the sample, then a reasonable
estimate of p is the sample proportion of people
who prefer Coke over Pepsi:
p =

X
n

Lets investigate p in a little more detail...


STAT7055 - Lecture 5

58 / 69

Sampling Distribution of Sample Proportion

Sample Proportion
I

Another way we can express X is:


X=

n
X

Xi

i=1

where
(
1 person i prefers Coke over Pepsi
Xi =
0 otherwise
I

So X is a sum of independent Bernoulli random


variables.
STAT7055 - Lecture 5

59 / 69

Sampling Distribution of Sample Proportion

Probability Distribution of Xi
xi

p(xi ) 1 p p

= E(Xi ) = 0 (1 p) + 1 p
=p

2 = V (Xi ) = E Xi2 (E(Xi ))2

= 02 (1 p) + 12 p p2
= p(1 p)
STAT7055 - Lecture 5

60 / 69

Sampling Distribution of Sample Proportion

Sample Proportion
I

Notice that p is just the sample mean of the


Bernoulli variables:
Pn
X
Xi
= i=1
p =
n
n

Therefore, from slides 41, 42 and 43, we know that:


p = E(
p) = = p
p(1 p)
2
2
p = V (
p) =
=
n
n
STAT7055 - Lecture 5

61 / 69

Sampling Distribution of Sample Proportion

Central Limit Theorem


I

And we can also apply the CLT to find the sampling


distribution of p!
The CLT tells us that for sufficiently large n:




p(1 p)
2
2
= N p,
p N p = , p =
n
n
If we standardise p, we then get:
p p
q
= Z N (0, 1)
p(1p)
n

STAT7055 - Lecture 5

62 / 69

Sampling Distribution of Sample Proportion

Example 1

Example 1
I

(Q9.41 p327) A psychologist believes that 80% of


male drivers when lost continue to drive hoping to
find the location they seek rather than ask
directions. To examine this belief, he took a random
sample of 350 male drivers and asked each what
they did when lost. If the belief is true, determine
the probability that less than 75% said they
continue driving.

STAT7055 - Lecture 5

63 / 69

Sampling Distribution of Sample Proportion

Example 1

Solution
I

I
I

Let p be the sample proportion of drivers that keep


driving.
We know that n = 350 and the population
proportion is p = 0.8.
We want to find P (
p < 0.75).
Since n = 350 is very large, we can apply the CLT
and conclude that:




0.8(1 0.8)
p(1 p)
= N 0.8,
p N p,
n
350
STAT7055 - Lecture 5

64 / 69

Sampling Distribution of Sample Proportion

Example 1

Solution
I

Therefore:

0.75 0.8
p p
<q
P (
p < 0.75) = P q

p(1p)
n

0.8(10.8)
350

= P (Z < 2.34)
= 0.0096

STAT7055 - Lecture 5

65 / 69

Sampling Distribution of Sample Proportion

Example 2

Example 2

(Q9.43 p327) An accounting professor claims that


no more than one-quarter of undergraduate business
students will major in accounting. What is the
probability that in a random sample of 1200
undergraduate business students, 336 or more will
major in accounting?

STAT7055 - Lecture 5

66 / 69

Sampling Distribution of Sample Proportion

Example 2

Solution
I

I
I

Let X be the number of students that major in


accounting out of the random sample of 1200.
Then X Bin(n = 1200, p = 0.25).
We want:

P (X 336) = P (X = 336) + P (X = 337) + . . .


. . . + P (X = 1200)

STAT7055 - Lecture 5

67 / 69

Sampling Distribution of Sample Proportion

Example 2

Solution
I

We can use the CLT to approximate this probability:




336
X

P (X 336) = P
1200 1200
P (
p > 0.28)

0.28 0.25
p p
= P q
<q

p(1p)
n

0.25(10.25)
1200

= P (Z > 2.40)
= 1 0.9918
= 0.0082
STAT7055 - Lecture 5

68 / 69

Reference

Reference

Chapter 9.
I

Keller, 9th edition.

STAT7055 - Lecture 5

69 / 69

You might also like