You are on page 1of 75

Slides Prepared by

JOHN S. LOUCKS
St. Edwards University

2002 South-Western College Publishing/Thomson Learning

Chapter 21
Sample Survey

Terminology Used in Sample Surveys


Types of Surveys and Sampling Methods
Survey Errors
Simple Random Sampling
Stratified Simple Random Sampling
Cluster Sampling
Systematic Sampling

Terminology Used in Sample Surveys

An element is the entity on which data are


collected.
A population is the collection of all elements of
interest.
A sample is a subset of the population.

Terminology Used in Sample Surveys

The target population is the population we


want to make inferences about.
The sampled population is the population from
which the sample is actually selected.
These two populations are not always the
same.
If inferences from a sample are to be valid, the
sampled population must be representative of
the target population.

Terminology Used in Sample Surveys

The population is divided into sampling units


which are groups of elements or the elements
themselves.
A list of the sampling units for a particular
study is called a frame.
The choice of a particular frame is often
determined by the availability and reliability of
a list.
The development of a frame can be the most
difficult and important steps in conducting a
sample survey.

Types of Surveys

Surveys Involving Questionnaires


Three common types are mail surveys,
telephone surveys, and personal interview
surveys.
Survey cost are lower for mail and
telephone surveys.
With well-trained interviewers, higher
response rates and longer questionnaires
are possible with personal interviews.
The design of the questionnaire is critical.

Types of Surveys

Surveys Not Involving Questionnaires


Often, someone simply counts or measures
the sampled items and records the results.
An example is sampling a companys
inventory of parts to estimate the total
inventory value.

Sampling Methods

Sample surveys can also be classified in terms


of the sampling method used.
The two categories of sampling methods are:
Probabilistic sampling
Nonprobabilistic sampling

Nonprobabilistic Sampling Methods

The probability of obtaining each possible


sample can be computed.
Statistically valid statements cannot be made
about the precision of the estimates.
Sampling cost is lower and implementation is
easier.
Methods include convenience and judgment
sampling.

Nonprobabilistic Sampling Methods

Convenience Sampling
The units included in the sample are chosen
because of accessibility.
In some cases, convenience sampling is the
only practical approach.

10

Nonprobabilistic Sampling Methods

Judgment Sampling
A knowledgeable person selects sampling
units that he/she feels are most
representative of the population.
The quality of the result is dependent on the
judgment of the person selecting the
sample.
Generally, no statistical statement should
be made about the precision of the result.

11

Probabilistic Sampling Methods

The probability of obtaining each possible


sample can be computed.
Confidence intervals can be developed which
provide bounds on the sampling error.
Methods include simple random, stratified
simple random, cluster, and systematic
sampling.

12

Survey Errors

Two types of errors can occur in conducting a


survey:
Sampling error
Nonsampling error

13

Survey Errors

Sampling Error
It is defined as the magnitude of the
difference between the point estimate,
developed from the sample, and the
population parameter.
It occurs because not every element in the
population is surveyed.
It cannot occur in a census.
It can not be avoided, but it can be
controlled.

14

Survey Errors

Nonsampling Error
It can occur in both a census and a sample
survey.
Examples include:
Measurement error
Errors due to nonresponse
Errors due to lack of respondent
knowledge
Selection error
Processing error

15

Survey Errors

Nonsampling Error
Measurement Error
Measuring instruments are not properly
calibrated.
People taking the measurements are not
properly trained.

16

Survey Errors

Nonsampling Error
Errors Due to Nonresponse
They occur when no data can be
obtained, or only partial data are
obtained, for some of the units surveyed.
The problem is most serious when a bias
is created.

17

Survey Errors

Nonsampling Error
Errors Due to Lack of Respondent
Knowledge
These errors on common in technical
surveys.
Some respondents might be more
capable than others of answering
technical questions.

18

Survey Errors

Nonsampling Error
Selection Error
An inappropriate item is included in the
survey.
For example, in a survey of small truck
owners some interviewers include SUV
owners while other interviewers do not.

19

Survey Errors

Nonsampling Error
Processing Error
Data is incorrectly recorded.
Data is incorrectly transferred from
recording forms to computer files.

20

Simple Random Sampling

A simple random sample of size n from a finite


population of size N is a sample selected such
that every possible sample of size n has the
same probability of being selected.
We begin by developing a frame or list of all
elements in the population.
Then a selection procedure, based on the use
of random numbers, is used to ensure that
each element in the sampled population has
the same probability of being selected.

21

Simple Random Sampling


We will see in the upcoming slides how to:
Estimate the following population parameters:
Population mean
Population total
Population proportion
Determine the appropriate sample size

22

Simple Random Sampling

In a sample survey it is common practice to


provide an approximate 95% confidence
interval estimate of the population parameter.
Assuming the sampling distribution of the
point estimator can be approximated by a
normal probability distribution, we use a value
of z = 2 for a 95% confidence interval.
The interval estimate is:
Point Estimator +/- 2(Estimate of the Standard
Error
of the Point Estimator)

The bound on the sampling error is:


2(Estimate of the Standard Error of the Point
Estimator)

23

Simple Random Sampling

Population Mean
Point Estimator

Estimate of the Standard Error of the Mean

N n s
sx
N n

24

Simple Random Sampling

Population Mean
Interval Estimate

x z / 2sx

Approximate 95% Confidence Interval


Estimate

x 2sx

25

Simple Random Sampling

Population Total
Point Estimator

X Nx

Estimate of the Standard Error of the Total

sx Nsx

26

Simple Random Sampling

Population Total
Interval Estimate

Nx z / 2sx

Approximate 95% Confidence Interval


Estimate

Nx 2sx

27

Simple Random Sampling

Population Proportion
Point Estimator

Estimate of the Standard Error of the


Proportion
N n p(1 p)
sp

n n 1

28

Simple Random Sampling

Population Proportion
Interval Estimate

p z / 2sp

Approximate 95% Confidence Interval


Estimate
p 2sp

29

Determining the Sample Size

An important consideration in sample design is


the choice of sample size.
The best choice usually involves a tradeoff
between cost and precision (size of the
confidence interval).
Larger samples provide greater precision, but
are more costly.
A budget might dictate how large the sample
can be.
A specified level of precision might dictate how
small a sample can be.

30

Determining the Sample Size

Smaller confidence intervals provide more


precision.
The size of the approximate confidence
interval depends on the bound B on the
sampling error.
Choosing a level of precision amounts to
choosing a value for B.
Given a desired level of precision, we can solve
for the value of n.

31

Simple Random Sampling

Necessary Sample Size


for Estimating the Population Mean
N n
B 2
N

Hence,
n
N

Ns2
B2
2

s
4

32

Example: Innis Investments

Simple Random Sampling


Innis is a financial advisor for 200 clients. A
sample of 40 clients has been taken to obtain
various demographic data and information
about the clients investment objectives.
Statistics of particular interest are the clients
age, clients total net worth, and the
proportion favoring fixed income investments.

33

Example: Innis Investments

Simple Random Sampling


For the sample, the mean age was 52 (with
a standard deviation of 10), the mean net
worth was $480,000 (with a standard deviation
of $120,000), and the proportion favoring
fixed-income investments was .30.

34

Example: Innis Investments

Estimate of Standard Error of Mean Age


N n s
200 40 10
sx
.

141

N
n
200
40

Approximate 95% Confidence Interval for Mean


Age
x 2sx = 52 2(141
. ) = 52 282
.
= 49.18 to 54.82

35

Example: Innis Investments

Point Estimate of Total Net Worth (TNW) of


Clients
X Nx 200(480) 96,000 thousand = $96,000,000

Estimate of Standard Error of TNW

N n s
200 40
120

sX Nsx N

400
6,788.4

N n
200
40

= $6,788,400

Approximate 95% Confidence Interval for TNW

Nx 2sx = 96,000 2(6,7884


. ) = 82,423.2 to 109,576.8
= $82,423,200 to $109,576,800

36

Example: Innis Investments

Point Estimate of Population Proportion


Favoring Fixed-Income Investments
p = .30
Estimate of Standard Error of Proportion

p(1 p) 200 40 .3(1.3) .029

200 200 1
n 1
Approximate 95% Confidence Interval

N n

sp

p 2sp = .300 2(.029) = .242 to .358

37

Example: Innis Investments


One year later Innis wants to again survey
his clients. He now has 250 clients and wants
to set a bound of $30,000 on the error of the
estimate of their mean net worth.

Necessary Sample Size


Ns2
250(120)2
n

5096
.
2
2
B
30
2
N
250
s
1202
4
4
He will need a sample size of 51.

38

Stratified Simple Random Sampling

The population is first divided into H groups,


called strata.
Then for stratum h, a simple random sample of
size nh is selected.
The data from the H simple random samples
are combined to develop an estimate of a
population parameter.
If the variability within each stratum is smaller
than the variability across the strata, a
stratified simple random sample can lead to
greater precision.
The basis for forming the various strata
depends on the judgment of the designer of
the sample.
39

Stratified Simple Random Sampling

Population Mean
Point Estimator
N h
xst
xh
h 1 N
H

where: Hxh= number of strata


= sample mean for stratum h
Nh = number of elements in the
population in stratum h
N = total number of elements
in the population (all strata)
40

Stratified Simple Random Sampling

Population Mean
Estimate of the Standard Error of the Mean
1
sxst 2
N

sh2
N h (N h nh )

nh
h 1
3

41

Stratified Simple Random Sampling

Population Mean
Interval Estimate

xst z / 2sxst

Approximate 95% Confidence Interval


Estimate
xst 2sxst

42

Stratified Simple Random Sampling

Population Total
Point Estimator

X Nxst

Estimate of the Standard Error of the Total

sx Nsxst

43

Stratified Simple Random Sampling

Population Total
Interval Estimate

X z / 2sx

Approximate 95% Confidence Interval


Estimate
X 2sx

44

Stratified Simple Random Sampling

Population Proportion
Point Estimator
N h
pst
ph
h 1 N
H

where: H = number of strata


h = sample proportion for stratum h
Nh = number of elements in the
population in stratum h
N = total number of elements
in the population (all strata)
45

Stratified Simple Random Sampling

Population Proportion
Estimate of Standard Error of the Proportion

1
spst 2
N

ph(1 ph )
N h(N h nh )

nh 1
h 1
H

46

Stratified Simple Random Sampling

Population Proportion
Interval Estimate

pst z / 2spst

Approximate 95% Confidence Interval


Estimate
pst 2spst

47

Stratified Simple Random Sampling

Total Sample Size When Estimating Population


Mean

N s
h 1
2

h h

B
N 2 N hsh2
h 1
4

48

Stratified Simple Random Sampling

Total Sample Size When Estimating Population


Total

N s

h h

n 2h 1 H
B
N hsh2
4 h 1

49

Stratified Simple Random Sampling

Allocating Total Sample Size When Estimating


Population Mean or Total

N hsh

nh n

N
s
h h
h 1

50

Stratified Simple Random Sampling

Total Sample Size When Estimating Population


Proportion

N
h 1
2

ph(1 ph )

B
N 2 N hph (1 ph )
h 1
4

51

Stratified Simple Random Sampling

Allocating Total Sample Size When Estimating


Population Proportion

N h ph(1 ph )

nh n

N
h 1

ph (1 ph )

52

Example: Mill Creek Co.

Stratified Simple Random Sampling


Mill Creek Co. has used stratified simple
random sampling to obtain demographic
information and preferences regarding health
care coverage for its employees and their
families.
The population of employees has been
divided into 3 strata on the basis of age: under
30, 30-49, and 50 or over. Some of the sample
data is shown on the next slide.

53

Example: Mill Creek Co.

Data
Annual Family
Dental Expense
Proportion
Stratum
Married

Nh

nh

Mean

St.Dev.

Under 30 100 30 $250


$75 .60
30-49 250 45 400
100 .70
50 or Over 125 30 425
130 .68
475 105
54

Example: Mill Creek Co.

Point Estimate of Mean Annual Dental Expense


3 N
100
250
125
h

xst
250
400
425
xh

475
475
475
h1 N
= $375
Estimate of Standard Error of Mean

sxst

2
3
s
1

1
h
2 N h(N h nh )
19,390,972
2
N h 1
475
nh

= 9.27

55

Example: Mill Creek Co.

Approximate 95% Confidence Interval


for Mean Annual Dental Expense
xst 2sxst = 375 2(927
. ) = 356.46 to 393.54

An approximate 95% confidence interval for


mean annual family dental expense is $356.46
to $393.54.

56

Example: Mill Creek Co.

Point Estimate of Total Family Expense for All


Employees
X Nxst 475(375) 178,125 $178,125

Approximate 95% Confidence Interval


X 2Nsxst = 178,125 2(475)(927
. ) = 178,125 8807

= $169,318 to $186,932

57

Example: Mill Creek Co.

Point Estimate of Proportion Married


3 N
100
250
125

pst h ph
.6
.7
.68 .6737
475
475
475
h1 N
Estimate of Standard Error of Proportion

spst

ph(1 ph )
1 3
1
2 N h(N h nh )

.
391637
2
N h 1
475
nh 1

= .0417
Approximate 95% Confidence Interval for
Proportion
pst 2spst = .6737 2(.0417) = .5903 to .7571

58

Cluster Sampling

Cluster sampling requires that the population


be divided into N groups of elements called
clusters.
We would define the frame as the list of N
clusters.
We then select a simple random sample of n
clusters.
We would then collect data for all elements in
each of the n clusters.

59

Cluster Sampling

Cluster sampling tends to provide better


results than stratified sampling when the
elements within the clusters are
heterogeneous.
A primary application of cluster sampling
involves area sampling, where the clusters are
counties, city blocks, or other well-defined
geographic sections.

60

Cluster Sampling

Notation
N = number of clusters in the population
n = number of clusters selected in the
sample
Mi = number of elements in cluster i
M = number of elements in the population
M = average number of elements in a
cluster
xi = total of all observations in cluster i
ai = number of observations in cluster i
with
a certain characteristic
61

Cluster Sampling

Population Mean
Point Estimator
n

xc

x
i 1
n

M
i 1

Estimate of StandardnError of the Mean


N n
sxc
2
NnM

2
(
x

x
M
)
i c i
i 1

n 1
62

Cluster Sampling

Population Mean
Interval Estimate

xc z / 2sxc

Approximate 95% Confidence Interval


Estimate

xc 2sxc

63

Cluster Sampling

Population Total
Point Estimator

X Mxc

Estimate of Standard Error of the Total

sX Msxc

64

Cluster Sampling

Population Total
Interval Estimate

Mxc z / 2sxc

Approximate 95% Confidence Interval


Estimate

Mxc 2sxc

65

Cluster Sampling

Population Proportion
Point Estimator
n

pc

a
i 1
n

M
i 1

66

Cluster Sampling

Population Proportion
Estimate of Standard Error of the Proportion
n

N n
2
NnM

spc

2
(
a

p
M
)
i c i
i 1

n 1

67

Cluster Sampling

Population Proportion
Interval Estimate

pc z / 2spc

Approximate 95% Confidence Interval


Estimate

pc 2spc

68

Example: Cooper County Schools


Cluster Sampling
There are 40 high schools in Cooper County.
School
officials are interested in the effect of
participation in
athletics on academic preparation for college.
A cluster sample of 5 schools has been taken
and a
questionnaire administered to all the seniors on
the
football team at those schools. There are a total
of 1200
high school seniors in the county playing
football.
69

Example: Cooper County Schools

Data
Number
Average
Number Planning School of Players
SAT Score to Attend College
1 45 840 15
2 20 980 16
3 30 905 12
4 38 880 18
5 40 970 23
173
84

70

Example: Cooper County Schools

Point Estimate of Mean SAT Score


5

xc

xi

i 1
5

Mi

45(840) 20(980)... 40(970)


906
45 20 30 38 40

i 1

71

Example: Cooper County Schools

Estimate of Standard Error of the Mean


5

N n
sxc

2
NnM

2
(xi xcM i )

i 1

n 1

1200 173 18,541,944



50478
.
2
5 1
1200(173)(30)

72

Example: Cooper County Schools

Approximate 95% Confidence Interval


xc 2sxc = 906 2(5.0478) = 896 to 916

Point Estimate of Proportion Planning to Attend


College
5
ai
84
pc i5 1
.49
173
M
i
i1

73

Systematic Sampling

Systematic Sampling is often used as an


alternative to simple random sampling which
can be time-consuming if a large population is
involved.
If a sample size of n from a population of size
N is desired, we might sample one element for
every N/n elements in the population.
We would randomly select one of the first N/n
elements and then select every (N/n)th
element thereafter.
Since the first element selected is a random
choice, a systematic sample is often assumed
to have the properties of a simple random
sample.
74

End of Chapter 21

75

You might also like