You are on page 1of 42

INFERENTIAL STATISTICS

Statistical inference may be divided into two major


areas:
Estimation
Testing of Hypothesis
ESTIMATION Terms

In theory of estimation, STATISTIC is renamed as


ESTIMATOR ( x)
Value of an estimator is called ESTIMATE
Introductory Case:
Fuel Usage of Ultra-Green Cars
File mpg
A car manufacturer advertises that its new ultra-
green car obtains an average of 100 mpg and, based
on its fuel emissions, has earned an A+ rating from the
Environmental Protection Agency.
Pinnacle Research, an independent consumer advocacy
firm, obtains a sample of 25 cars for testing purposes.
Each car is driven the same distance in identical
conditions in order to obtain the cars mpg.
Introductory Case:
Fuel Usage of Ultra-Green Cars
The mpg for each Ultra-Green car is given below.

Jared would like to use the data in this sample to:


Estimate with 90% confidence
The mean mpg of all ultra-green cars.
The proportion of all ultra-green cars that obtain over 100 mpg.
Determine the sample size needed to achieve a specified level of precision in the
mean and proportion estimates.
Point Estimate
is a SINGLE VALUE of the estimator, obtained from available
sample observations
Example : proportion of vegetarians in a random sample of 50
PGP students can be a point estimate of the corresponding
proportion in the population of all PGP students.

Interval Estimate (Confidence Interval)


is an INTERVAL that provides an upper and lower bound for a
specific unknown population parameter.
Ex: The interval (45, 52) may contain the true proportion of
vegetarians among all PGP students with 95% confidence.

Point estimate is always within the interval estimate


POINT ESTIMATION
Point Estimators and Their
Properties
An estimator of a parameter is a statistic used to estimate the
parameter. The most commonly-used estimator of the:
Population Parameter Estimator (statistic)
Mean () is the Mean (X)
Variance (2) is the Variance (s2)
Standard Deviation () is the Standard Deviation (s)
Proportion (p) is the proportion ( p )
Difference of means (1 2 ) is the difference of means( x1 x)2

Desirable properties of estimators include:


Unbiasedness
Efficiency
Consistency
Unbiased and Biased Estimators

{
Bias

An unbiased estimator is on A biased estimator is


target on average. off target on average.
Properties of estimator: Unbiasedness

T is said to be an unbiased estimator of iff E(T)=

Example:

SAMPLE MEAN IS THE ESTIMATOR OF POPULATION MEAN

1 n 1 n 1 n
E ( x ) E ( xi ) E ( xi ) E ( x )
n i 1 n i 1 n i 1
Example of biased estimator: Sample
variance.
Given sample of size n from the population with unknown mean () and variance
(2) we estimate mean as we already know and variance (intuitively) as:
1 n 1 n 2
T ( xi x ) xi x
2 2

n i 1 n i 1

Sample variance is not an unbiased estimator for the population variance. That is why
when mean and variance are unknown the following equation is used for sample
variance: 1 n
E (T )
n
E (x
i 1
2
i ) E(x 2 )

1 n
[var( xi ) ( E ( xi )) 2 ] [var( x ) ( E ( x )) 2
n i 1
1 n
1 n

[ ] [ 2 ]
2 2
2
s
2
i
n 1 i 1
( x x ) 2

n i 1 n
n 1 2

n
Example
Suppose, you want to estimate mean and sd of score of a batsman
in one day cricket. So, you have randomly chosen 5 different
innings and recorded scores as below
20 52 8 63 11
Find out unbiased estimator of mean and variance.

x 30.8
1 n
s
n 1 i 1
( xi x ) 2
25.07
Interval Estimation
Consider the following statements:
x = 550
A single-valued estimate that conveys little information
about the actual value of the population mean.
We are 90% confident that is in the interval [449,551]
An interval estimate which locates the population mean
within a narrow interval, with a high level of confidence.
We are 99% confident that is in the interval [400,700]
An interval estimate which locates the population mean
within a broader interval, with a lower level of confidence.
Interval Estimator =
Point Estimator Margin of
Error
Elements of Interval Estimation

A Probability That the Population Parameter Falls


Somewhere Within the Interval.
Confidence Interval
Sample
Statistic or
Point
Estimator

Confidence Limit Confidence Limit


(Lower) (Upper)
Elements of Interval Estimation
Confidence Coefficient/Level : Probability that
the confidence interval will contain true
parameter
Denoted by (1 - ) % e.g. 90%, 95%, 99%
:Probability that the interval does not contain the
parameter
The confidence coefficient is the area under the
curve of the sampling distribution.
What is happening?
S a m p lin g D is t r ib u t io n o f t h e M e a n
0 .4

95%
0 .3
f(x)

0 .2

0 .1
2.5% 2.5%

0 .0
x

x 1.96
x 1.96 n
n

x
2.5% fall below the interval
x

x
x
x 2.5% fall above the interval
x

x
x

95% fall within the interval


Background

We define z as the z value that cuts off a right-tail area of under the standard
2 2
normal curve.

S t a n d ard N o r m al Dis trib utio n


P z > z /2
0.4
(1 ) 2

P z < z /2

0.3
2

f(z)

P z z z (1 )
< <
0.2

0.1 2 2

2 2
0.0 (1- )100% Confidence Interval:
-5 -4 -3 -2 -1 0 1 2 3 4 5
z Z z x z
2 2
2 n
Critical Values of z and Levels of
Confidence
S t a n d a rd N o r m al Di s trib utio n
(1 ) z 0.4
2 2 (1 )
0.3
0.99 0.005 2.576

f(z)
0.2

0.98 0.010 2.326 0.1


2 2
0.95 0.025 1.960 0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
z z
0.90 0.050 1.645 2
Z
2

0.80 0.100 1.282


Interval Estimator

Confidence
Intervals

Mean Proportion

Known
known unknown
Confidence Interval of (
known)
Assumption
Population Standard Deviation, is Known
Sample size is large

Confidence Interval Estimator of :


Let x1 , x2 ,..., xn be an iid random sample of size n,
drawn from a population with mean and sd .
100(1-)% Confidence Interval of is

x z / 2 x z / 2
n n

The quantity z / 2 is often called the margin of error or
the sampling error. n

A 95% confidence interval:


For example, if: n = 30 =0.05 ; /2=0.025
= 20
x = 122 z0.025 1.96
20
x 1.96 122 1.96
n 30
122 (1.96)(3.65)
122 7.15
114.85,129.15
Confidence level and the Width
of the Confidence Interval
When sampling from the same population, using a fixed sample size, the
higher the confidence level, the wider the confidence interval.
S t a n d a r d N o r m al Di s tri b uti o n S t a n d a r d N o r m al Di s tri b u ti o n

0.4 0 .4

0.3 0 .3
f(z)

f(z)
0.2 0 .2

0.1 0 .1

0.0 0 .0
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
Z Z

80% Confidence Interval: 95% Confidence Interval:



x 128
. x 196
.
n n
Sample Size and the Width of the
Confidence Interval
When sampling from the same population, using a fixed confidence
level, the larger the sample size, n, the narrower the confidence
interval.
S a m p lin g D is trib u tio n o f th e Me a n S a m p lin g D is trib u tio n o f th e Me a n

0 .4 0 .9

0 .8

0 .3 0 .7

0 .6

0 .5

f(x)
f(x)

0 .2
0 .4

0 .3
0 .1
0 .2

0 .1
0 .0 0 .0

x x

95% Confidence Interval: n = 20 95% Confidence Interval: n = 40


Confidence Interval of (
unknown)
Assumption
Population Standard Deviation, is unknown
Sample size is large

Confidence Interval Estimator of :


Let x1 , x2 ,..., xn be an iid random sample of size n, drawn from a large sample
with mean and sd . 100(1-)% Confidence Interval of is

s s
x z / 2 x z / 2
n n

1 n
where s i x x 2

n i 1
Practice Problem 1:
A manufacturer of light bulbs claims that its light bulbs have a mean life hours
with a standard deviation of 85 hours. A random sample of 40 such bulbs is
selected for testing. If the sample produces a mean value of 1505 hours, find out
95% Confidence Interval of .
Solution: Given, n=40 (large), =85 (known), 1-=0.95, =0.05,
x 1505
z / 2 z 0.025 1.96
Therefore,

95% CI of is given by
85 85
1505 40 1.96 , 1505 40 1.96

1478.66 , 1531.34
Practice Problem 2:
Waiting times (in hours) at a popular restaurant are found to have a mean waiting
time of 1.52 hours with sd 2.25hrs. for a sample of 50 customers. Construct the
99% confidence interval for the estimate of the population mean.
Solution: Given, n=50 (large), s=2.25 (estimated), 1-=0.99, =0.01,
z / 2 z 0.005 2.58
x 1.52
Therefore,
99% CI of is given by

2.25 2.25
1.52 2.58 , 1.52 2.58
50 50
1.20 , 2.34
Large-Sample Confidence Intervals
for the Population Proportion, p

The estimator of the population proportion, p , is the sample proportion, p . If the


sample size is large, p has an approximately normal distribution, with E( p ) = p and
pq
V( p ) = , where q = (1 - p). When the population proportion is unknown, use the
n
estimated value, p , to estimate the standard deviation of p .

For estimating p , a sample is considered large enough when both n p an n q are greater
than 5.
Large-Sample Confidence Intervals
for the Population Proportion, p

Assumptions
Two Categorical Outcomes

Population Follows Binomial Distribution

Large Sample

100(1-)% Confidence Interval for population


proportion p is given by

p (1 p ) p (1 p )
p Z / 2 p p Z / 2
n n
8.3 Confidence Interval for
the Population Proportion
Example: Recall that Jared Beane wants to estimate the
proportion of all ultra-green cars that obtain over 100 mpg. Use
the sample information to construct a 90% confidence interval
of the population proportion.
Solution: Note that p 7 25 0.28. In addition, the
normality assumption is met since np > 5 and n(1 p) > 5.
Thus,

p 1 p 0.28 1 0.28
p z 2 =0.28 1.645 0.28 0.148
n 28

LO 8.6
Practice Problem 3:
A marketing research firm wants to estimate the share that foreign companies
have in the Indian market for certain products. A random sample of 100
consumers is obtained, and it is found that 34 people in the sample are users
of foreign-made products; the rest are users of domestic products. Give a
95% confidence interval for the share of foreign products in this market.


pq ( 0.34 )( 0.66)
p z 0.34 1.96
2
n 100
0.34 (1.96)( 0.04737 )
0.34 0.0928
0.2472 ,0.4328

Thus, the firm may be 95% confident that foreign manufacturers control
anywhere from 24.72% to 43.28% of the market.
Reducing the Width of Confidence Intervals -
The Value of Information
The width of a confidence interval can be reduced only at the
price of:
a lower level of confidence, or
a larger sample.

Lower Level of Confidence Larger Sample Size


90% Confidence Interval Sample Size, n = 200

pq (0.34)(0.66)
pq (0.34)(0.66)
p z 0.34 1645
. p z 0.34 196
.
n 100 2
n 200
0.34 (196
2
0.34 (1645
. )(0.04737) . )(0.03350)
0.34 0.07792 0.34 0.0657
0.2621,0.4197 0.2743,0.4057
8.2 Confidence Interval for the
Population Mean When Is Unknown
LO 8.4 Discuss features of the t distribution.

The t Distribution
If repeated samples of size n are taken from a normal
population with a finite variance, then the
statistic T follows the t distribution
with (n 1) degrees of freedom, df. X
T
S n
Degrees of freedom determine
the extent of the broadness of the tails of the
distribution; the fewer the degrees of freedom,
the broader the tails.

LO 8.4
8.2 Confidence Interval for the
Population Mean When Is Unknown
Summary of the tdf Distribution
Bell-shaped and symmetric around 0 with asymptotic

tails (the tails get closer and closer to the horizontal


axis, but never touch it).
Has slightly broader tails than the z distribution.

Consists of a family of distributions where the actual

shape of each one depends on the df. As df increases,


the tdf distribution becomes similar to the z
distribution; it is identical to the z distribution when df
approaches infinity.
LO 8.4
8.2 Confidence Interval for the
Population Mean When Is Unknown
The tdf Distribution with Various Degrees of
Freedom

LO 8.4
Confidence Interval of (
unknown)
Assumption
Population Distribution is Normal
Population Standard Deviation, is unknown

Confidence Interval Estimator of :


Let x1 , x2 ,..., xn be a random sample of size n, drawn from normal with mean
and sd . 100(1-)% Confidence Interval of is

s s
x t / 2,n 1 x t / 2,n 1
n n
where t / 2 is the value of the t distribution 1 n
with n-1 degrees of freedom that cuts off where s i x x 2

a tail area of to its right. n 1 i 1


Practice Problem 4:
A stock market analyst wants to estimate the average return on a certain
stock. A random sample of 15 days yields an average (annualized) return of
x 10.37% and a standard deviation of s = 3.5%. Assuming a normal
population of returns, give a 95% confidence interval for the average return
on this stock.

The critical value of t for df = (n -1) = (15 -1) =14 and a right-
tail area of 0.025 is:

15 2
t 0.025 2.145
`2= = 13.125; ` = 3.623
14
The corresponding confidence interval or interval estimate is:
s
x t 0.025
n
3.623
10.37 2.145
15
10.37 1.81
8.56,12.18
Sample-Size Determination
Before determining the necessary sample size, three questions must be
answered:
How close do you want your sample estimate to be to the unknown
parameter? (What is the desired bound, B?)
What do you want the desired confidence level (1-) to be so that the
distance between your estimate and the parameter is less than or equal to
B?
What is your estimate of the variance (or standard deviation) of the
population in question?


For example : (1 - )% Confidence Interval for : x z
2 n

}
Bound, B
Minimum Sample Size: Mean
and Proportion
Minimum required sample size in estimating the population
mean, :
z2 2
n 2 2
B
Bound of estimate : B (Known)

Minimum required sample size in estimating the population


proportion,
z2 pq
n 2 2
B
8.4 Selecting the Required
Sample Size
Example: Recall that Jared Beane wants to construct a
90% confidence interval of the mean mpg of all ultra-
green cars.
Suppose Jared would like to constrain the margin of error to
within 2 mpg. Further, the lowest mpg in the population is 76
mpg and the highest is 118 mpg.
How large a sample does Jared need to compute the 90%
confidence interval of the population mean?


2
2 1.645 10.50
2
z
n 74.58 or 75
D 2
LO 8.7
Example 1
A marketing research firm wants to conduct a survey to estimate the average
amount spent on entertainment by each person visiting a popular resort. The
people who plan the survey would like to determine the average amount spent by
all people visiting the resort to within $120, with 95% confidence. From past
operation of the resort, an estimate of the population standard deviation is
s = $400. What is the minimum required sample size?

z
2 2

n 2
2
B

2 2
(1.96) ( 400)
2
120

42.684 43
Example 2
The manufacturers of a sports car want to estimate the proportion of people in a
given income bracket who are interested in the model. The company wants to
know the population proportion, p, to within 0.01 with 99% confidence. Current
company records indicate that the proportion p may be around 0.25. What is the
minimum required sample size for this survey?

z2 pq
n 2

B2
2.5762 (0.25)(0.75)

. 2
010
124.42 125

You might also like