You are on page 1of 8

Chapter 17

Inference about a Population Mean

Conditions for inference


Previously, when making inferences about the population mean, , we were
assuming the following simple conditions:
(1)

Our data (observations) are a simple random sample (SRS) of size n


from the population of interest.

(2)

The variable we measure has an exactly normal distribution with


parameters and .

(3)

Population standard deviation is known.

Then we were constructing confidence interval for the population mean based
on _________ distribution (one-sample z statistic):
This holds approximately for large samples even if the assumption (2) is not
satisfied. Why?
Issue: In a more realistic setting, assumption (3) is not satisfied, i.e., the
standard deviation is unknown.
So what can we do to handle real-life problems?
We replace the population standard deviation, by its estimate:

When is known, the standard deviation of the sample mean x is


When is unknown, we then estimate the standard deviation of x by
(This quantity is called the _____________ of the sample mean x .)
We get the one-sample t statistic:
When making inferences about the population mean with unknown we use
the one-sample t statistic (Note that we still need the assumptions 1 and 2).
But one-sample t statistic doesnt have normal distribution, it has
1

The t-distributions
We specify a particular t-distribution by giving its degrees of freedom (d.f.).

How does t-distribution compare with standard normal distribution?


Similarities:

Difference:

As the d.f. k increases, the tk distribution approaches the Normal(0,1) distribution.


Notation: tk represents the t-distribution with k d.f.

Confidence Intervals for a Population Mean (when standard deviation


is unknown)
Confidence interval for when is unknown (t -CI)
A level C confidence interval for is given by

where t* is the upper (1-C)/2 critical value for the tn-1 distribution, i.e.,

Ex: What critical value t* from Table C would you use to make a CI for the
population mean in each of the following situations?
a) A 95% CI based on n = 10 observations.

b) A 90% CI based on n = 26 observations.

c) An 80% CI from a sample of size 7.

Ex: Suppose the JC-Penney wishes to know the average income of the
households in the Dallas area before they decide to open another store here. A
random sample of 21 households is taken and the income of these sampled
households turns out to average $45,000 with a standard deviation of $15,000.
(a)

Give a 90% confidence interval for the unknown average income of the
households in Dallas area.

(b)

Is there evidence at 10% level that the average income of the household
in the Dallas area is $48,000? Use the four-step process.

Matched Pairs t Procedures


As we mentioned in Chapter 9, comparative studies are more convincing than
single-sample investigations. For that reason, one sample- inference is less
common than comparative inference.
In a matched pairs design, subjects are matched in pairs and each treatment is
given to one subject in each pair. The experimenter can toss a coin to assign two
treatments to the two subjects in each pair.

Example 1. Suppose a college placement center wants to estimate , the


difference in mean, starting salaries for men and women graduates who seek jobs
through the center. If it independently samples men and women, the starting
salaries may vary because of their different college majors and differences in
grade point averages. To eliminate these sources of variability, the placement
center could match male and female job-seekers according to their majors and
GPAs. Then the differences between the starting salaries of each pair in the
sample could be used to make an inference about .
Example 2. Suppose you wish to estimate the difference in mean absorption rate
into the bloodstream for two drugs that relieve pain. If you independently sample
people, the absorption rates might vary because of age, weight, sex, etc. It may
be possible to obtain two measurements on the same person. First, we administer
one of the two drugs and record the time until absorption. After a sufficient
amount of time, the other drug is administered and a second measurement on
absorption time is obtained. The differences between the measurements for each
person in the sample could then be used to estimate .
Another situation calling for matched pairs is before-and-after observations on the
same subjects.
Example 3. Suppose you wish to estimate the difference in mean blood pressure
before and after taking a drug. We will obtain the first measurement before a
patient is taking the drug and second measurement after a sufficient amount of
time that the patient was taking the drug. The differences between the
measurements for each person in the sample could then be used to estimate .
If the samples are matched pairs, find the difference between the
responses within each pair, then apply one-sample t procedures to
those differences of observed responses.

Example. An experiment is conducted to compare the starting salaries of male and


female college graduates who find jobs. Pairs are formed by choosing a male and
a female with the same major and similar GPA. Suppose a random sample of 10
pairs is formed in this manner and the starting annual salary of each person is
recorded. Let 1 be the mean starting salary for males and let 2 be the mean
starting salary for females.
Pair
1
2
3
4
5
6
7
8
9
10

Male (in $)
29300
41500
40400
38500
43500
37800
69500
41200
38400
59200

Female (in $)
28800
41600
39800
38500
42600
38000
69200
40100
38200
58500

Difference (male female)


500
- 100
600
0
900
- 200
300
1100
200
700

(a) Compute a 95% confidence interval for the mean difference = 1-2.
The sample average of the paired difference

x=
and the sample standard deviation of the paired difference

s=

The 95% paired difference CI for = 1-2 is

(b) Is there evidence at 5% level that the male starting salary is significantly
different from the female starting salary? Use the four-step process.

Robustness of t procedures
A confidence interval is called robust if the confidence level does not change very
much when the conditions for use of the procedure are violated.
The t confidence interval is exact when the distribution of the population is exactly
_________. However, no real data are exactly ________.
The usefulness of the t procedures in practice therefore depends on

Here are some practical guidelines for inference on population means:


***Always make a plot to check for skewness and outliers before using the t
procedures for small samples. ***

Using the t procedures


Except in the case of small samples, the condition that the data are an SRS
from the population of interest is more important than the condition that the
population distribution is normal.
Sample size less than 15: Use t procedures if the data appear close to
normal (roughly symmetric, single peak, no outliers). If the data are clearly
skewed or if outliers are presented, do not use t procedures.
Sample size at least 15: The t procedures can be used except in the
presence of outliers or strong skewness.
Large samples: The t procedures can be used even for clearly skewed
distributions when the sample size is large, say n 40.

You might also like