You are on page 1of 14

Biostatistics I: Basic for Public Health

Lecture No.: KUI-6011


Starting Date: 22/09/2015

Statistical Inference: Interval Estimation


Laboratory Session 4

Copyright © 2015, S.A. Wilopo, Department of Public Health


Faculty of Medicine, Gadjah Mada University, Yogyakarta, Indonesia
Contents
1 Learning objectives: 1

2 Activities: 1

3 Class Exercises 1
3.1 Standard Deviation and Confidence Intervals . . . . . . . . . . . . . . . . . . 1
3.2 Odds Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.3 Risk Reduction and Number Needed to Treat . . . . . . . . . . . . . . . . . 4
3.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4 Homework: 6
4.1 Critical Appraisal of Confidence Intervals . . . . . . . . . . . . . . . . . . . . 6
4.2 Calculating and Interpreting Confidence Intervals using STATA . . . . . . . 7
4.2.1 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2.2 Descriptive Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5 Outputs 10

6 References: 11

7 Log sheet: 12

i
1 Learning objectives:
Upon completion of the exercise unit, students should be able to:

1. describe concerning unknown population parameters based on observed sample


data

2. describe the concepts of statistical inference in the form of interval estimation

3. assess unknown population parameters using interval estimation

4. evaluate concerning the results of intervention which use standard deviations, rela-
tive risks, odd ratios, confidence interval, chi-squared, and P values

5. demonstrate relationship between hypothesis testing and interval estimation

6. appraise article published research using interval estimations

2 Activities:
1. Discussion: Statistical Inference – Interval Estimation

2. Estimating unknown population parameters based on observed sample data

3. Estimating Interval estimation - Confidence interval from sample for continues data

4. Estimating Interval estimation - Confidence interval from sample for discrete data

5. Estimation standard deviations, relative risks, odd ratios, confidence interval and P
values

6. Exercise on relationship between hypothesis testing and interval estimation

3 Class Exercises
3.1 Standard Deviation and Confidence Intervals
1. In a sample of 100 healthy women between 25 and 29 years of age, systolic blood
pressure was found to follow a normal distribution. If the sample mean blood
pressure was 120 mm Hg and the population standard deviation was 10 mm Hg,
what interval of blood pressure would represent an approximate 95% confidence
interval for the true mean µ?

1
a. 118 to 122 mm Hg
b. 100 to 140 mm Hg
c. 119 to 121 mm Hg
d. 100 to 130 mm Hg
e. 90 to 150 mm Hg

2. How will the length of a confidence interval, for the population, mean change when

a sample size is increased


1) increases
2) decreases
3) stays the same
b variation in the population is higher
1) increases
2) decreases
3) stays the same
c confidence level is increased from 95% to 99%
1) increases
2) decreases
3) stays the same
d sample mean is larger
1) increases
2) decreases
3) stays the same

3. A 99% confidence interval implies

a. The probability the given interval contains the true parameter is 0.99.
b. The probability that 99% of the individuals are contained in the interval is 0.99.
c. The probability that 99% of the sample means are contained in the interval is
0.99.
d. On average, we would expect 99 out of 100 of similarly constructed intervals
to contain the true parameter.
e. The probability the given interval contains the true parameter is 0.01.

2
4. Weight loss is a major manifestation of HIV infection. A study was carried out to
analyze weight change in 9 male subjects with State IV HIV infection. The average
weight loss per month for the 9 men was 5.97 kg/month with a sample standard
deviation of 2.687 kg/month. Construct a 95% confidence interval for the true mean
weight loss per month of HIV positive males in State IV HIV infection. Interpret
the results.

5. There is a general concern about the escalating costs of providing health care in
Indonesia. One of the components contributing to the increasing costs is the length
of the hospital stay of a patient. In a sample of 23 patients, the mean time was 4.5
days with a sample standard deviation of 1.3 days. Find a 95% confidence interval
for the true mean length of hospital days.

3.2 Odds Ratio


3.2.1 Example

A cohort of 1000 regular football players and 1000 non-footballers were followed to see if
playing football was significant in the injuries that they received. After 1 year of follow-up
there had been 12 broken legs in the football players and only four in the non-footballers.
The risk of a footballer breaking a leg was therefore 12/1000 or 0.012. The risk of a
non-footballer breaking a leg was 4/1000 or 0.004. The risk ratio of breaking a leg was
therefore 0.012/0.004 which equals three. The 95% CI was calculated to be 0.97 to 9.41. As
the CI includes the value 1 we cannot exclude the possibility that there was no difference
in the risk of footballers and non-footballers breaking a leg. However, given these results
further investigation would clearly be warranted.

1. How important is it?

2. How easy is it to understand?

3. When is it used?

4. What does it mean?

3.2.2 Example

A group of 100 patients with knee injuries, “cases”, was matched for age and sex to 100
patients who did not have injured knees, “controls”. In the cases, 40 skied and 60 did not,

3
giving the odds of being a skier for this group of 40:60 or 0.66. In the controls, 20 patients
skied and 80 did not, giving the odds of being a skier for the control group of 20:80 or
0.25. We can therefore calculate the odds ratio as 0.66/0.25 = 2.64. The 95% CI is 1.41 to
5.02.
If you cannot follow the maths, do not worry! The odds ratio of 2.64 means that the
number of skiers in the cases is higher than the number of skiers in the controls, and as
the CI does not include 1 (no difference in risk) this is statistically significant. Therefore,
we can conclude that skiers are more likely to get a knee injury than non-skiers.
Authors may give the percentage change in the odds ratio rather than the odds ratio
itself. In the example above, the odds ratio of 2.64 means the same as a 164% increase in
the odds of injured knees among skiers.
Odds ratios are often interpreted by the reader in the same way as risk ratios. This is
reasonable when the odds are low, but for common events the odds and the risks (and
therefore their ratios) will give very different values. For example, the odds of giving
birth to a boy are 1, whereas the risk is 0.5. However, in the side-effect example given
above the odds are 0.0101, a similar value to the risk of 0.01. For this reason, if you are
looking at a case–control study, check that the authors have used odds ratios rather than
risk ratios.

3.3 Risk Reduction and Number Needed to Treat


3.3.1 Example

One hundred women with vaginal candida were given an oral antifungal, 100 were given
placebo. They were reviewed 3 days later. The results are given in Table 1 page 4.

Table 1: Results of placebo-controlled trial of oral antifungal agent

Given antifungal Give placebo


Improved No improvement Improved No improvement
80 20 60 40

ARR = improvement rate in the intervention group – improvement rate in the control
group = 80% – 60% = 20%

So five women have to be treated for one to get benefit.

4
The incidence of candidiasis was reduced from 40% with placebo to 20% with treatment,
i.e. by half.
Thus, the RRR is 50%.

In another trial young men were treated with an expensive lipid-lowering agent. Five
years later the death rate from ischaemic heart disease (IHD) is recorded. See Table 2
page 5 for the results.

Table 2: Results of placebo-controlled trial of Cleverstatin

Given Cleverstain Give placebo


Survived Died Survived Died
998(99.8%) 2(0.2%) 996(99.6%) 4(0.4%)

ARR = improvement rate in the intervention group – improvement rate in the control
group = 99.8% – 99.6% = 0.2%
So 500 men have to be treated for 5 years for one to survive who would otherwise have
died.
The incidence of death from IHD is reduced from 0.4% with placebo to 0.2% with treat-
ment – i.e. by half.
Thus, the RRR is 50%.
The RRR and NNT from the same study can have opposing effects on prescribing habits.
The RRR of 50% in this example sounds fantastic. However, thinking of it in terms of an
NNT of 500 might sound less attractive: for every life saved, 499 patients had unnecessary
treatment for 5 years.
Usually the necessary percentages are given in the abstract of the paper. Calculat-
ing the ARR is easy: subtract the percentage that improved without treatment from the
percentage that improved with treatment.
Again, dividing that figure into 100 gives the NNT.

5
With an NNT you need to know:

a. What treatment?

b. What are the side-effects?

c. What is the cost?

d. For how long?

e. To achieve what?

f. How serious is the event you are trying to avoid?

g. How easy is it to treat if it happens?

For treatments, the lower the NNT the better – but look at the context.
(a) NNT of 10 for treating a sore throat with expensive blundamycin
not attractive
(b) NNT of 10 for prevention of death from leukaemia with a non-toxic chemotherapy
agent
Worthwhile
Expect NNTs for prophylaxis to be much larger. For example, an immunization may have
an NNT in the thousands but still be well worthwhile. Numbers needed to harm (NNH)
may also be important.
In the example above, 6% of those on cleverstatin had peptic ulceration as opposed to 1%
of those on placebo i.e. for every 20 patients treated, one peptic ulcer was caused.
You may see ARR and RRR given as a proportion instead of a percentage. So, an ARR of
20% is the same as an ARR of 0.2
Be prepared to calculate RRR, ARR and NNT from a set of results. You may find that it
helps to draw a simple table and work from there.

4 Homework:
4.1 Critical Appraisal of Confidence Intervals
Read an article as follow:

Melnyk, B. M., Jacobson, D., Kelly, S., Belyea, M., Shaibi, G., Small, L., . . . Mar-
siglia, F. F. (2013). Promoting Healthy Lifestyles in High School Adolescents: A
Randomized Controlled Trial. American Journal of Preventive Medicine, 45(4), 407-415.
doi: http://dx.doi.org/10.1016/j.amepre.2013.05.013

6
Please answer the following questions:

1. In the Figure 1, please examine the number of subjects enrolled and analyzed at
6-month follow-up. What is your comment regarding the results of randomization
of this study? How did investigators try to avoid insufficient of sample size in this
study?

2. Examine Table 1 on the baseline findings and demographics characteristics of sub-


jects compared. Please comment regarding the results of randomization on this
study. Do you think the investigators have already conducted a proper randomiza-
tion? How they did it?

3. Please read the outcomes presented in table 3. They calculated the 95% confidence
intervals for each of outcome but they did not say anything in wording for these
numbers. Please read the results of their 95% confidence intervals for between-
group differences.

a. Write your reading for 95% confidence intervals in a plain language so that
non-statistician can understand the meaning.
b. Please pay special attention on the 95% confidence interval of between-group
differences for the category of marijuana. What is your comment?

4.2 Calculating and Interpreting Confidence Intervals using STATA


The data to be used in this exercise consist of observations on 9 variables for 118 female
psychiatric patients and are taken from Hand, D. J., Daly, F., Lunn, A. D., McConway, K.
J., & Ostrowski, E. 1994. A Handbook of Small Data Sets. London: Chapman & Hall.
The variables are as follows:

1. id: identification number

2. age: age in years

3. IQ: intelligence score

4. anxiety: anxiety (1=none, 2=mild, 3=moderate, 4=severe)

5. depress: depression (1=none, 2=mild, 3=moderate, 4=severe)

6. sleep: can you sleep normally? (1=yes, 2=no)

7
7. sex: have you lost interest in sex? (1=no, 2=yes)

8. life: have you thought recently about ending your life? (1=no, 2=yes)

9. weight: increase in weight over last six months (in lbs)

The data are given in ASCII data set (fem.dat) and missing values are coded as -99.

4.2.1 Data Preparation

1. How many cases have missing values in this data set? Can you compared the
summary statistics before and after missing values are deleted?

2. The variable sleep has been entered incorrectly as ‘3’ for subject number 3. Can you
correct this file? Hints: use command codebook or assert to check this error.

3. Since we do not know what the correct code for sleep should have been, we can
replace the incorrect value of 3 by ‘missing’ (i.e: in STATA: replace sleep=.
if sleep==3).

4. In order to have consistent coding for ‘yes’ and ‘no’, you have to recode the variable
sleep answer 1=2 and answer 2=1 (i.e. in STATA: recode sleep 1=2 2=1).

5. To avoid confusion in the answer, you could label the values become no or yes, for
example by executing following STATA commands:

label define yn 1 no 2 yes


label values sex yn
label values life yn
label values sleep yn

The last three commands could also have been carried out in a foreach loop as
follow:

foreach x in sex life sleep \{


label values ‘xt’ yn
\}

8
4.2.2 Descriptive Analysis

There are a variety of questions that might be addressed by these data; for example, how
are the variables related? Do women who have recently contemplated suicide differ in
any respects from those who have not? Also of interest are the correlations between
anxiety and depression and between weight change, age, and IQ.
The data contain a number of interval scale or continuous variables (weight change,
age, and IQ), ordinal variables (anxiety and depression), and dichotomous variables (sex
and sleep) that we wish to compare between two groups of women: those who have
thought about ending their lives and those who have not.

1. Please compare the suicidal women with the non-suicidal by simply tabulating
suitable summary statistics in each group. Interpret the results and write up as you
might want to submit in the journal. For example, for IQ the required instruction
is:
table life, contents(mean iq sd iq)

2. A more formal approach to comparing the two groups on say weight gain over the
last six months might involve an independent samples t-test. Before using such
test, you need to check whether the assumptions needed for the t-tests appear to be
satisfied for weight gain. One of these assumption is a normal distribution of the
outcome (dependent) variable. What is the most easiest way to check this normality
assumption.

3. One way to check the assumption is by plotting the variable weight as a boxplot for
each group after defining appropriate labels. What is your interpretation from cre-
ating the boxplot between the two groups on weight gain over the last six months?

4. You can also check the assumption of normality more formally by plotting a nor-
mal quantile plot of suitably defined residuals. Here the difference between the
observed weight changes and the group-specific mean weight changes can be used.
If the normality assumption is satisfied, the quantiles of the residuals should be
linearly related to the quantiles of the normal distribution. Is it true? What is your
conclusion from reading this graph?
Hints: use following commands to obtain QQ Plot:

egen res=mean(weight), by(life)


replace res=weight-res
label variable res "residuals of t-test for weight"
qnorm res, title("Normal Q-Q plot")

9
5. You could also test whether the variances differ significantly and calculate 95%
confidence intervals using this command:
sdtest weight, by(life)
Please interpret your output on the meaning of 95% confidence intervals. Can you
suggest using these numbers that two groups are difference?

5 Outputs
1. Discussion on Confidence interval calculation

2. Understanding concepts of OR, RR, ARR, RRR, and NNT.

3. Reading and understanding articles on presenting confidence intervals and it’s in-
terpretation

4. Conducting data analysis for confidence intervals and hypothesis testing

10
6 References:
1. Bewick V, Cheek L, and Ball J, (2002). Statistics review 11: Assessing risk . J. Critical
Care 8(4):287-291. Available online http://ccforum.com/content/8/4/287

2. Marshall, G., & Jonker, L. (2011). An introduction to inferential statistics: A re-


view and practical guide. Radiography, 17(1), e1-e6. doi: http://dx.doi.org/10.
1016/j.radi.2009.12.006

11
7 Log sheet:

No Activities Date Signature


1 Discussion on confidence interval calcu-
lation and hypothesis testing
2 Laboratory session: Calculation and in-
terpretation of Interval estimation
3 Critical appraisal to understand interpre-
tation and relationships between hypoth-
esis testing and interval estimation
3 Examining relationships between hypoth-
esis testing and interval estimation
Score : ____________________

Instructor,

12

You might also like