Professional Documents
Culture Documents
Lecture 2
Section A:Continous Data: Useful Summary Statistics
Learning Objectives
The sample mean is easily computed by adding up the five values and
dividing by 5: in stat notation the sample mean is frequently
represented by a letter with a line over it: for example
(pronounced x bar)
Lecture 2, Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Mean, Example
Where
In the formula to find the mean, we use the
summation sign: This is just mathematical shorthand for add
up all of the observations
The median is the middle number (also called the 50th percentile)
80 90
95
110 120
Sample Median
Sample Median
For example, if 120 became 200, the median would remain the
same, but the mean would change from 99 mmHg to 115 mmHg
80
10
90
95
80
90
95
110 200
Median
11
12
Lecture 2, Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Describing Variability
Describing Variability
13
14
Describing Variability
Recall, the 5 systolic blood pressures (mm Hg) with sample mean (
) of 99 mmHg
15
Describing Variability
16
Notes on s
Sample variance
The units of s are the same as the units of the data (for example,
mm Hg)
Often abbreviated SD or sd
s = 15.97 16 (mmHg)
17
18
Lecture 2, Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Notes on s
The formula
20
19
Percentiles
(Estimate of ):
(Estimate of ):
21
22
The 10th percentile for these 113 blood pressure measurements is 107
mmHg, meaning that approximately 10% of the men in the sample
have SBP 107 mmHg, and (100-10) = 90% of the men have SBP >
107 mmHg
The 75h percentile for these 113 blood pressure measurements is 132
mmHg, meaning that approximately 75% of the men in the sample
have SBP 132 mmHg, and (100-75) = 25% of the men have SBP >
132 mmHg
23
Percentile
Sample Estimate
2.5th
100.7 mmHg
25th
114 mmHg
50th
123 mmHg
75th
132 mmHg
97.5th
151.2 mmHg
24
Lecture 2, Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
(Estimate of ):
(Estimate of ):
(Estimate of population median):
25
26
Summary
Sample Estimate
2.5th
1 day
25th
1 day
50th
2 days
75th
5 days
97.5th
20 days
27
28
Learning Objectives
29
30
Lecture 2, Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
31
32
Example 1: Histogram
33
Example 1: Histogram
34
Example 1: Histogram
35
36
Lecture 2, Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Example 1: Histogram
Example 1: Boxplot
37
38
1
1
http://inclass.kaggle.com/
http://inclass.kaggle.com/
39
40
Anatomy of a boxplot:
(Large)
Outliers
Largest non-outlying value
(upper tail)
75th percentile (upper
hinge)
Median(50th percentile)
41
Lecture 2, Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Outlier Cutoffs
Large outliers: values > upper hinge +1.5*IQR
i.e. > 75th percentile +1.5*(75th percentile 25th
percentile)
Small outliers: values < lower hinge - 1.5*IQR
i.e. < 25th percentile -1.5*(75th percentile 25th
percentile)
2 Kelsall JE, Samet J, Zeger SL, Xu J: Air pollution and mortality in Philadelphia, 1974-1988. American
Journal of Epidemiology 146(9):750-762, 1997.
44
Boxplot
45
46
Uniform
47
48
Lecture 2, Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Summary
49
Learning Objectives
50
51
52
53
54
Lecture 2, Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Sample
n=113
n=213
n=1,000
Sample
n=113
n=213
n=1,000
124.4
123.5
122.6
12.9
13.1
12.4
121.7
123.4
123.6
14.4
11.6
13.3
122.6
123.1
122.6
12.0
13.1
13.1
123.1
123.4
122.8
14.5
12.9
12.9
120.5
123.1
123.1
13.6
13.6
13.1
55
56
(Estimate of ):
(Estimate of ):
(Estimate of population median):
57
58
59
60
10
Lecture 2, Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Sample
n=200
n=500
n=1,000
4.05
4.19
4.15
4.97
4.21
4.12
4.06
4.31
4.28
4.03
3.99
4.14
3.96
4.16
4.24
61
62
(Estimate of ):
Sample
n=200
n=500
n=1,000
4.68
4.78
5.02
5.67
4.84
4.70
4.90
5.10
5.10
4.50
4.78
4.74
4.45
4.65
4.73
(Estimate of ):
(Estimate of population median):
63
64
65
66
11
Lecture 2, Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Sample
n=200
n=500
n=1,000
56.1
54.3
53.9
52.3
53.2
55.0
54.0
54.4
54.7
52.7
54.3
54.3
55.0
53.7
53.7
67
68
Sample
n=200
n=500
n=1,000
Sample
n=200
n=500
n=1,000
18.1
18.1
17.9
56.2
56.5
54.3
18.4
17.8
18.0
53.5
52.9
55.5
17.8
17.1
17.7
55.6
55.5
56.6
17.3
16.9
17.8
51.8
54.0
55.0
17.4
18.1
18.0
55.5
54.5
54.1
69
Wrap Up
70
Wrap Up
71
72
12
Lecture 2, Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Learning Objectives
73
Motivation
74
Motivation
75
76
Boxplot
77
Histograms
78
13
Lecture 2, Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Histograms
79
80
81
82
Histograms
83
Mean Difference
84
14
Lecture 2, Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Histograms
From Abstract
85
From Abstract
87
Article Graphic
86
88
From abstract
From abstract
Jagsi R, et al. Gender Differences in the Salaries of Physician Researchers. Journal of the American
Medical Association (2012); 307(22); 2410-2417.
89
90
15
Lecture 2, Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation
Article Table
Article Table
91
92
Summary
93
94
16