Professional Documents
Culture Documents
Descriptive Statistics
Part 2: Descriptive Measures
Learning Objectives
Measures of centre
Measures of dispersion (spread)
Standardized (Z) scores
Identifying potential outliers
Box & whisker plots (boxplots)
Goal 2 of Descriptive
Statistics
In part 1, we looked at grouping and
graphing data.
Methods depended on data type.
Measures of Centre
Most common measure of centre is the
MEAN (average).
Mean =
(sum of numbers)/(# of numbers)
x1 , x2 , x3 , , xn
Then:
x1 x2 xn
5
Then:
x1 5, x2 9, x3 13.5, x4 18.2
x1 x2 x3 x4 5 9 13.5 18.2
45.7
6
Sigma Notation
In other words, sigma notation just
means add them up.
Median
The value that separates the top and
bottom halves of ORDERED data.
To find the median: ~
x
Arrange data into increasing order.
Odd # of data points: Take middle number.
Even # of data points: Take average of the
two middle numbers.
Median: Example
Suppose you have the following data:
4, 6, 0, 3.1, 5.5, 7, 4
Median: Example
Suppose you now have this data:
5, 1, 8, 9, 0, 7, 2, 1
Step 1: Arrange.
0, 1, 1, 2, 5, 7, 8, 9
~
x
11
Mode
Most frequent data point.
A data set can have any number of
modes.
0 modes: All data points occur once.
1 mode: One observation occurs more
than the others.
2 modes: Two observations occur equally
more often than the others.
Etc
12
Mode
Example: For the following data sets,
find the mode.
(a) 5, 9, 1, 4, 9, 4, 9, 6.
(b) 5, 9, 1, 4, 9, 4, 9, 6, 4.
(c) 1, 2, 3, 4, 5, 6, 7.
13
14
Robustness
Measures that are NOT sensitive to
outliers are called Robust.
Therefore, the following measures of
centre are robust:
16
is possible.
17
Measures of Centre:
Graphically
Where are the measures of centre on
the following distributions?
18
Measures of Centre:
Graphically
19
Using Minitab
Minitab quickly calculates measures of
centre for you (seen in section 1 for the
average of the circles).
20
Measures of Spread
Does the mean or median tell you how
your data is spread out (dispersed)?
NO! For example:
Consider the following data:
49, 50, 51
Mean = 50, median = 50.
21
Measures of Spread
It is very important in statistical
analyses to be able to describe how the
data is spread out.
Three main ways of doing this:
Range
Standard Deviation
Interquartile Range
22
Range
Range: Max value Min value of the
data.
Example: 34, 10, 49, 28, 51, 19.
Range = 51 10 = 41.
Is it sensitive to outliers?
23
Standard Deviation
Range only involves the maximum and
minimum observations.
It therefore ignores ALMOST ALL of
your data!
Standard deviation takes ALL
observations into account.
Measures how much, on average, each
value differs from the mean.
24
25
26
(x x)
n 1
27
Example Calculation
A company conducted a survey to
determine how long it takes their
employees to get to work. The data is
recorded in minutes. Find the variance
and standard deviation of the data.
Include units.
13.0, 17.5, 24.6, 18.0, 20.4, 17.7.
28
Example Calculation
Step 1: Find the mean. DO NOT
round intermediate calculations, but
round your final answer (in this case,
the standard deviation) two decimal
places.
Example Calculation
(Data was 13, 17.5, 24.6, 18, 20.4, 17.7 with
mean 18.53333333).
For the rest of the calculation, use a table:
xi x
( xi x ) 2
17.5
-1.03333333
1.06777777
24.6
6.06666667
36.80444449
18
-0.53333333
0.28444444
20.4
1.86666667
3.48444446
17.7
-0.83333333
0.69444444
xi
13
TOTAL = 72.95333334
30
Example Calculation
Divide that total by n 1. This gives
variance.
Variance S 2
x x
n 1
32
Standard Deviation:
Graphically
The mean measures where the
CENTRE of your data is.
Standard deviation measures how
SPREAD OUT your data is.
Large S => lots of spread, and vice
versa.
33
Standard Deviation:
Graphically
Example: The datasets graphed on the
next slide have the same mean, and
are graphed on the same scales.
Which one has the larger standard
deviation?
34
Standard Deviation:
Graphically
35
( xi x )
xi
2
-7.2
-4.2
10
0.8
12
2.8
17
7.8
( xi x ) 2
Always
Happens!
Total =
37
38
S and S
42
Population Parameters
The MEAN of a POPULATION is
calculated in the same way as for a
SAMPLE.
The STANDARD DEVIATION of a
POPULATION is slightly different than
that of a sample.
43
(x x)
i
n 1
(x )
i
17 , 2
46
47
Example
x
Z
28.1, 5.83
Calculate the Z scores of the data
points: x = 39.4, x = 13.6, x = 28.1
What is the significance of the SIGN
(+, -, or zero) of your Z score?
48
Z Scores
If a Z score is POSITIVE, then the
observation it came from was ABOVE
the mean.
If a Z score is NEGATIVE, then the
observation was BELOW the mean.
If a Z score is ZERO, then the
observation EQUALLED the mean.
49
11, 3.1
(a) Calculate the Z scores of ALL the
observations (round each to 1 decimal).
(b) Find the mean and population standard
deviation of these Z scores (1 decimal).
50
Interquartile Range
Standard deviation uses the MEAN to
determine spread.
Interquartile Range uses the MEDIAN.
Also provides a method of determining
potential outliers.
52
Quartiles: Definitions
A common practice in statistics is to
divide your data into QUARTERS.
First Quartile (Q1): The observation
below which is the bottom 25% of the
data.
Second Quartile (Q2): Same, but 50%.
Note that Q2 =
54
Quartiles: Example
Find the quartiles for the following data
sets:
(a) 12, 34, 21, 9, 20, 16, 80, 45, 32.
(b) 105, 51, 142, 88, 100, 97.
First: Arrange
(a) 9, 12, 16, 20, 21, 32, 34, 45, 80.
(b) 51, 88, 97, 100, 105, 142.
5 Number Summary
The following five values of a data set
make up its 5 number summary:
Minimum
Q1
Q2
Q3
Maximum
56
5 Number Summary
Example: Find the 5 number
summary for the data set (a) of our
quartile example.
You must arrange the data first, which was:
9, 12, 16, 20, 21, 32, 34, 45, 80.
58
59
Potential Outliers
Data that falls WITHIN the upper and
lower limits is considered OK.
The following data points are
considered POTENTIAL OUTLIERS:
Higher than UL.
Lower than LL.
60
61
Data in here is OK
LL = -24.25
UL = 77.75
63
64
Adjacent Values
Adjacent values are the two values
WITHIN the LL and UL, but CLOSEST
to them.
Example: In our quartile example:
9, 12, 16, 20, 21, 32, 34, 45, 80:
LL was -24.25 and UL was 77.75.
Modified Boxplots
A way to picture the 5 number
summary, adjacent values, and
potential outliers.
Modified boxplot:
Make a box from the three quartiles.
The whiskers (lines) are drawn from the
box to the ADJACENT values.
Mark the potential outliers as *.
66
Modified
Boxplots
80
70
60
50
40
30
20
10
0
Data (arranged):
9, 12, 16, 20, 21, 32, 34, 45, 80.
5 number summary was {9,
14, 21, 39.5, 80}
Adjacent values: 9, 45
Potential outlier: 80
Find:
5 number summary.
Lower limit and upper limit.
Adjacent values.
Potential outliers.
Boxplots in Minitab
Minitab makes modified boxplots.
Warning: Its mechanism for finding Q1
and Q3 is a bit different from the way
we do it by hand.
They will be fairly close to what you
would find by hand.
69