You are on page 1of 47

Descriptive Statistics

Tabular and Graphical Presentations


Summarizing QUALITATIVE Data
Frequency Distribution
Relative Frequency Distribution
Percent Frequency Distribution
Bar Graphs
Pie Charts
Frequency Distribution
Tabular summary of data showing the number
of items in each of several categories.
Example Soft Drink Purchases
Get Excel File

Soft Drink Frequency


Coke Classic 19
Diet Coke 8
Dr. Pepper 5
Pepsi 13
Sprite 5
TOTAL 50
How to Using Excel
Step 1: Add columns
Soft Drink (in C1) and type names of soft drinks
starting in C2
Frequency (in D1)

Step 2 Select cell D2


Step 3: Enter =Countif ($A$2:$A$51,C2)
Copy cell D2 to cells D3 through end
Relative and Percent Frequency
The relative frequency of a class is the fraction
or proportion of the total number of data items
belonging to the class.
= frequency of the class/ n

The percent frequency of a class is the relative


frequency multiplied by 100.
How to Using Excel
For Relative Frequency:
Step 1: Calculate Frequency as above

Step 2: Create new column

Step 3: write formula frequency/n

For Percent Frequency:


Step 1: Create new column

Step 2: write formula relative frequency *100


Example using Soft Drink Purchases
Soft Drink Relative Frequency Percent Frequency
Coke Classic .38 38
Diet Coke .16 16
Dr. Pepper .10 10
Pepsi .26 26
Sprite .10 10
Total 1.00 100
Bar Chart
A bar graph is a graphical device for depicting
qualitative data.
On one axis (usually the horizontal axis), we specify
the labels that are used for each of the classes.
A frequency, relative frequency, or percent
frequency scale can be used for the other axis
(usually the vertical axis).
Example
Soft Drink Purchases
20
18
16
14
Frequency

12
10
8
6
4
2
Brand
Coke Diet Dr. Pepsi Sprite
Classic Coke Pepper
Ho to Using Excel
Step 1: Calculate the Frequency as above
Step 2: Click insert, then charts, then column
Step 3: click next
Step 4: Change chart title, x axis and y axis
Step 5: click next
Pie Chart
The pie chart commonly used graphical device
for presenting relative frequency distributions
for qualitative data.
Pie Chart

Pie Chart

Sprite
10%
Coke Classic
Coke Classic
Pepsi 38% Diet Coke
26% Dr. Pepper
Pepsi
Sprite
Dr. Pepper Diet Coke
10% 16%
How to Using Excel
Step 1: Calculate % frequency as above
Step 2: Highlight columns of interest
Step 3: Click insert, then charts, then pie
Step 4: click next
Step 4: Change chart title, legend labels
Step 5: click next
You Try it:
Exercise #4 get data (syndicated from module
folder)

Exercise #7 you need to create it from scratch.

Any Questions????
Summarizing QUANTITATIVE
Data
Frequency Distribution
Relative Frequency Distribution
Percent Frequency Distribution
Dot Plot
Histogram
Cumulative Distributions
Ogive
Frequency Distribution
With quantitative data must be careful in defining non-overlapping classes

Three necessary steps:


1) Determine number of non-overlapping classes
formed by specifying ranges used to group data
use between 5 and 20 classes

1) Determine width of each class


usually the same for each class (i.e., equal width)
approximate class width =
(largest data value- smallest data value)/number of classes

2) Determine class limits

lower class limit smallest possible data value in class


upper class limit largest possible data value

Class Midpoints
Example Using Audit Data
Preliminary work:
1. Determine number of non-overlapping cases
since only 20 pieces of data decide to use 5 cases

2. Determine width of cases


(33-12)/5 = 4.2 so round to 5

3. Determine upper and lower limits


decide on 10 as lowest value so .
10-14
15-19
20-24
25-29
30-34
How to
Step 1: Add columns
Audit Time (in C1) and type ranges starting in C2
Upper limits (in D1) and type upper limits

Frequency (in E1)

Step 2 Select cell E2:E6


Step 3: Type (But dont enter)
=Frequency (A2:A21,D2:D61)
Press CTRL+SHIFT+ENTER
Relative and Percent Frequency
Same as with qualitative data:
relative frequency of class = frequency of class/n

percent frequency of class = relative frequency * 100


Histogram
The variable of interest is placed on the horizontal axis.

A rectangle is drawn above each class interval with its


height corresponding to the intervals frequency,
relative frequency, or percent frequency.

Similar to a bar graph BUT a histogram has no natural


separation between rectangles of adjacent classes.

One of most important uses is to provide info on the


shape of the distribution
Audit Data

8
7
6
Frequency

5
4
3
2
1

10-14 15-19 20-24 25-29 30-34


Audit Time (days)
NOTE that histograms may be skewed slightly
or a lot.

Examples:
Skewed left (longer tail to left)
Skewed right (longer tail to right)
How to using Excel
Step 1: Highlight class column
Step 2: Highlight frequency column
Step 3: click insert, charts, column, click
clustered column
Step 4: Click next then select rows.
Step 5: Click next
Step 6: change title, label axes
Cumulative Distributions
Rather than showing the frequency of each class, shows the number of items with
values less than or equal to the upper limit of each class.

Audit time (days) Cumulative Cumulative Cumulative


Frequency Relative Percent
Frequency Frequency
Less than or equal to 14 4 .20 20

Less than or equal to 19 12 .60 60

Less than or equal to 12 17 .85 85

Less than or equal to 29 19 .95 95

Less than or equal to 34 20 1.00 100


Also know the following from the
book:
Dot Plot
Ogive
You try it
Exercise # 15 you need to creat it from
scratch

Exercise # 20 get the data (ports from


excel)

Any Questions ???


Exploratory Data Analysis
Stem and leaf displays
Cross tabulation
Scatter Diagram
Stem and Leaf Plots
shows the rank order and shape of the distribution of
the data.
similar to a histogram on its side, but has the
advantage of showing the actual data values
The first digits of each data item are arranged to the left
of a vertical line.
To the right of the vertical line we record the last digit
for each item in rank order
Each line in the display is referred to as a stem
Each digit on a stem is a leaf.
Example with Ap test

6 8 9
7 2 3 356 6
8 0 1 1 2 3 4 5 6
9 1 2 2 2 4 5 5 6 7 8 8
10 0 0 2 4 6 6 68 7 8 8
11 2 3 5 5 8 9 9
12 4 6 7 8
13 2 4
14 1
a stem
A leaf
Advantage over Histogram
Similar to a histogram, but

Easy to construct by hand


Displays actual data
If we believe the original stem-and-leaf display
has condensed the data too much, we can
stretch the display by using two stems for each
leading digit(s).
Whenever a stem value is stated twice, the first
value corresponds to leaf values of 0 - 4, and the
second value corresponds to leaf values of 5 - 9.
Stretched Stem-and-Leaf Display

5 2
5 7
6 2 2 2 2
6 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4
7 5 5 5 6 7 8 9 9 9
8 0 0 2 3
8 5 8 9
9 1 3
9 7 7 7 8 9
10 1 4
10 5 5 9
Example: Leaf Unit = 0.1
If we have data with values such as
8.6 11.7 9.4 9.1 10.2 11.0 8.8

a stem-and-leaf display of these data will be

Leaf Unit = 0.1


8 6 8
9 1 4
10 2
11 0 7
Example: Leaf Unit = 10

If we have data with values such as


1806 1717 1974 1791 1682 1910 1838

a stem-and-leaf display of these data will be

Leaf Unit = 10
16 8
The 82 in 1682
17 1 9 is rounded down
18 0 3 to 80 and is
represented as an 8.
19 1 7
Crosstabulations
Looks at the relationship between two variables at
a time
Crosstabulation can be used when:
one variable is qualitative and the other is quantitative
both variables are qualitative, or
both variables are quantitative.
Crosstabulation

Example: Finger Lakes Homes


The number of Finger Lakes homes sold for each
style and price for the past two years is shown below.
quantitative qualitative
variable variable
Price Home Style
Range Colonial Log Split A-Frame Total
< $99,000 18 6 19 12 55
> $99,000 12 14 16 3 45

Total 30 20 35 15 100
Example conclusions
The greatest number of homes (19) in the
sample are a split-level style and priced at less
than or equal to $99,000.
Only three homes in the sample are an A-Frame
style and priced at more than $99,000 ..
Crosstabulation

Frequency distribution
for the price variable

Price Home Style


Range Colonial Log Split A-Frame Total
< $99,000 18 6 19 12 55
> $99,000 12 14 16 3 45

Total 30 20 35 15 100

Frequency distribution
for the home style variable
How to.
Page 89-103 of text (Working with Excel)
Simpsons Paradox
In some cases the conclusions based upon an aggregated
crosstabulation can be completely reversed if we look at the
unaggregated data.

Marital status Eat Candy Do Not Eat Total


(X) Regularly (Y) Candy Regularly
(Y)

Single 750 (75%) 249 (25%) 999

Married 1365 (63%) 745 (37%) 2010

Single people more likely to eat candy but.


Nature of the association changes

<25 >25
Eat Do Not Total Eat Do Not Total
Candy Eat Candy Eat
Reg Candy Reg Candy
Reg Reg

Single 632 167 799 Single 120 80 200


(79%) (21%) (60%) (40%)
Married 407 96 503 Married 873 634 1507
(81%) (19%) (58%) (42%)

When consider age marital status makes not difference


Scatter Plots
A scatter diagram is a graphical presentation of the
relationship between two quantitative variables.

One variable is shown on the horizontal axis and the


other variable is shown on the vertical axis.

The general pattern of the plotted points suggests the


overall relationship between the variables.

trendline is an approximation of the relationship.


Scatter Diagrams and Trendline

Positive relationship Negative relationship No apparent relationship

y y y

x x x
How to Using Stereo Data
See text p 104
You try it
# 25
# 36 get scatter data from excel file
Tabular and Graphical
Procedures
Data
Qualitative Data Quantitative Data

Tabular Graphical Tabular Graphical


Methods Methods Methods Methods

Frequency Bar Graph Frequency Dist. Dot Plot


Distribution Pie Chart Rel. Freq. Dist. Histogram
Relative Freq. % Freq. Dist. Ogive
Distribution Cum. Freq. Dist. Stem-and-
Percent Freq. Cum. Rel. Freq. Leaf Display
Distribution Distribution Scatter
Crosstabulation Cum. % Freq. Diagram
Distribution
Crosstabulation

You might also like