Professional Documents
Culture Documents
Some definitions
Descriptive Statistics - Methods for describing a data set
as it is.
Inferential Statistics Reach conclusions beyond what
the immediate data
Sample data insight ----- Infers population insight
More definitions!!
Population A unit that we are interested in studying
(ie, registered voters, college students, mlb players,
etc.)
Sample subset of the population
Variables Characteristics of the population or sample
Measures Assignment of a value for each characteristic
Could be continuous (ie age, weight) or categorical (A,B,C etc)
Exercise One
A survey shows that the average age of TV viewers is
51. Fox believes the average age of their viewership is
less than 51. To test this, they sample 200 viewers.
Describe
Describe
Describe
Describe
the
the
the
the
population
variable of interest
sample
inference
Reliability
As noted earlier, there could be error in a sample
statistic correctly approximating a population parameter
(ie Fox sample age correctly approximates age of ALL
Fox viewers)
We use measures of reliability to reflect the degree of
uncertainty about statistical inference
This is often referred to as the margin of error or
confidence interval
We will discuss these in more detail later
Data types
Quantitative (continuous)
Data that can be measured on a numerical scale
Exercise Two
The Army Corps of Engineers is measuring toxicity in
fish. The captured a total of 144 fish and noted the
following variables
Exercise Two
The Army Corps of Engineers is measuring toxicity in
fish. The captured a total of 144 fish and noted the
following variables
Exercise Three
Patient #
Quantitative Data
Notation
In statistics, notation is often used as shorthand for a variety
of calculations
Often times, the summation sign will be used (CAPITAL Sigma
we will use lower case sigma to denote a different item
later).
We then denote the measurements as x1, x2, x3, ..xn,
where n is the total number of measurements.
To sum these up, we use the summation sign as such:
Lets assume the following column of values
45
23
8
56
9
22
65
24
What is n?
What value would correspond to x4?
What is the value of ?
Lets assume the following column of values
45
23
8
56
9
22
65
24
What is n? n is 8
What value would correspond to x4? 56
What is the value of ? 252
Exercise Four
Assume a dataset of 3,8,4,5,3,4,6
Find
Find ^2
Find
Exercise Five
5
8
6
8
9
5
7
2
5
6
3
5
8
9
10
Sample mean
Formula for sample mean which is denoted as
Median
When n is odd, the median is simply the middle number
after arranging all observations either in ascending or
descending order
When n is even, then we use the average of the middle
two numbers
Ex: 1,5,6,8,9,9
The median would be 7 (Average of 6,8)
Mean or Median?
If the mean and median or close, we typically use the
mean
However, the mean is sensitive to extreme values
(outliers) whereas the median is not. In that case, the
median is often used.
Often, the median is used when considering household
incomes, which can have extreme outliers (Virginia sociology
majors)
Symmetric vs Skewed
When the mean is to the right of median, it is right
skewed (Skewness number would be negative)
When the mean is to the left of median, it is left skewed
(Skewness number would be postive)
Exercise Six
Consider {3,7,8,5,12,14,21,13,18,14}
Calculate:
Min
Max
Range
1st quartile
3rd quartile
Variance
Central measures are only part of the story
We are also interested in the spread of the distribution
We earlier considered range
Now we want to consider the variance and standard deviation
Variance
The distance from an observation to the mean of all
observations is called a deviation
Each observation would be noted as x- (Note that this
could be positive or negative
Ex.
Imagine a plot with three points {3,4,5}
The mean is 4, so the deviation for 3 is negative one and the
deviation for 5 is plus one (The deviation for 4 is 0)
Example of variance
Now consider these two data sets
What are the raw deviations for each number in each set?
{-2,-1,0,1,2} and {-1,0,0,0,1}
Calculating variance
Remember our last set {3,4,5}
We had deviations of -1,0,1
If we summed them up, they would equal zero, but that is
clearly not the case as there is a distribution
To resolve that, we square each of the deviations, sum them
up and then divide by N-1 (In this case, 2)
Therefore our variance is 1+0+1/2 = 1
Our formula looks like this:
Six sigma
Quality control methodology
Based on standard deviations
1 Standard Deviation encompasses about 68% of values
2 Standard Deviations encompasses about 95% of values
3 Standard Deviations encompasses about 99% of values
Remember, these are on either side of the mean, so a total of six
standard deviations (6 sigmas)
Six sigma
Exercise seven
Using excel
Enter the following numbers into a column
25,28,24,26,29,27,31,20,28,29
Using excel functions, return the mean, median, mode,
variance and standard deviation
Lunch Break
Probability
Why do professional poker players always seem to win?
LUCK??
Some, but they also have a great understanding of
probability and odds
Simple probability
Lets take our last examplewe flipped a coin and it
came up heads
We now want to flip the coin again
What is the probability that it will come up heads again?
If we flip it 10 times and it is heads all ten times, what is the
probability that it will come up heads the 11th time.
Experiments
These coin tosses are acts or observations that lead to a
single outcome, but that cannot be predicted with
certainty.
Each observation is called a sample point or simple
event
Sample points
In the case of the coin, there were two sample points
heads and tails
What are the sample points on a single die (Dice)
We have to be careful when considering the sample
points that are possible (See next slide)
Sample points
Problem: List all of the sample points if we flip TWO coins
We might think there are three sample points when
flipping these coins
Tails/Tails
Heads/Heads
Heads/Tails
Exercise eight
Denote the sample space of a single dice
Denote the sample space of the face cards (J, Q, K) in a
standard deck of cards
Probability defined
The probability of an event is the likelihood that an
outcome will occur when the experiment is performed
Probability is usually denoted as P
What is the likelihood that when we roll a standard die,
the roll will result in a 3?
It is 1 in 6, or approx. 16.7%
Probability expanded
Remember the 10 consecutive coin flips that resulted in
heads?
If we were to conduct that experiment (Flipping the coin)
one million times, we would very likely see a result that
suggests roughly half the flips were heads and half the
flips were tails.
This is the law of large numbers which states that the
relative frequency of an outcome approaches its true or
theoretical probability the more times you repeat it.
Probability rules
ALL probabilities, whether subjective or not, must obey
2 basic rules.
The probability of a sample point MUST lie between 0 and 1
The probabilities of all sample points within a sample space
must equal 1
Events
First, what are the sample points for a single die?
S: {1,2,3,4,5,6}
Suppose that instead of the probability of a single
sample point, we were interested in something like the
probability that the number will be even (or odd)this is
called an event
An event is typically noted as A.
Event A (even in this case)contains 3 sample points,
all with probabilities of 1/6we simply add them up to
get P of A of 1/2
Exercise ten
This is a study of divorced people
Group
Description
Proportion
PP
.12
CC
Occasional conflict
.38
AA
.25
FF
.25
So what do we do?
Combinatorial math
We utilize a combination formula
From our last scenario, lets assign N=100 (Total number of
items in the jar) and n=5 (Total number of items we wish to
select)
The formula we use is*:
N! / n!(N-n)!
The exclamation point is called a factorial (see next slide)
*This assumes once an item is selected it is NOT replaced). If it was replaced, the formula would be
100^5, but it wouldnt make sense in this context because we need 5 DIFFERENT items
Factorial
Factorial simply means that we multiply a number by
each number before it
Ex 5!
This means 5 x 4 x 3 x 2 x 1 (0! is 1 by definition)
So 5! = 120
Back to combinations
So in our example, N=100 and n=5
So we would have 100!/5!(100 5)!
This equals a REALLY big numberlets open up excel
and use the fact function to calculate it
You should get 75287520
That means there are over 75 million combinations
The probability of selecting any ONE of the
combinations is 1/75287520
Exercise eleven
Calculate how many samples of 5 items out of a
possible 20 there are.
(Answer is 15,504)
Lets calculate LOTTO!!
Assume 53 numbers and you need the right
combination of 6 to win. What is the probability?
Multiplicative probability of
independent events
What if we have multiple events occurring?
For example, the P of surving heart surgery is .9, and
the p of surviving the recovery is .94. What is the
probability that if you have heart surgery, you will
actually go home?
We simply multiply the probabilities, so the p of going
home is .9 * .94=.846 (So the hospital would say their
procedures have an 85% success rate)
Exercise 11 (Lotto)
53!/(6!)(47!)
This equals 22,957,480
The probability is 1/22,957,480
The ticket would state that the odds of winning are
roughly 1 in 23million
1.5
0.5
HH
1 H, 1 T
Toss of two coins
TT
HH
1 H, 1 T
TT
HH (0)
1 H, 1 T (1)
TT (2)
Exercise twelve
Lets say you work for an insurance company and you
sell a one year $10,000 policy with a premium of $290.
The policy pays out if the customer dies, but the p of
that happening is .001. What is the expected gain of
this transaction?
Exercise twelve
Lets say you work for an insurance company and you
sell a one year $10,000 policy with a premium of $290.
The policy pays out if the customer dies, but the p of
that happening is .001. What is the expected gain of
this transaction?
Gain x
Sample point
p
290
Customer lives
.999
-9,710 (290
10,000)
Customer dies
.001
Exercise thirteen
A certain type of chemotherapy is successful 70% of the
time. Let x equal the number of successful cures out of
5. X
0
1
2
3
4
5
P(x)
.002
.029
.132
.309
.360
.168
Exercise thirteen
Conclusion of Session 1
Q&A
Session 2 Agenda
Appendix
Website with statistical symbols
http://www.rapidtables.com/math/symbols/Statistical_Sy
mbols.htm
Greek letters
Z score table
T-table