You are on page 1of 19

Probability and statistics crash course

http://www.comp.leeds.ac.uk/hannah/mathsclub
Probability 1 (for dummies:-)
Stats 1 (averages and deviations)
Probability 2 (Trials and distributions)
Stats 2 (signicance)
Stats 3 (errors)
p. 1/19
Random variables
A random variable is an abstraction: it is how probability
theorists refer to things that can take more than one state,
usually with associated probabilities.
A fair 6 sided dice can be modelled as a random
variable with the possible outcomes {1, 2, 3, 4, 5, 6}. The
probability of each state is
1
6
.
Random variables can be discrete, like dice. . .
. . . or continuous, like the height of a sunower
Continuous random variables can take any number of
different values
p. 2/19
Probability functions
For discrete random variables, we can represent the
probability that each possible state occurs with a probability
mass function, or pmf.
For continuous random variables, we can represent the
distribution of probabilities with a probability density
function, or pdf.
To nd the probability of a particular continuous random
variable falling between two values, calculate the area
under the pdf for those values.
p. 3/19
Bernoulli trials
Any experiment where theres a random outcome and this
can be either success or failure is known as a Bernoulli
trial. A discrete random variable with 2 values is all you
need.
Head or tails?
Female or male?
Throwing a 6?
The Expectation E of a Bernoulli distribution with probility p
is E(X) = p, and the variance V (X) = p(1 p).
p. 4/19
Bernoulli distribution
The Bernoulli distribution is the distribution of outcomes in a
Bernoulli trial. Its really very simple. Its indexed by a single
parameter p, which is the probability of success.
Figure 1: pmf of a Bernoulli trial with p=0.2
p. 5/19
Trials
Think about throwing a dice repeatedly. How long are you
likely to have to wait to get a 6?
Figure 2: Waiting for a 6
p. 6/19
Trials 2
Think about throwing a dice 50 times. How many sixes will
you get?
Figure 3: Counting the number of 6s in 50 throws,
1000 times
p. 7/19
Distributions
The previous two slides are examples of the kind of
distribution you get when you ask particular questions
about a Bernoulli variable, and then carry out the
experiments to see what happens.
If you want to play with the parameters, the c++ code to
generate the data is on the web at
http://www.comp.leeds.ac.uk/hannah/mathsclub
Perhaps unsurprisingly, these distributions can be modelled
theoretically...
p. 8/19
Waiting for a 6 revisited
The probability of getting a 6 on the rst throw is...
The probability of NOT getting a 6 on the rst throw then
getting one on the second throw is...
The probability of NOT getting a 6 on the rst or second
throws then getting one on the third throw is...
p. 9/19
Geometric distribution
Waiting for success in a Bernoulli trial is governed by a
Geometric distribution.
P(X = j) = (1 p)
j1
p
p. 10/19
Probability distribution: Geometric
Figure 4: Probability of rolling a 6 in X throws; theo-
retical
Note: Unlike a few slides back, this pmf sums to 1, and is a
bit tidier!
p. 11/19
Binomial distribution
The pmf for a sum of n independent Bernoulli random
variables with success probability p is the Binomial
distribution.
p(x) =

n
x

p
x
(1 p)
nx
As you will remember from a few weeks back, n choose x
is dened as

n
x

=
n!
x!(n x)!
p. 12/19
pdfs for continuous random variables
Three main types of pdf are found with continuous random
variables. There are more, but these are the three big ones.
Continuous uniform distribution
Exponential distribution
Normal distribution (the famed Gaussian)
The probability of something falling between two values in
one of these distributions is the area under the distribution.
p. 13/19
Continuous uniform distribution
A continuous uniform distribution on the interval [a, b] has
pdf given by
f(x) =
1
b a
At all points, the probability is the same, and equal to
1
ba
.
p. 14/19
Exponential distribution
The pdf of a continuous random variable can sometimes be
modelled as an exponential distribution
Figure 5: Exponential distributions: lambda=0.2 and
0.5
p. 15/19
Exponential distribution: The sums bit
f(x) = e
x
Mean of an exponential distribution=
1

.
Variance of an exponential distribution=
1

2
.
p. 16/19
Normal distribution
Most things are normally distributed (blanket statement
alert). A normal distribution (Gaussian) is dened by two
parameters: mean and standard deviation.
Figure 6: Normal distributions: sd=2, 4 and 1,
means at either 5 or 7
p. 17/19
Normal distribution: The sums bit
The random variable X is normally distributed with mean
and variance
2
if the pdf is given by. . .
f(x) =
1

2
e
1
2
(
x

)
2
If the normal distribution has mean 0 and variance 1, its
called the Standard Normal distribution and is referred to as
Z.
p. 18/19
Central Limit Theorem
The central limit theorem is what a lot of statistics is based
upon.
The distribution of an average is normal, even if the
distribution from which the average is drawn is totally
strange
This bit is magic, and probably best addressed with an
animation:
http://www.statisticalengineering.com/central_limit_theorem.htm
p. 19/19

You might also like