You are on page 1of 97

2017/2018

2016/2017
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics Contents

Content of the lesson

1 – Descriptive statistics
- data organization
- graphical techniques
- position indices
- variance, covariance
2 – Random variables
3 – Main characteristics of some major random variables
- Gaussian distribution
- Gumbel distribtion
4 - Modeling
- Random variables used like models
- QQ-plot
5 – Sample: modeling modal choice

Design department 2
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics Introduction

Introduzion to statistics
Statistics is the science that says
if you have your head in the oven and your foot in refrigerator, so
on average you're ok

BAD STATISTICS

Only indices of synthesis


GOOD STATISTICS

- assessment of variability of the phenomenon


- search for correlations between cause and effect
- estimate of indices of synthesis only when significant

Design department 3
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Progress of the lesson

1 – Descriptive statistics
- data organization
- graphical techniques
- position indices
- variance, covariance
- mean, median, quartiles, quantile Mono-dimensional
- boxplot
- sample variance
- standardization
- histograms
- cumulative histograms

Design department 4
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Sample data
Motor Trend Car Road Tests
!
using a data set of examples to clarify concepts throughout the presentation

Description
!
evaluation of different aspects of the performance of American cars of the
years 1973-74

Contents
!
- mileage [km/l]
- displacement [cm3]
- power [kW]
- mass [kg]
- ¼ mile journey time [seconds]

Design department 5
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Data representation
mileage displacement power mass journey time

Mazda RX4 Wag 9,36 1000 110 1293 17,02


Datsun 710 10,16 675 93 1044 18,61
Hornet 4 Drive 9,54 1612 110 1446 19,44
Hornet Sportabout 8,33 2250 175 1548 17,02

Var 1 Var 2 Var 3 … Var p

Experimental unit 1 x11 x12 x13 … X1p


Experimental unit 2 x21 x22 x23 … X2p
… … … … … …
Experimental unit n xn1 xn2 xn3 … xnp

Design department 6
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Data representation

mileage displacement power mass journey time

Mazda RX4 Wag 9,36 1000 110 129375 17,02

Datsun 710 10,16 675 93 104400 18,61

Hornet 4 Drive 9,54 1612 110 144675 19,44

Hornet Sportabout 8,33 2250 175 154800 17,02

... ... ... ... ... ...

Evaluation of a single variable



one-dimensional analysis

n = 32 p=1

Design department 7
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Data representation
Dot diagram


- intuitive representation of the dispersion
- useful for few data

Design department 8
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Sample mean
!
ratio of the sum of the values and the number of values

1 n
x = ∑ xi
n i =1
- indicates the "center of gravity" of the data
- is affected by "outliers"

sample mean= 8,96 km/l

Design department 9
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Median
!
value that occupies the central position in an ordered group of data
x n + x n +1
n even ⌣ n odd ⌣
x = x n +1 x= 2 2

2 2
- measurement of the position that divides the data into two equal parts
- stable for the outliers

median = 8,56 km/l

Design department 10
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Quartiles - Quantile
a quartile is any of the three values which divide the sorted data set into four
equal parts, so that each part represents one fourth of the sampled population

•  first quartile = lower quartile = cuts off lowest 25% of data = 25th percentile
•  second quartile = median = cuts data set in half = 50th percentile
•  third quartile = upper quartile = cuts off highest 25% of data, or lowest 75% =
75th percentile
Interquartile range
!
difference between 1st and 3rd quartile (measure of variability of data) –
stable for outliers

Design department 11
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Data representation
Box plot

simultaneous and intuitive representation


- center (median)
- dispersion (1st-3rd quartile and interquartile range)
- symmetry
Design department 12
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Sample variance
n
2
(
∑ ix − x )
s2 = i =1
n −1
Sample standard deviation

s = s2
- indicate the "variability" or "dispersion" of data around the sampe mean
- s has the same units of measurement data (useful for quantifying the
phenomenon)

s = 2,69 km/l
Design department 13
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Standardization of data
xi − x
x" =
s

- formulation of values according to the standard deviation
- elimination of Units
New reference system
!
- average = 0 → centering of data on their average
- expression of the distances in standard deviation

Diagram of the standardized data points

Design department 14
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Effects of standardization
Original data

Standardized data

Design department 15
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Division into classes

xmax − xmin
Δx =
m
- m = number of intervals

- Δx = amplitude intervals
- not necessarily the same for all intervals

m=7

Design department 16
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Absolute frequency
!
ni = number of values of X falling in the i-th range

Relative frequency
!
fi = number of values of X falling in the i-th range, normalized by the total
number of values
ni
fi =
n

Design department 17
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Histograms
!
information contained in the area of rectangles
Mode
!
class characterized by the highest frequency

Design department 18
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Cumulative frequency
Absolute cumulative frequency Relative cumulative frequency
j j
X j = ∑ ni Fj = ∑ f i
i =1 i =1

Frequency

Cumulative Frequency

Design department 19
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Cumulative histogram

Design department 20
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics

Mileage histogram

Mileage cumulative histogram

Design department 21
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Progress of the lesson

2 – Random variables
- introduction
- from sample to population
- probability density function
- cumulative distribution function
- expected value
- variance
- comparison between sample and population

Design department 22
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Role of random variables


need to answer such questions:

- which was the average consumption of U.S. automobiles in the year


1973-74?
- consumption varied greatly between different models or were all
pretty similar?
- which was the distribution of consumption?

Design department 23
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Definizions
Population or Universe
!
whole group of elements under investigation
eg: all models of cars circulating in the U.S. in 1973-74

analysis is often impossible or uneconomical

Sample
!
small part of the universe
extracted using appropriate sampling techniques
eg: 32 models

Inference
!
process of generalization of results (obtained by observing a sample) to the
entire population (or universe) from which the sample was extracted

Design department 24
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Definizions
Descriptive statistics
!
set of scientific methods to collect,
sort, analyze, represent a group of data

Inferential statistics
!
all scientific methods designed to draw conclusions about
a population from a representative sample

Design department 25
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Hipotesys
!
you know the consumption of all U.S. car models

the universe is known


is possible to describe the consumption of U.S. automobiles with a random
variable with known probability distribution

Design department 26
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Probability density function

f x (x)

Design department 27
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Probability density function

f x (x)

Design department 28
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Probability density function


Properties
+∞

f X ( x) ≥ 0 ∫ f X ( x)dx = 1
−∞

b
Pr(a < X ≤ b) = ∫ f X ( x)dx
a

Design department 29
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Expected value
or Mean
!
indicates the "center of gravity" of the data

+∞
µ x = E [X ] = ∫ x ⋅ f ( x)dx
−∞

Design department 30
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Median - Quartiles

Design department 31
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Comparison between sample and population

Design department 32
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Variance
!
shows the degree of concentration of the distribution
around the mean
+∞
2
σ x = Var [X ] = ∫ ( x − µ x ) 2 ⋅ f ( x)dx
−∞

Standard deviation
σ x = σ x2

Design department 33
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Comparison between sample and population

Design department 34
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Cumulative distribution function

x
F ( x) = Pr( X ≤ x) = ∫ f ( x)dx
−∞

Design department 35
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Cumulative distribution function


Properties
!
- is defined over the interval [- ∞, + ∞]
- takes values in the range [0, 1]
- tends to zero for x → -∞ - tends to 1 for x → +∞
- FX (a) ≤ FX (b) for any a and b with a ≤ b
→ monotonically nondecreasing function

- FX (b) - FX (a) = Pr(a < X ≤ b)

x
dFX ( x)
f X ( x) =
dx
Fx ( x ) = ∫ f (x ) dx
−∞
x

Design department 36
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Probability density function

Cumulative distribution function

Design department 37
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 2D

Progress of the lesson

1 – Descriptive statistics
- correlazion
- dispersion matrix
Two-dimensional

Design department 38
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 2D

Data representation

mileage displacement power mass journey time

Mazda RX4 Wag 9,36 1000 110 1293 17,02

Datsun 710 10,16 675 93 1044 18,61

Hornet 4 Drive 9,54 1612 110 1446 19,44

Hornet Sportabout 8,33 2250 175 1548 17,02

... ... ... ... ... ...

Evaluation of two variables



Two-dimensional analysis

n = 32 p=2

Design department 39
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 2D

Data representation
Dot diagram

Mean, median, 1st – 3rd quartile

Design department 40
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 2D

Data representation
Histogram

Design department 41
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 2D

Data representation
BoxPlot Displacement

Design department 42
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 2D

Representation of bi-varied data

Mileage [km/l]

Displacement [cm3]
Design department 43
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 2D

Representation of bi-varied data


ScatterPlot

Consumi [km/litro]

Cilindrata [cm2]
Design department 44
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 2D

Covariance
!
a measure of how much two variables change together

1 n
s xy = cov( x, y ) = ∑ ( xi − x) ⋅ ( yi − y )
n i =1
sxy = 0 ⇒ lack of correlation
sxy > 0 ⇒ positive correlation
sxy < 0 ⇒ negative correlation

not significant unit of measure [(km/l)*(cm2)]

Design department 45
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 2D

Covariance

1 n
s xy = ∑ ( xi − x) ⋅ ( yi − y )
n i =1

sxy = −1764[km / l ⋅ cm2 ]


Displacement [cm3]

Mleage [Km/l]

Design department 46
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 2D

Correlation
!
Measure of the linear asociation between standardized random variable
s xy
rxy = cor ( x, y ) =
sx ⋅ s y
rxy = 0 ⇒ lack of correlation
rxy > 0 ⇒ positive correlation
rxy < 0 ⇒ negative correlation
rxy ± 1 ⇒ exact linear correlation

No Unit

ability to compare different variables

0 ≤ rxy ≤ 1 rxy = −0,847

Design department 47
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 2D

Dispersion matrix and correlation matrix

mileage displacement
n=32 p=2
Mazda RX4 Wag 9,36 1000
Symmetrical matrix pxp (2x2)
Datsun 710 10,16 675
Hornet 4 Drive 9,54 1612 V1 V2
V1 s11 s12
Hornet Sportabout 8,33 2250
V2 s21 s22
… … …

Dispersion matrix Correlation matrix


Mileage Displacement Mileage Displacement

Mileage 7,22 -1764,40 Mileage 1,000 -0,847


Displacement -1764,40 600031,24 Displacement -0,847 1,000

Design department 48
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 2D

Correlation
Example of unrelated variables

Joureny time Mass

Journey time 1,000 -0,175


Mass -0,175 1,000

Design department 49
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Progress of the lesson

2 – Random variables
- joint probability density function Two-dimensional
- correlation

Design department 50
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Transition from sample


data set of 32 models

to population or universe
all models of car circulating in the U.S. in 1973-74
Design department 51
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Population or Universe
joint probability density function

Design department 52
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 2. Random variables

Definitions
Expected value

µ = [ µ1 , µ 2 ]
joint probability density function
b d
P(a < X < b, c < Y < d ) = ∫ ∫ f ( x, y)dxdy
a c
Covariance

Cov( X1 X 2 ) = E[( X1 − E[ X1 ])( X 2 − E[ X 2 ])]


Correlation

Cov ( X 1 X 2 )
ρ X1 X 2 = − 1 ≤ ρ X1 X 2 ≤ 1
2 2
σσ1 2

Design department 53
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 3D

Progress of the lesson

1 – Descriptive statistic
- correlation Three-dimensional
- dispersion matrix Multidimensional

Design department 54
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 3D

Data organization

mileage displacement power mass journey time

Mazda RX4 Wag 9,36 1000 110 129375 17,02

Datsun 710 10,16 675 93 104400 18,61

Hornet 4 Drive 9,54 1612 110 144675 19,44

Hornet Sportabout 8,33 2250 175 154800 17,02

... ... ... ... ... ...

Evaluation of three variables



Three-dimensional analysis

n = 32 p=3

Design department 55
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 3D

Data representation
Dot diagram

Power [kW]

Box Plot Outlier

Power [kW]

Design department 56
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 3D

Data representation

mileage

displacement

power

Design department 57
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 3D

Analysis of correlations
Dispersion matrix

mileage displacement power

mileage 7.22 -1764 -143


displacement -1764 600031 42007
power -143 42007 4700

Correlation matrix

mileage displacement power

mileage 1.000 -0.848 -0.776


displacement -0.848 1.000 0.791
power -0.776 0.791 1.000

Design department 58
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 3D

Data organization

mileage displacement power mass journey time

Mazda RX4 Wag 9,36 1000 110 1293 17,02

Datsun 710 10,16 675 93 1044 18,61

Hornet 4 Drive 9,54 1612 110 1446 19,44

Hornet Sportabout 8,33 2250 175 1548 17,02

... ... ... ... ... ...

Evaluation of all variables



Analysis in 5 dimensions

n = 32 p=5

Design department 59
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 3D

Data representation

Design department 60
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 1. Descriptive statistics 3D

Analysis of correlations
Correlation matrix
!
Symmetrical matrix pxp (5x5)

mileage displacement power mass journey time

mileage 1,000 -0,848 -0,776 -0,868 0,419


displacement -0,848 1,000 0,791 0,888 -0,434
power -0,776 0,791 1,000 0,659 -0,708
mass -0,868 0,888 0,659 1,000 -0,175
journey time 0,419 -0,434 -0,708 -0,175 1,000

Design department 61
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Progress of the lesson

3 - Main characteristic of some major random variables


- Gaussian distribution
- Univariate
- Multivariate

- Gumbel distribtion

Design department 62
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gaussian (or normal) distribution


Exmples
!
- diffusion of pollutants in the atmosphere
- insertion of random errors
- speed of molecules in ideal gas

Standard normal random variable


!
µ=0
σ2=1
named Z
Normal random variable
!
µ=µ
σ2=σ2
named N (µ, σ 2)

Design department 63
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gaussian (or normal) distribution

Probability density function


( x − µ X N )2
1 −
2σ X2 N
f ( x) = ⋅e
2π ⋅ σ X N

Expected value
+∞
µX = E [ X N ] =
N ∫ x ⋅ f ( x)dx
−∞

Variance
+∞
σ X2 = Var [ X N ] =
N ∫ ( x − µ X N
) 2
⋅ f ( x)dx
−∞

Design department 64
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gaussian (or normal) distribution


Properties

- f (x) continuous and defined in [-∞, +∞]


- if x → -∞ ⇒ f (x) → 0
- if x → +∞ ⇒ f (x) → 0
-  f (x) is symmetrical respect
to the axis passing through µ
- f (x) is max in µ
- f (x) has two inflection points
in m - σ and m + σ

Design department 65
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gaussian (or normal) distribution

Design department 66
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gaussian (or normal) distribution


Trend with respect to σ 2

Design department 67
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gaussian (or normal) distribution


Trend with respect to µ

Design department 68
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gaussian (or normal) distribution


cumulative distribution function
( x − µ X N )2
x
1 −
2σ X2 N
FX N ( x) = ⋅∫e dx
2π ⋅ σ X N −∞

Property
!
is not possibile solve analytically

Numerical methods
!
- statistical software
- tabulated values

Design department 69
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gaussian (or normal) distribution


Bi-variate variable
First variabile

N (µ, σ )

Second variabile
µ = [µ1 , µ 2 ]

& σ11 σ12 #


σ=$ !
σ
% 21 σ 22 "

Design department 70
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gaussian (or normal) distribution


Bi-variate variable
N (µ, σ )

µ = [0,0]

&1 0#
σ=$ !
% 0 1 "

Section
plane

Design department 71
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gaussian (or normal) distribution


Bi-variate variable
N (µ, σ )

µ = [0,0]

&1 0#
σ=$ !
% 0 1 "

Section
plans

Design department 72
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gaussian (or normal) distribution


Bi-variate variable

Design department 73
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gaussian (or normal) distribution


Bi-variate variable with correlation
N (µ, σ )

µ = [0,0]

& 1 0,5#
σ=$ !
% 0,5 1 "

Section
plane

Design department 74
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gaussian (or normal) distribution


Bi-variate variable with correlation
N (µ, σ )

µ = [0,0]

& 1 0,5#
σ=$ !
% 0,5 1 "

Section
plans

Design department 75
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gaussian (or normal) distribution


Bi-variate variable with correlation

Design department 76
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gaussian (or normal) distribution


Bi-variate variable - comparison with related variables

Design department 77
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gumbel distribtion
Variable used in many models of random utility
!
- has analytical characteristics suitable for many problems
- is crucial in behavioral models of random utility
- is used in the theory of Generalized Extreme Value

Variable defined by two parameters


!
- location = location parameter: indicates the mode
- scale factor = scale parameter: indicates the shape

Design department 78
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gumbel distribtion
Probability density function

1
f ( x ) = exp [−(u − V ) / θ − φ ] exp "#−exp [−(u − V ) / θ − φ ]$%
θ

FX ( x ) = Pr ( X ≤ x ) =exp #$−exp [−(u − V ) / θ − φ ]%&

- V = location
- θ = scale factor
- φ = eulero constant (0,577)

Design department 79
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gumbel distribtion
Trend with respect to “location”

Design department 80
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gumbel distribtion
Trend with respect to scale factor

Design department 81
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 3. Major random variables

Gumbel distribtion
Cumulative distribution function
x
F(x) = ∫ f (x)dx
−∞

Property
!
can be solved analytically

Design department 82
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 4. Modeling

Progress of the lesson

4 - Modeling
- Random variables used like models
- QQ-plot

Design department 83
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 4. Modeling

Description of a phenomenon
Example
!
speed through an highway link
problem: the speed is not constant

Variability is due to many factors


!
- type of vehicle
- conditions of the driver
- weather
- traffic conditions
-…

speed can be described with a
random variable

Design department 84
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 4. Modeling

Random variable
!
Speed = mean speed Vµ + variation
- mean speed = costant
- variation = random component, unpredictable

Experiment
!
what is the speed of the next car that passes?

Random variable
!
numeric variable whose measured value can change during
different replay of the experiment

Design department 85
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 4. Modeling

Modeling
!
- description of the phenomenon with an appropriate random variable
- possibility to adopt techniques of inference from a sample
Example
!
- measurement of 100 transits
- attempt to reconstruct the random variable that best
approximates the phenomenon

Sample
!
- measured speed of 100 cars

Design department 86
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 4. Modeling

Sample

speed

progressive observation number

Design department 87
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 4. Modeling

Sample
Histogram

Design department 88
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 4. Modeling

Identification of random variable


What is the random variable that best approximates the phenomenon?
Example
!
Gauss distribution N ( x, s )
- mean = 110,5 Km/h
- standard deviation= 18,7 km/h

Design department 89
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 5. Modeling modal choice

Progress of the lesson

5 - Example of modeling modal choice

- phenomenon to model
- sample
- model = randm variabile

Design department 90
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 5. Modeling modal choice

Phenomenon to model
Modal choice between three different alternatives

Bus

Car Tramway

Design department 91
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 5. Modeling modal choice

Sample
Interview of 100 people

Question: declare a vote of 0 to 10 being the "usefulness" of each mode

Car Bus Tram


Fabio 6,0 5,5 6,0
Cristina 8,5 7,0 9,5
Roberto 9,5 6,0 5,5
Marco 5,5 6,5 9,5
… … … …

n = 100, p = 3
multivariate random variable (3 dimensions)
Design department 92
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 5. Modeling modal choice

Sample
Data analysis

Design department 93
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 5. Modeling modal choice

Sample

Design department 94
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 5. Modeling modal choice

Sample

Dispersion matrix
Car Bus Tram
Car 1,04 -0,05 -0,09
Bus 1,88 0,79
Tram 1,40

Design department 95
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics 5. Modeling modal choice

Model
Multivariate normal random variable

N ( x, s )

Vector of averages Dispersion matrix

µ
Car Bus Tram
Car 8
Car 1,00 0,00 0,00
Bus 6
Bus 1,90 0,80
Tram 7
Tram 1,40

Design department 96
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility

! Statistics Bibliography

Bibliography

- Applied multivariate statistical analysis; Johnson, Wichern

Design department 97

You might also like