Professional Documents
Culture Documents
2016/2017
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
! Statistics Contents
1 – Descriptive statistics
- data organization
- graphical techniques
- position indices
- variance, covariance
2 – Random variables
3 – Main characteristics of some major random variables
- Gaussian distribution
- Gumbel distribtion
4 - Modeling
- Random variables used like models
- QQ-plot
5 – Sample: modeling modal choice
Design department 2
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
! Statistics Introduction
Introduzion to statistics
Statistics is the science that says
if you have your head in the oven and your foot in refrigerator, so
on average you're ok
↓
BAD STATISTICS
↓
GOOD STATISTICS
Design department 3
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
1 – Descriptive statistics
- data organization
- graphical techniques
- position indices
- variance, covariance
- mean, median, quartiles, quantile Mono-dimensional
- boxplot
- sample variance
- standardization
- histograms
- cumulative histograms
Design department 4
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Sample data
Motor Trend Car Road Tests
!
using a data set of examples to clarify concepts throughout the presentation
Description
!
evaluation of different aspects of the performance of American cars of the
years 1973-74
Contents
!
- mileage [km/l]
- displacement [cm3]
- power [kW]
- mass [kg]
- ¼ mile journey time [seconds]
Design department 5
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Data representation
mileage displacement power mass journey time
Design department 6
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Data representation
n = 32 p=1
Design department 7
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Data representation
Dot diagram
↓
- intuitive representation of the dispersion
- useful for few data
Design department 8
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Sample mean
!
ratio of the sum of the values and the number of values
1 n
x = ∑ xi
n i =1
- indicates the "center of gravity" of the data
- is affected by "outliers"
Design department 9
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Median
!
value that occupies the central position in an ordered group of data
x n + x n +1
n even ⌣ n odd ⌣
x = x n +1 x= 2 2
2 2
- measurement of the position that divides the data into two equal parts
- stable for the outliers
Design department 10
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Quartiles - Quantile
a quartile is any of the three values which divide the sorted data set into four
equal parts, so that each part represents one fourth of the sampled population
• first quartile = lower quartile = cuts off lowest 25% of data = 25th percentile
• second quartile = median = cuts data set in half = 50th percentile
• third quartile = upper quartile = cuts off highest 25% of data, or lowest 75% =
75th percentile
Interquartile range
!
difference between 1st and 3rd quartile (measure of variability of data) –
stable for outliers
Design department 11
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Data representation
Box plot
Sample variance
n
2
(
∑ ix − x )
s2 = i =1
n −1
Sample standard deviation
s = s2
- indicate the "variability" or "dispersion" of data around the sampe mean
- s has the same units of measurement data (useful for quantifying the
phenomenon)
s = 2,69 km/l
Design department 13
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Standardization of data
xi − x
x" =
s
↓
- formulation of values according to the standard deviation
- elimination of Units
New reference system
!
- average = 0 → centering of data on their average
- expression of the distances in standard deviation
Design department 14
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Effects of standardization
Original data
Standardized data
Design department 15
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
xmax − xmin
Δx =
m
- m = number of intervals
- Δx = amplitude intervals
- not necessarily the same for all intervals
m=7
Design department 16
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Absolute frequency
!
ni = number of values of X falling in the i-th range
Relative frequency
!
fi = number of values of X falling in the i-th range, normalized by the total
number of values
ni
fi =
n
Design department 17
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Histograms
!
information contained in the area of rectangles
Mode
!
class characterized by the highest frequency
Design department 18
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Cumulative frequency
Absolute cumulative frequency Relative cumulative frequency
j j
X j = ∑ ni Fj = ∑ f i
i =1 i =1
Frequency
Cumulative Frequency
Design department 19
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Cumulative histogram
Design department 20
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Mileage histogram
Design department 21
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
2 – Random variables
- introduction
- from sample to population
- probability density function
- cumulative distribution function
- expected value
- variance
- comparison between sample and population
Design department 22
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Design department 23
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Definizions
Population or Universe
!
whole group of elements under investigation
eg: all models of cars circulating in the U.S. in 1973-74
↓
analysis is often impossible or uneconomical
Sample
!
small part of the universe
extracted using appropriate sampling techniques
eg: 32 models
Inference
!
process of generalization of results (obtained by observing a sample) to the
entire population (or universe) from which the sample was extracted
Design department 24
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Definizions
Descriptive statistics
!
set of scientific methods to collect,
sort, analyze, represent a group of data
Inferential statistics
!
all scientific methods designed to draw conclusions about
a population from a representative sample
Design department 25
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Hipotesys
!
you know the consumption of all U.S. car models
↓
the universe is known
↓
is possible to describe the consumption of U.S. automobiles with a random
variable with known probability distribution
Design department 26
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
f x (x)
Design department 27
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
f x (x)
Design department 28
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
f X ( x) ≥ 0 ∫ f X ( x)dx = 1
−∞
b
Pr(a < X ≤ b) = ∫ f X ( x)dx
a
Design department 29
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Expected value
or Mean
!
indicates the "center of gravity" of the data
+∞
µ x = E [X ] = ∫ x ⋅ f ( x)dx
−∞
Design department 30
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Median - Quartiles
Design department 31
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Design department 32
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Variance
!
shows the degree of concentration of the distribution
around the mean
+∞
2
σ x = Var [X ] = ∫ ( x − µ x ) 2 ⋅ f ( x)dx
−∞
Standard deviation
σ x = σ x2
Design department 33
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Design department 34
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
x
F ( x) = Pr( X ≤ x) = ∫ f ( x)dx
−∞
Design department 35
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
x
dFX ( x)
f X ( x) =
dx
Fx ( x ) = ∫ f (x ) dx
−∞
x
Design department 36
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Design department 37
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
1 – Descriptive statistics
- correlazion
- dispersion matrix
Two-dimensional
Design department 38
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Data representation
n = 32 p=2
Design department 39
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Data representation
Dot diagram
Design department 40
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Data representation
Histogram
Design department 41
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Data representation
BoxPlot Displacement
Design department 42
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Mileage [km/l]
Displacement [cm3]
Design department 43
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Consumi [km/litro]
Cilindrata [cm2]
Design department 44
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Covariance
!
a measure of how much two variables change together
1 n
s xy = cov( x, y ) = ∑ ( xi − x) ⋅ ( yi − y )
n i =1
sxy = 0 ⇒ lack of correlation
sxy > 0 ⇒ positive correlation
sxy < 0 ⇒ negative correlation
Design department 45
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Covariance
1 n
s xy = ∑ ( xi − x) ⋅ ( yi − y )
n i =1
Mleage [Km/l]
Design department 46
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Correlation
!
Measure of the linear asociation between standardized random variable
s xy
rxy = cor ( x, y ) =
sx ⋅ s y
rxy = 0 ⇒ lack of correlation
rxy > 0 ⇒ positive correlation
rxy < 0 ⇒ negative correlation
rxy ± 1 ⇒ exact linear correlation
No Unit
Design department 47
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
mileage displacement
n=32 p=2
Mazda RX4 Wag 9,36 1000
Symmetrical matrix pxp (2x2)
Datsun 710 10,16 675
Hornet 4 Drive 9,54 1612 V1 V2
V1 s11 s12
Hornet Sportabout 8,33 2250
V2 s21 s22
… … …
Design department 48
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Correlation
Example of unrelated variables
Design department 49
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
2 – Random variables
- joint probability density function Two-dimensional
- correlation
Design department 50
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
to population or universe
all models of car circulating in the U.S. in 1973-74
Design department 51
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Population or Universe
joint probability density function
Design department 52
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Definitions
Expected value
µ = [ µ1 , µ 2 ]
joint probability density function
b d
P(a < X < b, c < Y < d ) = ∫ ∫ f ( x, y)dxdy
a c
Covariance
Cov ( X 1 X 2 )
ρ X1 X 2 = − 1 ≤ ρ X1 X 2 ≤ 1
2 2
σσ1 2
Design department 53
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
1 – Descriptive statistic
- correlation Three-dimensional
- dispersion matrix Multidimensional
Design department 54
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Data organization
n = 32 p=3
Design department 55
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Data representation
Dot diagram
Power [kW]
Power [kW]
Design department 56
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Data representation
mileage
displacement
power
Design department 57
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Analysis of correlations
Dispersion matrix
Correlation matrix
Design department 58
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Data organization
n = 32 p=5
Design department 59
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Data representation
Design department 60
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Analysis of correlations
Correlation matrix
!
Symmetrical matrix pxp (5x5)
Design department 61
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
- Gumbel distribtion
Design department 62
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Design department 63
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Expected value
+∞
µX = E [ X N ] =
N ∫ x ⋅ f ( x)dx
−∞
Variance
+∞
σ X2 = Var [ X N ] =
N ∫ ( x − µ X N
) 2
⋅ f ( x)dx
−∞
Design department 64
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Design department 65
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Design department 66
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Design department 67
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Design department 68
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Property
!
is not possibile solve analytically
Numerical methods
!
- statistical software
- tabulated values
Design department 69
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
N (µ, σ )
Second variabile
µ = [µ1 , µ 2 ]
Design department 70
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
µ = [0,0]
&1 0#
σ=$ !
% 0 1 "
Section
plane
Design department 71
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
µ = [0,0]
&1 0#
σ=$ !
% 0 1 "
Section
plans
Design department 72
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Design department 73
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
µ = [0,0]
& 1 0,5#
σ=$ !
% 0,5 1 "
Section
plane
Design department 74
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
µ = [0,0]
& 1 0,5#
σ=$ !
% 0,5 1 "
Section
plans
Design department 75
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Design department 76
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Design department 77
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Gumbel distribtion
Variable used in many models of random utility
!
- has analytical characteristics suitable for many problems
- is crucial in behavioral models of random utility
- is used in the theory of Generalized Extreme Value
Design department 78
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Gumbel distribtion
Probability density function
1
f ( x ) = exp [−(u − V ) / θ − φ ] exp "#−exp [−(u − V ) / θ − φ ]$%
θ
- V = location
- θ = scale factor
- φ = eulero constant (0,577)
Design department 79
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Gumbel distribtion
Trend with respect to “location”
Design department 80
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Gumbel distribtion
Trend with respect to scale factor
Design department 81
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Gumbel distribtion
Cumulative distribution function
x
F(x) = ∫ f (x)dx
−∞
Property
!
can be solved analytically
Design department 82
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
! Statistics 4. Modeling
4 - Modeling
- Random variables used like models
- QQ-plot
Design department 83
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
! Statistics 4. Modeling
Description of a phenomenon
Example
!
speed through an highway link
problem: the speed is not constant
Design department 84
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
! Statistics 4. Modeling
Random variable
!
Speed = mean speed Vµ + variation
- mean speed = costant
- variation = random component, unpredictable
Experiment
!
what is the speed of the next car that passes?
Random variable
!
numeric variable whose measured value can change during
different replay of the experiment
Design department 85
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
! Statistics 4. Modeling
Modeling
!
- description of the phenomenon with an appropriate random variable
- possibility to adopt techniques of inference from a sample
Example
!
- measurement of 100 transits
- attempt to reconstruct the random variable that best
approximates the phenomenon
Sample
!
- measured speed of 100 cars
Design department 86
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
! Statistics 4. Modeling
Sample
speed
Design department 87
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
! Statistics 4. Modeling
Sample
Histogram
Design department 88
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
! Statistics 4. Modeling
Design department 89
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
- phenomenon to model
- sample
- model = randm variabile
Design department 90
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Phenomenon to model
Modal choice between three different alternatives
Bus
Car Tramway
Design department 91
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Sample
Interview of 100 people
n = 100, p = 3
multivariate random variable (3 dimensions)
Design department 92
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Sample
Data analysis
Design department 93
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Sample
Design department 94
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Sample
Dispersion matrix
Car Bus Tram
Car 1,04 -0,05 -0,09
Bus 1,88 0,79
Tram 1,40
Design department 95
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
Model
Multivariate normal random variable
N ( x, s )
µ
Car Bus Tram
Car 8
Car 1,00 0,00 0,00
Bus 6
Bus 1,90 0,80
Tram 7
Tram 1,40
Design department 96
Politecnico di Milano Faculty of Civil, Environmental and Territorial Engineering Laboratory of Transport and Mobility
! Statistics Bibliography
Bibliography
Design department 97