Professional Documents
Culture Documents
3
Guiding Principles Behind Parameter Estimation Method
M M ( ) | DM
Suppose the system is:
M ( ) : y (t | ) Wy (q, ) y (t ) Wu (q, )u (t )
Where:
Wy (q, ) 1 H 1 (q, ) , Wu (q, ) H 1 (q, )G(q, )
4
Guiding Principles Behind Parameter Estimation Method
ZN N DM
(t , ) y(t ) y (t | )
When the data set ZN is known, these errors can be computed for t=1, 2 , , N
Based on Zt we can compute the prediction error (t,). Select N so that the
prediction error (t ,N ) , t=1, 2, , N, becomes as small as possible.
?
Form a scalar-valued criterion function that measure the size of .
We describe
two approaches
Make (t ,N ) uncorrelated with a given data sequence. 6
Models of linear time invariant system
7
Minimizing Prediction Error
(t , ) y(t ) y (t | )
is the same as ZN
Let to filter the prediction error by a stable linear filter L(q)
F (t , ) L(q) (t , )
Then use the following norm
1 N
VN ( , Z ) l F (t , )
N
N t 1
Where l(.) is a scalar-valued positive function.
N N ( Z N ) arg min VN ( , Z N )
DM 8
Minimizing Prediction Error
N
VN ( , Z N ) l F (t , )
1
F (t , ) L(q) (t , )
N t 1
N N ( Z N ) arg min VN ( , Z N )
DM
Generally the term prediction error identification methods (PEM) is used for
the family of this approaches.
Choice of l(.)
N
VN ( , Z N ) l F (t , )
1
F (t , ) L(q) (t , )
N t 1
N N ( Z N ) arg min VN ( , Z N )
DM
Choice of L
The effect of L is best understood in a frequency-domain interpretation.
Thus L acts like frequency weighting.
N
VN ( , Z N ) l F (t , )
1
F (t , ) L(q) (t , )
N t 1
N N ( Z N ) arg min VN ( , Z N )
DM
Choice of l
A standard choice, which is convenient both for computation and analysis.
1 2
l ( )
2
One can also parameterize the norm independent of the model parameterization.
11
Models of linear time invariant system
12
Linear Regressions and the Least-Squares Method
N t 1 N t 1 2
2
Least-squares criterion
N N
N 1
1 1
VN ( , Z ) l F (t , ) y(t ) T (t )
2
N t 1 N t 1 2
The least square estimate (LSE) is:
1
1 N
1 N
LS
N arg min V ( , Z ) (t ) (t )
N T
(t ) y(t )
N t 1 N t 1
R(N ) f (N )
N R( N ) f ( N )
LS 1
14
Linear Regressions and the Least-Squares Method
Properties of LSE
The least square method is a special case of PEM (prediction error method)
So we have
15
Linear Regressions and the Least-Squares Method
Properties of LSE
16
Linear Regressions and the Least-Squares Method
(t )
N 2
1
VN ( , Z )
N
N
t
t 1
y (t ) T
or
N 2
VN ( , Z N ) ( N , t ) y (t ) T (t )
t 1
1 N
N
NLS ( N , t ) (t ) T (t ) ( N , t ) (t ) y(t )
t 1 t 1
17
Linear Regressions and the Least-Squares Method
v(t ) k (q)e(t )
Now e(t) is white noise, but the new model take us out from LS environment,
except in two cases:
Known noise properties
High-order models 18
Linear Regressions and the Least-Squares Method
where
Suppose that the noise v can be well described by k(q)=1/D(q) where D(q) is a
polynomial of order r. So we have
1
A(q) y(t ) B(q)u (t ) e(t )
D( q )
or
Note: Since there are infinite number of such matrices that describe the same
system (similarity transformation), we will have to fix the coordinate basis of
the state space realization.
21
Linear Regressions and the Least-Squares Method
Chapter 10
23
Models of linear time invariant system
24
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Estimation and the Principle of Maximum Likelihood
The area of statistical inference, deals with the problem of extracting information
from observations that themselves could be unreliable.
Suppose that observation yN=(y(1), y(2),,y(N)) has following probability density
function (PDF)
f ( ; x1 , x2 , ..., xN ) f y ( ; x N )
That is:
P( y N A) f y ( ; x N )dx N
x N A
This is a deterministic function of once the numerical value yN* is inserted and it is
called Likelihood function.
A reasonable estimator of could then be
where the maximization performed for fixed yN* . This function is known as the
maximum likelihood estimator (MLE). 26
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Example: Let
y(i) , i 1, ..., N
Be independent random variables with normal distribution with unknown means 0
and known variances i y(i) N ( 0 , i )
A common estimator is the sample mean:
N
1
SM ( y N )
N
y(i)
i 1
To calculate MLE, we start to determine the joint PDF for the observations. The
PDF for y(i) is:
1 ( xi ) 2
exp
2 i 2 i
Joint PDF for the observations is: (since y(i) are independent)
N
1 ( xi ) 2
f y ( ; x )
N
exp
i 1 2 i 2 i 27
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Example: Let
y(i) , i 1, ..., N
Be independent random variables with normal distribution with unknown means 0
and known variances i 1 N
Joint PDF for the observations is: (since y(i) are independent)
N
1 ( xi ) 2
f y ( ; x N ) exp
i 1 2 i 2 i
So the likelihood function is:
f y ( ; y N )
Maximizing likelihood function is the same as maximizing its logarithm. So
ML ( y N ) arg max log f y ( ; y N ) N
1 y (i)
ML ( y )
N
N 1 N y (i ) i
N 2 N
1
arg max log 2 i
2 i 1 2 2 i 1 i
1 / i 28i1
i 1
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Example: Let y(i) , i 1, ..., N
Be independent random variables with normal distribution with unknown means 0
and known variances i y(i) N ( 0 , i )
Suppose N=15 and y(i) is derived from a random generation (normal distribution)
such that the means is 10 but variances are:
10, 2, 3, 4, 61, 11, 0.1, 121, 10, 1, 6, 9, 11, 13, 15
The estimated means for 10 different 40
Different estimators
experiments are shown in the figure: SM ( y N )
N 20
1
SM ( y N )
N
y(i)
i 1
0
N
1 y (i) ML ( y N )
ML ( y N ) N i
1 /
i 1
i
i 1 -20
2 4 6 8 10
Different experiments
Exercise:Do the same procedure for another experiments and draw the corresponding figure.
Exercise:Do the same procedure for another experiments and draw the corresponding figure.
29
Suppose all variances as 10.
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Maximum likelihood estimator (MLE)
P E ( y N ) 0 ( y N ) 0 T
True value of
We may be interested in selecting estimators that make P small. Cramer-Rao
inequality give a lower bound for P
M is Fisher
Information
matrix
31
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Asymptotic Properties of the MLE
Calculation of
P E ( y N ) 0 ( y N ) 0 T
Is not an easy task.
Therefore, limiting properties as the sample size tends to infinity are calculated
instead.
For the MLE in case of independent observations, Wald and Cramer obtain
Suppose that the random variable {y(i)} are independent and identically
distributed, so that N
f y ( ; x1 , x2 ,...,xN ) f y (i ) ( ; xi )
i 1
Suppose also that the distribution of yN is given by fy(0 ;xN) for some value
0. Then ML ( y N ) tends to 0 with probability 1 as N tends to infinity, and
N ML ( y N ) 0
converges in distribution to the normal distribution with zero mean covariance
matrix given by Cramer-Rao lower bound M-1. 32
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Probabilistic Models of Dynamical Systems
Suppose
M ( ) : y (t | ) g (t , Z t 1 ; )
(t , ) y (t ) y (t | )
is independen t and have the
PDF f e ( x, t; )
t
f m (t , y | u ) f e y (k ) g (k , Z k 1 ), k
t t
(I )
k 1
p( xt , xt 1 | Z t 2 ) p( xt | y (t 1) xt 1 , Z t 2 ). p( xt 1 | Z t 2 )
f e xt g (t , Z t 1 ), t . f e xt 1 g (t 1, Z t 2 ), t 1
34
Similarly we derive (I)
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Probabilistic Models of Dynamical Systems
Suppose
M ( ) : y (t | ) g (t , Z t 1 ; ) is independen t and have the
(t , ) y (t ) y (t | ) PDF f e ( x, t; )
f (t, ), t;
N N
f y ( ; y N ) f e y (t ) g (t , Z t 1; ), t; e
t 1 t 1
l ( , , t ) log f e ( , t ; )
35
A Statistical Framework for Parameter Estimation
and the Maximum Likelihood Method
Probabilistic Models of Dynamical Systems
Maximizing this function is the same as maximizing
1 N
log f y ( ; y ) log f e (t , ), t;
1 N
N N t 1
If we define
l ( , , t ) log f e ( , t ; )
We may write
N
1
ML ( y ) arg min l ( (t , ), t; )
N
N t 1
37
Correlation Prediction Errors with Past Data
Ideally, the prediction error (t,) for good model should be independent of the past
data Zt-1
If (t,) is correlated with Zt-1 then there was more information available in Zt-1
about y(t) than picked up by y (t | )
(t ) (t , ) 0
1
N t 1
38
Derived would be the best estimate based on the observed data.
Correlation Prediction Errors with Past Data
(t , ) (t , Z t 1 , )
Choose a function () and define
1 N
f N ( , Z ) (t , ) F (t , )
N
N t 1
Then calculate
N sol f N ( , Z N ) 0
DM
N sol f N ( , Z N ) 0
N
(t , ) F (t , )
1
f N ( , Z )
N
N DM
t 1
Pseudolinear Regressions
We saw in chapter 4 that a number of common prediction models could be
written as:
y (t | ) T (t , )
(t , ) (t , ) 1
N
( )
PLR
N sol
N
t 1
(t , ) y (t ) (t , ) 0
T
42
Instrumental Variable Methods
y (t | ) T (t )
The least-square estimate of is given by
1
(t ) 0
N
NLS sol
N
t 1
(t ) y (t ) T
y (t ) T (t ) 0 v0 (t )
We found in section 7.3 that LSE N will not tend to 0 in typical cases.
43
Instrumental Variable Methods
1
N
y (t ) (t ) 0 v0 (t )
T
LS
N sol
N
t 1
(t ) y (t ) (t ) 0
T
We found in section 7.3 that LSE N will not tend to 0 in typical cases.
1
N
IV
N sol
N
t 1
(t ) y (t ) (t ) 0
T
Such an application to a linear regression is called instrumental-variable method.
Estimated is:
1
1 N
1 N
IV
N (t ) (t )
T
(t ) y(t )
N t 1 N t 1 44
Instrumental Variable Methods
(t ) y (t ) T (t ) 0
1 N
LS
N sol
N
t 1
We found in section 7.3 that LSE N will not tend to 0 in typical cases.
1
N 1
1 N 1 N
NIV sol
N
t 1
(t ) y (t ) T (t ) 0
NIV (t ) (t )
t 1
N
T
N
t 1
(t ) y(t )
Does N 0 as N in IV method ?
Exercise: Show that NIV will be exist and tend to 0 if following equations
exists.
E (t ) T (t ) be nonsingula r
E (t )v0 (t ) 0
45
Instrumental Variable Methods
Choices of instruments
Where K is a linear filter and x(t) is generated from the input through
a linear system 46
Instrumental Variable Methods
N (q )x (t ) M (q )u (t )
Here
N (q ) 1 n1q 1 ... n n n q n n
M (q ) m 0 m1q 1 ... m n m q n m
(t ) (t ,u t 1 )
47
Instrumental Variable Methods
E (t )v 0 (t ) 0
Since both the -vector and -vector are generated form the same
input sequence, it might be expected that the following property
should hold in general.
E (t )T (t ) be nonsingular
48
Instrumental Variable Methods
Model-dependent Instruments
It may be desirable to choose the filetrs N and M to those of the true
system
N (q ) A0 (q ); M (q ) B 0 (q )
They are clearly not known, but we may let the instruments depend
on the parameters in the obvious way
49
Instrumental Variable Methods
(t , ) Ku (q , )u (t )
Where Ku (q , ) is a d-dimentional column vector of linear filters
The IV method could be summarized as follows
F (t , ) L (q )[ y (t ) T ]
NIV sol f N ( , Z N ) 0
D M
where
N
(t , ) (t , )
1
f N ( , Z )
N
F
N t 1
(t , ) Ku (q , )u (t )
50