You are on page 1of 21

Adaptive and Statistical Signal Processing

Contents

Course No:
1. Random Variables and Vectors
2. Random Variables and Vectors, Intro to Estimation
3. MVU, Cramer Rao Lower Bound
4. Linear Models, Best Linear Unbiased Estimator
5. Least Squares and Maximum Likelihood Estimation
6. Introduction to Bayes Estimation
7. Linear Bayes Estimation
8. Linear Bayes Estimation, Stochastic Processes
9. Stochastic Processes
10. Wiener Filter
11. Wiener Filter
12. Least Squares Filter

Page 2
Minimum Variance Unbiased Estimator (MVU)

 Remember: Quality Criterion for estimator:


 mean squared error: E((θ −θˆ) )should be minimal
2

ˆ
 let e = θ −θ then E ((θ − θˆ) 2
) = σ 2
− µ
2
e e

 with µe and σe2 as the mean and variance of the estimation error e,
respectively, µe is also called the bias of the estimator.
 For a minimum variance unbiased estimatior (MVU) our aim is to find
an estimator that has zero bias (µe =0), i.e. E(θˆ) = θ with minimum
variance σe2 of the error
 Please note a MVU does not necessarily is the estimator with the
minimum mean squared error, in fact there might be an estimator
that is biased (µe ≠0) but has a lower mean squared error.
 However, from a practical point of view, MUVs are often easier to
find that estimators minimizing the mean squared error.
 Often it is even impossible to find an MVU. In this case we even have
to use additional constraints, e.g. to find best linear unbiased
estimator (BLUE)
Page 3
Minimum Variance Unbiased Estimator (MVU)
 Please note that for an MVU estimator, σe2 must be minimum for every θ!!!

Taken from Kay: Fundamentals of Statistical Signal Processing, Vol 1: Estimation Theory, Prentice Hall, Upper Saddle River

 Similar definition of an MVU for a vector parameter θ:


ˆ) = θ
 Unbiased: E(θ
 Minimum Variance: var(θi) is minimal for all i amoung all unbiased
estimators
 Additional advantage of unbiased estimators:
 if multiple estimators θˆ1,θˆ2 ,...,θˆm for a parameter
m
exist they
1
 can be combined by averaging θˆ = ∑θˆi
m i=1
 leading to an unbiased estimator, e.g. m estimators with equal variance
σ2 2
2 σ
 σ θ =
m
Page 4
How to find the MVU?

 Two questions:
 Does the MVU exist?
 How to find it?
 Possible options to answer these questions:
 Determining a lower bound on the estimator’s variance and
checking if there is an estimator satisfying this bound
 Further restricting the estimator, e.g. to be linear
 Rao-Blackwell-Lehmann-Scheffe Theorem, Maximum likelihood
estimator,…

Page 5
Cramer-Rao-Lower Bound

 Provide an easy way to determine a lower bound on the


estimator performance, i.e. a better estimator cannot exist
 The MVU estimator does not necessarily attain the CRLB
 Sometimes the best estimator is provided by the CRLB directly

Page 6
Cramer-Rao-Lower Bound

 The Cramer-Rao Lower Bound (CRLB) is given by1)

 This means that the variance of every unbiased estimator must


be higher or equal than the CRLB
 If the CRLB exists and one can write

 then the MVU estimator is given by 1


ˆ
θ = g (x) and has the variance I (x)
 ∂2 ln p(x;θ ) 
where I(x) is the so called Fischer information: I (x) = −E  2 
 ∂ θ 
i.e. then the MVU satisfies the CRLB with equality
 ∂ ln p(x;θ ) 
1) if the pdf p(x;θ) satisfies the “regularity” condition: E   = 0 for all θ Page 7
 ∂θ 
Cramer-Rao-Lower Bound

 cf. Information Theory: Information is –logb p(x)


1
 Intuitive explanation for var(θ ) ≥
I (θ )
 The Fischer information can be seen as the “amount of information”
about θ that is available in the data
 The more information is available for estimation, the lower the
estimation variance!
 Important properties of –ln p(x;θ)
 non-negativity
N −1
 additive for independent RV: ln p(x;θ ) =∑ln p( x[n];θ )
n=0

 ∂2 ln p(x;θ )  N −1  ∂ 2 ln p( x[n];θ ) 
  − E
2  = −∑ E  2 
 ∂ θ  n =0  ∂θ 

 CLRB lowers when using additional (independent) random variables


Page 8
Efficient Estimators

 An unbiased estimator is said to be efficient if it attains the CRLB:


 it uses all the available data efficiently
 An efficient estimator is always the MVU but the MVU is not necessarily
an efficient estimator

Taken from Kay: Fundamentals of Statistical Signal Processing, Vol 1: Estimation Theory, Prentice Hall, Upper Saddle River

Page 9
Cramer-Rao-Lower Bound

 Example: DC Level in white gaussian noise:


 x[n] = A + w[n]
 w[n] is WGN with variance σ2

1  1 N −1 2
  p(x; A) = N
exp  − 2 ∑( x[n] − A) 
2 2  2σ n=0 
(2πσ )
 Taking the first derivative
∂ ln p(x; A) ∂  N
1 N −1 2 1 N −1
=  − ln[(2πσ ) ] − 2 ∑( x[n] − A)  = 2 ∑( x[n] − A)
2 2

∂A ∂A  2σ n=0  σ n=0
N
= 2
( x − A)
σ N −1
1
with g ( x) = x =
N
∑ x[n]
n =0
as the sample mean.
Page 10
Cramer-Rao-Lower Bound

 Differentiating again gives:


∂ 2 ln p(x; A) N
2
=− 2
∂ A σ
 This leads to the CLRB:
σ2
var(Aˆ ) ≥
N
 By using the result from the first derivative one obtains:
N
2
( x − A) = I ( A)( g (x) − A)
σ
 This means for DC level in WGN the sample mean is the MVU
estimator!

Page 11
General CRLB for Signals in White Noise

 Assume a deterministic signal with an unknown parameter θ is


observed in WGN as:
x[n] =s[n;θ ] + w[n] n = 0,1,..., N − 1
 The pdf of x depending on the parameter θ is given as:
1  1 N −1

p(x;θ ) = N
exp − 2
 2σ

n =0
( x[n] − s[n;θ ]) 

2

2 2
(2πσ )
 Differentiating once produces:

∂ 2 ln p(x;θ ) 1 N −1
∂s[n;θ ]
∂θ 2
= 2
σ

n =0
( x[n] − s[n;θ ])
∂θ
 and a second differentiation results in:
∂ 2 ln p(x;θ ) 1 N −1  ∂ 2 s[n;θ ]  ∂s[n;θ ]  
2

∂θ 2
=− 2
σ

n =0 
( x[n] − s[n;θ ])
∂θ 2
−
 ∂θ  
 

Page 12
General CRLB for Signals in White Noise

 Taking the expectation value results in:


2
 ∂ 2 ln p(x;θ )  1 N −1
 ∂s[n;θ ] 
E 
∂θ 2
 = − 2
σ
∑ 
n =0  ∂θ 

 
 This leads to the CRLB for signals in White Noise:
σ2
var(θˆ) ≥ 2
N −1
 ∂s[n;θ ] 
∑ 
n =0  ∂θ 

 The form of the bound demonstrates the importance of the


signal dependence on θ.
 Signals that change rapidly as the unknown parameter changes
result in accurate estimators
 E.g. as we have seen with the DC level in WGN: s[n;θ]= θ
produces a CRLB of σ2/N.

Page 13
Transformation of Parameters

 Assume the we wish to estimate a parameter that is a function


g(θ) of some more fundamental parameter θ and already know
the CRLB for θ
 Then the CRLB for g(θ) can be obtained by (without proof):
 Letα = g (θ ) and α̂ be an estimator of α
 Then the CRLB for α is
2
 ∂g 
 
var(αˆ ) ≥  ∂θ 
 ∂ 2 ln p(x;θ ) 
− E 2 
 ∂θ 

Page 14
Transformation of Parameters

 But: be carefull!
 Non-linear transformations destroy the efficiency
 e.g. DC Level in WGN, Estimator for A2
◊ Square of the sample mean: x 2 might be a resonable estimator
◊ But is not even unbiased anymore
σ2
E ( x 2 ) =E 2 ( x ) + var( x ) =A2 + ≠ A2
N
 On the other hand: affine (linear) transformations

g (θ ) = g (θˆ) = aθˆ + b
preserve efficiency!

Page 15
CRLB for Vector Paramters

 Commonly: vector of parameters θ=[θ1,θ2,…., θp]T


 Without proof the CRLB is found as the [i,i] element of the
inverse of the Fischer Information Matrix I(θ)

[
var(θˆi ) ≥ I −1 (θ ) ] ii

 With the Fischer Information Matrix defined by:


 ∂ 2 ln p(x;θ ) 
[I(θ )]ij = − E   for i = 1,..., p and j = 1,..., p
 ∂θ i ∂θ j 

 A Fischer Information Matrix is always symmetric


 In practice I(θ) is assumed to be positive definite and hence
invertible

Page 16
CRLB for Vector Paramters

 More formally: The covariance matrix Cθ) of any unbiased


ˆ satisfies:1)
estimator θ
Cθ) − I −1 (θ) ≥ 0
 Furthermore an unbiased estimator θˆ = g (x) may be found that
attains the bound if and only if

∂ ln p(x; θ)
= I(θ)(g (x) − θ)
∂θ

 Again this MVU estimator is then efficient


 Here p(x) is a p dimensional function!
 The CRLB represents a powerful tool to find efficient estimators
for vector parameters.
 ∂ ln p(x; θ) 
1) if the pdf p(x;θ) satisfies the “regularity” condition: E   = 0 for all θ Page 17
 ∂θ 
Example: Line Fitting

 Consider the problem x[n] = A + Bn + w[n]


 w[n] is again WGN
 The parameter vector θ = [ A, B]

1  1 N −1 2
  p(x; θ) = N
exp − 2 ∑( x[n] − A − Bn) 
2 2  2σ n=0 
(2πσ )

 from which the first derivatives follow as:


∂ ln p(x; θ) 1 N −1
= 2 ∑( x[n] − A − Bn)
∂A σ n=0
∂ ln p(x; θ) 1 N −1
= 2 ∑( x[n] − A − Bn)n
∂B σ n=0

Page 18
Example: Line Fitting

 and ∂ 2 ln p(x; θ) N
2
=− 2
∂A σ
∂ 2 ln p(x; θ) 1 N −1
= − 2 ∑n
∂A∂B σ n=0
∂ 2 ln p(x; θ) 1 N −1 2
2
= − 2 ∑n
∂B σ n=0

 This leads to a Fischer Information Matrix of:

 N −1
  N ( N −1) 
1  N ∑ n   N
2 
I(θ) = − 2 N −1 n=0
=
σ  n N −1
2 N ( N −1) N ( N −1)(2N −1) 
∑ ∑ n  
n=0 n=0
  2 6 

Page 19
Example: Line Fitting

 Inverting the Fischer Information Matrix yields:


 2(2N −1) 6 
 N ( N +1) −
N ( N +1) 
I (θ) = σ 
−1 2

− 6 12 
 N ( N +1) N ( N −1) 
2

 This means 2
2(2 N − 1)σ
var(Aˆ ) ≥
N ( N +1)
2
12 σ
var(Bˆ ) ≥
N ( N 2 −1)
 Important Observations:
 The CRLB of A is higher than of a single DC Level in WGN
  General Result: The CRLB always increases with the number
of estimated paramters!
Page 20
Example: Line Fitting

 The CRLB for B is lower than for A (for N>2)


◊ The lower bound for estimation of B decrease with oder 1/N3
oposed to 1/N for A
◊  B is easier to estimate than A
◊ Intuitive Explanation: changes of B are magnified by n
 Finding Estimators for A and B:
 The derivatives  ∂ ln p(x; θ)   1
N −1

∂ ln p(x; θ)    σ2 ∑ ( x[n] − A − Bn) 
= ∂A =  Nn−=10 
∂ ln p(x; θ)  1
∂θ    2 ∑( x[n] − A − Bn)n
 ∂B  σ n=0 
 can be rewritten as (after some manipulations) as
 ∂ ln p(x; θ)   2(2N −1) 6 
 N ( N +1) −
∂ ln p(x; θ)  ∂A  N ( N +1)   Aˆ − A
 =σ  
2
= 
∂θ ∂ ln p (x; θ) − 6 12 ˆ
 B − B
 
 ∂B   N ( N +1) N ( N 2 −1)  Page 21
Example: Line Fitting

 with
N −1 N −1
2(2 N − 1) 6
Aˆ = ∑
N ( N +1) n=0
x[n] − ∑
N ( N +1) n=0
nx[n]
N −1 N −1
6 12
Bˆ = − ∑
N ( N +1) n=0
x[n] + 2 ∑
N ( N −1) n=0
nx[n]

 This means for these estimators the CRLB is satisfied with


equality, hence they are efficient MVU estimators.

Page 22

You might also like