Adaptive and Statistical Signal Processing

Adaptive and Statistical Signal Processing
Contents
Course No:
1. Random Variables and Vectors
2. Random Variables and Vectors, Intro to Estimation
3. MVU, Cramer Rao Lower Bound
4. Linear Models, Best Linear Unbiased Estimator
5. Least Squares and Maximum Likelihood Estimation
6. Introduction to Bayes Estimation
7. Linear Bayes Estimation
8. Linear Bayes Estimation, Stochastic Processes
9. Stochastic Processes
10. Wiener Filter
11. Wiener Filter
12. Least Squares Filter
Page 2
Minimum Variance Unbiased Estimator (MVU)
Remember: Quality Criterion for estimator:

mean squared error: E((θ −θˆ) )should be minimal
2
ˆ
let e = θ −θ then E ((θ − θˆ) 2
) = σ 2
− µ
2
e e
with µe and σe2 as the mean and variance of the estimation error e,
respectively, µe is also called the bias of the estimator.
For a minimum variance unbiased estimatior (MVU) our aim is to find
an estimator that has zero bias (µe =0), i.e. E(θˆ) = θ with minimum
variance σe2 of the error
Please note a MVU does not necessarily is the estimator with the
minimum mean squared error, in fact there might be an estimator
that is biased (µe ≠0) but has a lower mean squared error.
However, from a practical point of view, MUVs are often easier to
find that estimators minimizing the mean squared error.
Often it is even impossible to find an MVU. In this case we even have
to use additional constraints, e.g. to find best linear unbiased
estimator (BLUE)
Page 3
Minimum Variance Unbiased Estimator (MVU)
Please note that for an MVU estimator, σe2 must be minimum for every θ!!!
Taken from Kay: Fundamentals of Statistical Signal Processing, Vol 1: Estimation Theory, Prentice Hall, Upper Saddle River
Similar definition of an MVU for a vector parameter θ:

ˆ) = θ
Unbiased: E(θ
Minimum Variance: var(θi) is minimal for all i amoung all unbiased
estimators
Additional advantage of unbiased estimators:
if multiple estimators θˆ1,θˆ2 ,...,θˆm for a parameter
m
exist they
1
can be combined by averaging θˆ = ∑θˆi
m i=1
leading to an unbiased estimator, e.g. m estimators with equal variance
σ2 2
2 σ
σ θ =
m
Page 4
How to find the MVU?
Two questions:
Does the MVU exist?
How to find it?
Possible options to answer these questions:
Determining a lower bound on the estimator’s variance and
checking if there is an estimator satisfying this bound
Further restricting the estimator, e.g. to be linear
Rao-Blackwell-Lehmann-Scheffe Theorem, Maximum likelihood
estimator,…
Page 5
Cramer-Rao-Lower Bound
Provide an easy way to determine a lower bound on the

estimator performance, i.e. a better estimator cannot exist
The MVU estimator does not necessarily attain the CRLB
Sometimes the best estimator is provided by the CRLB directly
Page 6
The Cramer-Rao Lower Bound (CRLB) is given by1)
This means that the variance of every unbiased estimator must

be higher or equal than the CRLB
If the CRLB exists and one can write
then the MVU estimator is given by 1

ˆ
θ = g (x) and has the variance I (x)
 ∂2 ln p(x;θ ) 
where I(x) is the so called Fischer information: I (x) = −E  2 
 ∂ θ 
i.e. then the MVU satisfies the CRLB with equality
 ∂ ln p(x;θ ) 
1) if the pdf p(x;θ) satisfies the “regularity” condition: E   = 0 for all θ Page 7
 ∂θ 
cf. Information Theory: Information is –logb p(x)

1
Intuitive explanation for var(θ ) ≥
I (θ )
The Fischer information can be seen as the “amount of information”
about θ that is available in the data
The more information is available for estimation, the lower the
estimation variance!
Important properties of –ln p(x;θ)
non-negativity
N −1
additive for independent RV: ln p(x;θ ) =∑ln p( x[n];θ )
n=0
 ∂2 ln p(x;θ )  N −1  ∂ 2 ln p( x[n];θ ) 
− E
2  = −∑ E  2 
 ∂ θ  n =0  ∂θ 
CLRB lowers when using additional (independent) random variables

Page 8
Efficient Estimators
An unbiased estimator is said to be efficient if it attains the CRLB:

it uses all the available data efficiently
An efficient estimator is always the MVU but the MVU is not necessarily
an efficient estimator
Taken from Kay: Fundamentals of Statistical Signal Processing, Vol 1: Estimation Theory, Prentice Hall, Upper Saddle River
Page 9
Example: DC Level in white gaussian noise:

x[n] = A + w[n]
w[n] is WGN with variance σ2
1  1 N −1 2
p(x; A) = N
exp  − 2 ∑( x[n] − A) 
2 2  2σ n=0 
(2πσ )
Taking the first derivative
∂ ln p(x; A) ∂  N
1 N −1 2 1 N −1
=  − ln[(2πσ ) ] − 2 ∑( x[n] − A)  = 2 ∑( x[n] − A)
2 2
∂A ∂A  2σ n=0  σ n=0
N
= 2
( x − A)
σ N −1
1
with g ( x) = x =
N
∑ x[n]
n =0
as the sample mean.
Page 10
Differentiating again gives:

∂ 2 ln p(x; A) N
2
=− 2
∂ A σ
This leads to the CLRB:
σ2
var(Aˆ ) ≥
N
By using the result from the first derivative one obtains:
N
2
( x − A) = I ( A)( g (x) − A)
σ
This means for DC level in WGN the sample mean is the MVU
estimator!
Page 11
General CRLB for Signals in White Noise
Assume a deterministic signal with an unknown parameter θ is

observed in WGN as:
x[n] =s[n;θ ] + w[n] n = 0,1,..., N − 1
The pdf of x depending on the parameter θ is given as:
1  1 N −1

p(x;θ ) = N
exp − 2
 2σ
∑
n =0
( x[n] − s[n;θ ]) 

2
2 2
(2πσ )
Differentiating once produces:
∂ 2 ln p(x;θ ) 1 N −1
∂s[n;θ ]
∂θ 2
= 2
σ
∑
n =0
( x[n] − s[n;θ ])
∂θ
and a second differentiation results in:
∂ 2 ln p(x;θ ) 1 N −1  ∂ 2 s[n;θ ]  ∂s[n;θ ]  
2
∂θ 2
=− 2
σ
∑
n =0 
( x[n] − s[n;θ ])
∂θ 2
−
 ∂θ  
 

Page 12
General CRLB for Signals in White Noise
Taking the expectation value results in:

2
 ∂ 2 ln p(x;θ )  1 N −1
 ∂s[n;θ ] 
E 
∂θ 2
 = − 2
σ
∑ 
n =0  ∂θ 

 
This leads to the CRLB for signals in White Noise:
σ2
var(θˆ) ≥ 2
N −1
 ∂s[n;θ ] 
∑ 
n =0  ∂θ 

The form of the bound demonstrates the importance of the

signal dependence on θ.
Signals that change rapidly as the unknown parameter changes
result in accurate estimators
E.g. as we have seen with the DC level in WGN: s[n;θ]= θ
produces a CRLB of σ2/N.
Page 13
Transformation of Parameters
Assume the we wish to estimate a parameter that is a function

g(θ) of some more fundamental parameter θ and already know
the CRLB for θ
Then the CRLB for g(θ) can be obtained by (without proof):
Letα = g (θ ) and α̂ be an estimator of α
Then the CRLB for α is
2
 ∂g 
 
var(αˆ ) ≥  ∂θ 
 ∂ 2 ln p(x;θ ) 
− E 2 
 ∂θ 
Page 14
Transformation of Parameters
But: be carefull!
Non-linear transformations destroy the efficiency
e.g. DC Level in WGN, Estimator for A2
◊ Square of the sample mean: x 2 might be a resonable estimator
◊ But is not even unbiased anymore
σ2
E ( x 2 ) =E 2 ( x ) + var( x ) =A2 + ≠ A2
N
On the other hand: affine (linear) transformations
∧
g (θ ) = g (θˆ) = aθˆ + b
preserve efficiency!
Page 15
CRLB for Vector Paramters
Commonly: vector of parameters θ=[θ1,θ2,…., θp]T

Without proof the CRLB is found as the [i,i] element of the
inverse of the Fischer Information Matrix I(θ)
[
var(θˆi ) ≥ I −1 (θ ) ] ii
With the Fischer Information Matrix defined by:

 ∂ 2 ln p(x;θ ) 
[I(θ )]ij = − E   for i = 1,..., p and j = 1,..., p
 ∂θ i ∂θ j 
A Fischer Information Matrix is always symmetric

In practice I(θ) is assumed to be positive definite and hence
invertible
Page 16
CRLB for Vector Paramters
More formally: The covariance matrix Cθ) of any unbiased

ˆ satisfies:1)
estimator θ
Cθ) − I −1 (θ) ≥ 0
Furthermore an unbiased estimator θˆ = g (x) may be found that
attains the bound if and only if
∂ ln p(x; θ)
= I(θ)(g (x) − θ)
∂θ
Again this MVU estimator is then efficient

Here p(x) is a p dimensional function!
The CRLB represents a powerful tool to find efficient estimators
for vector parameters.
 ∂ ln p(x; θ) 
1) if the pdf p(x;θ) satisfies the “regularity” condition: E   = 0 for all θ Page 17
 ∂θ 
Example: Line Fitting
Consider the problem x[n] = A + Bn + w[n]

w[n] is again WGN
The parameter vector θ = [ A, B]
1  1 N −1 2
p(x; θ) = N
exp − 2 ∑( x[n] − A − Bn) 
2 2  2σ n=0 
(2πσ )
from which the first derivatives follow as:

∂ ln p(x; θ) 1 N −1
= 2 ∑( x[n] − A − Bn)
∂A σ n=0
∂ ln p(x; θ) 1 N −1
= 2 ∑( x[n] − A − Bn)n
∂B σ n=0
Page 18
and ∂ 2 ln p(x; θ) N
2
=− 2
∂A σ
∂ 2 ln p(x; θ) 1 N −1
= − 2 ∑n
∂A∂B σ n=0
∂ 2 ln p(x; θ) 1 N −1 2
2
= − 2 ∑n
∂B σ n=0
This leads to a Fischer Information Matrix of:
 N −1
  N ( N −1) 
1  N ∑ n   N
2 
I(θ) = − 2 N −1 n=0
=
σ  n N −1
2 N ( N −1) N ( N −1)(2N −1) 
∑ ∑ n  
n=0 n=0
  2 6 
Page 19
Inverting the Fischer Information Matrix yields:

 2(2N −1) 6 
 N ( N +1) −
N ( N +1) 
I (θ) = σ 
−1 2

− 6 12 
 N ( N +1) N ( N −1) 
2
This means 2
2(2 N − 1)σ
var(Aˆ ) ≥
N ( N +1)
2
12 σ
var(Bˆ ) ≥
N ( N 2 −1)
Important Observations:
The CRLB of A is higher than of a single DC Level in WGN
General Result: The CRLB always increases with the number
of estimated paramters!
Page 20
The CRLB for B is lower than for A (for N>2)

◊ The lower bound for estimation of B decrease with oder 1/N3
oposed to 1/N for A
◊ B is easier to estimate than A
◊ Intuitive Explanation: changes of B are magnified by n
Finding Estimators for A and B:
The derivatives  ∂ ln p(x; θ)   1
N −1

∂ ln p(x; θ)    σ2 ∑ ( x[n] − A − Bn) 
= ∂A =  Nn−=10 
∂ ln p(x; θ)  1
∂θ    2 ∑( x[n] − A − Bn)n
 ∂B  σ n=0 
can be rewritten as (after some manipulations) as
 ∂ ln p(x; θ)   2(2N −1) 6 
 N ( N +1) −
∂ ln p(x; θ)  ∂A  N ( N +1)   Aˆ − A
 =σ  
2
= 
∂θ ∂ ln p (x; θ) − 6 12 ˆ
 B − B
 
 ∂B   N ( N +1) N ( N 2 −1)  Page 21
with
N −1 N −1
2(2 N − 1) 6
Aˆ = ∑
N ( N +1) n=0
x[n] − ∑
N ( N +1) n=0
nx[n]
N −1 N −1
6 12
Bˆ = − ∑
N ( N +1) n=0
x[n] + 2 ∑
N ( N −1) n=0
nx[n]
This means for these estimators the CRLB is satisfied with

equality, hence they are efficient MVU estimators.
Page 22

Adaptive and Statistical Signal Processing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Adaptive and Statistical Signal Processing

Uploaded by

Copyright:

Available Formats

Adaptive and Statistical Signal Processing

Remember: Quality Criterion for estimator:

Similar definition of an MVU for a vector parameter θ:

Provide an easy way to determine a lower bound on the

The Cramer-Rao Lower Bound (CRLB) is given by1)

This means that the variance of every unbiased estimator must

then the MVU estimator is given by 1

cf. Information Theory: Information is –logb p(x)

CLRB lowers when using additional (independent) random variables

An unbiased estimator is said to be efficient if it attains the CRLB:

Example: DC Level in white gaussian noise:

Differentiating again gives:

Assume a deterministic signal with an unknown parameter θ is

Taking the expectation value results in:

The form of the bound demonstrates the importance of the

Assume the we wish to estimate a parameter that is a function

Commonly: vector of parameters θ=[θ1,θ2,…., θp]T

With the Fischer Information Matrix defined by:

A Fischer Information Matrix is always symmetric

More formally: The covariance matrix Cθ) of any unbiased

Again this MVU estimator is then efficient

Consider the problem x[n] = A + Bn + w[n]

from which the first derivatives follow as:

This leads to a Fischer Information Matrix of:

Inverting the Fischer Information Matrix yields:

The CRLB for B is lower than for A (for N>2)

This means for these estimators the CRLB is satisfied with

You might also like