You are on page 1of 35

Introduction of

Weiner Filter
R01943128
Darcy Tsai

Weiner Filter
Darcy Tsai ()
E-mailraulshepherd@access.ee.ntu.edu.tw
Graduate Institute of Electronics Engineering
Nation Taiwan University, Taipei, Taiwan, ROC

Abstract
Wiener theory, formulated by Norbert Wiener in 1940, forms the foundation of
data-dependent linear least square error filters. Wiener filters play a central role in a
wide range of applications such as linear prediction, echo cancellation, signal
restoration, channel equalization and system identification. The coefficients of a
Wiener filter are calculated to minimize the average squared distance between the
filter output and a desired signal. In its basic form, the Wiener theory assumes that the
signals are stationary processes. However, if the filter coefficients are periodically
recalculated for every block of N signal samples then the filter adapts itself to the
average characteristics of the signals within the blocks and becomes block-adaptive. A
block-adaptive (or segment adaptive) filter can be used for signals such as speech and
image that may be considered almost stationary over a relatively small block of
samples. In this chapter, we study Wiener filter theory, and consider alternative
methods of formulation of the Wiener filter problem. We consider the application of
Wiener filters in channel equalization, time-delay estimation and additive noise
reduction. A case study of the frequency response of a Wiener filter, for additive noise
reduction, provides useful insight into the operation of the filter. We also deal with
some implementation issues of Wiener filters.

Keywords
Weiner filter, Optimum linear filters, Minimum mean squared error (MMSE)
.
1

Content
1
2

Introduction ............................................................................................................ 3
Linear Optimum Filtering ...................................................................................... 4
2.1
Problem Statement ................................................................................. 4
2.2
Principle of Orthogonalaity.................................................................... 6
2.3
Minimum Mean Squared Error ............................................................ 11
Wiener-Hopf Filters ............................................................................................. 11
3.1
Wiener-Hopf Equations ....................................................................... 11
3.2
Matrix Formulation of the Wiener-Holf Equations ............................. 14

3.3
Error Performance Surface .................................................................. 15
3.4
Numeral Example ................................................................................ 17
Some Applications of Wiener Filters................................................................... 25

5
6

4.1
Wiener Filter for Additive Noise Reduction ........................................ 25
4.2
Wiener Channel Equalizer ................................................................... 28
4.3
Time-Alignment of Signals in Multichannel Systems ......................... 29
Implementation of Wiener Filters ........................................................................ 31
Further Comments ............................................................................................... 33

1 Introduction
Wiener filters are a class of optimum linear filters which involve linear
estimation of a desired signal sequence from another related sequence. In the
statistical approach to the solution of the linear filtering problem, we assume the
availability of certain statistical parameters (e.g. mean and correlation functions) of
the useful signal and unwanted additive noise. The problem is to design a linear filter
with the noisy data as input and the requirement of minimizing the effect of the noise
at the filter output according to some statistical criterion. A useful approach to this
filter-optimization problem is to minimize the mean-square value of the error signal
that is defined as the difference between some desired response and the actual filter
output. For stationary inputs, the resulting solution is commonly known as the Weiner
filter. Its main purpose is to reduce the amount of noise present in a signal by
comparison with an estimation of the desired noiseless signal.

2 Linear Optimum Filtering


2.1

Problem Statement

Consider the block diagram of Fig.1 built around a linear discrete-time filter.
The filter input consists of a time series x(0), x(1), x(2), , and the filter is itself
characterized by the impulse response w0, w1, w2, At some discrete time n, the
filter produces an output denote by y(n). This output is used to provide an estimate
of a desired response denoted by d(n). With the filter input and the desired response
representing single realizations of respective stochastic processes, the estimation is
accompanied by an error with statistical characteristics if its own. In particular, the
estimation error e(n) as small as possible in some statistical sense. Two
restriction s have so far been placed on the filter:

1. The filter is linear, which makes the mathematical analysis easy to handle.
2. The filter operates in discrete time, which makes it possible for the filter to
be implemented using digital hardware/software.

Given input samples and filter coefficients

Input
x(0), x(1), x(2),

Linear Discrete-Time Filter


w0, w1, w2,

Conditions at time n

Output

Desired Signal
d(n)

y(n)

e(n)

Estimation
Error

Fig. 1 Block diagram representation of the statistical filtering problem

The final details of the filter specification, however, depend on two other choices
that have to be made:

1. Whether the impulse response of the filter has finite or infinite duration.
2. The type of statistical criterion used for the optimization.

The choice of a finite-duration impulse response (FIR) or an infinite-duration


impulse response (IIR) for the filter is dictated by practical considerations. The
choice of a statistical criterion for optimizing the filter design is influenced bt
mathematical tractability. These two issues are considered in turn.

For the initial developed includes that for FIR filters as a special case.
However, for much of the material presented in this tutorial, we will confine our
attention to the use if FIR filters. We do so for the following reason. An FIR filter
is inherently stable, because its structure involves the use of forward paths only. In
others words, the only mechanism for input-output interaction in the filter us via
forward paths from the filter input to its output. Indeed, it is this form of signal
transmission through the filter that limits its impulse response to a finite duration.
On the other hand, an IIR filter involves both feedforward and feedback. The
presence of feedback means that portions of the filter output and possibly other
internal variables in the filter are fed back to the input. Consequently, unless it is
properly designed, feedback in the filter can indeed make it unstable with the
result that the filter oscillates; this kind of operation is clearly unacceptable when
the requirement is that of filtering for which stability is a must. By itself, the
stability problem in IIR filters us manageable in both theoretical and practical
terms. However, when the filter us required to be adaptive, bringing with it
stability problems of its own, the inclusion of the adaptivity combined with
feedback that is inherently present in an IIR filter makes a difficult problem that
much more difficult to handle. It is for this reason that we find that in the majority
if applications requiring the use if adaptivity, the use if an FIR filter is preferred
over IIR filter even through the latter is less demanding in computational
requirements.

Turning next to the issue of what criterion to choose for statistical


optimization, there are indeeded several criteria that suggest themselves.
Specifically, we may consider optimizing the filter design by minimizing a cost
5

function, or index of performance, selected form the following short list of


possibilities:

1. Mean-square value if the estimation error


2. Expectation of the absolute value of the estimation error
3. Expectation of third or higher powers of the absolutely value of the
estimation error

Option 1 has a clear advantage over the other two, because it leads to tractable
mathematics. In particular, the choice of the mean-square error criterion results in
a second order dependence for the cost function on the unknown coefficients in
the impulse response of the filter. Moreover, the cost function has a distinct
minimum that uniquely defines the optimum statistical design of the filter.

We may now summarize the essence of the filtering problem it making the
following statement:

Design a linear discrete-time filter whose output y(n) provides an estimate of a


desired signal response d(n), given a set of input samples x(0), x(1), x(2), , such
that the mean-square value of the estimation error e(n), defined as the difference
between the desired response d(n) and the actual response y(n), is minimized.

We may develop the mathematical solution to this statistical optimization


problem by following two entirely different approaches that are complementary.
One approach leads to the development of an important theorem commonly
known as the principle of orthogonality. The other approach highlights the
error-performance surface that describes the second-order dependence of the cost
function one the filter coefficients. We will proceed by deriving the principle of
orthogonality first, because the derivation is relatively simple and because the
principle of orthogonality is highly insightful.

2.2

Principle of Orthogonality

Consider again the statistical filtering problem described in Fig1. We have a


set of samples{x(n)} and desired{d(n)} coming from a jointly wide sense
stationary (WSS) process with zero mean. Suppose now we want to find a linear
estimate of d(n) based on the L-most recent samples of x(n), i.e.,
L 1

y (n) w X (n) wk x(n k ),


T

w, X (n) R L

and

n 0,1,2,......

(1)

k 0

The introduction of a particular criterion to quantify how well d(n) is estimated


by y(n) would influence how the coefficients wk will be computed. We propose
to use the Mean Squared Error (MSE), which is defined by

J MSE ( w) E[| e(n) |2 ] E[| d (n) y (n) |2 ]

(2)

where E[] is the expectation operator and e(n) is the estimation error. Then, the
estimation problem can be seen as finding the vector w that minimizes the cost
function JMSE(w). The solution to this problem is sometimes called the stochastic
least squares solution. If we choose the MSE cost function (2), the optimal
solution to the linear estimation problem can be presented as:

wopt (w) a r gm iL nJ M S (Ew)


wR

(3)

Replacing (1) in (2), the latter can be expanded as

J MSE ( w) E[| d (n) |2 2d (n) X (n)T w wT X (n) X T (n) w]

(4)

As this is a quadratic form, the optimal solution will be at the point where the
cost function has zero gradient, i.e.,

w | J MSE ( w)

J MSE
0 Lx1
w

(5)

or in other words, the partial derivative of JMSE with respect to each coefficient
wk should be zero. Under this set of conditions the filter is said to be optimum in
the mean-squared-error sense. Using (1) in (2), we can compute the gradient as:

J MSE
e(n)
2 E[e(n)
] 2 E[e(n) X (n)]
w
w

(6)

Then, at the minimum, the condition that should hold is:

E[em i (nn) X (n)] 0 Lx1

(7)

or equivalently

E[emin (n) x(n k )] 0

k 0,1,2,...., L 1

(8)

This is called the principle of orthogonality, and it implies that the optimal
condition is achieved if and only if the error e(n) is decorrelated from the
samples x(nk), k = 0,1,..., L1. Actually, the error will also be decorrelated
from the estimate y(n) since:

E[emin (n) yopt (n)] E[emin (n)wT opt (n) X (n)] wT opt E[emin (n) X (n)] 0

(9)

Fig.2

yopt(n)

We may thus state the corollary to the principle of orthogonality as follows:


When the filter operates in its optimum condition, the estimate of the desired
response defined by the filter output yopt(n), and the corresponding estimation
error, emin(n), are orthogonal to each other.
Equation (9) offers an interesting geometric interpretation of the conditions that
exist at the output of the optimum filter, as illustrated in Fig.2 for case L = 2. In
this figure, the desired response, the filter output, and the corresponding
estimation error are represented by vectors labeled d, yopt, and emin, respectively.
We see that for the optimum filter the vector representing the estimation error is
normal (i.e. perpendicular) to the vector representing the filter output. It should,
however, be emphasized that the situation depicted in Fig.2 is merely an analogy,
where random variables and expectations are replaced with vectors and vector
inner products, respectively. Also for obvious reasons the geometry depicted in
this figure may be viewed as a Statisticians Pythagorean Theorem.

2.3

Minimum Mean-Squared Error


When the linear discrete-time filter in Fig.1 operates in its optimum condition,

it takes on the following special form:

em i (nn) d ( n) y opt (n)

(10)

Rearranging the terms, we have:

d ( n) y opt ( n) em i (nn)

(11)

Let JMSE denotes the MMSE, defined by:

J MSE E[| em in ( n) |2 ]

(12)

Hence, evaluating the mean-square values of both sides of (11), and applying to it
the corollary to the principle of orthogonality described by (9), we get:

d2 y2 J M S E
o p t

(13)

where 2 is the variance of the desired response, and 2 is the variance of


the estimate yopt ; both of these random variables are assumed to be of zero mean.
Solving (13) for MMSE, we get:

J M S E d2 y2o p t

(14)

This relation shows that for the optimum filter, the MMSE equals the difference
between the variance of the desired response and the variance of the estimate that
the filter products at its output.

It is convenient to normalize the expression in (14) in such a way that the


minimum value if the mean-squared error always lies between zero and one. We
may do this by dividing both sides of (14) by 2 , obtaining

y
JM S E
1 opt
2
2
2

(15)

Clearly, this is possible because 2 is never zero, expect in the trivial case of a
desired response d(n) that is zero for all n. Let

J MSE

(16)

d2

The quantity is called the normalized mean-squared error, in terms of which


we may rewrite (15) in the form:

y2

o p t

d2

(16)
10

We note that the ratio can never be negative, and the ratio

y2

opt

is always

d2

positive. We therefore have

0 1

(17)

If is zero, the optimum filter operates perfectly in the sense that there is
complete agreement between the estimate yopt(n) at the filter output and the
desired response d(n). On the other hand, if is unity, there is no agreement
whatsoever between these two quantities; this corresponds to the worst possible
situation.

Wiener-Hopf Filter
3.1

Wiener-Hopf Equation
Consider a signal x(n) as the input to a finite impulse response (FIR) filter

of length L, as shown in Fig.3, wT = [wT,0, wT,1, , wT,L-1]T. This filtering


operation generates an output:

y (n) wTT X (n)

(18)

with X(n) = [x(n), x(n-1), , x(n-L+1)]T. As the output of the filter is observed,
it can be corrupted by an additive measurement noise v(n), leading to a linear
regression model for the observed output

d (n) wTT X (n) v(n)

(19)

FIR Filter
X(n)

Z-1
w0(n)

Z-1
w1(n)

w2(n)

wL-1(n)

y(n)

d(n)

e(n)

Fig. 3 FIR Filter


11

It should be noticed that this linear regression model can also be used even if the
input-output relation of the given data pairs [x(n), d(n)] is nonlinear, with wT
being a linear approximation to the actual relation between them. In that case, in
v(n) there would be a component associated to the additive noise perturbations,
but also another one representing, for example, modeling errors.

In the context of (19), we can look at wT as the quantity to be estimated by a


linear filter w RL, with (1) giving the output of this filter. This output can still
be seen as an estimate of the reference signal d(n) or the systems output y(n).
Therefore, the problem of optimal filtering is analogous to the one of linear
estimation. When JMSE is the cost function to be optimized, the orthogonality
principle (7) holds, which can be put as:

T
E[em i (nn) X (n)] Ed (n) wopt
X (n) 0Lx1

(20)

From (20) we can conclude that given the signals x(n) and d(n), we can always
assume that d(n) was generated by the linear regression model (19). To do this,
the system wT would be equal to the optimal filter wopt, while v(n) would be
associated to the residual error emin(n), which will be uncorrelated to the input
x(n).

It should be noticed that (8) is not just a condition for the cost function to
reach its minimum, but also a mean for testing whether a linear filter is operating
in the optimal condition. Here, the principle of orthogonality illustrated in Fig. 2
can be interpreted as follows: at time n the input vector X(n) = [x(n), x(n-1)]T
will pass through the optimal filter wopt = [wopt,0 , wopt,1]T to generate the output
yopt(n). Given d(n), yopt(n) is the only element in the space spanned by x(n) that
leads to an error e(n) that is orthogonal to x(n), x(n 1), and yopt(n).
Now we focus on the computation of the optimal solution. From (20), we
have:

12

E[ X (n) X T (n)]wopt E[ X (n)d (n)]

(21)

We introduce the following definitions

R X E[ X (n) X T (n)] and

rXd E[ X (n)d (n)]

(22)

for the input autocorrelation matrix and the cross correlation vector, respectively.

The two expectations in (21) may be interpreted as follows:


1.

The expectation

L 1

E[ X (n) X T (n)] E ( x(n k ) x(n i ))

k 0,1,2,......

i 0

is equal to the autocorrelation function of the filter input for a lag of i-k. We may
thus express this expectation as

r (i k ) E ( x(n k ) x(n i))


(23)

2.

The expectation

rXd E[ X (n)d (n)] E ( x(n k )d (n))

k 0,1,2,......

is equal to the cross-correlation between the filter input x(n-k) and the desired
response d(n) for a lag of k. We may thus express this second expectation as

p(k ) E ( x(n k )d (n))

(24)

13

Accordingly, using the definitions of (23)(24) in (21), we get an infinitely large


system of equations as the necessary and sufficient condition for the optimality of
the filter:
L 1

w
i 0

i
opt

r (i k ) p (k )

k 0,1,2...

(25)

The system of equations (25) defines the optimum filter coefficients, in the most
general setting, in terms of two correlation functions: the autocorrelation function
of the filter input, and the cross-correlation between the filter input and the
desired response. These equations are called WienerHopf equations.

3.2

Matrix Formulation of the Wiener-Holf Equations


Let RX denote the L-by-L correlation matrix of the tap inputs x(n), x(n-1), ,

x(n-L+1) in the FIR filter of the Fig.3. In expanded form, we have

r (1)
r ( L 1)
r (0)
r (1)
r (0)
r ( L 2)

RX

r (0)
r ( L 1) r ( L 2)

Correspondingly, let rXd denote the L-by-1 cross0correlation vector between the
tap inputs of the filter and the desired response d(n). In expanded form, we have

rXd p(0), p(1),., p(1 L)

Note that as the joint process is WSS, the matrix Rx is symmetric, semi-positive
definite and Toeplitz. Using these definitions, equation (25) can be put as:

R X wopt rXd

(26)

14

This is the compact matrix form of a set of L equations known as WienerHopf


equations and provides a way for computing the optimal filter (in MSE sense)
based on some statistical properties of the input and reference processes. Under
the assumption on the positive definiteness of Rx (so that it will be nonsingular),
the solution to (26) is:

wopt RX1rXd

(27)

which is known as the Wiener filter. An alternative way to find it is the following.

3.3

Error Performance Surface

Using the definitions (21) into (4) results in

T
J MSE ( w) E[| d (n) |2 ] 2rXd
w wT R X w

(28)

In addition, it can be easily shown that the following factorization holds:

T
T
wT R X w 2rXd
w ( R X w rXd )T R X1 ( R X w rXd ) rXd
R X1rXd

(29)

Replacing (2.18) in (2.17) leads to

T
J MSE ( w) E[| d (n) |2 ] rXd
R X1rXd ( w R X1rXd ) T R X ( w R X1rXd )

(30)

Using the fact that Rx is positive definite (and therefore, so is its inverse), it turns
out that the cost function reaches its minimum when the filter takes the form of
(30), i.e., the Wiener filter. The minimum MSE value (MMSE) on the surface (30)
is:

15

T
J MMSE (w) J MMSE (wopt ) E[| d (n) |2 ] rXd
RX1rXd E[| d (n) |2 ] E[| yopt (n) |2 ] (31)

( )

We could have also arrived to this result by noticing that

( )

( ) and using the orthogonality principle as in (2.9). Therefore, the MMSE


is given by the difference between the variance of the reference signal d(n) and
the variance of its optimal estimate yopt(n).
It should be noticed that if the signals x(n) and d(n) are orthogonal (rxd = 0),
the optimal filter will be the null vector and

[| ( )|2 ]. This is

reasonable since nothing can be done with the filter w if the input signal carries
no information about the reference signal (as they are orthogonal). Actually, (28)
shows that in this case, if any of the filter coefficients is nonzero, the MSE would
be increased by the term

, so it would not be optimal. On the other hand,

if the reference signal is generated by passing the input signal through a system
wT as in (19), with the noise v(n) being uncorrelated from the input x(n), the
optimal filter will be:

wopt RX1rXd RX1EX (n)[ X T (n)wT v(n)] wT

(32)

This means that the Wiener solution will be able to identify the system wT with a
resulting error given by v(n). Therefore, in this case

[| ( )|2 ]

Finally, it should be noticed that the autocorrelation matrix admits the


eigen-decomposition:

R X QQ T

(33)

with being a diagonal matrix determined by the eigenvalues 0, 1, . . . , L1


of Rx, and Q a (unitary) matrix that has the associated eigenvectors q0, q1, . . . ,
qL1 as its columns.

Lets define the misalignment vector (or weight error vector)

16

~ w w
w
opt

(34)

~
u QT w

(35)

and its transformed version

Using (27)(30)(31)(33)(34)(35)

J M S (Ew) J M M S E u T u

(36)

This is called the canonical form of the quadratic form J MSE(w) and it contains no
cross-product terms. Since the eigenvalues are non-negative, it is clear that the
surface describes an elliptic hyperparaboloid, with the eigenvectors being the
principal axes of the hyperellipses of constant MSE value.

3.4

Numeral Example
To illustrate the filtering theory developed above, we consider the example

depicted in Fig.4. The desired response d(n) is modeled as an AR


(Auto-Regressive) process of order 1; that is, it may be produced by applying a
white-noise process v1(n) of zero mean and variance 2

to the input of an

all-pole filter of order 1, whose transfer function equals:

H1 ( z )

1
1 0.8458z 1

(37)

The process d(n) is applied to a communication channel modeled by the all pole
transfer function

H 2 ( z)

1
1 0.9 4 5z81

(38)

The channel output x(n) is corrupted by an additive white-noise process v2(n) of


zero mean and variance 2

, so a sample of the received signal u(n) equals


17

u ( n) x ( n) v 2 ( n)

(39)

The white-noise processes v1(n) and v2(n) are uncorrelated. It is also assumed that
d(n) and u(n), and therefore v1(n) and v2(n), are all real valued.

v1(n)

d(n)

Z-1

d(n-1)

0.8458

(a)
v2(n)
x(n)
d(n)

u(n)

Z-1

0.9458

x(n-1)

(b)

Fig. 4 (a) Autoregressive model of desired response d(n) (b) model of noisy
communication channel

The requirement is to specify a Wiener filter consisting of a transversal filter


with two taps, which operates on the received signal u(n) so as to produce an
estimate of the desired response that is optimum in the mean-squared sense.

Statistical Characterization of the Desired Response & Received


Signal
We begin the analysis by considering the difference equations that characterize
the various processes desired by the models of Fig.4. First, the generation of the
desired response d(n) is governed by the first-order difference equation

d (n) a1d (n 1) v1 (n)

(40)
18

Where a1 = 0.8458. The variance if the process d(n) equals

12

0.9486
1 12
2
d

(41)

The process d(n) acts as input to the channel. Hence, form Fig.4, we find that the
channel output x(n) is related to the channel input d(n) by the first-order
difference equation

x(n) b1 x(n 1) d (n)

(42)

where b1 = -0.9458. We also observe from the two parts of Fig.4 that the channel
output x(n) may be generated by applying the white noise process v1(n) to a
second-order all=pole filter whose transfer function equals

H ( z ) H1 ( z ) H 2 ( z )

(43)

Accordingly, x(n) is a second-order AR process described by the difference


equation

x(n) a1 x(n 1) a2 x(n 2) v(n)

(44)

where a1 = -0.1 and a2= -0.8. Note that both AR process d(n) and x(n) are WSS.
To characterize the Wiener filter, we need to solve the Wiener-Holf equations
(26). This set of equations requires knowledge of two quantities: (1) the
correlation matrix R pertaining to received signal u(n), and (2) the
cross-correlation vector rXd between u(n) and the desired response d(n). In our

19

example, R is a 2-by-2matrix and rXd is a 2-by-1 vector, since the FIR filter used
to implement the Wiener filter is assumed to have two taps.

The received signal u(n) consists if the channel output x(n) plus the additive
white noise v2(n). Since the process x(n) and v2(n) are uncorrelated, it follows that
the correlation matrix R equals the correlation matrix of x(n) plus the correlation
matrix if v2(n). That is,

R RX R2

(45)

For the correlation matrix RX, we write

r (0) rx (1)
RX x

rx (1) rx (0)
Where rx(0) and rx(1) are the autocorrelation functions of the received signal
x(n)for lags of 1 and 1,repectively. We have:

rX (0) x2 1

and rX (1) 0.5

Hence,

1 0.5
RX

0.5 1

(46)

Next we observe that since v2(n) is a white-noise process of zero mean and
variance 22 0.1 , the 2-by-2 correlation matrix R2 of this process results

0.1 0
R2

0 0.1

(47)

20

Thus, substituting (46) (47) in (45) we find that the 2-by-2 correlation matrix of
the received signal x(n) equals

1.1 0.5
R

0.5 1.1

(48)

For the 2-by-1 cross-correlation vector rXd , we write

p(0)
rXd

p(1)

where p(0) and p(-1) are the cross-correlation functions between d(n) and u(n) for
lags of 0 and -1, respectively. Since these two processes are real valued, we have

p(k ) p(k ) u (n k )d (n),

k 0,1

(49)

Substituting (42) in (49), and recognizing that the channel output x(n) is
uncorrelated with the white-noise process v2(n), we get

p(k ) rx (k ) b1rx (k 1)

k 0,1

(50)

Putting b1= -0.9458 and using the element values fit the correlation matrix R,
given in (46), we obtain

p(0) rX (0) b1rx (1) 1 0.9458 x0.5 0.5272


p(1) rX (1) b1rx (0) 0.5 0.9458 x01 0.4458

Hence,

21

0.5272
rXd

0.4458

(51)

Error-Performance Surface
The dependence of the mean-squared error on the 2-by-1tap-weight vector w is
defined by (28). Hence, substituting (41) (48) and (51) into (28), we get

w
1.1 0.5 w0
J ( w0 , w1 ) 0.9486 2[0.5272,0.4458] 0 [ w0 , w1 ]

0.5 1.1 w1
w1
0.9486 1.0544 w0 0.8916 w1 w0 w1 1.1( w02 w12 )

Using a three-dimensional computer plot, the mean-squared error J ( w0 , w1 ) is


plotted versus the tap weights w0 and w1 . The result is shown in Fig.5.

Fig.6 shows contour plots of the tap weight w1 versus w0 for varying valuesof
the mean-squared error J. We see that the locus of w1 versus w0 for a fixed J is
in the form of an ellipse. The elliptical locus shrinks in size as the mean-squared
error J approaches the minimum value Jmin. For J = Jmin, the locus reduces to a
point with coordinates wo0 and wo1.

Wiener Filter
The 2-by-1 optimum tap-weight vector wo of the Wiener filter is defined by
(27). In particular, it consists of the inverse matrix R-1 multiplied by the
cross-correlation vector rXd . Inverting the correlation matrix R of (48), we get

22

Fig. 5 Error performance surface of the two-tap FIR filter described in the numerical
example

Fig. 6 Contour plots of the error-performance surface depicted in Fig.5


23

r (0) r (1)
r (0) r (1) 1.1456 0.5208
1
R
2

r (1) r (0) 0.5208 1.1456 (52)


2
r
(
1
)
r
(
0
)
r
(
0
)

r
(
1
)

Hence, substituting (51) (52) into (27), we get the desired result:

1.1456 0.5208 0.5272 0.8360


w0

0.5208 1.1456 0.4458 0.7853

(53)

Minimum Mean-Squared Error


To evaluate the minimum value of the mean-squared error, Jmin, which results
from the use of the optimum tap-weight vector wo, we use (31). Hence,
substituting (41) (51) and (53) into (31), we get

0.8360
J MMSE 0.9486 [0.5272,0.4458]
0.1579
0.7853

(54)

The point represented jointly by the optimum tap-weight vector wo of (53) and the
minimum mean-squared error of (54) defines the bottom of the error-performance
surface in Fig.5, or the center of the contour plots in Fig.6.

Canonical Error-Performance Surface


The characteristic equation of the 2-by-2correlation matrix R of (48) is

(1.1 ) 2 (0.5) 2 0

The two eigenvalues if the correlation matrix R are therefore

1 1.6

2 0.6
24

The canonical error-performance surface is therefore defined by

J MSE (u1 , u 2 ) J MMSE 1.6u12 0.6u 22

(55)

The locus of u2 versus u1, as defined in (55), traces an ellipse for a fixed value of
JMSE - JMMESE. In particular, the ellipse has a minor axis of [( J MSE J MMSE ) / 1 ]1 / 2
along the u1 coordinate and a major axis of [( J MSE J MMSE ) / 2 ]1 / 2 along the u2
coordinate; this assumes that 1 2 , which is how they are related.

Some Applications of Wiener Filters


In this section, we consider some applications of the Wiener filter in reducing
broadband additive noise, in time-alignment of signals in multichannel or
multi-sensor systems, and in channel equalization.

4.1

Wiener Filter for Additive Noise Reduction

Consider a signal x(m) observed in a broadband additive noise n(m)., and model
as:

y(m) x(m) n(m)

(56)

Assuming that the signal and the noise are uncorrelated, it follows that the
autocorrelation matrix of the noisy signal is the sum of the autocorrelation matrix
of the signal x(m) and the noise n(m):

R yy R xx Rnn

(57)

rxy rxx

(58)

and we can also write

25

where Ryy, Rxx and Rnn are the autocorrelation matrices of the noisy signal, the
noise-free signal and the noise respectively, and rxy is the cross-correlation vector
of the noisy signal and the noise-free signal. Substitution of (57) and (58) in the
Wiener filter, yields

w ( R xx Rnn ) 1 rxx

(59)

(59) is the optimal linear filter for the removal of additive noise. In the
following, a study of the frequency response of the Wiener filter provides useful
insight into the operation of the Wiener filter. In the frequency domain, the noisy
signal Y(f) is given by

Y( f ) X ( f ) N( f )

(60)

where X(f) and N(f) are the signal and noise spectra. For a signal observed in
additive random noise, the frequency-domain Wiener filter is obtained as

W( f )

PXX ( f )
PXX ( f ) PNN ( f )

(61)

where PXX(f) and PNN(f) are the signal and noise power spectra. Dividing the
numerator and the denominator of Equation (61) by the noise power spectra
PNN(f) and substituting the variable SNR(f)=PXX(f)/PNN(f) yields

W( f )

SNR( f )
SNR( f ) 1

(62)

where SNR is a signal-to-noise ratio measure. Note that the variable, SNR(f) is
expressed in terms of the power-spectral ratio, and not in the more usual terms of
log power ratio. Therefore SNR(f)=0 corresponds to dB.

26

From Fig.7, the following interpretation of the Wiener filter frequency response
W(f) in terms of the signal-to-noise ratio can be deduced. For additive noise, the
Wiener filter frequency response is a real positive number in the range 0 W(f)
1. Now consider the two limiting cases of (a) a noise-free signal SNR(f) =
and (b) an extremely noisy signal SNR(f)=0. At very high SNR, W (f)1, and the
filter applies little or no attenuation to the noise-free frequency component. At
the other extreme, when SNR(f)=0, W(f)=0. Therefore, for additive noise, the
Wiener filter attenuates each frequency component in proportion to an estimate
of the signal to noise ratio. Figure 6.4 shows the variation of the Wiener filter
response W(f), with the signal-to-noise ratio SNR(f).

Fig. 7 Variation of the gain of Wiener filter frequency response with SNR

An alternative illustration of the variations of the Wiener filter frequency


response with SNR(f) is shown in Fig8. It illustrates the similarity between the
Wiener filter frequency response and the signal spectrum for the case of an
additive white noise disturbance. Note that at a spectral peak of the signal
spectrum, where the SNR(f) is relatively high, the Wiener filter frequency
response is also high, and the filter applies little attenuation. At a signal trough,
the signal-to-noise ratio is low, and so is the Wiener filter response. Hence, for
additive white noise, the Wiener filter response broadly follows the signal
spectrum.

27

Fig. 8 Illustration of the variation of Wiener frequency response with signal spectrum
for additive white noise. The Wiener filter response broadly follow the signal
spectrum.

4.2

Wiener Channel Equalizer


Communication channel distortions may be modelled by a combination of a

linear filter and an additive random noise source as shown in Figure 9. The
input/output signals of a linear time invariant channel can be modelled as

P 1

y (m) hk x(m k ) n(m)

(63)

k 0

where x(m) and y(m) are the transmitted and received signals, [hk] is the impulse
response of a linear filter model of the channel, and n(m) models the channel
noise. In the frequency domain (63) becomes

Y ( f ) X ( f )H ( f ) N ( f )

(64)

where X(f), Y(f), H(f) and N(f) are the signal, noisy signal, channel and noise
spectra respectively. To remove the channel distortions, the receiver is followed
by an equalizer. The equalizer input is the distorted channel output, and the
28

desired signal is the channel input. It is easy to show that the Wiener equalizer in
the frequency domain is given by

PXX ( f ) H * ( f )
W( f )
PXX ( f ) | H ( f ) | 2 PNN ( f )

(65)

where it is assumed that the channel noise and the signal are uncorrelated. In the
absence of channel noise, PNN(f)=0, and the Wiener filter is simply the inverse
of the channel filter model W(f)=H1(f).

Fig. 9 Illustration of a channel model followed by an equalizer

4.3 Time-Alignment of Signals in Multichannel Systems


In multichannel/multisensor signal processing there are a number of noisy and
distorted versions of a signal x(m), and the objective is to use all the observations
in estimating x(m), as illustrated in Fig.10, where the phase and frequency
characteristics of each channel is modelled by a linear filter. As a simple example,
consider the problem of time-alignment of two noisy records of a signal given as

y1 (m) x(m) n1 (m)

(66)

y2 (m) Ax(m D) n2 (m)

(67)

where y1(m) and y2(m) are the noisy observations from channels 1 and 2, n1(m)
and n2(m) are uncorrelated noise in each channel, D is the time delay of arrival of
the two signals, and A is an amplitude scaling factor. Now assume that y1(m) is

29

used as the input to a Wiener filter and that, in the absence of the signal x(m),
y2(m) is used as the desired signal. The error signal is given by

P 1

e(m) y 2 (m) wk y1 (m)

(68)

k 0

P 1

P1

Ax(m D) wk x(m) wk n1 (m) n2 (m)


k 0

k 0

The Wiener filter strives to minimize the terms shown inside the square brackets
in (68). Using the Wiener-Holf equation, we have

w Ry11y2 ry1y 2 ( Rxx Rn1n1 ) 1 Arxx (D)

(69)

where rxx(D)=E [x(PD)x(m)]. The frequency-domain equivalent of (68) can be


derived as

W( f )

PXX ( f )
Ae jD
PXX ( f ) PNl Nl ( f )

(70)

Note that in the absence of noise, the Wiener filter becomes a pure phase (or a
pure delay) filter with a flat magnitude response.

Fig. 10 Configuration of a system for estimation of frequency Wiener filter


30

Implementation of Wiener Filters

Fig. 11 A filter-bank implementation of a Wiener filter

The implementation of a Wiener filter for additive noise reduction, using


Equations (59)(61), requires the autocorrelation functions, or equivalently the
power spectra, of the signal and noise. The noise power spectrum can be
obtained from the signal-inactive, noise-only, periods. The assumption is that the
noise is quasi-stationary, and that its power spectra remains relatively stationary
between the update periods. This is a reasonable assumption for many noisy
environments such as the noise inside a car emanating from the engine, aircraft
noise, office noise from computer machines, etc. The main practical problem in
the implementation of a Wiener filter is that the desired signal is often observed
in noise, and that the autocorrelation or power spectra of the desired signal are
not readily available. Fig.10 illustrates the block-diagram configuration of a
system for implementation of a Wiener filter for additive noise reduction. An
estimate of the desired signal power spectra is obtained by subtracting an
estimate of the noise spectra from that of the noisy signal. A filter bank
implementation of the Wiener filter is shown in Fig.11 where the incoming
signal is divided into N bands of frequencies. A first-order integrator, placed at
31

the output of each band-pass filter, gives an estimate of the power spectra of the
noisy signal. The power spectrum of the original signal is obtained by
subtracting an estimate of the noise power spectrum from the noisy signal. In a
Bayesian implementation of the Wiener filter, prior models of speech and noise,
such as hidden Markov models, are used to obtain the power spectra of speech
and noise required for calculation of the filter coefficients.

The Choice of Wiener Filter Order


The choice of Wiener filter order affects:
(a) the ability of the filter to remove distortions and reduce the noise;
(b) the computational complexity of the filter; and
(c) the numerical stability of the of the Wiener solution

The choice of the filter length also depends on the application and the method of
implementation of the Wiener filter. For example, in a filter-bank
implementation of the Wiener filter for additive noise reduction, the number of
filter coefficients is equal to the number of filter banks, and typically the number
of filter banks is between 16 to 64. On the other hand for many applications, a
direct implementation of the time-domain Wiener filter requires a larger filter
length say between 64 and 256 taps.

A reduction in the required length of a time-domain Wiener filter can be


achieved by dividing the time domain signal into N sub-band signals. Each
sub-band signal can then be decimated by a factor of N. The decimation results
in a reduction, by a factor of N, in the required length of each sub-band Wiener
filter.

32

Further Comments
The MSE defined in (2) uses the linear estimator y(n) defined in (1). If we relax

the linear constraint on the estimator and look for a function of the input, i.e., y(n) =
g(x(n)), the optimal estimator in mean square sense is given by the conditional
expectation E[d(n)|x(n)]. Its calculation requires knowledge of the joint distribution
between d(n) and x(n), and in general, it is a nonlinear function of x(n) (unless certain
symmetry conditions on the joint distribution are fulfilled, as it is the case for
Gaussian distributions). Moreover, once calculated it might be very hard to implement
it. For all these reasons, linear estimators are usually preferred (which as we have seen,
depend only on second order statistics).
On a historical note, Norbert Wiener solved a continuous-time prediction problem
under causality constraints by means of an elegant technique now known as the
Wiener-Hopf factorization technique. This is a much more complicated problem than
the one presented in (3). Later, Norman Levinson formulated the Wiener filter in
discrete time.

It should be noticed that the orthogonality principle used to derive the Wiener filter
does not apply to FIR filters only; it can be applied to IIR (infinite impulse response)
filtering, and even noncausal filtering. For the general case, the output of the noncausal
filter can be put as:

y ( n)

w x(n k )

(71)

Then, minimizing the mean square error leads to the Wiener-Hopf equations

r (k i ) rxd (k )

k
opt x

(72)

which can be solved using Z-transform methods. In addition, a general expression for
the minimum mean square error is
33

J MMSE rd (0)

k
opt xd

r (k )

(73)

From this general case, we can derive the FIR filter studied before (index i in the
summation and lag k in (72) go from 0 to L 1) and the causal IIR filter (index I in
the summation and lag k in (72) go from 0 to ). Finally we would like to comment
on the stationary of the processes. We assume the input and reference processes were
WSS. If this were not the case, the statistics would be time-dependent. However, we
could still find the Wiener filter at each time n as the one that makes the estimation
error orthogonal to the input, i.e., the principle of orthogonality still holds. A less
costly alternative would be to recalculate the filter for every block of N signal samples.
However, nearly two decades after Wieners work, Rudolf Kalman developed the
Kalman filter, which is the optimum mean square linear filter for non-stationary
processes (evolving under a certain state space model) and stationary ones
(converging in steady state to the Wieners solution).

References
1. A.H. Sayed, Adaptive Filters (John Wiley & Sons, Hoboken, 2008)
2. S. Haykin, Adaptive Filter Theory, 4th edn. (Prentice-Hall, Upper Saddle River, 2002)
3. G.H. Golub, C.F. van Loan, Matrix Computations (The John Hopkins University Press, Baltimore,
1996)
4. B.D.O. Anderson, J.B. Moore, Optimal Filtering (Prentice-Hall, Englewood Cliffs, 1979)
5. T. Kailath, A.H. Sayed, B. Hassibi, Linear estimation (Prentice-Hall, Upper Saddle River, 2000)

34

You might also like