Professional Documents
Culture Documents
Weiner Filter
R01943128
Darcy Tsai
Weiner Filter
Darcy Tsai ()
E-mailraulshepherd@access.ee.ntu.edu.tw
Graduate Institute of Electronics Engineering
Nation Taiwan University, Taipei, Taiwan, ROC
Abstract
Wiener theory, formulated by Norbert Wiener in 1940, forms the foundation of
data-dependent linear least square error filters. Wiener filters play a central role in a
wide range of applications such as linear prediction, echo cancellation, signal
restoration, channel equalization and system identification. The coefficients of a
Wiener filter are calculated to minimize the average squared distance between the
filter output and a desired signal. In its basic form, the Wiener theory assumes that the
signals are stationary processes. However, if the filter coefficients are periodically
recalculated for every block of N signal samples then the filter adapts itself to the
average characteristics of the signals within the blocks and becomes block-adaptive. A
block-adaptive (or segment adaptive) filter can be used for signals such as speech and
image that may be considered almost stationary over a relatively small block of
samples. In this chapter, we study Wiener filter theory, and consider alternative
methods of formulation of the Wiener filter problem. We consider the application of
Wiener filters in channel equalization, time-delay estimation and additive noise
reduction. A case study of the frequency response of a Wiener filter, for additive noise
reduction, provides useful insight into the operation of the filter. We also deal with
some implementation issues of Wiener filters.
Keywords
Weiner filter, Optimum linear filters, Minimum mean squared error (MMSE)
.
1
Content
1
2
Introduction ............................................................................................................ 3
Linear Optimum Filtering ...................................................................................... 4
2.1
Problem Statement ................................................................................. 4
2.2
Principle of Orthogonalaity.................................................................... 6
2.3
Minimum Mean Squared Error ............................................................ 11
Wiener-Hopf Filters ............................................................................................. 11
3.1
Wiener-Hopf Equations ....................................................................... 11
3.2
Matrix Formulation of the Wiener-Holf Equations ............................. 14
3.3
Error Performance Surface .................................................................. 15
3.4
Numeral Example ................................................................................ 17
Some Applications of Wiener Filters................................................................... 25
5
6
4.1
Wiener Filter for Additive Noise Reduction ........................................ 25
4.2
Wiener Channel Equalizer ................................................................... 28
4.3
Time-Alignment of Signals in Multichannel Systems ......................... 29
Implementation of Wiener Filters ........................................................................ 31
Further Comments ............................................................................................... 33
1 Introduction
Wiener filters are a class of optimum linear filters which involve linear
estimation of a desired signal sequence from another related sequence. In the
statistical approach to the solution of the linear filtering problem, we assume the
availability of certain statistical parameters (e.g. mean and correlation functions) of
the useful signal and unwanted additive noise. The problem is to design a linear filter
with the noisy data as input and the requirement of minimizing the effect of the noise
at the filter output according to some statistical criterion. A useful approach to this
filter-optimization problem is to minimize the mean-square value of the error signal
that is defined as the difference between some desired response and the actual filter
output. For stationary inputs, the resulting solution is commonly known as the Weiner
filter. Its main purpose is to reduce the amount of noise present in a signal by
comparison with an estimation of the desired noiseless signal.
Problem Statement
Consider the block diagram of Fig.1 built around a linear discrete-time filter.
The filter input consists of a time series x(0), x(1), x(2), , and the filter is itself
characterized by the impulse response w0, w1, w2, At some discrete time n, the
filter produces an output denote by y(n). This output is used to provide an estimate
of a desired response denoted by d(n). With the filter input and the desired response
representing single realizations of respective stochastic processes, the estimation is
accompanied by an error with statistical characteristics if its own. In particular, the
estimation error e(n) as small as possible in some statistical sense. Two
restriction s have so far been placed on the filter:
1. The filter is linear, which makes the mathematical analysis easy to handle.
2. The filter operates in discrete time, which makes it possible for the filter to
be implemented using digital hardware/software.
Input
x(0), x(1), x(2),
Conditions at time n
Output
Desired Signal
d(n)
y(n)
e(n)
Estimation
Error
The final details of the filter specification, however, depend on two other choices
that have to be made:
1. Whether the impulse response of the filter has finite or infinite duration.
2. The type of statistical criterion used for the optimization.
For the initial developed includes that for FIR filters as a special case.
However, for much of the material presented in this tutorial, we will confine our
attention to the use if FIR filters. We do so for the following reason. An FIR filter
is inherently stable, because its structure involves the use of forward paths only. In
others words, the only mechanism for input-output interaction in the filter us via
forward paths from the filter input to its output. Indeed, it is this form of signal
transmission through the filter that limits its impulse response to a finite duration.
On the other hand, an IIR filter involves both feedforward and feedback. The
presence of feedback means that portions of the filter output and possibly other
internal variables in the filter are fed back to the input. Consequently, unless it is
properly designed, feedback in the filter can indeed make it unstable with the
result that the filter oscillates; this kind of operation is clearly unacceptable when
the requirement is that of filtering for which stability is a must. By itself, the
stability problem in IIR filters us manageable in both theoretical and practical
terms. However, when the filter us required to be adaptive, bringing with it
stability problems of its own, the inclusion of the adaptivity combined with
feedback that is inherently present in an IIR filter makes a difficult problem that
much more difficult to handle. It is for this reason that we find that in the majority
if applications requiring the use if adaptivity, the use if an FIR filter is preferred
over IIR filter even through the latter is less demanding in computational
requirements.
Option 1 has a clear advantage over the other two, because it leads to tractable
mathematics. In particular, the choice of the mean-square error criterion results in
a second order dependence for the cost function on the unknown coefficients in
the impulse response of the filter. Moreover, the cost function has a distinct
minimum that uniquely defines the optimum statistical design of the filter.
We may now summarize the essence of the filtering problem it making the
following statement:
2.2
Principle of Orthogonality
w, X (n) R L
and
n 0,1,2,......
(1)
k 0
(2)
where E[] is the expectation operator and e(n) is the estimation error. Then, the
estimation problem can be seen as finding the vector w that minimizes the cost
function JMSE(w). The solution to this problem is sometimes called the stochastic
least squares solution. If we choose the MSE cost function (2), the optimal
solution to the linear estimation problem can be presented as:
(3)
(4)
As this is a quadratic form, the optimal solution will be at the point where the
cost function has zero gradient, i.e.,
w | J MSE ( w)
J MSE
0 Lx1
w
(5)
or in other words, the partial derivative of JMSE with respect to each coefficient
wk should be zero. Under this set of conditions the filter is said to be optimum in
the mean-squared-error sense. Using (1) in (2), we can compute the gradient as:
J MSE
e(n)
2 E[e(n)
] 2 E[e(n) X (n)]
w
w
(6)
(7)
or equivalently
k 0,1,2,...., L 1
(8)
This is called the principle of orthogonality, and it implies that the optimal
condition is achieved if and only if the error e(n) is decorrelated from the
samples x(nk), k = 0,1,..., L1. Actually, the error will also be decorrelated
from the estimate y(n) since:
E[emin (n) yopt (n)] E[emin (n)wT opt (n) X (n)] wT opt E[emin (n) X (n)] 0
(9)
Fig.2
yopt(n)
2.3
(10)
d ( n) y opt ( n) em i (nn)
(11)
J MSE E[| em in ( n) |2 ]
(12)
Hence, evaluating the mean-square values of both sides of (11), and applying to it
the corollary to the principle of orthogonality described by (9), we get:
d2 y2 J M S E
o p t
(13)
J M S E d2 y2o p t
(14)
This relation shows that for the optimum filter, the MMSE equals the difference
between the variance of the desired response and the variance of the estimate that
the filter products at its output.
y
JM S E
1 opt
2
2
2
(15)
Clearly, this is possible because 2 is never zero, expect in the trivial case of a
desired response d(n) that is zero for all n. Let
J MSE
(16)
d2
y2
o p t
d2
(16)
10
We note that the ratio can never be negative, and the ratio
y2
opt
is always
d2
0 1
(17)
If is zero, the optimum filter operates perfectly in the sense that there is
complete agreement between the estimate yopt(n) at the filter output and the
desired response d(n). On the other hand, if is unity, there is no agreement
whatsoever between these two quantities; this corresponds to the worst possible
situation.
Wiener-Hopf Filter
3.1
Wiener-Hopf Equation
Consider a signal x(n) as the input to a finite impulse response (FIR) filter
(18)
with X(n) = [x(n), x(n-1), , x(n-L+1)]T. As the output of the filter is observed,
it can be corrupted by an additive measurement noise v(n), leading to a linear
regression model for the observed output
(19)
FIR Filter
X(n)
Z-1
w0(n)
Z-1
w1(n)
w2(n)
wL-1(n)
y(n)
d(n)
e(n)
It should be noticed that this linear regression model can also be used even if the
input-output relation of the given data pairs [x(n), d(n)] is nonlinear, with wT
being a linear approximation to the actual relation between them. In that case, in
v(n) there would be a component associated to the additive noise perturbations,
but also another one representing, for example, modeling errors.
T
E[em i (nn) X (n)] Ed (n) wopt
X (n) 0Lx1
(20)
From (20) we can conclude that given the signals x(n) and d(n), we can always
assume that d(n) was generated by the linear regression model (19). To do this,
the system wT would be equal to the optimal filter wopt, while v(n) would be
associated to the residual error emin(n), which will be uncorrelated to the input
x(n).
It should be noticed that (8) is not just a condition for the cost function to
reach its minimum, but also a mean for testing whether a linear filter is operating
in the optimal condition. Here, the principle of orthogonality illustrated in Fig. 2
can be interpreted as follows: at time n the input vector X(n) = [x(n), x(n-1)]T
will pass through the optimal filter wopt = [wopt,0 , wopt,1]T to generate the output
yopt(n). Given d(n), yopt(n) is the only element in the space spanned by x(n) that
leads to an error e(n) that is orthogonal to x(n), x(n 1), and yopt(n).
Now we focus on the computation of the optimal solution. From (20), we
have:
12
(21)
(22)
for the input autocorrelation matrix and the cross correlation vector, respectively.
The expectation
L 1
k 0,1,2,......
i 0
is equal to the autocorrelation function of the filter input for a lag of i-k. We may
thus express this expectation as
2.
The expectation
k 0,1,2,......
is equal to the cross-correlation between the filter input x(n-k) and the desired
response d(n) for a lag of k. We may thus express this second expectation as
(24)
13
w
i 0
i
opt
r (i k ) p (k )
k 0,1,2...
(25)
The system of equations (25) defines the optimum filter coefficients, in the most
general setting, in terms of two correlation functions: the autocorrelation function
of the filter input, and the cross-correlation between the filter input and the
desired response. These equations are called WienerHopf equations.
3.2
r (1)
r ( L 1)
r (0)
r (1)
r (0)
r ( L 2)
RX
r (0)
r ( L 1) r ( L 2)
Correspondingly, let rXd denote the L-by-1 cross0correlation vector between the
tap inputs of the filter and the desired response d(n). In expanded form, we have
Note that as the joint process is WSS, the matrix Rx is symmetric, semi-positive
definite and Toeplitz. Using these definitions, equation (25) can be put as:
R X wopt rXd
(26)
14
wopt RX1rXd
(27)
which is known as the Wiener filter. An alternative way to find it is the following.
3.3
T
J MSE ( w) E[| d (n) |2 ] 2rXd
w wT R X w
(28)
T
T
wT R X w 2rXd
w ( R X w rXd )T R X1 ( R X w rXd ) rXd
R X1rXd
(29)
T
J MSE ( w) E[| d (n) |2 ] rXd
R X1rXd ( w R X1rXd ) T R X ( w R X1rXd )
(30)
Using the fact that Rx is positive definite (and therefore, so is its inverse), it turns
out that the cost function reaches its minimum when the filter takes the form of
(30), i.e., the Wiener filter. The minimum MSE value (MMSE) on the surface (30)
is:
15
T
J MMSE (w) J MMSE (wopt ) E[| d (n) |2 ] rXd
RX1rXd E[| d (n) |2 ] E[| yopt (n) |2 ] (31)
( )
( )
[| ( )|2 ]. This is
reasonable since nothing can be done with the filter w if the input signal carries
no information about the reference signal (as they are orthogonal). Actually, (28)
shows that in this case, if any of the filter coefficients is nonzero, the MSE would
be increased by the term
if the reference signal is generated by passing the input signal through a system
wT as in (19), with the noise v(n) being uncorrelated from the input x(n), the
optimal filter will be:
(32)
This means that the Wiener solution will be able to identify the system wT with a
resulting error given by v(n). Therefore, in this case
[| ( )|2 ]
R X QQ T
(33)
16
~ w w
w
opt
(34)
~
u QT w
(35)
Using (27)(30)(31)(33)(34)(35)
J M S (Ew) J M M S E u T u
(36)
This is called the canonical form of the quadratic form J MSE(w) and it contains no
cross-product terms. Since the eigenvalues are non-negative, it is clear that the
surface describes an elliptic hyperparaboloid, with the eigenvectors being the
principal axes of the hyperellipses of constant MSE value.
3.4
Numeral Example
To illustrate the filtering theory developed above, we consider the example
to the input of an
H1 ( z )
1
1 0.8458z 1
(37)
The process d(n) is applied to a communication channel modeled by the all pole
transfer function
H 2 ( z)
1
1 0.9 4 5z81
(38)
u ( n) x ( n) v 2 ( n)
(39)
The white-noise processes v1(n) and v2(n) are uncorrelated. It is also assumed that
d(n) and u(n), and therefore v1(n) and v2(n), are all real valued.
v1(n)
d(n)
Z-1
d(n-1)
0.8458
(a)
v2(n)
x(n)
d(n)
u(n)
Z-1
0.9458
x(n-1)
(b)
Fig. 4 (a) Autoregressive model of desired response d(n) (b) model of noisy
communication channel
(40)
18
12
0.9486
1 12
2
d
(41)
The process d(n) acts as input to the channel. Hence, form Fig.4, we find that the
channel output x(n) is related to the channel input d(n) by the first-order
difference equation
(42)
where b1 = -0.9458. We also observe from the two parts of Fig.4 that the channel
output x(n) may be generated by applying the white noise process v1(n) to a
second-order all=pole filter whose transfer function equals
H ( z ) H1 ( z ) H 2 ( z )
(43)
(44)
where a1 = -0.1 and a2= -0.8. Note that both AR process d(n) and x(n) are WSS.
To characterize the Wiener filter, we need to solve the Wiener-Holf equations
(26). This set of equations requires knowledge of two quantities: (1) the
correlation matrix R pertaining to received signal u(n), and (2) the
cross-correlation vector rXd between u(n) and the desired response d(n). In our
19
example, R is a 2-by-2matrix and rXd is a 2-by-1 vector, since the FIR filter used
to implement the Wiener filter is assumed to have two taps.
The received signal u(n) consists if the channel output x(n) plus the additive
white noise v2(n). Since the process x(n) and v2(n) are uncorrelated, it follows that
the correlation matrix R equals the correlation matrix of x(n) plus the correlation
matrix if v2(n). That is,
R RX R2
(45)
r (0) rx (1)
RX x
rx (1) rx (0)
Where rx(0) and rx(1) are the autocorrelation functions of the received signal
x(n)for lags of 1 and 1,repectively. We have:
rX (0) x2 1
Hence,
1 0.5
RX
0.5 1
(46)
Next we observe that since v2(n) is a white-noise process of zero mean and
variance 22 0.1 , the 2-by-2 correlation matrix R2 of this process results
0.1 0
R2
0 0.1
(47)
20
Thus, substituting (46) (47) in (45) we find that the 2-by-2 correlation matrix of
the received signal x(n) equals
1.1 0.5
R
0.5 1.1
(48)
p(0)
rXd
p(1)
where p(0) and p(-1) are the cross-correlation functions between d(n) and u(n) for
lags of 0 and -1, respectively. Since these two processes are real valued, we have
k 0,1
(49)
Substituting (42) in (49), and recognizing that the channel output x(n) is
uncorrelated with the white-noise process v2(n), we get
p(k ) rx (k ) b1rx (k 1)
k 0,1
(50)
Putting b1= -0.9458 and using the element values fit the correlation matrix R,
given in (46), we obtain
Hence,
21
0.5272
rXd
0.4458
(51)
Error-Performance Surface
The dependence of the mean-squared error on the 2-by-1tap-weight vector w is
defined by (28). Hence, substituting (41) (48) and (51) into (28), we get
w
1.1 0.5 w0
J ( w0 , w1 ) 0.9486 2[0.5272,0.4458] 0 [ w0 , w1 ]
0.5 1.1 w1
w1
0.9486 1.0544 w0 0.8916 w1 w0 w1 1.1( w02 w12 )
Fig.6 shows contour plots of the tap weight w1 versus w0 for varying valuesof
the mean-squared error J. We see that the locus of w1 versus w0 for a fixed J is
in the form of an ellipse. The elliptical locus shrinks in size as the mean-squared
error J approaches the minimum value Jmin. For J = Jmin, the locus reduces to a
point with coordinates wo0 and wo1.
Wiener Filter
The 2-by-1 optimum tap-weight vector wo of the Wiener filter is defined by
(27). In particular, it consists of the inverse matrix R-1 multiplied by the
cross-correlation vector rXd . Inverting the correlation matrix R of (48), we get
22
Fig. 5 Error performance surface of the two-tap FIR filter described in the numerical
example
r (0) r (1)
r (0) r (1) 1.1456 0.5208
1
R
2
r
(
1
)
Hence, substituting (51) (52) into (27), we get the desired result:
(53)
0.8360
J MMSE 0.9486 [0.5272,0.4458]
0.1579
0.7853
(54)
The point represented jointly by the optimum tap-weight vector wo of (53) and the
minimum mean-squared error of (54) defines the bottom of the error-performance
surface in Fig.5, or the center of the contour plots in Fig.6.
(1.1 ) 2 (0.5) 2 0
1 1.6
2 0.6
24
(55)
The locus of u2 versus u1, as defined in (55), traces an ellipse for a fixed value of
JMSE - JMMESE. In particular, the ellipse has a minor axis of [( J MSE J MMSE ) / 1 ]1 / 2
along the u1 coordinate and a major axis of [( J MSE J MMSE ) / 2 ]1 / 2 along the u2
coordinate; this assumes that 1 2 , which is how they are related.
4.1
Consider a signal x(m) observed in a broadband additive noise n(m)., and model
as:
(56)
Assuming that the signal and the noise are uncorrelated, it follows that the
autocorrelation matrix of the noisy signal is the sum of the autocorrelation matrix
of the signal x(m) and the noise n(m):
R yy R xx Rnn
(57)
rxy rxx
(58)
25
where Ryy, Rxx and Rnn are the autocorrelation matrices of the noisy signal, the
noise-free signal and the noise respectively, and rxy is the cross-correlation vector
of the noisy signal and the noise-free signal. Substitution of (57) and (58) in the
Wiener filter, yields
w ( R xx Rnn ) 1 rxx
(59)
(59) is the optimal linear filter for the removal of additive noise. In the
following, a study of the frequency response of the Wiener filter provides useful
insight into the operation of the Wiener filter. In the frequency domain, the noisy
signal Y(f) is given by
Y( f ) X ( f ) N( f )
(60)
where X(f) and N(f) are the signal and noise spectra. For a signal observed in
additive random noise, the frequency-domain Wiener filter is obtained as
W( f )
PXX ( f )
PXX ( f ) PNN ( f )
(61)
where PXX(f) and PNN(f) are the signal and noise power spectra. Dividing the
numerator and the denominator of Equation (61) by the noise power spectra
PNN(f) and substituting the variable SNR(f)=PXX(f)/PNN(f) yields
W( f )
SNR( f )
SNR( f ) 1
(62)
where SNR is a signal-to-noise ratio measure. Note that the variable, SNR(f) is
expressed in terms of the power-spectral ratio, and not in the more usual terms of
log power ratio. Therefore SNR(f)=0 corresponds to dB.
26
From Fig.7, the following interpretation of the Wiener filter frequency response
W(f) in terms of the signal-to-noise ratio can be deduced. For additive noise, the
Wiener filter frequency response is a real positive number in the range 0 W(f)
1. Now consider the two limiting cases of (a) a noise-free signal SNR(f) =
and (b) an extremely noisy signal SNR(f)=0. At very high SNR, W (f)1, and the
filter applies little or no attenuation to the noise-free frequency component. At
the other extreme, when SNR(f)=0, W(f)=0. Therefore, for additive noise, the
Wiener filter attenuates each frequency component in proportion to an estimate
of the signal to noise ratio. Figure 6.4 shows the variation of the Wiener filter
response W(f), with the signal-to-noise ratio SNR(f).
Fig. 7 Variation of the gain of Wiener filter frequency response with SNR
27
Fig. 8 Illustration of the variation of Wiener frequency response with signal spectrum
for additive white noise. The Wiener filter response broadly follow the signal
spectrum.
4.2
linear filter and an additive random noise source as shown in Figure 9. The
input/output signals of a linear time invariant channel can be modelled as
P 1
(63)
k 0
where x(m) and y(m) are the transmitted and received signals, [hk] is the impulse
response of a linear filter model of the channel, and n(m) models the channel
noise. In the frequency domain (63) becomes
Y ( f ) X ( f )H ( f ) N ( f )
(64)
where X(f), Y(f), H(f) and N(f) are the signal, noisy signal, channel and noise
spectra respectively. To remove the channel distortions, the receiver is followed
by an equalizer. The equalizer input is the distorted channel output, and the
28
desired signal is the channel input. It is easy to show that the Wiener equalizer in
the frequency domain is given by
PXX ( f ) H * ( f )
W( f )
PXX ( f ) | H ( f ) | 2 PNN ( f )
(65)
where it is assumed that the channel noise and the signal are uncorrelated. In the
absence of channel noise, PNN(f)=0, and the Wiener filter is simply the inverse
of the channel filter model W(f)=H1(f).
(66)
(67)
where y1(m) and y2(m) are the noisy observations from channels 1 and 2, n1(m)
and n2(m) are uncorrelated noise in each channel, D is the time delay of arrival of
the two signals, and A is an amplitude scaling factor. Now assume that y1(m) is
29
used as the input to a Wiener filter and that, in the absence of the signal x(m),
y2(m) is used as the desired signal. The error signal is given by
P 1
(68)
k 0
P 1
P1
k 0
The Wiener filter strives to minimize the terms shown inside the square brackets
in (68). Using the Wiener-Holf equation, we have
(69)
W( f )
PXX ( f )
Ae jD
PXX ( f ) PNl Nl ( f )
(70)
Note that in the absence of noise, the Wiener filter becomes a pure phase (or a
pure delay) filter with a flat magnitude response.
the output of each band-pass filter, gives an estimate of the power spectra of the
noisy signal. The power spectrum of the original signal is obtained by
subtracting an estimate of the noise power spectrum from the noisy signal. In a
Bayesian implementation of the Wiener filter, prior models of speech and noise,
such as hidden Markov models, are used to obtain the power spectra of speech
and noise required for calculation of the filter coefficients.
The choice of the filter length also depends on the application and the method of
implementation of the Wiener filter. For example, in a filter-bank
implementation of the Wiener filter for additive noise reduction, the number of
filter coefficients is equal to the number of filter banks, and typically the number
of filter banks is between 16 to 64. On the other hand for many applications, a
direct implementation of the time-domain Wiener filter requires a larger filter
length say between 64 and 256 taps.
32
Further Comments
The MSE defined in (2) uses the linear estimator y(n) defined in (1). If we relax
the linear constraint on the estimator and look for a function of the input, i.e., y(n) =
g(x(n)), the optimal estimator in mean square sense is given by the conditional
expectation E[d(n)|x(n)]. Its calculation requires knowledge of the joint distribution
between d(n) and x(n), and in general, it is a nonlinear function of x(n) (unless certain
symmetry conditions on the joint distribution are fulfilled, as it is the case for
Gaussian distributions). Moreover, once calculated it might be very hard to implement
it. For all these reasons, linear estimators are usually preferred (which as we have seen,
depend only on second order statistics).
On a historical note, Norbert Wiener solved a continuous-time prediction problem
under causality constraints by means of an elegant technique now known as the
Wiener-Hopf factorization technique. This is a much more complicated problem than
the one presented in (3). Later, Norman Levinson formulated the Wiener filter in
discrete time.
It should be noticed that the orthogonality principle used to derive the Wiener filter
does not apply to FIR filters only; it can be applied to IIR (infinite impulse response)
filtering, and even noncausal filtering. For the general case, the output of the noncausal
filter can be put as:
y ( n)
w x(n k )
(71)
Then, minimizing the mean square error leads to the Wiener-Hopf equations
r (k i ) rxd (k )
k
opt x
(72)
which can be solved using Z-transform methods. In addition, a general expression for
the minimum mean square error is
33
J MMSE rd (0)
k
opt xd
r (k )
(73)
From this general case, we can derive the FIR filter studied before (index i in the
summation and lag k in (72) go from 0 to L 1) and the causal IIR filter (index I in
the summation and lag k in (72) go from 0 to ). Finally we would like to comment
on the stationary of the processes. We assume the input and reference processes were
WSS. If this were not the case, the statistics would be time-dependent. However, we
could still find the Wiener filter at each time n as the one that makes the estimation
error orthogonal to the input, i.e., the principle of orthogonality still holds. A less
costly alternative would be to recalculate the filter for every block of N signal samples.
However, nearly two decades after Wieners work, Rudolf Kalman developed the
Kalman filter, which is the optimum mean square linear filter for non-stationary
processes (evolving under a certain state space model) and stationary ones
(converging in steady state to the Wieners solution).
References
1. A.H. Sayed, Adaptive Filters (John Wiley & Sons, Hoboken, 2008)
2. S. Haykin, Adaptive Filter Theory, 4th edn. (Prentice-Hall, Upper Saddle River, 2002)
3. G.H. Golub, C.F. van Loan, Matrix Computations (The John Hopkins University Press, Baltimore,
1996)
4. B.D.O. Anderson, J.B. Moore, Optimal Filtering (Prentice-Hall, Englewood Cliffs, 1979)
5. T. Kailath, A.H. Sayed, B. Hassibi, Linear estimation (Prentice-Hall, Upper Saddle River, 2000)
34