You are on page 1of 9

Design and Implementation of an Adaptive FIR Filter

Based on Delayed Error LMS Algorithm


Yen-Tai Lai, Chi-Chou Kao, and Hao-Jan Chen
Department of Electrical Engineering
National Cheng Kung University
Tainan 70 1. Taiwan

Abstract - Adaptive filtering techniques are widely used in the fields of signal
processing and communication such as echo/noise cancellation and speech/image
coding. Adaptive filters usually need real time ability to process signal. This paper
presents a high speed and flexible VLSI architecture. This filter is the digital adaptive
finite impulse response (FIR) filter based on the delayed error least mean square
(DELMS) algorithm. The architecture has hardware utilization efficiency (HUE) of
loo%, and we can easily scale the filter without reducing the throughput rate. The
timing simulation results demonstrate the effectiveness of the architecture. We have
used 0.6 p CMOS SPTM standard cells technology to implement the chip.

INTRODUCTION

Adaptive filters have become increasingly popular due to their intelligent


nature of processing signal. They are widely used in various fields such as echo or
noise cancellation in communications. An adaptive system is a time-varying
signal processing system as shown in Figure 1. The adaptive filter is designed
such that its output approximates a desired signal. The coefficients of an adaptive
filter are adjusted according to the input and the error between the desired signal
and the filter output. The coefficients are repeatedly updated with an adaptive
algorithm.

Usually, applications of adaptive system have long filter length and high
signals clock rate, The systolic architecture provides high throughput. So we
design an adaptive system with systolic architecture based on the Delay Error
LMS algorithm.

The rest of this paper is organized as follows: We describe the algorithm of


digital adaptive filters in Section 2. Section 3 presents the architecture of the
adaptive FIR filter. Experimental results are shown in Section 4. We conclude the
paper in Section 5.

ADAPTIVE ALGORITHM

An adaptive algorithm known as the least mean square (LMS) algorithm


uses the statistical properties of the data signals. The objective of this method is to
minimize the mean square error. The lease mean square (LMS) [ 13 is widely used

. 0-7803-5650-0/99/$10.000 1999 IEEE 704


'
in the adaptive algorithm because of its simplicity in structure and its robustness
for numerical analysis. A general formulation of the LMS algorithm for an N-tap
adaptive filter can be described by the following equations:

y ( n )= c
N-1

k=O
w, ( n ) x ( n - k) = w (n)X(n)

w,(n+l) = w,( n ) +e(n)x(n-k),

osk Gv-1,

where n is the time index, w,(n) is the coefficient, x(n) is the input sample, y(n) is
the output, p(n) is the desired signal, e(n) is the estimation error at time n, W(n)=
[w,(n) wl(n)..... wN.l(n)]Tis the coefficient vector, X(n) = [x(n) x(n-1). ....
x( n - N + 1)IT is the input vector, the superscript T denotes vector
transposition, N is the filter length.

The LMS adaptive filter contains feedback operations such that the current
recursion must be completed before the next recursion is initiated. This
computation cannot be pipelined. This characteristic of recurrent computation
prevents the system fiom being used for high-speed real time application. This is
called the latency problem.

To overcome the latency problem, the algorithm has therefore been modified
to enable its pipelined implementation.The modified algorithm is named Delayed
LMS [ 2 ] and can be described as follows:

w,(n+l) = w, (n) +pe(n-M)x(n-k-M),

OIkI N-1,

where N is the number of filter taps, and U , is the step size whic.. governs the
stability and the rate of convergence, and A4 is the delay-units inserted in the
coefficient adaptation block. The delayed LMS (DLMS) algorithm solves the
latency problem by using the delayed input and residual error signal to update the
filter coefficient. One major disadvantage of this approach, however, is that it has
a worse convergenceperformance than the LMS algorithm.

The delay enforced on the input sample serves only to align it in time with the
delayed error samples in the DLMS coefficient-updating rule; it doesn't
contribute to increase the sample rate. Accordingly, J. Thomas [3] presents a

705
Delayed Error Least Mean Square (DELMS) adaptation rule that can be
described as follows:

w,(n+l) = w, ( n ) +pe(n-M)x(n-k) ,

OIkIN-1,

where M is the error delay in the error feedback path. This rule proposes a
possible realization that convolves delayed error samples with underlayed input
samples. If the delay element unitD I (k- I ) , the DELMS algorithm has better
convergence behavior than that which can be obtained with the DLMS technique.
Therefore DELMS is a proper choice for pipeline architecture.

ARCHITECTURE FOR THE DELMS FIR FILTER

According to J. Thomas [SI, the signal flow graph (SFG) for N-tap DELMS
adaptive FIR filters can be divided into two blocks: the tap coefficients adaptation
block and the FIR filter output block. The output block computes the summation
of the products of multiplication of w, ( n ) x(n-k) , OIk IN-1. Fig. 2 is the signal
flow graph (SFG) for N-tap DELMS adaptive FIR filters.

We modify the architecture by cut-set re-timing transformation technique [4].


A cut-set r is a scope from the left bound between tap r and tap r+l to the right of
the last tap. When we insert a delay in one direction of the cut set, a delay must be
subtracted in the opposite direction. We thus replace the delay element unit D by
2d. We also move one of the two registers on the data input edges of the cut-set
and one of the (N+l) registers on the output edge of the step size multiplier to the
outputs of the cut-set in each cut-set and the right of the last tap. To maintain the
fhction e(n) = p ( n ) - y ( n ) in the original error computation block, we
insert a delay unit into the desired signal input. Because the input signal used in
the coefficient adaptation block and the FIR filter output block are the same, we
can combine the input signal path in the coefficient adaptation block into the filter
output block. After modification, we obtain a systolic architecture for the DELMS
adaptive FIR filter. Fig. 3 is this systolic architecture. The circles represent the
multiply-accumulate (MAC) module.

Since there are two time scaling factors in the architecture, the utilization of
the circuit in the systolic is only 50%. To restore the 100% efficiency of the
systolic array, we fold the upper MAC operation modules in the coefficient
adaptation block into the lower MAC operation modules in the filter output block
for each stage of the adaptive filter. The folded architecture is shown in Fig. 4. We
add a multiplexer into the output of the filter to obtain the valid output sample
signal in the' folded architecture. The system consists of N identical processing
elements (PES) that are connected in cascade. Each processing element performs

706
all of the computations associated with a single coefficient of the adaptive filter.
The proposed system thus provides a significant computational speed up over the
single processor LMS filter. Fig. 4 shows the folded systolic array for the DELMS
algorithm. Fig. 5 shows the processing elements (PES) shown in Fig. 4.

EXPERIMENTAL RESULTS

We used Verilog-XL Tool to run the timing simulation and 0.6 P CMOS
SPTM standard cells technology to implement the chip. The simulation results
demonstrated the effectiveness of the architecture in Figure 3. We set the tap
length, N , of the DELMS adaptive filter as N = 5. The adaptation delay MD =
(N+1) D / 2 = 3 0 was adopted in the systolic architecture. The main features of
this chip are summarized in Table 1. The whole chip layout is shown in Figure 6.

CONCLUSIONS

In this paper, we have presented a pipelined implementation of the adaptive


FIR filter based on the Delayed Error LMS (DELMS) algorithm. The properties
of the new architecture are a systolic computation, scalability, high throughput,
and high sampling rate.

References

B. Widrow and S. D. Steams, Adaptive Signal Processing, Englewood Cliffs,


NJ, Prentice-Hall, 1985.
G. Long, F. Ling, and J. G. Proakis, “The LMS Algorithm with Delayed
Coefficient Adaptation,” IEEE Trans. Acoust., Speech, Signal Processing,
vol. 37, no. 9, pp.1397-1405, Sept. 1989.
J. Thomas, “A Delayed Error Least Mean Squares Adaptive Filtering
Algorithm and its Performance Analysis,” Proceedings of the IEEE Digital
Signal Processing Workshop, 1996.
C. E. Leiserson, F. Rose, and J. B. Saxe, “Optimizing synchronous circuitry
by retiming,” Proceedings of the 3rd Caltech Conference on VLSI, Pasatena,
CA, March 1983.
J. Thomas, “Pipelined Systolic Architectures for DLMS Adaptive Filtering,”
Journal of VLSI Signal Processing, v01.12, no.3, June 1996.
S. Haykin, Adaptive Filter Theory, Englewood Cliffs, NJ: Prentice-Hall,
Second Edition, 1991.
T. H.-Y, Meng and D. G. Messerschmitt, “Arbitrarily high sampling rate
adaptive filters,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 35,
no. 4, pp. 455-470, Apr. 1987.

707
[8] H. Herzberg, R. Haimi-Cohen, and Y. Be’ery, “A systolic array realization of
an LMS adaptive filter and the effects of delayed adaptation,” IEEE Trans.
Signal Processing, vol. 40, no. 11, pp.2799-2803, Nov. 1992.
[9] G. Long, E Ling, and J. G. Proakis, “Corrections to “The LMS algorithm
with delayed coefficient adaptation,” IEEE Trans. Acoust., Speech, Signal
Processing, vol. 40, no. 1, pp. 230-232, Jan. 1992.

708
x( n): input sequence
y ( n): output
p ( n): desired signal
e: error between the desired signal
and the output

Adaption
Algorithm

Fig. 1 Adaptive system diagram.

Coefficient adaptation block


J

x(n): the filter input signal.


y (n): the filter output signal.
p (n): the desired filter output signal.
e (n): the estimation error signal betweenp (n) andy (n).
w,(n) wl(n)..... wN-/(n): the N-tap coefficients at time n.
D:the delay element unit.
M: the number of delayed units by which the error is delayed to
update the coefficients.
, : the step size parameter of the coefficient adaptation.
U

Fig. 2 The SFG for the DELMS adaptive FIR filter

709
-p(l) -

Legend:

x(n): the reference signal.


p(n):the desired signal.
y(n):the output signal of the adaptive filter.

Fig. 3 A systolic array for the DELMS adaptive FIR filter

Fig. 4 The folded systolic array for DELMS algorithm

710
U

Fig. 5 The i th processing element (PE) for the pipelined DELMS adaptive filter in
Figure 4

Fig. 6 Chip Layout View

71 1
Die size 1720 X 1720 Iim2
Total area with 3 120 X 312OPrn2
Pin count 40
Transistor gate 6950
Clock rate 23Mhz

Table 1: Chip Specification

712

You might also like