You are on page 1of 8

CHAPTER 3

DISTRIBUTED ARITHMETIC FIR FILTER

Distributed Arithmetic(DA) is named so because the arithmetic operations that


appear in

signal processing(addition, multiplication etc) are not lumped in a

comfortably familiar fashion but are distributed in an often unrecognizable fashion.


The most often encountered forms of computation in digital signal processing are
calculation of sum of products or vector dot product or inner product or multiply and
accumulate (MAC).MAC operation is very common in all Digital Signal Processing
Algorithms. Hence distributed arithmetic is a bit level rearrangement of a multiply
accumulate to hide the multiplications. It is a powerful technique for reducing the
size of a parallel hardware multiply-accumulate that is well suited to FPGA
designs. It can also be extended to other sum functions such as complex multiplies,
Fourier transforms and so on. The advantages of DA are best exploited in data-path
circuit designing. Area savings from using DA can be up to 80% and seldom less
than 50% in digital signal processing hardware designs. It is an older technique that
has been revived by the wide spread use of FPGA and DSP.
One possible multiplier less filter approach is distributed arithmetic (DA) which
uses pre-calculated sub results stored in a look-up-table (LUT) to eliminate the need
of multiplier in the filter. The following chapter will describe the principle of DA
approach and how it can be used to implement FIR filter.
3.1

DISTRIBUTED ARITHMETIC TECHNIQUE


DA is expected to have several advantages compared to normal multiplier-

accumulate structures, e.g. high throughput and smaller energy consumption. The
energy consumption is reduced as the switching activity introduced by memoryfetches is also reduced. The principle of DA takes an outset in the general
description of a FIR filter as a sum of products:
y=

h[k]x[k]

12

(3.1)

If the coefficients h[k] are known a priori, then the partial product term h[k] x[k]
becomes a multiplication with a constant. This characteristic makes it possible the
use of the Distributed Arithmetic Technique.
To understand this paradigm, lets start by unfolding the FIR Equation:
h[k]x[k] = h[0]x[0] + h[1]x[1] + h[2]x[2] + . . +h[N 1]x[N 1]

y=

where
Then

[ ]

[ ]

(h[k]

= [0](x
[1](x

(3.2)

X [k] 2 )

[ 0] 2

(3.3)

[ 0] 2

+ x

+ . +x [0]2 ) +

[1]2

+ x

[1]2

+ . +x [1]2 ) +

[ 1] 2

+ x

[ 1] 2

.
.
.
[ 1](x

+ . +x [ 1]2 ) (3.4)

Now, the summation is redistributed as follows:


= ([ 0]
= ([ 0]

[ 0] + [ 1]
[ 0] + [ 1]

[ 1] + + [ 1]
[ 1] + + [ 1]

[ 1]) 2
[ 1]) 2

= ([0] [0] + [1] [1] + + [ 1] [ 1]) 2

f(h[k], x [k]) =

(2

+
+ . . ..
+

h[k] x [k])

(2 f( h[k] x [k]))

[ ] x [k] ,

13

[0]

[ ]

[1] .

[ 1] ]

(3.5)

3.1.1 SIGNED DA SYTEM


2s complement: MSB tells the sign. Thus, we use the following (B+1)-bit
representation,

[ ] = 2

[ ]+

X [k] 2

The inner product y, now becomes,

([ ] (2

[ ]+

X [k] 2 ))

Using a similar procedure as in the previous case, the inner product results,

h[k]x [k] +

y = 2

= 2

([[ ],

[ ]) +

Preferred implementation of ( [ ],
to accept an N-bit input vector

= [

(2

h[k]x

[ ])

(2 f( h[k], x [k]))

[ ] ): A 2 word LUT is pre programmed


[ 0]

[ ]

[1] .

[ 1] to produce the

output f(h[k], x [k]). Then, each f(h[k], x [k]) is weighted by 2 and finally all of them
are accumulated.

Figure 3.1 2 Word LUT

3.2

FIR FILTER RELIZATION USING DA


The DA of FIR filter consists of Look Up Table (LUT), Shift registers and

scaling accumulator. In DA, all the cumulative partial product outcomes are pre

14

computed and stored in a Look Up Table (LUT) which is addressed by the multiplier
bits. For filter with N coefficients, the LUT has 2 values.
We make use of a shift-adder as shown in Fig. 3.2
1. A vector

is fed into the 2 -word LUT at each clock cycle.

2. Instead of shifting each intermediate value f(h[k], x [k]) by b bits (which


demands an expensive barrel shifter), it is more appropiate to shift the
accumulator content itself in each iteration one bit to the right.
3. The adder units include an add/sub control so that when b=B, it will subtract
the f(h[k], x [k]) from the current result.
4. The shift-adder implementation requires the use of N shift registers of B+1
length.
From equation 3.5, f(h[k], x [k]) =

[ ] x [k]

has only 2

possible

values. Hence it can be pre calculated for all values and can be stored in a look-up
table of 2 words addressed by N bits. For e.g., if the number of inputs is 4, then the
LUT will have 2 = 16 memory words.

Figure 3.2 DA Architecture


15

Each product term consists of a variable (signal) and a constant (coefficient)


both in fixed point binary format but not necessarily of the same word length. Rather
than to compute the product on a term by term basis, the partial products of all terms
are computed simultaneously. These partial products are generally the filter
coefficients. These partial product filter coefficients of all terms are cumulated on bit
by bit basis. Finally all the cumulative partial products of each bit are added and the
result is produced.
In DA, all the cumulative partial product outcomes are pre computed and
stored in a look up table which is addressed by the multiplier bits. All input variables
are sequenced simultaneously, bit serial first to address the LUT; its outcome is
added to the accumulated partial products. The complete dot product computation
takes B clocks where B is the number of input variable bits, and is independent of
the number of input variables. During the first iteration, the Least-Significant Bits
( ),

( 1), ., of the K input samples form an K-bit address to the Look Up

Table for f(x,0), and that tables output becomes the initial value of the accumulator.
During the second iteration, the next to least significant bits

( ),

1), ., of the K input samples form another K-bit address to the lookup table for
f(x,1), and the adder sums the Look up Table output to the contents of the
accumulator shifted by one bit. This process continues until the last iteration, where
the most-significant bits

( ),

( 1), ., of the B input samples form an B-

bit address to the Lookup Table for f(x, N-1) and the adder sums the Look up Table
output to the contents of the accumulator after shifting it to the corresponding
position.

3.3

DA TECHNIQUE FOR 3RD ORDER FIR FILTER


The 3rd order FIR Filter has a tap length of four. The equation for the 3rd order

FIR filter is
[ ]=

[ ] [ ]

[ ] = [0] [ ] + [1] [ 1] + [2] [ 2] + [3] [ 3]

16

The number of coefficients and inputs will be as follows,


Coefficients = 4
No. of inputs = 4
LUT size => 2 = 16 memory location
In this method, possible outputs are pre computed and stored in LUT. LUT
can be addressed through input of the filter. Each location has different output for the
corresponding inputs .The possible inputs for this filter is from 0(0000) to
15(1111).For each input the computation of output is made easy by using this
technique.
If Input = 1011, then
Output = 1. + 0. + 1. + 1.
= h0+h2+h3
If Input = 1111, then
Output = + + +
If Input = 0101, then
Output = +
If Input = 1010, then
Output = +
It represents the addition of high level input co-efficient. We can easily find 16
outputs for corresponding input without any mathematical calculation. Table 3.1
shows the content of the LUT for 3rd order filter.
For Example,
Input = [0], [1], [2], [3]
[0]= 1011=11
[1]= 1101=13
[2]= 1010=10
[3]= 1001=9
[ 0] = [ 1] = [ 2] = [ 3] = 1

17

Table 3.1. LUT table for 3rd order filter

Address

Data

0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111

0
h3
h2
h2 + h3
h1
h1 + h3
h1 + h2
h1 + h2 + h3
h0
h0 + h3
h0 + h2
h0 + h2 + h3
h0 + h1
h0 + h1 + h3
h0 + h1 + h2
h0 + h1 + h2 + h3

Step 1:
Store the values in input buffer.
[0] [0]

[0]

[0] =1101

[1] [1]

[1]

[1] =1010

[2] [2]

[2]

[2] =0100

[3] [3]

[3]

[3] =1111

Step 2:
Read the values from LUT for corresponding values in buffer.
Output of LUT:
0[0]= 0011 = 3
0[1] = 0010 = 2
0[2]= 0001 = 1
0[3] = 0100 = 4

18

Step 3:
If the value is multiplied by 2, it implies left shift.
Output =0[0] + Shift the value of 0[1] one time + Shift the value of 0[2] 2 times
+ Shift value of 0[3] 3 times.
Output = 3 + 4 + 4 + 32 = 43.
Disadvantage:
A filter with N coefficients requires LUT with 2 values. For higher order, filter
LUT size will increase and require more memory space.

19

You might also like