You are on page 1of 108

DSP C5000

Chapter 14
Finite Impulse Response (FIR)
Filter Implementation

Copyright 2003 Texas Instruments. All rights reserved.

Outline
Digital Filters and FIR filters
Implementation of FIR Filters on C54x
Implementation of FIR Filters on C55x
Comparison of C54x and C55x
ESIEE, Slide 2

Copyrigh

Outline of FIR Filters

ESIEE, Slide 3

Generalities on Digital Filters


FIR Filters with Matlab
Implementation of FIR Filters

Copyrigh

Digital Filters
Sampling
frequency
fS
x(t)

Analog
antialiasing
filter

A
D
C

xn

ESIEE, Slide 4

yn

xn

Digital Filter

Digital Filter

D
A
C

Analog
y(t)
smoothing
filter

yn

Copyrigh

Linear, Time-Invariant Digital Systems

1 R

2 R

Linearity

1x1( n ) 2 x2 ( n ) 1 y1( n ) 2 y2 ( n )

Time Invariance

x ( n ) y ( n ) x ( n n0 ) y ( n n0 )

ESIEE, Slide 5

Copyrigh

Impulse Response
n 0 un 0

Impulse sequence un
u0 1
n 0 u 0
n

un
n=0

ESIEE, Slide 6

Digital Filter

hn

Copyrigh

Input-Output Relationship, Convolution


xn
n=-1 0

n=-1 0

n=-1 0

n=-1 0

n=-1 0

x-1un+1

xn

xu

k nk

x0un
x1un-1
x2un-2

ESIEE, Slide 7

Copyrigh

Input-Output Relationship, Convolution

yn

Using linearity and time invariance:


k

x output(u

yn

ESIEE, Slide 8

n k

xh

k n k

xk hnk hk xnk
Copyrigh

Output for a Single Frequency Input

Single frequency input Single frequency output

xn e

j 0 nTe

y n x n H (0 )
H (0 )
H (0 ) H (0 ) e
ESIEE, Slide 9

hk e

j 0 kTe

j arg( H ( 0 ))

A(0 )e

j ( 0 )

Copyrigh

Frequency Transfer Function

For a digital filter the frequency transfer


function is periodic.

H ( ) H ( ) e
1
hn
2f e

j arg( H ( ))

f e

jnTe
H
(

)
e
d

ESIEE, Slide 10

Amplitude

f e

( ) arg H
Phase

A( )e

j ( )

( )
( )

Group
delay

Copyrigh

Relationship Between Fourier Transforms


of Input and Output

X ( )

xne

jnTe

Y ( )

yne

jnTe

Y ( ) H ( ) X ( )

ESIEE, Slide 11

Copyrigh

Z Transfer Function

H(z)
H ( )

hne

hn z

jnTe

H ( z ) z e jTe

Y( z) X ( z)H( z)
ESIEE, Slide 12

Copyrigh

Basic Relationships of a Digital Filter

yn

xk hn k hk xnk

Y ( ) H ( ) X ( )
Y( z) X ( z)H( z)
ESIEE, Slide 13

Copyrigh

Rational z Transfer Function


Q

N(z)
H(z)

D( z )

bi z
i0
P

1 ak z

k 1

Linear equation with constant coefficients.


Q

i0

k 1

yn bi xni ak yn k
ESIEE, Slide 14

Copyrigh

IIR and FIR Filters

IIR = Infinite Impulse Response


FIR = Finite Impulse Response

H ( z ) bi z
i0

FIR

hn z

n 0, Q 1 hn 0

n 0, Q 1 hn bn

IIR

N(z)
With D( z ) constant.
H(z)
D( z )
ESIEE, Slide 15

Copyrigh

FIR and IIR

FIR: output yn is a linear combination of a


finite number of input samples.
Q

i 0

i 0

yn hi xn i bi xn i , bi hi .

IIR: output yn is a linear combination of a


finite number of input and of output
samples. Recursive form.
Q

i0

k 1

yn bi xni ak yn k
ESIEE, Slide 16

Copyrigh

Causality and Stability

A filter is causal if hn=0 for n < 0


A filter is stable if the output is bounded
for any bounded input.
Condition for stability is:

All the poles of H(z) are inside the unit circle

FIR are always stable.

Or:

hn

ESIEE, Slide 17

Copyrigh

Representation of Poles and Zeroes of H(z) in


the Complex Plane
Imaginary Part
1
0.5

Real Part

0
-0.5
-1
-1

ESIEE, Slide 18

-0.5

0.5

Copyrigh

Some Useful Matlab Functions

Example for a FIR filter:

N ( z ) b0 b1 z
b [b0

b3 z

b2 ] [1 1 1 1].

b=[1 1 1 1]; a=1;

Calculate transfer function Hf, its


amplitude and phase on 256 samples,
with fs=1:

ESIEE, Slide 19

b2

b2 z

Enter the filter coefficients vector b:

b1

[Hf,f]=freqz(b,a,256,1);
HfA=abs(Hf);
Hfphi=angle(Hf);

Copyrigh

Some Useful Matlab Functions


Plot impulse response: stem(b)
Plot amplitude and phase of transfer
function: plot(f,HfA) and plot(f,Hfphi)

Phase of the transfer function

Amplitude of the transfer function

3.5

0.5

2.5
-0.5
2
-1
1.5
-1.5
1
-2

-2.5

ESIEE, Slide 20

0.5

0.05

0.1

0.15

0.2

0.3

0.35

0.25
Frequency, FS=1

0.4

0.45

0.5

0.05

0.1

0.15

0.2

0.3

0.35

0.25
Frequency, FS=1

0.4

0.45

0.5

Copyrigh

Some Useful Matlab Functions

Generate a test signal = sum of cosines:

Apply the filter to x. Output is y:

x=cos(2*pi*[0:99]*0.25)+2*cos(2*pi*[0:99]*0.1);
y=filter(b,a,x);

Plot the results: plot(x); plot(y)


Input x

x is the sum of
2 frequencies :
0.25 and 0.1.

-1

-2

-2

-4

-3

ESIEE, Slide 21

20

40

Time

60

Output y

80

100

-6

The filter
cancels the
frequency 0.25.
0

20

40

Time

60

80

100

y has only the


freq. 0.1.
Copyrigh

Calculation of a FIR using Matlab

For given attenuation and frequency


response characteristics, the transfer
function can be calculated using
different methods:

Corresponding Matlab functions

ESIEE, Slide 22

Mean square error, miniMax (Chebychev)


Empirical window method
firls and remez.
fir and fir1.

Copyrigh

Example using Matlab

Design a low pass filter:

Sampling frequency = 9600 Hz


Maximum attenuation (passband) = 0.1 dB
Minimum attenuation (stopband) = 50 dB
Limit frequencies of passband and
stopband = 1200 Hz and 2600 Hz.
Attenuation in dB

f in Hz
1200
ESIEE, Slide 23

2600
Copyrigh

Example using Matlab

ESIEE, Slide 24

Vector of limited frequencies (normalized)


F=[0 1200 2600 4800]/4800;
Vector of required amplitudes:
A=[1 1 0 0];
Least square calculation of filter:
Bls=firls(23,F,A);
Mini Max calculation of filter:
Bre=remez(21,F,A);
Window method (Hamming):
Bwin=fir1(25,(1200+2600)/9600);
Copyrigh

Results of Matlab Example

The minimum orders to satisfy the


constraints are 23 for LS, 21 for
minimax and 25 for the window method.
140

Least square
method

120

100

Window
method

80

60

40

20

Mini Max
window

-20

ESIEE, Slide 25

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

Copyrigh

Results of Matlab Example


Impulse Response

0.4
0.35

hn

0.3
0.25
0.2
0.15
0.1
0.05

0
-0.05
-0.1

ESIEE, Slide 26

10

15

20

25

Copyrigh

FIR Filters with Constant Group Delay or


Linear Phase
For many applications, it is desirable to
use a filter with a constant group delay
(independant of the frequency).

2 possible cases:

ESIEE, Slide 27

The phase will be linear or affine.

symmetrical or asymmetrical FIR.


Constant group delay = TS (N-1)/2
Symmetrical:
Asymmetrical;

h(n)=h(N-1-n)
h(n)=-h(N-1-n)
Copyrigh

FIR filters with Constant Group Delay or


Linear Phase

Asymmetric case: linear phase


( f ) kf

Asymmetrical case:

( f ) kf
2

ESIEE, Slide 28

Copyrigh

Fixed Point Implementation of FIR Filters


Numerical Issues

Fixed point implementation:

Fixed point representation of data

16 bits for data and coefficients


Accumulators have size 40 bits
Size B = 16 bits, Format Qk: k fractional bits

Quantization of coefficients

Maximum magnitude coefficient = hmax


Number of bits of the integer part of
coefficients is Bi:

ESIEE, Slide 29

Bi = log2(hmax)
Coefficients in Qk with k = 16-Bi

Copyrigh

Matlab Example

The coefficients Bre can be quantized using


16-bit fixed point with 15 fractional bits:

To store the result in a text file for CCS:

ESIEE, Slide 30

Bre=round(Bre*2^15);
fp=fopen('coef.asm','wt')
for i=1:22
fprintf(fp,' .word %d \n',Ba(i))
end
fclose(fp)

Copyrigh

Matlab Example

ESIEE, Slide 31

File coef.asm
Can be edited
to be used
with CCS.

.word
.word
.word
.word
.word
.word
.word
.word
.word
.word
.word
.word
.word
.word
.word
.word
.word
.word
.word
.word
.word
.word

39
-92
-242
25
668
579
-978
-2229
86
6374
12127
12127
6374
86
-2229
-978
579
668
25
-242
-92
39
Copyrigh

FIR Implementation, Numerical issues,


FRCT bit

Common case:

ESIEE, Slide 32

Data and coefficients in Q15 format


Product h(i)x(n-i) in Q30 (2 sign bits)
By shifting products 1 bit left, the product
are in Q31 format with only 1 sign bit.
If the FRCT bit (Fraction) is set to 1,
products are automatically shifted 1 bit left.

Copyrigh

Structures for FIR Implementation

Common structures for FIR filters

Transversal structures
Trellis structure

Transversal structures using:

ESIEE, Slide 33

Useful in some adaptive situations.

Linear buffers
Circular buffers

Special case for symmetrical or


asymmetrical FIRs.
Copyrigh

Transversal Structures of FIR

xn-1

xn
b0

Structure with a delay line

b1

xn-2

b2

xn-N+1

b3

bN-1
yn

bN-1

bN-2

Transposed structure
b3

b2

b1

b0

yn

xn
ESIEE, Slide 34

Copyrigh

Implementation of a FIR with a Delay Line

Most common structure used in DSP.

The delay line can be implemented using a


linear or a circular buffer.

Basic operations:

Read a new data value x(n) every TS


ACCU=0
for i=0 to N-1:

ESIEE, Slide 35

Multiply h(i) by x(n-i) and add it to


accumulator

Output y(n)
Copyrigh

Implementation of FIR Filters on C54x

Implementation of General Transver


sal FIR filters

ESIEE, Slide 36

Using linear buffers


Using circular buffers

Implementation of Symmetrical FIR


filters

Copyrigh

Operations using a Linear Buffer for a FIR


with N Coefficients

Length of the delay line = N samples


Read a new sample x(n) and store it in the
delay line in the first position.
ACCU=0
for i=0 to N-1

ESIEE, Slide 37

Read h(i) and x(n-i)


Multiply h(i) by x(n-i) and add it to ACCU

Output y(n)
N-1 Shifts in the delay line.
Copyrigh

Linear Buffer, MACD Mode

Instead of shifting N-1 samples at the


end, do the shift in the loop one by one.
Read a new sample xn and store it in the
delay line in the first position.
ACCU=0
for i=N-1 to 0

ESIEE, Slide 38

Read h(i) and x(n-i)


Multiply h(i) by x(n-i) and add it to ACCU
Shift x(n-i) in the delay line

Output y(n)
Copyrigh

MACD Instruction

MACD:

Multiply Accumulate and Delay move.


MACDSmem,pmad,src

If MACD used in a loop with RPT the


program memory (pmad) address is
automatically incremented.

ESIEE, Slide 39

src=src+Smem*pmad;
T=Smem;
(Smem+1)=Smem

MACD alone = 3 cycle times


In a RPT loop 1 cycle time
Copyrigh

Implementing a FIR with MACD

Memory organization of data and coefficients


Program Memory
Addresses
Content
i=pmad
b(N-1)
i+1
b(N-2)
i+2
b(N-3)

i+N-1
b(0)

Data Memory
Addresses
Content
k=Smem
x(n)
k+1
x(n-1)
k+2
x(n-2)

k+N-1
x(n-N+1)
dummy place
for copy of

k+N
ESIEE, Slide 40

x(n-N+1)
Copyrigh

Initialization of Registers

STM Stores #value to the MMR early


in the pipeline to avoid latencies.

Initialization of FRCT bit (fractional


mode):

Instructions SSBX (Set Status Bit) and


RSBX (Reset Status Bit).

Initialization of ACCU

ESIEE, Slide 41

2 words, 2cycles.

Using RPTZ :RePeaT after initializing


ACCU at 0
Or via LD #0,A
Copyrigh

RPT, RPTZ Instructions

RPT #n

RPTZ src, #n

ESIEE, Slide 42

Repeat next instruction n+1 times.


Repetition counter set to n and decreases
until 0.
1 or 2 cycles, not interruptible.
Same as repeat, except that src ACCU is
cleared to zero before repeat.
2 cycles , not interruptible.

Some instructions execute faster when in


repeat mode (pipeline).
Copyrigh

Implementing a FIR Filter with MACD


.bss
adr_fin_dat .set
.text
* Initialization of
STM
SSBX
* Filter loop
RPTZ
MACD

adr_debut_dat,N+1
adr_debut_dat+N-1
AR1 and FRCT
#adr_fin_dat, AR1
FRCT
A, #N-1
*AR1-, adr_coef, A

Test with CCS

ESIEE, Slide 43

Filter with N=32 coefficients all equal to 1/32


Create a file fircoef.asm, address of coefficients in
program mem = adr_coef
Copyrigh

Implementing a FIR Filter with MACD

File containing coefficients fircoef.asm

adr_coef

ESIEE, Slide 44

.global
adr_coef
.sect ".coef"
.word 0X400, 0X400
.word 0X400,0X400,0X400,0X400,0X400
.word 0X400,0X400,0X400,0X400,0X400
.word 0X400,0X400,0X400,0X400,0X400
.word 0X400,0X400,0X400,0X400,0X400
.word 0X400,0X400,0X400,0X400,0X400
.word 0X400,0X400,0X400,0X400,0X400

Copyrigh

Implementing a FIR Filter with MACD

File firmacd.asm with the program

2 files to compile and link:

Test by associating files on the ports


DRR0 and DXR0

ESIEE, Slide 45

fircoef.asm and firmacd.asm

File infir.dat attached to DRR0


File outfir.dat attached to DXR0

Copyrigh

Implementing a FIR Filter with MACD

Program file firmacd.asm: initializations

N
adr_fin_dat

.mmregs
.global
.global
.global
.set
.bss
.set

.text
* Initialization of
LD
SSBX
* Initialization of
STM
STM
STM
ESIEE, Slide 46

adr_debut_dat
adr_fin_dat
adr_coef
32
adr_debut_dat,N+1
adr_debut_dat+N-1

DP and FRCT
#0, DP
FRCT
AR0, AR1, AR2
#(adr_debut_dat),AR2
#(adr_debut_dat-1),AR1
#N, AR0
Copyrigh

Implementing a FIR Filter with MACD

Program file firmacd.asm: endless loop

debut:
* set AR1 at adr_fin_dat
MAR
*AR1+0
* Read x(n) at DRR
LDM
DRR0, A
STL
A,*AR2
* Endless filter loop
RPTZ
A, #N-1
MACD
*AR1-, adr_coef, A
* Write y(n) in DXR
* by saving the high part of ACCU in DXR
STH
A,DXR0
* Go back to the beginning of the loop
B
debut

ESIEE, Slide 47

See files
firmacd.asm
and
fircoef.asm
for the test in
directory
tutorial.

Copyrigh

FIR with MACD, Test with CCS

Create project, create command file,


compile and link.
To test the impulse response:

Create a file infir.dat with:

Set 2 probe points

1 at reading of DRR: LDM DRR


1 at end of loop: B debut

Attach files to probe points

ESIEE, Slide 48

A value 0.5 (0x4000) then zeros (at least 40)

infir.dat at 1rst probe point (read value stored


at address 0x20 DRR)
outfir.dat at second probe point (data at address
0x21 DXR is strored in the file)
Copyrigh

Results

Let program run until end of file


infir.dat
Load file outfir.dat at some address in
the DSP data memory (File-Data-Load)
Plot the content of this memory area
(View-Graph-Time/Frequency).

ESIEE, Slide 49

Plot a time graph (Single Time)


Plot a frequency graph (FFT: Magnitude
and Phase)

Copyrigh

Results for the impulse response and its FFT

ESIEE, Slide 50

Copyrigh

Second Test

ESIEE, Slide 51

New test with a sine input.


Replace infir.dat by file insinus.dat
containing 80 samples of a sine with 40
samples per period of sine.
Name outsine.dat the result file.
Repeat the same operations as in the
preceding test.

Copyrigh

Second test

ESIEE, Slide 52

Observe that the output is attenuated and is phase


shifted by values corresponding at H(f) at fS/40.

Copyrigh

Implementation using a Circular Buffer

A circular buffer of length N is a block


of contiguous memory words addressed
by a pointer using a modulo N
addressing mode.

Characteristics of a circular buffer:

ESIEE, Slide 53

The 2 extreme words of the memory block


are considered as contiguous.
Instead of moving the N data in memory,
just modify the pointers.
When a new data x(n) arrives, the pointer is
incremented and the new data is written in
place of the oldest one.
Copyrigh

Trace of Memory and Pointer in a Circular


Buffer of Length 3

ESIEE, Slide 54

Time n

Time n+1

Time n+2

Time n+3

x(n-1)
x(n)
x(n-2)

x(n-1)
x(n)
x(n+1)

x(n+2)
x(n)
x(n+1)

x(n+2)
x(n+3)
x(n+1)

Copyrigh

FIR with Circular Buffers

2 circular buffers

1 for data
1 for coefficients

Data
Memory
adr_deb_data

Coefficient
memory
adr_deb_coef

b(N-1)
b(N-2)

pnt_coef
pnt_data
adr_fin_coef

ESIEE, Slide 55

adr_fin_coef

b(0)

Copyrigh

Operation of FIR with Circular Buffer

Read a new input sample x(n)


Store it at address of pnt_data
ACCU=0
for i=1 to N-1

ESIEE, Slide 56

multiply data pointed by pnt_data by


coefficient pointed by pnt_coef. Add
product to ACCU
decrement pointers pnt_data and pnt_coef

end
output y(n) from ACCU
increment pnt_data of 1
Copyrigh

Instruction MAC with 2 operands in Indirect


Addressing Mode

MAC: Multiply and Accumulate

MACXmem,Ymem,src[,dest]

Dual operand instructions indirect


addressing restricted to:

ESIEE, Slide 57

dst=src+Xmem*Ymem
T=Xmem
With Xmem, Ymem use only AR2 to AR5
Can be executed in 1 cycle time.

AR2, AR3, AR4, AR5


none, +, -, +0%
Copyrigh

Circular Buffer with C54x

Circular indirect addressing mode:

*ARi-%, *ARi+%, *ARi-0%, *ARi+0%,


*ARi(lk)%
In dual operand mode Xmem, Ymem:

BK register:

ESIEE, Slide 58

*ARi+0% only valid mode


To perform a decrement, store a negative value
in AR0.

Stores the size N of the circular buffer.


Must be initialized before use.
There may be several circular buffers at
different addresses at the same time but
with the same length.

Copyrigh

Limitations on Start Addresses of Circular


Buffers

If N is written on nb bits in binary, the


start address must have its nb LSB at 0:

Examples:

To access a circular buffer:

Initialize BK with N (nb bits)


Choose 1 ARi as a pointer

ESIEE, Slide 59

for N=32, 6 LSB of start address =0


for N=30, 5 LSB of start address =0

The effective start address of the buffer is the


value in ARi with its nb LSB at 0.
The end address = start addess +N-1.
Copyrigh

Circular buffer on C54x


Data Memory
Start_address =
xxxxxxxxxxx00000

ARi

BK

xxxxxxxxxxx00010

N=30=1 1 1 1 0

ARi

End_address =
xxxxxxxxxxx11111

ESIEE, Slide 60

Copyrigh

Implementation of FIR Filter


with 2 Circular Buffers

Same filter as in the preceding example,


coefficients in section .coef (in program
memory) in file fircoef.asm.
N=32
2 buffers are allocated in data memory
for the coefficients and the data of the
filters

First step of program after initialization:

ESIEE, Slide 61

Start addresses must be multiple of 64.


Transfer coefficients from program to data
memory from adr_coef to adr_debut_coef.
Copyrigh

Move Instructions

MVPD #pmad, Smem

Copy values from program to data memory


In RPT mode pmad is automatically
incremented.

Program
Data
MVPD, MVDP
READA, WRITEA
Data
Data
MVKD, MVDK, MVDD

ESIEE, Slide 62

MMR

Data

MVMD, MVDM
MMR

MMR
MVMM

Copyrigh

Implementation of FIR with 2 Circular


Buffers, Initializations

N
adr_debut_dat
adr_debut_coef
adr_fin_dat
adr_fin_coef

.mmregs
.global
.global
.global
.global
.global

adr_debut_dat
adr_fin_dat
adr_debut_coef
adr_fin_coef
adr_coef

.set
.usect
.usect
.set
.set

32
"buf_data", N
"buf_coef", N
adr_debut_dat+N-1
adr_debut_coef+N-1

.text
* Initialization of BK,AR0,FRCT
STM
#N, BK
STM
#-1, AR0
SSBX
FRCT
* Initialization of AR2, AR3
STM
#(adr_debut_dat),AR2
STM
#(adr_fin_coef),AR3
ESIEE, Slide 63

Copyrigh

Implementation of FIR with 2 Circular


Buffers, Program
* Transfer of coefficients from
* program to data memory
STM
#adr_debut_coef, AR4
RPT
#N-1
MVPD
adr_coef, *AR4+

* Endless loop
debut:
* Read x(n) at DRR
LDM
DRR0, A
STL
A, *AR2
* Calculation of y(n)
RPTZ
A, #N-1
MAC
*AR2+0%, *AR3+0%, A
* Write y(n) in DXR
* by saving high part of ACCU
STH
A, DXR0
* Go back to the beginning of the loop
MAR
*AR2+
B
debut
ESIEE, Slide 64

See files
fircirc.asm
and
fircoef.asm
for the test.

Copyrigh

Command File for Circular Buffer


Addressing Constraint

The addresses adr_debut_dat and


adr_debut_coef have to be aligned with
a multiple of 64 in the example.

adr_debut_dat is the start address of


unitialized section buf_data.
adr_debut_coef is the start address of
unitialized section buf_coef.
To align the 2 sections on a multiple of 64,
in the command file add align(64) after the
name of the sections in the MEMORY
directive, for example:

ESIEE, Slide 65

buf_dataalign(64)>DATA

page1
Copyrigh

Implementation of a Symmetrical FIR filter

The symmetry of coefficients is used to decrease the


computational load:
b(n)=b(N-1-n)
N time cycles for a general FIR filter with N
coefficients is N (in good conditions).
N/2 time cycles for a symmetrical FIR filter.
Use of specific instruction FIRS.
N
1
2

y (n) b(i ) x(n i ) x(n N 1 i ) N even


i 0

N 1
1
2

N 1

y (n) b(i) x(n i ) x(n N 1 i ) b


i 0

ESIEE, Slide 66

x n

N 1

N odd

Copyrigh

FIRS Instruction to Work with RPT(Z)

FIRSXmem,Ymem,pmad
Xmem, Ymem corresponds to:

x(n-i), x(n-N+1+i)

Coefficients in program memory pmad


operations of FIRS:

pmad PAR
while RC 0

ESIEE, Slide 67

B = B + A(32:16) x Pmem addressed by PAR


A = (Xmem+Ymem)<<16
PAR=PAR+1
RC=RC-1

Copyrigh

Using FIRS for a Symmetrical FIR Filter

3 arrays:

N/2 first coefficients,


N/2 newest data and N/2 oldest data.

adr_debut_coef
PAR

Program
Memory

Data
Memory

b(0)

x(n-2)

b(1)
b(2)

x(n)
x(n-1)
x(n-3)

adr_debut_dat0
AR2

adr_debut_dat1
AR3

x(n-5)
x(n-4)
Example for N = 6
2 circular
buffers
ESIEE, Slide 68

Copyrigh

Using FIRS for a Symmetrical FIR Filter

BK = N/2
At the beginning AR2 and AR3 point to:

the newest data x(n)


and the oldest data x(n-N+1)
Beginning

x(n)
x(n-1)

x(n-N/2-1)

ESIEE, Slide 69

x(n-N+3)

x(n-N/2)
x(n-N+1)
x(n-N+2)

After N/2 +1 incrementations


x(n)
x(n-1)

x(n-N/2-1)

x(n-N+3)

x(n-N/2)
x(n-N+1)
x(n-N+2)

Copyrigh

Using FIRS for a Symmetrical FIR Filter

FIRS is repeated N/2 times


The first sum x(n)+x(n-N+1) is done before
entering the loop.
N/2 iterations (AR2 and AR3 incremented
by 1):

ESIEE, Slide 70

At the first iteration AR2 points on x(n-1) and


AR3 on x(n-N+2)
After N/2 iterations: AR2 is decremented of 2
and AR3 of 1.
The oldest sample x(n-N/2+1) of 1st buffer is
stored in 2nd buffer in place of x(n-N+1).
Then AR is incremented by 1.
New sample x(n+1) is stored in place of x(n).

Copyrigh

Symmetrical FIR Implementation with FIRS,


Initializations

N
Nsur2
adr_debut_coef
adr_debut_dat
adr_debut_dat1

.mmregs
.global
.global
.global
.set
.set
.set
.usect
.usect

adr_debut_coef
adr_debut_dat0
adr_debut_dat1
32
16
adr_coef
"buf_data0", N
"buf_data1", N

.text
* Initialization of BK, AR0,FRCT
STM
#Nsur2, BK
STM
#-2, AR0
SSBX
FRCT
* Initialization of AR2, AR3
STM
#(adr_debut_dat0),AR2
STM
#(adr_debut_dat1),AR3
ESIEE, Slide 71

Copyrigh

Symmetrical FIR Implementation using


FIRS, Program
* Endless loop
debut:
* Read x(n) at DRR
LDM
DRR0, A
STL
A, *AR2
* Calculation of y(n)
* Calculation of the first sum
ADD
*AR2+0%,*AR3+0%,A
* Repeat N/2 times FIRS
RPTZ
B, #(Nsur2-1)
FIRS
*AR2+0%, *AR3+0%, adr_coef
* Write y(n) at DXR
* by saving high part of ACCU in DXR
STH
B, DXR0
* Transfer of the oldest value of 1rst array
* to the oldest value of the 2nd array
MAR
*+AR2(-2)%
MAR
*AR3-%
MVDD
*AR2, *AR3+0%
* Go back to the beginning of the loop
B
debut
ESIEE, Slide 72

See files
firsym.asm
and
fircoef.asm
for the test.

Copyrigh

Tutorial

The listing files for the prceent examples can


be found in directory tutorial:

ESIEE, Slide 73

Tutorial > Dsk5416 > Chapter 14 > Labs_fir

Copyrigh

Implementation of FIR Filters on C55x

Implementation of block filters

Implementation of symmetrical or a
symmetrical FIR filters

ESIEE, Slide 74

Copyrigh

Implementation of FIR Filters using C55x

2 MAC units accessed using 3 data buses


D, B, C make it possible to:

Calculate 2 output samples y at a time using


same set of coefficients and different data x.
Calculate 2 output samples y at a time using
same input data x but 2 set of coefficients.
Data Read Buses

t
MAC

MAC

AC
A0
AC1
ESIEE, Slide 75

Copyrigh

Using the 2 MAC Units

yn

Use of block
filtering in order to
calculate 2 output
samples at a time.
b 0 x n + b 1 x n-1 + b 2 x n-2 + b 3 x n-3

y n+1 = b 0 x n+1 + b 1 x n + b 2 x n-1 + b 3 x n-2


C55x
yn

C54x

ESIEE, Slide 76

Data Read Buses

t
MAC

MAC

AC
A0
AC1

MAC *AR2+, *CDP+, AC0 :: MAC *AR3+, *CDP+, AC1


b 0 x n + b 1 x n-1 + b 2 x n-2 + b 3 x n-3
MAC *AR2+, *AR3+, A

Copyrigh

Block Filter

Calculate a block of M output samples:

Avoids interrupts sample by sample


Allows calculation of 2 samples at a time
N 1

yn m bi xn mi
i 0

ESIEE, Slide 77

m 0, M 1 .

M+N-1 inputs necessary to calculate M output


samples.
Because of N-1 initial conditions.

Copyrigh

Block Filter, example N=4, M=3


CDP

Coeffcients
b0
AR2
b1
AR3
b2
b3

Input data
xn
xn-1
xn-2
xn-3
xn-4
xn-5

yn = b0xn+b1xn-1+b2xn-2+b3xn-3
yn-1 = b0xn-1+b1xn-2+b2xn-3+b3xn-4
yn-2 = b0xn-2+b1xn-3+b2xn-4+b3xn-5
ESIEE, Slide 78

Copyrigh

Block Filter Example

Double loop:

On coefficients and on m

Coefficients accessed by CDP:

CDP (Cmem) modifications limited to:


*CDP, *CDP+, *CDP-, *(CDP+T0).

CDP uses B bus only for dual-MAC.

Because B bus is internal only, coefficients


must also be internal.

ESIEE, Slide 79

Place data operands carefully to avoid


memory conflicts (SA/DARAM).
Copyrigh

yn

Using Dual MAC

b 0 x n + b 1 x n-1 + b 2 x n-2 + b 3 x n-3

y n+1 = b 0 x n+1 + b 1 x n + b 2 x n-1 + b 3 x n-2


CDP

AR2

AR3
CDP

B
C
D
MAC

MAC
AC0
AC1

Coeffcients
b0
AR2
b1
AR3
b2
b3

Input data
xn
xn-1
xn-2
xn-3
xn-4
xn-5

MAC *AR2+, *CDP+, AC0 :: MAC *AR3+, *CDP+, AC1


ESIEE, Slide 80

Copyrigh

Initialization of Pointers

Use AMOV to do transfers during the


AD pipeline phase.
Init AR2 to point to the 1st value of input
data : (x)
Init AR3 to point to the 2nd value of
input data (x+1)
Init CDP to point to coefficient array (a)
AMOV #x,XAR2
AMOV #(x+1),XAR3
AMOV #a0,XCDP

ESIEE, Slide 81

Copyrigh

Inner Loop on Coefficients


RPT #3
MAC *AR2+,*CDP+,AC0
:: MAC *AR3+,*CDP+,AC1
Pointers at the end of the repeat instruction:
CDP

CDP

ESIEE, Slide 82

Coeffcients
b0
b1
b2
AR2
b3
AR3
AR2
AR3

Input data
xn
xn-1
xn-2
xn-3
xn-4
xn-5

Reinitialization of
pointers for next
output sample:

ASUB
ASUB
MOV

#2,AR2
#2,AR3
#a0,CDP

Copyrigh

Circular Addressing Mode for Coefficients

Initialize size of the circular buffer: BK


Set up Buffer Start Address: BSA and
Xeven
Set up ARi or CDP
No memory alignment constraint
b0

BKzz

Xeven : BSAxx

b1
b2

ARn/CDP

b3

ESIEE, Slide 83

Copyrigh

Circular Buffer Addressing Mode

Buffer Start Address =


Offset into Buffer

Calculated Address

Buffer Length

ESIEE, Slide 84

Xeven[22:16]

+
Xeven[22:16]

BSAxx[15:0]
ARn/CDP
BSAxx + ARn/CDP
BKzz[15:0]

Copyrigh

Circular Buffer Addressing Mode


Offset
AR0
AR1
AR2
AR3
AR4
AR5
AR6
AR7
CPD

Xeven

Buffer
Start
Address

XAR0[22:16]

BSA01

Block size
Register

BK03
XAR2[22:16]

BSA01

XAR4[22:16]

BSA01
BK03

XAR6[22:16]

BSA01

XCDP[22:16]

BSAC

BKC

The even XARn (i.e. 0,2,4,6) determines the 64K Page


ESIEE, Slide 85

Copyrigh

Selecting Circular or Linear Addressing


Mode

Use the LSB of Status word ST2_55


15

ST2_55

9 8 7 6 5 4 3 2 1 0
other bits or rsvd

0 = linear mode
(default)

A
R
7
L
C

A
R
6
L
C

A
R
5
L
C

A
R
4
L
C

A
R
3
L
C

A
R
2
L
C

A
R
1
L
C

A
R
0
L
C

1 = circular mode

Set or reset status bits:

BSET AR5LC
BCLR AR3LC
ESIEE, Slide 86

C
D
P
L
C

;AR5 in circular mode


;AR3 in linear mode
Copyrigh

Circular Buffer Exercise


Use AR4 as a circular pointer to x{5}:
A
AR
R44
x

.sect data
.int 7,1,9,6,2
.sect code
__________________
AMOV #x,XAR4
__________________
MOV #x,BSA45
__________________
MOV #5,BK47
__________________
MOV #0,AR4
__________________
BSET AR4LC
MOV
MOV
MOV
MOV

ESIEE, Slide 87

#3,T0
*(AR4+T0),AC0
*+AR4(#4h),AC1
*AR4(T0),AC2

;init data
;init XAR
;init start addr
;init length
;init AR4 to top
;set AR4 to circ
;index
;AC0 =_7__, AR4 =_3__
;AC1 =_9__, AR4 =_2__
;AC2 =_7__, AR4 =_2__

x
7
1
9
6
2

0
1
2
3
4

Results are
cumulative

Copyrigh

Circular Buffer for Coefficients

Table of coefficients b0 b3:

Circular buffer addressed by CDP.


Initialize XCDP: 7 MSB
Initialize CDP to 0: offset in the buffer
Set up CPD in circular addressing mode

s1: AMOV
AMOV
AMOV
MOV
MOV
MOV
BSET
ESIEE, Slide 88

#x,XAR2
#a0,XCDP
#(x+1),XAR3
#a0,BSC
#0,CDP
#4,BKC
CDPLC
Copyrigh

Store Results, 32-bit Moves

Assuming fractional mode, 2 results are


in high parts of AC0 and AC1
AC0 and AC1 can be saved separately:
MOV HI(AC0), *AR4+
MOV HI(AC1), *AR4+

AC0, AC1 can be saved at the same time:

MOV pair(hi(AC0)),dbl(*AR4+)

ESIEE, Slide 89

Pairs: (AC0,AC1), (AC2,AC3)


ARi incremented of 2
Even align y
Copyrigh

Block Filter Inner Loop


s1:

e1:
ESIEE, Slide 90

AMOV
AMOV
AMOV
AMOV
MOV
MOV
MOV
BSET

#x,XAR2
#a0,XCDP
#(x+1),XAR3
#y,XAR4
#a0,BSAC
#0,CDP
#4,BKC
CDPLC

MOV
MOV
RPT
MAC
::MAC
ASUB
ASUB
MOV

#0,AC0
#0,AC1
#3
*AR2+,*CDP+,AC0
*AR3+,*CDP+,AC1
#2,AR2
#2,AR3
pair(hi(AC0)),dbl(*AR4+)
Copyrigh

Outer Loop Using RPTB or RPTBlocal

Use RPTB Repeat Block instruction


We must specifiy:

Start address of the block: next instruction


End address: label specifies last instruction
The number of repetitions counter:

RPTBlocal: executes from the IBU

ESIEE, Slide 91

BRC0: loop counter initialized with count-1


Min count = 2

56 bytes maximum (if > 56 Bytes use RPTB)


Reduces power consumption
Copyrigh

Outer Loop on m: Calculate M yn-m


s1:

AMOV
#x,XAR2
AMOV
#a0,XCDP
AMOV
#(x+1),XAR3
AMOV
#y,XAR4
MOV
#a0,BSAC
MOV
#0,CDP
MOV
#4,BKC
BSET
CDPLC
MOV
#((samps-taps)/2),BRC0
RPTBLOCAL e1
MOV
#0,AC0
MOV
#0,AC1
RPT
#3
MAC
*AR2+,*CDP+,AC0
:: MAC
*AR3+,*CDP+,AC1
ASUB
#2,AR2
ASUB
#2,AR3
e1:
MOV
pair(hi(AC0)),dbl(*AR4+)
ESIEE, Slide 92

Copyrigh

More Nested loops ?

Nesting RPTB or RPTBlocal:

2 levels supported using BRC0 (outer) and


BRC1/BRS1 (inner)
No saving of registers required for nested
block repeat.

MOV #outer_cnt,BRC0
MOV #inner_cnt,BRC1
RPTBLOCAL outer
. . .
RPTBLOCAL inner
. . .
inner: last_inner
. . .
outer: last outer

ESIEE, Slide 93

;load outer loop count


;load BRC1, auto-load BRS1
;use BRC0
;BRC1: decrements, BRS1-no change

Copyrigh

Laboratory on Block Filter

Implement a block FIR with 16 coefficients


and input block size = 200.
Implement subroutine
C5 5 1 0

64Kx8
ROM
FF_0000h EPtable{16}

1_0000h

code

4000h
FF_FF00h

vectors
6000h

SARAM0 8Kx8

a{16}
DARAM2 8Kx8

x{200}
DARAM3 8Kx8

SP/SSP

5_0000h

AC0

16Kx8
CE0

All addresses and lengths are shown in bytes

ESIEE, Slide 94

Copyrigh

Using the Stack and Subroutines

Subroutines require call and ret.


During a call the return address is
stored in the Stack SP.
Let us call fir the subroutine:

ESIEE, Slide 95

call fir

Copyrigh

Initialize the Stack

Declare an unitialized section (.usect) of


appropriate length to reserve space.
Initialize stack pointer to point to the
top of stack +1.
Recommendation: place the stack in
internal memory and align on a 4-byte
boundary:

ALIGN= specifies bytes

Size .set 100h


Stack .usect"STK",size
AMOV#(stack+size),XSP

Mem

STK
SP

ESIEE, Slide 96

Copyrigh

The System Stack SSP

ESIEE, Slide 97

When a call occurs PC[15:0] is pushed


on the stack
The upper 8 bits SP[23:16] are pushed
on the system stack accessed by SSP
System Stack Pointer.
CFCT is used to store the active loop
context.
WSP and XSSP share the same upper 7
bits.
Place SP and SSP with care to avoid
dual-access delays.
Copyrigh

Data Types

Byte: 8 bits
Word: 16 bits
Long: 32 bits

Long access assumes address points to MSW

LSW read from same address with LSB toggled.


Ptr=100h, MSW=100h, LSW = 101h
Ptr=101h, MSW=101h, LSW = 100h

To ensure proper alignment:

Constants (int, long) are automatically aligned on


type boundaries
Variables:

16 bit: no problem
32 bits use: use the even-align flag:

ESIEE, Slide 98

.usect vars,Nwords,,1

Copyrigh

Solution: Declarations
x0

stklen
a0
y0
BOS
BOSS

.sect "indata"
.copy in7.dat
.def start
.cpl_off
.arms_off
.c54cm_off
.set 100
.usect "coeffs",16,1,1
.usect "results",200,1,1
.usect "STK", stklen,1,1
.usect "SSTK",stklen,1,1
.sect "init"

table

ESIEE, Slide 99

.int
.int
.int
.int

7FCh,
800h,
803h,
7FFh,

7FDh,
801h,
802h,
7FEh,

7FEh,
802h,
801h,
7FDh,

7FFh
803h
800h
7FCh
Copyrigh

Solution: Code
sect "code"
.DP a0
.

start:

AMOV #BOS+stklen,XSPc ;set up Stack +


MOV #BOSS+stklen,SSP ;System Stack Ptrs
CALL copy
BSET
BSET
BSET

here:

ESIEE, Slide 100

FRCT
M40
SXMD

CALL fir
nop
B
here

;copy coeffs

;turn
;turn
;turn

on
on
on

mult. shift
40 bit math
sign exten.

;perform fir
;stop

Copyrigh

Solution: Subroutine copy


copy:

AMOV #table,XAR2
;load pointers
AMOV #a0,XAR3
RPT #7
MOV dbl(*AR2+),dbl(*AR3+)
;move from table to a
RET

ESIEE, Slide 101

Copyrigh

Solution: Subroutine fir


fir: MOV #92,BRC0
;block repeat count
AMOV #x0,XAR2
;initialize pointers
AMOV #x0+1,XAR3 ;for data,
AMOV #y0,XAR4
;results
AMOV #a0,XCDP
;and coeffiecients
MOV #a0,BSAC
;buffer start address
MOV #16,BKC
;buffer size
MOV #0, CDP
;index
BSET CDPLC ;turn on circ adr CDP

end

ESIEE, Slide 102

RPTBlocal end
MPYM *AR2+,*CDP ,AC0 ;AC0 1st product
MPYM *AR3+,*CDP+,AC1 ;AC1 gets 2nd prd
RPT #14
MAC *AR2+,*CDP+,AC0
;form results
:: MAC *AR3+,*CDP+,AC1
MOV pair(hi(AC0)),dbl(*AR4+)
;store AC0/AC1
ASUB #14,AR2
;wrap data pointers
ASUB #14,AR3
;next calculation
RET

Copyrigh

Implementation of Symmetrical and


Anti-symmetrical FIR filters on C55x
Symmetrical

Coeff
s

Coeff
s
b0 b1 b2 b3 b4 b5 b6 b7

Antisymmetrical
b0 b1 b2 b3

b4 b5 b6 b7

These filters may be folded and performed with N adds and N/2 MACs
Filters need to be designed as even length

N
1
2

y (n) b(i ) x(n i ) x(n N 1 i ) N even.


i 0

ESIEE, Slide 103

Copyrigh

Instructions FIRSADD and FIRSSUB

FIRSADD Xmem,Ymem, coef,Acx,Acy

FIRSSUB Xmem,Ymem, coef,Acx,Acy

ESIEE, Slide 104

Acy = Acy + (Acx x (*CDP))


|| Acx = Xmem + Ymem
For symmetrical FIR
Acy = Acy + (Acx x (*CDP))
|| Acx = Xmem - Ymem
For anti-symmetrical FIR

If performing a block FIR, dual MAC has


better performance than FIRS.
A design consideration for migration from
C54x.
Copyrigh

Comparison of C54x and C55x

2 MAC in C55x versus 1 for C54x

Circular addressing modes:

ESIEE, Slide 105

Well suited for block filtering and 2 taps


per cycle time instead of 1 (for large N).
3 BK registers in C55X instead of 1 in
C54x: allows for several simultaneous
circular buffers with different size.
In C54x, circular addressing mode is
specified in indirect addressing type % in
the instructions.
In C55x, the mode in set in status register
ST2_55 for each register (linear or
circular). No memory alignment constraint.
Copyrigh

Comparison of C54x and C55x


Symmetrical and Anti-symmetrical
FIR Filters

In C54x, instruction FIRS:

In C55x, instructions FIRSADD +


FIRSSUB:

ESIEE, Slide 106

Allows 2 taps/cycle for a symmetrical FIR

Allow us to efficiently implement symmetrical


and anti-symmetrical FIRs.
Despite the 2 MACs, as there is only 1 ALU,
again 2 taps/cycle for symmetrical or antisymmetrical FIRs.
Copyrigh

Follow On Activities on 5416 DSK

Laboratory 3 for TMS320C5416 DSK

Laboratory 4 for TMS320C5416 DSK

To determine by experiment how many FIR


coefficients are required for acceptable audio
quality.

Application 4 for TMS320C5416 DSK

ESIEE, Slide 107

To determine by practical experiment the best FIR


window functions for audio.

Electronic Crossover for multiple loudspeaker


system. Divides audio signal into treble and bass at
16 different selectable frequencies using FIR
filters.

Copyrigh

Follow on activities on 5510 DSK

Application delays and echo for


TMS320C5510 DSK

ESIEE, Slide 108

Simulates delays in communications


networks and reflection of sound heard in a
canyon. Introduces circular buffers and the
configuration used for a Finite Impulse
Response (FIR) filter.

Copyrigh

You might also like