You are on page 1of 36

Jgvldgram

Digital signal processors and


applications
By
Prof. Sridhar Ranganathan
VIT,Chennai
Why DSP?
DSP algorithms need lot of mathematical
operations on every sample of data
They need to be done quickly [ before next
sample of data arrives]
Deferred processing is NOT possible
General purpose processors provide Add,
subtract and shift operations
They provide multiply and divide but typically
they take lot of memory cycles
DSP need
General purpose processors also suffer from
space constraints
They also consume lot of power

Hardware features visible through DSP
Hardware Modulo addressing
Memory architecture designed for streaming
data they may support several memory
accesses per cycle
DMA
Multiple arithmetic units
Harvard architecture

Features of DSP contd..
Special SIMD instructions
Some processors use VLIW [ very long instruction
word] techniques
Specialized instructions of DSP[ like MAC Multiply
and accumulate] work quickly
Special algorithms are packaged as libraries for quick
functionality
Bit reversed addressing that would help in calculating
FFT
Deliberate exclusion of memory management unit
they do not support virtual memory




Data opertions
Saturation arithmetic operations that
produce overflows will accumulate the
maximum number as the result
Fixed point arithmetic
Single cycle operations

History of DSP
Originally Bit slice processors are used for
implementing DSPs
Example AMD2901 4 bit processors
By connecting several AMD20
Multiplier and Multiplier
Accumulator[MAC]
Array multiplication is one of the common operations
required in DSP
Example operations that require array multiplication are
Convolution
Correlation
One of the important requirements of array multiplier is
that we need to process the signals in real time
Operations related to one sample need to be completed
before next sample arrives
If sampling frequency is 100 Hz, the operations needed by
the present sample need to be completed before 0.01s
Higher the sampling frequency, lesser the time available for
computation based on present sample
How to construct a real time array
multiplier
Two approaches
A dedicated MAC unit may be implemented in hardware,
which integrates multiplier and accumulator in a single
unit Ex. Motorola DSP5600X
Have separate multiplier and accumulator. Example for
this approach TIDSP320C5X - Here output of the
accumulator is stored into product register and content of
product register is added to accumulator register in the
central ALU
In both the approaches MAC operation can be completed
in one cycle
Thus the presence of H/W multipliers and multiplier
accumulator is one of the mandatory requirements of P-
DSP[Programmable DSP]
How the array multiplier operates
Let the input signal [ present sample and
previous M samples are]
Array x=x
n
x
n-1
x
n-2
x
n-M+3
x
n-M+2
x
n-M+1

and the array input corresponding to the
impulse response of the sequence is
Array h=h
0
h
1
h
2
h
M-3
h
M-2
h
M-1
The output at the nth sampling instant y
n
is
obtained by multiplying x
n
with the array h



x
n
x
n-1
x
n-2
x
n-M+3
x
n-M+2
x
n-M+1

Array multiplier operation - II

x n+1 is
obtained by shifting x
n
so that the
[n+1] th sample becomes first element and all
the elements of the x array are shifted right
such that ith element of x
n
becomes [i+1] the
element of x
n+1

The
content of the product register is added to
accumulator before new product is stored

Further
the content of dma is copied to next
location whose address is dma+1.


Harvard architecture
Harvard architecture explained
This employed entirely separate memory systems to store instructions and
data
CPU fetched the next instruction
It also fetched data simultaneously
Its unique feature is instruction address space and data address space are
separate
Each address space can have the same address
So An address does NOT uniquely specify a memory location
You also need to store which address space you are referring to.
This will use two buses one for accessing instructions and one for
accessing data
Von Neumann architecture
Von Neumann architecture explained
It employs one address space
Instructions and data are stored in the same address
space
The PC refers to the next instruction
It takes the instruction, examines it and the instruction
would be having pointers to operands
If the pointer gets corrupted, there is a possibility of
program abending
As it fetches instruction and then data, this
architecture is slow
So P-DSPs rarely use this architecture
Modified Harvard architecture
In a Pure Harvard architecture, mechanisms need to be
provided to load programs into program memory and
initial data into data memory
Modern machines use Multiple buses
One will access both program memory and data memory
One will access only data memory
Data can also be transferred from one memory to another
memory
This feature is used in modern day P-DSPs
This is helpful at start time too as constant data can be
transferred from program memory to data memory
Advantage of having multiple busses
Number of accesses/memory cycle can be
increased
Motorola DSP5600X, DSP96002 have three
memory buses and three memory
accesses/cycle
TMS320C54X has four memory buses and four
memory accesses/cycle
Multiple access memory
Memory that permits more than one memory
access per cycle is called Multiple access
memory
Dual access RAM technology permits two
memory accesses per clock cycle
Four memory accesses are also possible if
Dual access RAM memory is connected to P-
DSP with two independent address and data
buses
Multiported memory
No of accesses can be increased using multiport
memory
Typical 2 port memories will have two memory
address buses and two data buses
Thus two different chips need not be used in
Harvard architecture
Disadvantage
Increased complexity
More number of pins, more area and increased cost
VLIW architecture
VLIW Very long instruction word
Transmeta crusoe is a chip that uses this technique
TMS320C6X also uses similar technique
This reads relatively large group of instructions
They execute them at the same time
For this purpose they have
Many ALUs
Many Multipliers
Many shifters etc.,
VLIW is accessed from memory and it specifies the operations and
operands for performing on different data paths
It simply increases the number of instructions executed per cycle
Performance gain with VLIW depends on parallelism achievable
with the algorithm

Instruction pipelining
An instruction may have many phases
Fetch
Decode
Execute
Write
Throughput will be low if all these are executed
serially as when one stage is busy others are idle
All these stages could be operated parallely in
pipelining technique which will improve
throughput


Pipelining diagram
Special addressing modes in P-DSPs
Short immediate addressing
Short direct addressing
Memory mapped addressing
Indirect addressing
Bit reversed addressing
Circular addressing
Special addressing modes explained - 1
Short immediate addressing
Operand is specified as a short constant
This forms part of the instruction
Length depends on P-DSP
Example TMS320C5X an 8 bit constant could be used
Short direct addressing
The lower order address of operand is specified as part of the
instruction
Higher order bits could be stored elsewhere like a page
pointer
Example
TITMS320 DSP lower 7 bits are specified in instruction
Motorola DSP5600X lower 6 bits are specified in instruction

Special addressing modes explained - 2
Memory mapped addressing
CPU registers and I/O registers are accessed as memory locations
This is done by storing them in the initial or final page
Example
TMS320C5x page 0 corresponds to CPU registers and I/O registers
Motorola DSP5600X last page is used
Indirect addressing
Address of operands can be stored in one of the registers called
indirect access registers
When operands are fetched from addresses specified in registers, the
registers are updated
This is by done having another special CPU or ALU for updating these
addresses
Increment can be 1 or an offset in some special registr
Special addressing modes explained - 3
Bit reversed addressing
Binary pattern corresponding to a particular decimal number is
obtained by writing the natural binary equivalent in reverse
order
Therefore LSB becomes MSB and MSB becomes LSB
Address is incremented or decremented in bit reversed form
Circular addressing mode
In real time data will be continuously coming
If they are stored in linear buffers, buffer would be exhausted
If they are stored in circular buffer, new data would overwrite
older data
No need to check whether we have reached the end of buffer
Use of linear buffer
Use of circular buffer
Example of circular addressing
Limitations of circular buffering
Methodology for a circular buffer
On Chip peripherals
On chip timer
They generate periodic interrupts to the DSP
They also generate sampling clocks for A/D
converters
Serial port
They enable data communication between P-DSP and peripherals such
as ADC,DAC or a RS-232C device.
These ports have buffers such that the DSP sends data and reads data
to these ports in parallel form but the data is sent out through these
ports in serial form and data is read from these ports in serial form
On Chip peripherals contd..
TDM serial port
a special serial port which permits P-DSP to
communicate with other devices or other P-DSPs
using Time division multiplexing format
Parallel ports
They are faster than serial port
Bit I-O port
These are only single bit wide
They can be individually set, reset or read
These bits are used for control purposes or for data transfer also

On Chip peripherals contd..
Host port
A special type of parallel port the P-DSPs have
This enables the P-DSPs to communicate with a processor or a PC
which is called host
They can communicate data through this
They can generate interrupts
They also help P-DSP to load a program from ROM to RAM
Common ports
They are used for communication between many P-DSPs in a
multiprogrammed system
On Chip ADCs and DACs
They are used to enable P-DSP communicate with analog world
They are used in cellular phones and tapeless answering machines
TMS320C50
Complex DSP operations
Sum of products is the most key element in
most DSP algorithms




Algorithm Equation
Finite Impulse Response Filter

=
=
M
k
k
k n x a n y
0
) ( ) (
Infinite Impulse Response Filter

= =
+ =
N
k
k
M
k
k
k n y b k n x a n y
1 0
) ( ) ( ) (
Convolution

=
=
N
k
k n h k x n y
0
) ( ) ( ) (
Discrete Fourier Transform

=
=
1
0
] ) / 2 ( exp[ ) ( ) (
N
n
nk N j n x k X t
Discrete Cosine Transform
( ) ( )

=
(

+ =
1
0
1 2
2
cos ). ( ). (
N
x
x u
N
x f u c u F
t

You might also like