Unit 3

UNIT : 3 Architectures For Programmable DSP Devices
Architectures For Programmable DSP Devices

V. R. Gupta
Assistant Professor Department of Electronics & Telecomm. Engg. Y. C. College of Engineering, Nagpur.
Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur
Outline
Digital Signal Processors (DSPs) vs. General Purpose Processors (GPPs) Basic Architectural features Multiplier and Multiplier Accumulator (MAC) Modified Bus Structure and Memory Access Schemes in P-DSPs Multiple Access Memory
Multiported Memory
VLIW architecture
2
Outline
Pipelining:
1) Pipelining and Performance,

2) Pipeline Depth, 3) Interlocking, 4) Branching effects, 5) Interrupt effects, 6) Pipeline Programming models. Special Addressing Modes in P-DSPs On-Chip Peripherals.
3
Why do we need DSP processors?

Why not use a General Purpose Processor (GPP) such as a Pentium instead of a DSP processor? What is the power consumption of a Pentium
and a DSP processor?

What is the cost of a Pentium and a DSP
processor?
Why do we need DSP processors?

Use a DSP processor when the following are required: Cost saving. Smaller size. Low power consumption. Processing of many high frequency signals in real-time. Use a GPP processor when the following are required: Large memory. Advanced operating systems.
What are the typical DSP algorithms?

The Sum of Products (SOP) is the key element in most DSP algorithms:
Hardware vs. Microcode multiplication

DSP processors are optimized to perform multiplication and addition operations. Multiplication and addition are done in hardware and in one cycle.
Example: 4-bit multiply (unsigned).

Hardware Microcode
1011 x 1110
10011010
1011 x 1110
0000 1011. 1011.. 1011... 10011010
Cycle Cycle Cycle Cycle
1 2 3 4
Cycle 5
Parameters to consider when choosing a DSP processor
Parameter
Arithmetic format Extended floating point
TMS320C6211
(@150MHz)
TMS320C6711
(@150MHz)
32-bit N/A
32-bit 64-bit
Extended Arithmetic
Performance (peak) Number of hardware multipliers Number of registers Internal L1 program memory cache Internal L1 data memory cache Internal L2 cache
40-bit
1200MIPS 2 (16 x 16-bit) with 32-bit result
40-bit
1200MFLOPS 2 (32 x 32-bit) with 32 or 64-bit result
32
32K 32K 512K
32
32K 32K 512K
C6711 Datasheet: \Links\TMS320C6711.pdf C6211 Datasheet: \Links\TMS320C6211.pdf
Parameters to consider when choosing a DSP processor Parameter

I/O bandwidth: Serial Ports (number/speed) DMA channels Multiprocessor support
TMS320C6211
(@150MHz)
TMS320C6711
(@150MHz)
2 x 75Mbps 16 Not inherent
2 x 75Mbps 16 Not inherent
Supply voltage
Power management On-chip timers (number/width) Cost
3.3V I/O, 1.8V Core

Yes 2 x 32-bit US$ 21.54
3.3V I/O, 1.8V Core

Yes 2 x 32-bit US$ 21.54
Package
External memory interface controller JTAG
256 Pin BGA

Yes Yes
256 Pin BGA

Yes Yes
Floating vs. Fixed point processors

Applications which require:
High precision. Wide dynamic range. High signal-to-noise ratio. Ease of use.
Need a floating point processor. Drawback of floating point processors:

Higher power consumption. Can be more expensive. Can be slower than fixed-point counterparts and larger in size.
Floating vs. Fixed point processors

It is the application that dictates which device and platform to use in order to achieve optimum performance at a low cost.
For educational purposes, use the floating-point

device (C6711) as it can support both fixed and
floating point operations.
General Purpose DSP vs. DSP in ASIC

Application Specific Integrated Circuits (ASICs) are semiconductors designed for dedicated functions. The advantages and disadvantages of using ASICs are listed below:
Advantages High throughput Lower silicon area Lower power consumption Improved reliability Reduction in system noise Low overall system cost Disadvantages High investment cost Less flexibility Long time from design to market
Texas Instruments TMS320 family

Different families and sub-families exist to support different markets.
C2000
Lowest Cost
Control Systems Motor Control Storage Digital Ctrl Systems
C5000
Efficiency
Best MIPS per Watt / Dollar / Size Wireless phones Internet audio players Digital still cameras Modems Telephony VoIP
C6000 Performance & Best Ease-of-Use

Multi Channel and Multi Function App's Comm Infrastructure Wireless Base-stations DSL Imaging Multi-media Servers Video
TMS320C64x: The C64x fixed-point DSPs offer the industry's highest level of performance to address the demands of the digital age. At clock rates of up to 1 GHz, C64x DSPs can process information at rates up to 8000 MIPS with costs as low as $19.95. In addition to a high clock rate, C64x DSPs can do more work each cycle with built-in extensions. These extensions include new instructions to accelerate performance in key application areas such as digital communications infrastructure and video and image processing. TMS320C62x: These first-generation fixed-point DSPs represent breakthrough technology that enables new equipments and energizes existing implementations for multi-channel, multi-function applications, such as wireless base stations, remote access servers (RAS), digital subscriber loop (xDSL) systems, personalized home security systems, advanced imaging/biometrics, industrial scanners, precision instrumentation and multichannel telephony systems.
TMS320C67x: For designers of high-precision applications, C67x floating-point DSPs offer the speed, precision, power savings and dynamic range to meet a wide variety of design needs. These dynamic DSPs are the ideal solution for demanding applications like audio, medical imaging, instrumentation and automotive.
C6000 Roadmap
Object Code Software Compatibility
Performance
Multi-core Floating Point C64x DSP 1.1 GHz
2nd Generation
C6416 C6414 C6412 C6411 C6415 DM642
1st Generation
C6203 C6202 C6713
C6201 C6701
C6211
C6204 C6205 C6711 C6712
C62x/C64x/DM642: Fixed Point C67x: Floating Point Time
DSPs Features
High speed DSP computations
Specialized instruction set
High performance repetitive numeric calculations

Fast & efficient memory accesses
Special mechanism for real-time I/O

Low power consumption Low cost in comparison with GPPs
16
DSPs General Applications

Digital cellular phones Satellite communications Seismic analysis Vehicle collision avoidance Secure communications Voice mail Digital cameras Navigation equipment Modems (ISDN, cable,...) Audio production Noise cancellation Videoconferencing Medical ultrasound Music synthesis, effects Radar
Voice over Internet

Motor control
Sonar
17
DSPs Ps Applications
Speech and audio compression Filtering Modulation and demodulation Error correction coding and decoding Audio processing (e.g., surround sound, noise
reduction, equalization, sample rate conversion, echo

cancellation) Signaling (e.g., DTMF detection) Speech recognition Signal synthesis (e.g., music, speech synthesis)
18
DSPs Characteristics
1. Data path & internal ALU architecture
2. Specialized instruction set
3. External memory architecture

4. Specialized addressing modes
5. Specialized execution control

6. Specialized peripherals for DSP
19
Data Path
DSPs GPPs
Performs all key Multiplication often arithmetic operations take >1 cycle in 1 cycle. Shifts often take >1 Hardware support for cycle managing numeric fidelity: Other operations (e.g. Shifters saturation, rounding) Guard bits typically take multiple Saturation cycles congregation
20
DSPs Data Path Example
A representative conventional fixedpoint DSP processor data path (from the Motorola DSP560xx, a 24-bit, fixed point processor family)
21
Instruction Set
DSPs
Specialized, complex instructions Multiple operations per instruction (e.g. using VLIW)
GPPs
General-purpose instructions Typically only one operation per instruction
22
VLIW
Very long instruction word (VLIW) architectures are garnering increased attention for DSP applications.
Major features:
Multiple independent operations per cycle Packed into a single large instruction or packet More regular, orthogonal, RISC-like operations Large, uniform register sets
23
Memory Architecture
DSPs
Harvard architecture 2-4 memory accesses/cycle
GPPs
Von Neumann architecture Typically 1 access/cycle
No cacheson-chip SRAM
May use caches
24
Von Neumann Architecture

The Von Neumann memory architecture, common among micro controllers. Since there is only one data bus, operands cannot be loaded while instructions are fetched, creating a bottleneck that slows the execution of DSP algorithms.
25
Harvard Architecture
A Harvard architecture, common to many DSP processors. The processor can simultaneously access the two memory banks
using
two
independent
sets of buses, allowing
operands to be loaded
while fetching instructions.
26
Addressing Modes
DSPs Dedicated address generation units Specialized addressing modes; e.g.: Auto-increment Modulo (circular) Bit-reversed (for FFT) Good immediate data support GPPs Often, no separate address generation unit
General-purpose addressing modes
27
Execution Control
Hardware support for fast looping
Fast interrupts for I/O handling
Real-time debugging support
28
DSPs classifications (1)

By arithmetic format Fixed-point Floating-point
By data width Typical fixed-point DSPs: 16-bit Typical floating-point DSPs: 32-bit By memory organization By multiprocessor support
29
DSPs classifications (2)

By speed Million of instruction per second (MIPS) A basic operation (e.g. MAC) A basic algorithm (e.g. FFT, FIR or IIR filter)
By power consumption Operating voltage Sleep or idle mode Programmable clock dividers Peripheral control
30
DSPs Evolution
First generation (TI TMS32010)
Second generation (Motorola DSP56001, AT&T DSP16A, Analog Dev. ADSP-2100, TI TMS320C50) Third generation (Motorola DSP56301, TI TMS320C541, TI TMS320C80, Motorola MC68356) Fourth generation (TI TMS320C6201, Intel Pentium MMX)
31
First Generation (1982)

16-bit fixed-point
Harvard architecture
Accumulator
Specialized instruction set
390 ns MAC time (228 ns

today)
32
Second Generation (1987)

24-bit data, instructions 3 memory spaces (X, Y, P) Parallel moves Single- and multi instruction hardware loops Modulo addressing 75 ns MAC (21 ns today)
33
Third Generation (1995)

Enhanced conventional DSP architectures 3.0 or 3.3 volts More on-chip memory Application-specific function units in data path or as coprocessors More sophisticated debugging and application development tools DSP cores (Pine & Oak from DSP G., cDSP from TI) 20 ns MAC (10 ns today)
34
Fourth Generation (1998)

Blazing clock speeds and super scalar architectures VLIW-like architectures, achieve top performance via high parallelism and increased clock speeds 3 ns MAC throughput Expensive, power-hungry
35
DSPs Evolution Chart
36
Multiplier and Multiplier Accumulator (MAC)

Most common operation required in DSP applications. i.e. Array multiplication E.g. Convolution and Correlation
Important requirement of array multipliers: process the signal in

real time It requires multiplication as well as accumulation to be carried out using hardware elements. Two approaches to solve this problem are:
1) Implement a Dedicated MAC unit in H/W (M DSP5600X)

2) Have Multiplier and Accumulator separate
(TI 320C5X)
37

In both the approaches the MAC operation can completed in one clock cycle. Presence of H/W multiplier and/or MAC is one of the mandatory requirement of P-DSPs. be
x x
n
n 1
n2
n M 3
nM 2
n M 1
+
Register
h h
1
M 3
M 2
M 1
Figure: Implementation of convolver with single multiplier/ adder

38

The output yn at nth sampling interval, is obtained by multiplying the array the array
[ xn , xn 1 , xn 2 ,......., xn M 3 , xn M 2 , xn M 1 ] , hM 2 , hM 1 ]
corresponding to the present and past M-1 samples of the input with
h [h , h , h , h ,......., h
0 1 2 3
M 3
corresponding to the impulse response sequence.

To obtain
n 1
, the input signal array xn 1 is multiplied
After obtaing the product xn M 1h M 1 the element xn M may Similarly, after obtaing the product xn M 2 h M 2 the element xn M 1may be made equal to xn M 2and so on.
39
with the array h.
be made equal to xn M 1.

In P-DSPs this can be achieved b using a special instruction MACAD : Multiply accumulate For example, in TMS320C5X MCAD pma, dma This instruction multiplies the content of the program memory pma with the content of the data memory with
address dma and stores the result in the product register.

The content of the product register is added to the accumulator before the new product is stored. The content of the dma is copied to the next location whose address is dma+1.
40
Modified Bus Structure in P-DSPs

DSP processors use special memory architectures, namely, Harvard architecture or modified Von Neumann
architecture, which allow fetching multiple data and/or
instructions at the same time.

GPPs have used Von Neumann architecture, in which there
is one memory space connected to the processor core by

one bus set consisting of address bus and a data bus. Von Neumann architecture is not good for DSP applications, as some DSP algorithms require more memory bandwidth.
8 July 2013 41

Note that the MAC operation with data move (i.e. MACD instruction) requires 4 memory access per instruction cycle. The 4 memory accesses/ clock period required for the MACD instructions are as follows: 1) Fetch the MACD instruction from the program memory
2) Fetch one of the operand from the program memory.

3) Fetch the second operand from the data memory.
4) Write the content of the data memory with address

dma into the location with address dma+1 .
8 July 2013 42

Results
Processing Unit
Operands Status Bus
Data Bus
Opcode
Instructions
Data/Instructions
Control Unit
Address
Data Memory and Program Memory
Figure: Von Neumann Architecture

8 July 2013 43

Results/ Operands
Processing Unit
Address
Data Memory
Status Bus
Opcode
Control Unit
Instructions Address
Program Memory
Figure: Harvard Architecture

8 July 2013 44

Results/ Operands
Processing Unit
Address
Status Bus Opcode
Data Memory
Control Unit
Instructions
Program/ Data Memory
Address
Figure: Modified Harvard Architecture

8 July 2013 45
Memory Access Schemes in P-DSPs

The number of memory accesses/ clock period can also be
increased by using a high speed memory that permits more

than one memory access/ clock period. For example: DARAM, dual access RAM permits two memory access/ cock period. Multiple access RAM may be connected to the processing
unit of the P-DSPs by using the Harvard architecture.

Example: DARAM connected to a P-DSP with two independent data and address buses can be used to achieve four memory accesses/ cock period.
8 July 2013 46
Memory Access Schemes in P-DSPs

Multiported Memory: Adopted for increasing the number of accesses/ clock period.
Major limitation o f the dual ported memory is the increase in the cost compared to two single port memory of the same capacity. Since the number of pins and the chip area is increased. Motorola DSP 561XX have a single ported program memory and a dual ported data memory
Address bus 1 Data bus 1
Address bus 2
Dual Port Memory
Data bus 2
Figure: Block diagram of a Dual ported Memory

8 July 2013 47
VLIW Architecture
8 July 2013
48
Pipelining
It is a technique for increasing the performance of a processor by breaking a sequence of operations into smaller pieces and executing theses pieces in parallel when possible, thereby decreasing the overall time required to complete the set of operation. It represents the trade-off between efficiency and ease of use. Lets see a real life example of pipelining.
8 July 2013
49
Traditional Pipeline Concept

Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes Folder takes 20 minutes

6 PM
7 8
Time
10
11
Midnight
30
A
40
20
30
40
20
30
40
20
30
40
20
Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take?

6 PM T a s k O r d e r 7 8 9 Time 30 A B C D 40 40 40 40 20 10 11 Midnight
Pipelined laundry takes 3.5 hours for 4 loads

6 PM 7 8
Time T a s k
30
A
40
40
40
40
20
O r d e r
Pipelining doesnt help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to fill pipeline and time to drain it reduces speedup Stall for Dependences
Pipelining
An Instruction cycle can be divided into a number of microinstructions. Execution of each microinstruction is referred as one Phase of an instruction. For e.g. an instruction cycle requiring four microinstructions can be said to be in four phases as follows: 1) Fetch Phase : The instruction is fetched from the program memory. 2) Decode Phase : The instruction is decoded. 3) Memory Read Phase : The operand required for the execution of the instruction may be read from the data memory. 4) Execution Phase : Execution as well as the storage of the results in either one of the register or memory is carried out.
8 July 2013 54
Instruction cycles of processor with no Pipelining

Value of T Fetch I -1 I -1 I -1 Decode Read Execute
8 July 2013
1 2 3 4 5 6 7 8 9 10 11 12
I -1
I -2 I -2
I -2
I -2 I -3
I -3
I -3 I -3
55
Instruction cycles of processor with no Pipelining

Value of T Fetch I -1 I -1 I -1 Decode Read Execute
8 July 2013
1 2 3 4 5 6 7 8 9 10 11 12
I -1
I -2 I -2
I -2
I -2 I -3
I -3
I -3 I -3
56
Instruction cycles of processor with Pipelining

Value of T Fetch I -1 I -2 I -3 Decode I -1 I -2 Read Execute
8 July 2013
1 2 3 4 5 6 7 8 9 10 11 12
I -1
I -4 I -5 I -6
I -3 I -4 I -5
I -2 I -3 I -4
I -1 I -2 I -3
57
Instruction cycles of processor with Pipelining

Value of T Fetch I -1 I -2 I -3 Decode I -1 I -2 Read Execute
8 July 2013
1 2 3 4 5 6 7 8 9 10 11 12
I -1
I -4 I -5 I -6 I -7 I -8 I -9
I -3 I -4 I -5 I -6 I -7 I -7 I -9
I -2 I -3 I -4 I -5 I -6 I -7 I -7 I -9
I -1 I -2 I -3 I -4 I -5 I -6 I -7 I -7 I -9
58
Pipeline Performance
Let T denote the time required for each phase of the instruction. One clock cycle of the processor corresponds to T. In a period of 12T only three instructions can be executed in a
machine with no pipeline.

In the same period nine instructions can be executed in a machine with pipeline.
Hence the throughput is increased by a factor of 3 in this case.

Also note that the initial latency of a machine with 4 phases is 4T. Hence for executing a program with N instructions (N+4)T. With a non-pipelined machine, the time required for executing N instruction is 4NT
8 July 2013 59
Branching Effect on Pipelining

Branching Effects: To overcome this problem some of the PDSPs have special Branch/ Call and return instructions called as delayed branch/call/return instructions. The throughput efficiency may also be reduced because of conflicts between the instructions in the instruction pipeline in different phases. This happens if the same memory is used to store the data and program and there is only a single address bus for addressing both the program and data memory. For e.g. an instruction in the fetch phase may try to fetch the instruction code from a memory chip that is also accessed by another instruction that is in the operand read phase. To avoid the conflict, the operand read phase will be done first and the opcode fetch is repeated till there is no conflict again.
8 July 2013 60
Pipeline Hazards
Data hazards an instruction uses the result of a previous instruction (RAW) ADD R1, R2, R3 ADD R4, R1, R5 Control hazards the location of an instruction depends on a previous instruction JMP LOOP LOOP: ADD R1, R2, R3 Structural hazards two instructions need access to the same resource e.g., single memory shared for instruction fetch and load/store collision in reservation table
61
Data Hazards (RAW)
Cycle
F R X M W
Write Data to R1 Here
Instruction
Read from R1 Here
ADD R1, R2, R3 ADD R4, R1, R5
62
Pipeline Depth
The number of instructions that are processed simultaneously in the CPU, is referred as Depth of the instruction pipeline, differs in different families of PDSPs. The pipeline depths of some of the P-DSPs are as given below:
P-DSP Name/Family Analog Devices Motorola DSP5600X TI TMS320C5X TI TMS320c54X
8 July 2013
Pipeline Depth 2 3 4 6
63
Special addressing Modes in P-DSPs

The P-DSPs have special addressing modes that permits single word/instruction format thereby speed up the execution by making effective use of pipelining.
1) 2) 3) 4) 5) 6)
Short Immediate Addressing Short Direct Addressing Memory Mapped Addressing Indirect Addressing Bit Reversed Addressing Circular Addressing
8 July 2013
64
Short Immediate Addressing

Permits the operand to be specified using a short constant that forms part of a single word instruction. The length of the short constant depends on the instruction type and the P-DSP. For e.g. In case of TMS320C5X, an 8-bit constant can be specified as one of the operands in the single word instructions for addition, subtraction, AND, OR, XOR, etc.
8 July 2013
65
Short Direct Addressing

Permits the lower order address of the operand of an instruction to be specified in the single word instruction. For e.g. In the TMS320 DSPs, the higher order 9 bits of the memory are stored in the data page pointer and only the lower 7 bits are specified as a part of the instruction.
Each contiguous block of 128 words is referred to as one page in

the TI DSP. The argument in the instruction specifies only the location within the current page.
8 July 2013 66
Memory Mapped Addressing

The CPU registers and the I/O registers of the P-DSPs are also
accessible as memory location. This is achieved by storing them in either the starting page or the final page of the memory space. For e.g. in TMS320C5X, page 0 corresponds to the CPU registers
and I/O registers.

In the case of Motorola DSP5600X, the last page of the memory space containing 64 locations is used as the memory map for the CPU and I/O registers
8 July 2013 67
Indirect Addressing
Permits an array of data to be processed in P-DSP to be efficiently
fetched and stored. The address of the operand can be stored in one of the registers called indirect address registers. In the case of TI processors, the indirect address registers are
called auxiliary registers ARs.

The content of ARs may be incremented or decremented either in steps of 1 or in steps specified by the content of the offset register (TI processor: INDX register). Additional ALU in the CPU core for indirect address registers ARs.
8 July 2013 68
Indirect Addressing
In the P-DSP from analog devices it is called the modifier register.
The content of the indirect address registers may also be updated by a constant using Bit reversed addressing mode. In the TI 5X Processors the new address computed by the auxiliary ALU is not used for fetching the operand for the current
instruction that is being decoded and is executed.

It is used for fetching the operand that uses the indirect addressing mode next with this particular AR. E.g. A0= A0+ *R5++ or A0= A0+ *R5-- or A0 = A0+ *R5++ R17 Indirect addressing mode with post-increment and decrement.
69
Bit Reversed Addressing Mode

The most unusual of addressing modes, Bit Reversed addressing is used only in very specialized circumstances. For the computation of the FFT, the data is to be arranged in the
bit reversed order and 2-point DFT of the resulting sequence is to

be computed first. In the bit reversed addressing mode , when a 8-point FFT is to be computed, 2-point DFT of X(0) and X(4) is to be found. Similarly 2-point DFT of X(2) and X(6)and so on. Note that the values 0,4,2,6,1,5,3,7 corresponds to the consecutive numbers in the bit reversed number representation.
70
Bit Reversed Addressing Mode

In the bit reversed addressing mode, the address is incremented/ decremented by the number represented in the bit reversed form.
Decimal No.
0 1
Binary Representation
000 001
Reversed Binary Representation

000 100
Bit reversed addresses

0 4
2
3 4 5 6 7
010
011 100 101 110 111
010
110 001 101 011 111
2
6 1 5 3 7
71
Circular Addressing Mode

Let x(n)=[1 2 3 4 5 6 7 8 9 9 8 7 6 5 4 3 2 1 ] Y(n)=[9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9] 18 9
27 3 6 4 5 4 5 6 3 7 2 8 1 9 9
8 7 6 5 4 3 2 1 1
LCD Display
72
On-Chip Peripherals
The P-DSPs have a number of on-chip peripherals that relieve
the CPU from routine functions. They also help to reduce the chip count n the DSP system based around P-DSP. Some of the on-chip peripherals in the PDSPs are as follows: On-chip Timer Serial Port TDM serial port Host Port Comm ports On-chip A/D, D/A converters P-DSPs with RISC and CISC
Parallel Port
Bit I/O Ports
73
On chip Timer
Two common applications of on-chip Timers are
1) 2) Generation of periodic interrupts to the P-DSPs Generation of sampling clocks for the A/D converters.
The Timer mode can be programmed by the P-DSPs. The timers can generate a single pulse or periodic train of pulses.
They can also generate a single square wave or a periodic square

wave. The period of the timer is also made programmable.
8 July 2013
74
Serial Port
This enables the data communication between the P-DSP & an external peripheral such as A/D converter, D/A converter, RS232 C. These ports have input & output buffers. So that the P-DSP writes
or reads from serial port in parallel form and the serial port sends
and receive the data to the peripherals in serial form. These devices have parallel to serial and serial to parallel converter inbuilt into them. The shift clock can be fed from P-DSP or external clock generator. Can operate in synchronous mode or asynchronous mode.
75
TDM Serial Port

The P-DSPs have a special serial port called TDM serial port. Permits a P-DSP to communicate with other devices or P-DSPs by using Time Division Multiplexing (TDM).
One of the devices can generate the frame sync
pulse that
indicates the beginning of a TDM frame and bit clock, the duration for which a bit is to be transmitted. The TDM frame is split into a number of equal slots and each slot can be allotted for one of the devices.
Ch 1
Ch 2
Ch 3
Ch 4
Ch 5
Ch 6
Ch 7
Ch 8
One TDM frame

76
TDM Serial Port

There are 8 slots/frame and is referred to as a TDM with eight channels. In each of the slots, a number of bits may be transmitted by a channel. The TDM serial port normally uses four lines for the propose of serial communication. They are TFRM: TClock: TADD: TDAT:
Ch 1
The frame sync signal The bit clock The address of the serial device The data Tx into the TDM channel by the authorized device
Ch 2 Ch 3 Ch 4 Ch 5 Ch 6 Ch 7 Ch 8
One TDM frame
77
TDM Serial Port
8 July 2013
78
Parallel Port
8 July 2013
79
Bit I/O Port
8 July 2013
80
Host Port
8 July 2013
81
Comm Port
8 July 2013
82
On chip A/D & D/A Converters
8 July 2013
83
P-DSPs with RISC and CISC
8 July 2013
84
Comparison: CISC, RISC, VLIW
8 July 2013
85
Web Links & Information

http://www.bdti.com http://www.eg3.com/dsp Buyers Guide to DSP Processors, Berkeley, California: Berkeley Design Technology, Inc., 1994, 1995, 1997, 1999. Phil Lapsley, Jeff Bier, Amit Shoham, and Edward A. Lee, DSP Processor Fundamentals: Architectures and Features, Berkeley, California: Berkeley Design Technology, Inc., 1996. An Introduction To Very-Long Instruction Word (VLIW) Computer Architecture, Philips Semiconductors, http://www.semiconductors.philips.com/acrobat_download/other/vliw-wp.pdf
86
THANK YOU !!
HAVE A NICE DAY!!!

8 July 2013 87

Unit 3

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 3

Uploaded by

Copyright:

Available Formats

UNIT : 3 Architectures For Programmable DSP Devices

Architectures For Programmable DSP Devices

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

1) Pipelining and Performance,

Why do we need DSP processors?

and a DSP processor?

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

Why do we need DSP processors?

What are the typical DSP algorithms?

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

Hardware vs. Microcode multiplication

Example: 4-bit multiply (unsigned).

Cycle Cycle Cycle Cycle

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

Parameters to consider when choosing a DSP processor

C6711 Datasheet: \Links\TMS320C6711.pdf C6211 Datasheet: \Links\TMS320C6211.pdf

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

Parameters to consider when choosing a DSP processor Parameter

2 x 75Mbps 16 Not inherent

2 x 75Mbps 16 Not inherent

3.3V I/O, 1.8V Core

3.3V I/O, 1.8V Core

256 Pin BGA

256 Pin BGA

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

Floating vs. Fixed point processors

Need a floating point processor. Drawback of floating point processors:

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

Floating vs. Fixed point processors

For educational purposes, use the floating-point

floating point operations.

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

General Purpose DSP vs. DSP in ASIC

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

Texas Instruments TMS320 family

C6000 Performance & Best Ease-of-Use

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

C6204 C6205 C6711 C6712

C62x/C64x/DM642: Fixed Point C67x: Floating Point Time

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

High performance repetitive numeric calculations

Special mechanism for real-time I/O

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

DSPs General Applications

Voice over Internet

reduction, equalization, sample rate conversion, echo

3. External memory architecture

5. Specialized execution control

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

DSPs Data Path Example

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

May use caches

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

Von Neumann Architecture

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

sets of buses, allowing

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur

Real-time debugging support

Mr. Vikas R. Gupta, Assistant Professor, ET, YCCE, Nagpur