You are on page 1of 25

Cliff Notes for Digital Signal Processor

C54x
1. With a neat diagram explain the important features of TMS320C54x Processors?

Overview The C54x DSP has a high degree of operational flexibility and speed. It combines an advanced modified Harvard architecture (with one program memory bus, three data memory buses, and four address buses), a CPU with application-specific hardware logic, on-chip memory, on-chip peripherals, and a highly specialized instruction set The C54x devices offer these advantages: Enhanced Harvard architecture built around one program bus, three data buses, and four address buses for increased performance and versatility Advanced CPU design with a high degree of parallelism and application specific hardware logic for increased performance A highly specialized instruction set for faster algorithms and for optimized high-level language operation Modular architecture design for fast development of spinoff devices Advanced IC processing technology for increased performance and low power consumption Low power consumption and increased radiation hardness because of new static design techniques

Key Features This section lists the key features of the C54x DSPs. Key Features - CPU Advanced multibus architecture with one program bus, three data buses, and four address buses 40-bit arithmetic logic unit (ALU), including a 40-bit barrel shifter and two independent 40-bit accumulators 17-bit 17-bit parallel multiplier coupled to a 40-bit dedicated adder for nonpipelined singlecycle multiply/accumulate (MAC) operation Compare, select, store unit (CSSU) for the add/compare selection of the Viterbi operator Exponent encoder to compute the exponent of a 40-bit accumulator value in a single cycle Two address generators, including eight auxiliary registers and two auxiliary register arithmetic units

Multiple-CPU/core architecture on some devices Key Features - Memory 192K words 16-bit addressable memory space (64K-words program, 64K-words data, and 64Kwords I/O), with extended program memory in the C548, C549, C5402, C5410, and C5420

Key Features - Instruction set Single-instruction repeat and block repeat operations Block memory move instructions for better program and data management Instructions with a 32-bit long operand Instructions with 2- or 3-operand simultaneous reads Arithmetic instructions with parallel store and parallel load Conditional-store instructions Fast return from interrupt

Key Features - On-chip peripherals Software-programmable wait-state generator Programmable bank-switching logic On-chip phase-locked loop (PLL) clock generator with internal oscillator or external clock source. External bus-off control to disable the external data bus, address bus, and control signals Data bus with a bus holder feature Programmable timer Available Ports: HPI (Host Port Interface), Synchronous Serial Ports, Buffered Serial Port, Multichannel Buffered Serial Port, TDM (Time Division Multiplexed Serial Port) Speed Supported: 25/20/15/12.5/10ns execution time for a single cycle fixed point instruction (40 MIPS/50 MIPS/66 MIPS/80 MIPS/100 MIPS)

Key Features Power Power consumption control with IDLE 1, IDLE 2, and IDLE 3 instructions for power-down modes Control to disable the CLKOUT signal

Key Features - Emulation IEEE Standard 1149.1 boundary scan logic interfaced to on-chip scan-based emulation logic

References: TMS320C54x DSP Reference Set Volume 1: CPU and Peripherals, Chapter 1

2. With a neat diagram explain the bus-architecture of TMS320C54x processors?

The C54xE DSPs use an advanced modified Harvard architecture that maximizes processing power with eight buses. Separate program and data spaces allow simultaneous access to program instructions and data, providing a high degree of parallelism. For example, three reads and one write can be performed in a single cycle. Instructions with parallel store and application-specific instructions fully utilize this architecture. In addition, data can be transferred between data and program spaces.

The C54xE DSP architecture is built around eight major 16-bit buses (four program/data buses and four address buses): The program bus (PB) carries the instruction code and immediate operands from program memory. Three data buses (CB, DB, and EB) interconnect to various elements, such as the CPU, data address generation logic, program address generation logic, on-chip peripherals, and data memory. o o The CB and DB carry the operands that are read from data memory. The EB carries the data to be written to memory.

Four address buses (PAB, CAB, DAB, and EAB) carry the addresses needed for instruction execution. The C54x DSP can generate up to two data-memory addresses per cycle using the two auxiliary register arithmetic units (ARAU0 and ARAU1). The PB can carry data operands stored in program space (for instance, a coefficient table) to the multiplier and adder for multiply/accumulate operations or to a destination in data space for data move instructions (MVPD and READA). This capability, in conjunction with the feature of dual-operand read, supports the execution of single-cycle, 3-operand instructions such as the FIRS instruction. The C54x DSP also has an on-chip bidirectional bus for accessing on-chip peripherals. This bus is connected to DB and EB through the bus exchanger in the CPU interface. Accesses that use this bus can require two or more cycles for reads and writes, depending on the peripherals structure.

Reference: TMS320C54x DSP Reference Set Volume 1: CPU and Peripherals, Chapter 1

3. With a neat diagram explain the architecture of TMS320C54x processors?

The C54x DSPs use an advanced modified Harvard architecture that maximizes processing power with eight buses. Separate program and data spaces allow simultaneous access to program instructions and data, providing a high degree of parallelism. For example, three reads and one write can be performed in a single cycle. Instructions with parallel store and application-specific instructions fully utilize this architecture. In addition, data can be transferred between data and program spaces. Such parallelism supports a powerful set of arithmetic, logic, and bit-manipulation operations that can all be performed

in a single machine cycle. Also, the C54x DSP includes the control mechanisms to manage interrupts, repeated operations, and function calling. Bus Structure: The C54x DSP architecture is built around eight major 16-bit buses (four program/data buses and four address buses): The program bus (PB) carries the instruction code and immediate operands from program memory. Three data buses (CB, DB, and EB) interconnect to various elements, such as the CPU, data address generation logic, program address generation logic, on-chip peripherals, and data memory. The CB and DB carry the operands that are read from data memory. The EB carries the data to be written to memory. Four address buses (PAB, CAB, DAB, and EAB) carry the addresses needed for instruction execution.

Internal Memory Organization: The C54x DSP memory is organized into three individually selectable spaces: program, data, and I/O space. The C54x devices can contain random access memory (RAM) and read-only memory (ROM). Among the devices, the following types of RAM are represented: dual-access RAM (DARAM), singleaccess RAM (SARAM), and two-way shared RAM. The DARAM or SARAM can be shared within subsystems of a multiple-CPU core device. You can configure the DARAM and SARAM as data memory or program/data memory. The C54x DSP also has 26 CPU registers plus peripheral registers that are mapped in data-memory space. Central Processing Unit: The CPU is common to all C54x devices. The C54x CPU contains: 40-bit arithmetic logic unit (ALU) Two 40-bit accumulators Barrel shifter 17 17-bit multiplier 40-bit adder Compare, select, and store unit (CSSU) Data address generation unit Program address generation unit

Data Addressing: The C54x DSP offers seven basic data addressing modes: Immediate addressing uses the instruction to encode a fixed value. Absolute addressing uses the instruction to encode a fixed address. Accumulator addressing uses accumulator A to access a location in program memory as data. Direct addressing uses seven bits of the instruction to encode the lower seven bits of an address. The seven bits are used with the data page pointer (DP) or the stack pointer (SP) to determine the actual memory address. Indirect addressing uses the auxiliary registers to access memory. Memory-mapped register addressing uses the memory-mapped registers without modifying either the current DP value or the current SP value. Stack addressing manages adding and removing items from the system stack. During the execution of instructions using direct, indirect, or memory-mapped register addressing, the data-address generation logic (DAGEN) computes the addresses of data-memory operands

Pipeline Operation An instruction pipeline consists of a sequence of operations that occur during the execution of an instruction. The C54x DSP pipeline has six levels: prefetch, fetch, decode, access, read, and execute. At each of the levels, an independent operation occurs. Because these operations are independent, from one to six instructions can be active in any given cycle, each instruction at a different stage of completion. Typically, the pipeline is full with a sequential set of instructions, each at one of the six stages. When a PC discontinuity occurs, such as during a branch, call, or return, one or more stages of the pipeline may be temporarily unused Onchip Peripherals All the C54x devices have a common CPU, but different on-chip peripherals are connected to their CPUs. The C54x devices may have these, or other, on-chip peripheral options: General-purpose I/O pins Software-programmable wait-state generator Programmable bank-switching logic Clock generator Timer Direct memory access (DMA) controller Standard serial port

Time-division multiplexed (TDM) serial port Buffered serial port (BSP) Multichannel buffered serial port (McBSP) Host-port interface o 8-bit standard (HPI) o 8-bit enhanced (HPI8) o 16-bit enhanced (HPI16)

External Bus Interface The interfaces external ready input signal and software-generated wait states allow the processor to interface with memory and I/O devices of many different speeds. The interfaces hold modes allow an external device to take control of the C54x DSP buses; in this way, an external device can access the resources in the program, data, and I/O spaces. IEEE Standard 1149.1 Scanning Logic The IEEE Standard 1149.1 scanning-logic circuitry is used for emulation and testing purposes only. This logic provides the boundary scan to and from the interfacing devices. Also, it can be used to test pin-topin continuity as well as to perform operational tests on devices peripheral to the C54x DSP. The IEEE Standard 1149.1 scanning logic is interfaced to internal scanning-logic circuitry that has access to all of the on-chip resources. Thus, the C54x DSP can perform on-board emulation using the IEEE Standard 1149.1 serial scan pins and the emulation-dedicated pins.

4. Explain the pipeline stages and phases of any of the DSP? What is meant by pipelining? Describe briefly the pipeline operation of TMS320C54x processors? Processors with pipelining are organized inside into stages which can semi-independently work on separate jobs. Each stage is organized and linked into a 'chain' so each stage's output is fed to another stage until the job is done. This organization of the processor allows overall processing time to be significantly reduced. A deeper pipeline means that there are more stages in the pipeline, and therefore, fewer logic gates in each stage. This generally means that the processor's frequency can be increased as the cycle time is lowered. This happens because there are fewer components in each stage of the pipeline, so the propagation delay is decreased for the overall stage Pipelining does not help in all cases. There are several possible disadvantages. An instruction pipeline is said to be fully pipelined if it can accept a new instruction every clock cycle. A pipeline that is not fully pipelined has wait cycles that delay the progress of the pipeline. Advantages of Pipelining The cycle time of the processor is reduced, thus increasing instruction issue-rate in most cases. Some combinational circuits such as adders or multipliers can be made faster by adding more circuitry. If pipelining is used instead, it can save circuitry vs. a more complex combinational circuit.

Disadvantages of Pipelining A non-pipelined processor executes only a single instruction at a time. This prevents branch delays (in effect, every branch is delayed) and problems with serial instructions being executed concurrently. Consequently the design is simpler and cheaper to manufacture. The instruction latency in a non-pipelined processor is slightly lower than in a pipelined equivalent. This is because extra flip flops must be added to the data path of a pipelined processor. A non-pipelined processor will have a stable instruction bandwidth. The performance of a pipelined processor is much harder to predict and may vary more widely between different programs.

C54x Pipeline The C54x DSP has a six-level deep instruction pipeline. The six stages of the pipeline are independent of each other, which allows overlapping execution of instructions. During any given cycle, from one to six different instructions can be active, each at a different stage of completion.

The six levels and functions of the pipeline structure are: Program prefetch: Program address bus (PAB) is loaded with the address of the next instruction to be fetched. Program fetch: An instruction word is fetched from the program bus (PB) and loaded into the instruction register (IR). This completes an instruction fetch sequence that consists of this and the previous cycle. Decode: The contents of the instruction register (IR) are decoded to determine the type of memory access operation and the control sequence at the data-address generation unit (DAGEN) and the CPU. Access: DAGEN outputs the read operands address on the data address bus, DAB. If a second operand is required, the other data address bus, CAB, is also loaded with an appropriate address. Auxiliary registers in indirect addressing mode and the stack pointer (SP) are also updated. This is considered the first of the 2-stage operand read sequence. Read: The read data operand(s), if any, are read from the data buses, DB and CB. This completes the two-stage operand read sequence. At the same time, the two-stage operand write sequence begins. The data address of the write operand, if any, is loaded into the data write address bus (EAB). For memorymapped registers, the read data operand is read from memory and written into the selected memorymapped registers using the DB. Execute: The operand write sequence is completed by writing the data using the data write bus (EB). The instruction is executed in this phase.

The first two stages of the pipeline, prefetch and fetch, are the instruction fetch sequence. In one cycle, the address of a new instruction is loaded. In the following cycle, an instruction word is read. In case of multiword instructions, several such instruction fetch sequences are needed.

During the third stage of the pipeline, decode, the fetched instruction is decoded so that appropriate control sequences are activated for proper execution of the instruction. The next two pipeline stages, access and read, are an operand read sequence. If required by the instruction, the data address of one or two operands are loaded in the access phase and the operand or operands are read in the following read phase. Any write operation is spread over two stages of the pipeline, the read and execute stages. During the read phase, the data address of the write operand is loaded onto EAB. In the following cycle, the operand is written to memory using EB. Each memory access is performed in two phases by the C54x DSP pipeline. In the first phase, an address bus is loaded with the memory address. In the second phase, a corresponding data bus reads from or writes to that memory address.

5. Explain briefly all the different addressing modes of C54x Processor? Data addressing The TMS320C54x DSP offers seven basic addressing modes: Immediate addressing uses the instruction to encode a fixed value. Absolute addressing uses the instruction to encode a fixed address. Accumulator addressing uses an accumulator to access a location in program memory as data. Direct addressing uses seven bits of the instruction to encode an offset relative to DP or to SP. The offset plus DP or SP determine the actual address in data memory. Indirect addressing uses the auxiliary registers to access memory. Memory-mapped register addressing modifies the memory-mapped registers without affecting either the current DP value or the current SP value. Stack addressing manages adding and removing items from the system stack.

Data addressing Immediate Addressing In immediate addressing, the instruction syntax contains the specific value of the operand. Two types of values can be encoded in an instruction: Short immediate values can be 3, 5, 8, or 9 bits in length. Long immediate values are always 16 bits in length.

Immediate values can be encoded in 1-word or 2-word instructions. The 3-, 5-, 8-, or 9-bit values are encoded into 1-word instructions; 16-bit values are encoded into 2-word instructions. Data addressing Absolute Addressing There are four types of absolute addressing Data-memory address (dmad) addressing Program-memory address (pmad) addressing Port address (PA) addressing *(lk) addressing is used with all instructions that support the use of a single data-memory (Smem) operand

Data Addressing - Accumulator Addressing Accumulator addressing uses the value in the accumulator as an address. This addressing mode is used to address program memory as data. Data Addressing - Direct Addressing In direct addressing, the instruction contains the lower seven bits of the datamemory address (dma). The 7-bit dma is an address offset that is combined with a base address, with the data-page pointer

(DP), or with the stack pointer (SP) to form a 16-bit data-memory address. Using this form of addressing, you can access any of 128 locations in random order without changing the DP or the SP. Data Addressing - Indirect Addressing In indirect addressing, any location in the 64K-word data space can be accessed using the 16-bit address contained in an auxiliary register. The C54x DSP has eight 16-bit auxiliary registers (AR0AR7). Indirect addressing is used mainly when there is a need to step through sequential locations in memory in fixedsize steps. Data Addressing - Memory-Mapped Register Addressing Memory-mapped register addressing is used to modify the memory-mapped registers without affecting either the current data-page pointer (DP) value or the current stack-pointer (SP) value. Because DP and SP do not need to be modified in this mode, the overhead for writing to a register is minimal. Memorymapped register addressing works for both direct and indirect addressing. Data Addressing - Stack Addressing The system stack is used to automatically store the program counter during interrupts and subroutines. It can also be used at your discretion to store additional items of context or to pass data values. The stack is filled from the highest to the lowest memory address. The processor uses a 16-bit memorymapped register, the stack pointer (SP), to address the stack. SP always points to the last element stored onto the stack. Program Memory Addressing Following program control operations that affect the value loaded in the PC: Branches Calls Returns Conditional operations Repeats of an instruction or a block of instructions Hardware reset Interrupts

6. Explain with a neat diagram the architecture of 6x series of processors?

The C6000 devices execute up to eight 32-bit instructions per cycle. The C674x CPU consists of 64 general-purpose 32-bit registers and eight functional units. These eight functional units contain: Two multipliers Six ALUs

Features of the C6000 devices Advanced VLIW CPU with eight functional units, including two multipliers and six arithmetic units o Executes up to eight instructions per cycle for up to ten times the performance of typical DSPs o Allows designers to develop highly effective RISC-like code for fast development time

Instruction packing o Gives code size equivalence for eight instructions executed serially or in parallel o Reduces code size, program fetches, and power consumption Conditional execution of most instructions o Reduces costly branching o Increases parallelism for higher sustained performance Efficient code execution on independent functional units 8/16/32-bit data support, providing efficient memory support for a variety of applications 40-bit arithmetic options add extra precision for vocoders and other computationally intensive applications Saturation and normalization provide support for key arithmetic operations Field manipulation and instruction extract, set, clear, and bit counting support common operation found in control and data manipulation applications.

The VelociTI architecture of the C6000 platform of devices make them the first off-the-shelf DSPs to use advanced VLIW to achieve high performance through increased instruction-level parallelism. A traditional VLIW architecture consists of multiple execution units running in parallel, performing multiple instructions during a single clock cycle. Parallelism is the key to extremely high performance, taking these DSPs well beyond the performance capabilities of traditional superscalar designs. VelociTI is a highly deterministic architecture, having few restrictions on how or when instructions are fetched, executed, or stored. It is this architectural flexibility that is key to the breakthrough efficiency levels of the TMS320C6000 Optimizing compiler. The C674x CPU, contains: Program fetch unit 16/32 bit instruction dispatch unit, advanced instruction packing Instruction decode unit Eight functional units (.L1, .L2, .S1, .S2, .M1, .M2, .D1, and .D2) Two load-from-memory data paths (LD1 and LD2) Two store-to-memory data paths (ST1 and ST2) Two data address paths (DA1 and DA2) Two register file data cross paths (1X and 2X) Two general-purpose register files (A and B) Control registers Control logic Test, emulation, and interrupt logic Internal DMA (IDMA) for transfers between internal memories

The program fetch, instruction dispatch, and instruction decode units can deliver up to eight 32-bit instructions to the functional units every CPU clock cycle. The processing of instructions occurs in each of the two data paths (A and B), each of which contains four functional units (.L, .S, .M, and .D) and 32 32-bit general-purpose registers. General-Purpose Register Files There are two general-purpose register files (A and B) in the CPU data paths. Each of these files contains 32 32-bit registers (A0A31 for file A and B0B31 for file B). The general-purpose registers can be used for data, data address pointers, or condition registers. Functional Units The eight functional units in the C6000 data paths can be divided into two groups of four; each functional unit in one data path is almost identical to the corresponding unit in the other data path. Each functional unit has its own 32-bit write port, so all eight units can be used in parallel every cycle, into a general-purpose register file. All units ending in 1 (for example, .L1) write to register file A, and all units ending in 2 write to register file B. Each functional unit has two 32-bit read ports for source operands src1 and src2. Four units (.L1, .L2, .S1, and .S2) have an extra 8-bit-wide port for 40-bit long writes, as well as an 8-bit input for 40-bit long reads. Since each DSP multiplier can return up to a 64-bit result, an extra write port has been added from the multipliers to the register file. Register File Cross Paths Each functional unit reads directly from and writes directly to the register file within its own data path. That is, the .L1, .S1, .D1, and .M1 units write to register file A and the .L2, .S2, .D2, and .M2 units write to register file B. The register files are connected to the opposite-side register file's functional units via the 1X and 2X cross paths. These cross paths allow functional units from one data path to access a 32-bit operand from the opposite side register file. The 1X cross path allows the functional units of data path A to read their source from register file B, and the 2X cross path allows the functional units of data path B to read their source from register file A. Memory, Load, and Store Paths The DSP supports double word loads and stores. There are four 32-bit paths for loading data from memory to the register file. For side A, LD1a is the load path for the 32 LSBs and LD1b is the load path for the 32 MSBs. For side B, LD2a is the load path for the 32 LSBs and LD2b is the load path for the 32 MSBs. There are also four 32-bit paths for storing register values to memory from each register file. For side A, ST1a is the write path for the 32 LSBs and ST1b is the write path for the 32 MSBs. For side B, ST2a is the write path for the 32 LSBs and ST2b is the write path for the 32 MSBs. Data Address Paths The data address paths (DA1 and DA2) are each connected to the .D units in both data paths. This allows data addresses generated by any one path to access data to or from any register. The DA1 and DA2 resources and their associated data paths are specified as T1 and T2, respectively. T1 consists of the DA1 address path and the LD1 and ST1 data paths. For the DSP, LD1 is comprised of LD1a and LD1b to

support 64-bit loads; ST1 is comprised of ST1a and ST1b to support 64-bit stores. Similarly, T2 consists of the DA2 address path and the LD2 and ST2 data paths. For the DSP, LD2 is comprised of LD2a and LD2b to support 64-bit loads; ST2 is comprised of ST2a and ST2b to support 64-bit stores.

7. What is pipelining? Explain the pipeline stages o TMS320C6x Processors? Processors with pipelining are organized inside into stages which can semi-independently work on separate jobs. Each stage is organized and linked into a 'chain' so each stage's output is fed to another stage until the job is done. This organization of the processor allows overall processing time to be significantly reduced. A deeper pipeline means that there are more stages in the pipeline, and therefore, fewer logic gates in each stage. This generally means that the processor's frequency can be increased as the cycle time is lowered. This happens because there are fewer components in each stage of the pipeline, so the propagation delay is decreased for the overall stage Pipelining does not help in all cases. There are several possible disadvantages. An instruction pipeline is said to be fully pipelined if it can accept a new instruction every clock cycle. A pipeline that is not fully pipelined has wait cycles that delay the progress of the pipeline. Advantages of Pipelining The cycle time of the processor is reduced, thus increasing instruction issue-rate in most cases. Some combinational circuits such as adders or multipliers can be made faster by adding more circuitry. If pipelining is used instead, it can save circuitry vs. a more complex combinational circuit.

Disadvantages of Pipelining A non-pipelined processor executes only a single instruction at a time. This prevents branch delays (in effect, every branch is delayed) and problems with serial instructions being executed concurrently. Consequently the design is simpler and cheaper to manufacture. The instruction latency in a non-pipelined processor is slightly lower than in a pipelined equivalent. This is because extra flip flops must be added to the data path of a pipelined processor. A non-pipelined processor will have a stable instruction bandwidth. The performance of a pipelined processor is much harder to predict and may vary more widely between different programs.

Highlights of C6000 Pipeline The pipeline can dispatch eight parallel instructions every cycle. Parallel instructions proceed simultaneously through each pipeline phase. Serial instructions proceed through the pipeline with a fixed relative phase difference between instructions. Load and store addresses appear on the CPU boundary during the same pipeline phase, eliminating read-after-write memory conflicts.

Pipeline Operation Overview The pipeline phases are divided into three stages: Fetch Decode Execute All instructions in the DSP instruction set flow through the fetch, decode, and execute stages of the pipeline. The fetch stage of the pipeline has four phases for all instructions, and the decode stage has two phases for all instructions. The execute stage of the pipeline requires a varying number of phases, depending on the type of instruction. Fetch Phase The fetch phases of the pipeline are: PG: Program address generate PS: Program address send PW: Program access ready wait PR: Program fetch packet receive The DSP uses a fetch packet (FP) of eight words. All eight of the words proceed through fetch processing together, through the PG, PS, PW, and PR phases. During the PG phase, the program address is generated in the CPU. In the PS phase, the program address is sent to memory. In the PW phase, a memory read occurs. Finally, in the PR phase, the fetch packet is received at the CPU. Decode Phase The decode phases of the pipeline are: DP: Instruction dispatch DC: Instruction decode In the DP phase of the pipeline, the fetch packets are split into execute packets. Execute packets consist of one instruction or from two to eight parallel instructions. During the DP phase, the instructions in an execute packet are assigned to the appropriate functional units. In the DC phase, the source registers, destination registers, and associated paths are decoded for the execution of the instructions in the functional units. Execution Phase The execute portion of the pipeline is subdivided into five phases (E1-E5). Different types of instructions require different numbers of these phases to complete their execution.

8. With a neat diagram explain the core architecture of ADSP 21xx DSP?

ADSP 21xx has following architectural features Computation unitsmultiplier, ALU, shifter, and data register file Program sequencer with related instruction cache, interval timer, and Data Address Generators (DAG1 and DAG2) Dual-blocked SRAM External ports for interfacing to off-chip memory, peripherals, and hosts Input/Output (I/O) processor with integrated DMA controllers, serial ports (SPORTs), serial peripheral interface (SPI) ports, and a UART port JTAG Test Access Port for board test and emulation

ADSP 21xx Bus ADSP-21xx has three onchip buses - PM bus, DM bus, and DMA bus. The PM bus provides access to either instructions or data. During a single cycle, these buses let the processor access two data operands (one from PM and one from DM), and access an instruction (from the cache)

How ADSP addresses DSP requirements Fast, flexible arithmetic computation units o The ADSP-219x family DSPs execute all computational instructions in a single cycle. They provide both fast cycle times and a complete set of arithmetic operations. Unconstrained data flow to and from the computation units. The ADSP-219x has a modified Harvard architecture combined with a data register file. In every cycle, the DSP can: o Read two values from memory or write one value to memory o Complete one computation o Write up to three values back to the register file o Extended precision and dynamic range in the computation units 40-Bit Extended Precision. The DSP handles 16-bit integer and fractional formats (twoscomplement and unsigned). The processors carry extended precision through result registers in their computation units, limiting intermediate data truncation errors. Dual address generators with circular buffering support o Dual Address Generators. The DSP has two data address generators (DAGs) that provide immediate or indirect (pre- and post-modify) addressing. Modulus and bit-reverse operations are supported with memory page constraints on data buffer placement only.

Efficient program sequencing o Efficient Program Sequencing. In addition to zero-overhead loops, the DSP supports quick setup and exit for loops. Loops are both nestable (eight levels in hardware) and interruptable. The processors support both delayed and non-delayed branches.

You might also like