You are on page 1of 12

Block diagram (You can simplify how the buses are depicted etc.. and the fig.

will become smaller!!) The TMS320C4x devices are 32-bit floating-point digital signal processors optimized for parallel processing. The C4x family combines a high performance CPU and DMA controller with up to six communication ports to meet the needs of multiprocessor and I/O-intensive applications. Each C4x device contains an on-chip analysis module, which supports hardware breakpoints for parallel processing development and debugging. The

C4x family is source-code compatible with the TMS320C3x family of floating-point DSPs. The TMS320C40 is the original member of the C4x family. It features a CPU that can deliver up to 30 MIPS/60 MFLOPS with a maximum I/O bandwidth of 384M bytes/s. The C40 has 2K words of on-chip RAM, 128 words of program cache and a bootloader. Two external buses provide an address reach of 4 gigawords of unified memory space. The C40 is available in a 325-pin CPGA package. The TMS320C44 The TMS320C44 is a lower cost version of the C40, for parallel processing applications that are more price sensitive. The C44 features four communication ports and has an external address reach of 32M words over two external buses. To further reduce cost, the C44 comes in a 304-pin PQFP package. The TMS320C44 can deliver up to 30 MIPS/60 MFLOPS performance with a maximum I/O bandwidth of 384M bytes/s. The C44 is source-code compatible with the C40. 1. Key Features of the TMS320C4x The TMS320C4x has several key features: _ Up to 40 MIPS/80 MFLOPS performance with 488-Mbytes/s I/O capability _ IEEE floating-point conversion for ease of use _ Register-based CPU _ Single-cycle byte and half-word manipulation capabilities _ Divide and square root support for improved performance _ On-chip memory includes 2K words of SRAM, 128 words of program cache, and bootloader _ Two external buses providing an address reach of up to 4 gigawords _ Two memory-mapped 32-bit timers _ 6 and 12 channel DMA _ Up to six communication ports for multiprocessor communication _ Idle mode for reduced power consumption

Central Processing Unit (CPU) The C4xs CPU has a register-based architecture. The CPU consists of the several components: 1.Floating-point/integer multiplier 2.Arithmetic Logic Unit (ALU) 3.32-bit barrel shifter 4.Internal buses (CPU1/CPU2 and REG1/REG2) 5.Auxiliary register arithmetic units (ARAUs) 6.CPU register file Floating-Point/Integer Multiplier The multiplier performs single-cycle multiplications on 32-bit integer and 40-bit floatingpoint values. The C4x implementation of floating-point arithmetic allows for floatingpoint operations at fixed-point speeds via a 25-ns instruction cycle and a high degree of parallelism. To gain even higher throughput, you can use parallel instructions to perform a multiply and ALU operation in a single cycle. When the multiplier performs floating-point multiplication, the inputs are 40-bit floatingpoint numbers, and the result is a 40-bit floating-point number. When the multiplier performs integer multiplication, the input data is 32 bits and yields either the 32 mostsignificant bits or the 32 least-significant bits of the resulting 64-bit product. Arithmetic Logic Unit (ALU) and Internal Buses The ALU performs single-cycle operations on 32-bit integer, 32-bit logical, and 40-bit floating-point data, including single-cycle integer and floating-point conversions. Results of the ALU are always maintained in 32-bit integer or 40-bit floating-point formats. The barrel shifter is used to shift up to 32 bits left or right in a single cycle. Four internal buses, CPU1, CPU2, REG1, and REG2, carry two operands from memory and two operands from the register file, thus allowing parallel multiplies and adds/subtracts on four integer or floating-point operands in a single cycle. Auxiliary Register Arithmetic Units (ARAUs) The two auxiliary register arithmetic units (ARAU0 and ARAU1) can generate two addresses in a single cycle. The ARAUs operate in parallel with the multiplier

and ALU. They support addressing with displacements, index registers (IR0 and IR1), and circular and bit-reversed addressing. CPU Primary Register File The C4x primary register file provides 32 registers in a multiport register file that is tightly coupled to the CPU. Table 21 lists register names and functions, followed by the section number and page of each description. All of the primary register file registers can be operated upon by the multiplier and ALU and can be used as general-purpose registers. However, the registers also have some special functions. For example, the 12 extended-precision registers are especially suited for maintaining floating-point results. The eight auxiliary registers support a variety of indirect addressing modes and can be used as general-purpose 32-bit integer and logical registers. The remaining registers provide system functions such as addressing, stack management, processor status, interrupts, and block repeat. The extended-precision registers (R0R11) are capable of storing and supporting operations on 32-bit integer and 40-bit floating-point numbers. Any instruction that assumes that the operands are floating-point numbers uses bits 390. If the operands are either signed or unsigned integers, only bits 310 are used, and bits 3932 remain unchanged. This is true for all shift operations. The 32-bit auxiliary registers (AR0AR7) can be accessed by the CPU and modified by the two auxiliary register arithmetic units (ARAUs). The primary function of the auxiliary registers is the generation of 32-bit addresses. They can also be used as loop counters or as 32-bit general-purpose registers that can be modified by the multiplier and ALU. The data page pointer (DP) is a 32-bit register. The 16 LSBs of the data page pointer are used by the direct addressing mode as a pointer to the page of data being addressed. The C4x can address up to 64K pages, each page containing 64K words The 32-bit index registers contain the value used by the auxiliary register arithmetic unit (ARAU) to compute an indexed address. The ARAU uses the 32-bit block size register (BK) in circular addressing to specify the data block size.

The system stack pointer (SP) is a 32-bit register that contains the address of the top of the system stack. The SP always points to the last element pushed onto the stack. A push performs a pre-increment, and a pop performs a post-decrement of the system stack pointer. The SP is manipulated by interrupts, traps, calls, returns, and the PUSH/PUSHF and POP/POPF instructions. The status register (ST) contains global information related to the state of the CPU. Typically, operations set the condition flags of the status register according to whether the result is zero, negative, etc. This includes register load and store operations as well as arithmetic and logical functions. When the status register is loaded, however, a bit-for-bit replacement is performed with the contents of the source operand, regardless of the state of any bits in the source operand. Therefore, following a load, the contents of the status register are identically equal to the contents of the source operand. The DMA coprocessor interrupt enable register (DIE) is a 32-bit register containing 2- and 3-bit fields to designate the interrupt synchronization scheme for each of the six DMA channels. It allows each DMA channel to service a corresponding input communication port and output communication port. Also, each DMA channel can be synchronized with external interrupts or the on-chip timers. The CPU internal interrupt enable register (IIE) is a 32-bit register that enables/ disables interrupts for the six communication ports, both timers, and the six DMA coprocessor channels. The IIOF flag register (IIF) controls the function (general-purpose I/O or interrupt) of the four external pins (IIOF0 to IIOF3). It also contains timer/DMA interrupt flags. The 32-bit repeat counter (RC) register specifies the number of times a block of code is to be repeated when a block repeat is performed. When the processor is operating in the repeat mode, the 32-bit repeat start address register (RS) contains the starting address of the block of program memory to be repeated, and the 32-bit repeat end address register (RE) contains the ending address of the block to be repeated. Block Repear (RS,RE) and Repeat Count (RC) Registers,

The program counter (PC) is a 32-bit register containing the address of the next instruction to be fetched. Although the PC is not part of the CPU register file, it is a register that can be modified by instructions that modify the program flow. CPU Expansion Register File Besides the CPU primary register file, the expansion register file contains two special registers that act as pointers: _ The IVTP register points to the interrupt-vector table (IVT), which defines vectors for all interrupts. _ The TVTP register points to the trap vector table (TVT), which defines vectors for 512 traps. Memory Organization The total memory reach of the C4x is 4G 32-bit words. Program memory (onchip RAM or ROM and external memory) as well as registers affecting timers, communication ports, and DMA channels are contained within this space. This allows tables, coefficients, program code, and data to be stored in either RAM or ROM. Thus, memory usage is maximized, and memory space allocated as desired. By manipulating one external pin (ROMEN), you can configure the first onemegaword area of memory (0000 0000h to 000F FFFFh) to address the local address bus or to address the on-chip ROM when you use the bootloader (with remaining space reserved). 2.1 RAM, ROM, and Cache The ROM block is reserved and contains a bootloader. Each RAM and ROM block is capable of supporting two accesses in a single cycle. The separate program buses, data buses, and DMA buses allow for parallel program fetches, data reads and writes, and DMA operations. For example: the CPU can access two data values in one RAM block and perform an external program fetch in parallel with the DMA coprocessor loading another RAM block, all within a single cycle. The reserved ROM block (upper right contains a bootloader. This loader supports loading of program and data at reset time. Loading is from 8-, 16-, or 32-bit wide memories or any one of the six communication

ports. A 128k, 32-bit instruction cache is provided to store often-repeated sections of code, thus greatly reducing the number of needed off-chip accesses. This allows for code to be stored off-chip in slower, lower-cost memories. By using the cache to execute your program, the external buses are freed for use by the DMA controller or CPU. Memory Maps For each processor, the level at the external pin ROMEN determines whether or not the first megaword of memory addresses the internal ROM or external memory. The maps illustrate the entire address space of the C40 and C44. The value of ROMEN affects only the first megaword of memory: _ A 1 at external pin ROMEN causes internal ROM to be enabled at 0000h with the one-megaword space reserved (0000 0000h 000F FFFFh). This is shown in the right side of the figure. _ A 0 at ROMEN causes addresses 0000 0000h 000F FFFFh to be accessible on the local bus. This is shown in the left side of the figure. The rest of the memory map is the same for either level of ROMEN: _ The second megaword of memory is devoted to peripherals _ The third megaword of memory contains the two 1K-word (4K-byte) blocks of RAM (BLK0 and BLK1 as shown at 002F F800h 002F FFFFh). _ The rest of the first 2 gigawords (0030 0000h 7FFF FFFFh) is on the local bus (external). _ The second 2 gigawords (8000 0000h FFFF FFFFh) are on the global bus (external). Caution Any access to a reserved area in the address space produces unpredictable results. Do not attempt to access reserved areas. Memory Aliasing (C44 only) Memory aliasing occurs in the C44, since both the global and local ports on that device have 24 pins, instead of the 31 pins on each port in the C40. Memory aliasing causes the first 16 M of each address space to be repeated in the memory map. Memory on the local bus occupies, and is aliased, in the first 2 G of address space, and memory on the global bus occupies, and is

aliased, in the second 2 G of address space. Figure 27 shows the alias regions on the local and global buses. Memory Addressing Modes The C4x supports a base set of general-purpose instructions as well as arithmeticintensive instructions that are particularly suited for digital signal processing and other numeric-intensive applications. Refer to Chapter 6, Addressing Modes, for detailed information on addressing. Four groups of addressing modes are provided on the C4x. Each group uses two or more of several different addressing types. The following list shows the addressing modes with their addressing types. _ General addressing modes: _ Register. The operand is a CPU register. _ Immediate. The operand is a 16-bit immediate value. _ Direct. The operand is the contents of a 32-bit address (concatenation of 16 bits of the data page pointer and a 16-bit operand). _ Indirect. A 32-bit auxiliary register indicates the address of the operand. _ Three-operand addressing modes: _ Register. (same as for general addressing mode). _ Indirect. (same as for general addressing mode). _ Immediate. The operand is an 8-bit immediate value. _ Parallel addressing modes: _ Register. The operand is an extended-precision register. _ Indirect. (same as for general addressing mode). _ Branch addressing modes: _ Register. (same as for general addressing mode). _ PC-relative. A signed 16-bit displacement or a 24-bit displacement is added to the PC. Internal Bus Operation A large portion of the C4xs high performance is due to internal busing and parallelism.

Separate buses allow for parallel program fetches, data accesses, and DMA accesses: _ Program buses PADDR and PDATA _ Data buses DADDR1, DADDR2, and DDATA _ DMA buses DMAADDR and DMADATA These buses connect all of the physical spaces (on-chip memory, off-chip memory, and on-chip peripherals) supported by the C4x. Figure 23 shows these internal buses and their connections to on-chip and off-chip memory blocks. The program counter (PC) is connected to the 32-bit program address bus (PADDR). The instruction register (IR) is connected to the 32-bit program data bus (PDATA). In this configuration, the buses can fetch a single instruction word every machine cycle. The 32-bit data address buses (DADDR1 and DADDR2) and the 32-bit data data bus (DDATA) support two data memory accesses every machine cycle. The DDATA bus carries data to the CPU over the CPU1 and CPU2 buses. The CPU1 and CPU2 buses can carry two data memory operands to the multiplier, ALU, and register file every machine cycle. Also internal to the CPU are register buses REG1 and REG2, which can carry two data values from the register file to the multiplier and ALU every machine cycle. Figure 22 shows the buses that are internal to the CPU section of the processor. The DMA controller is supported with a 32-bit address bus (DMAADDR) and a 32-bit data bus (DMADATA). These buses allow the DMA to perform memory accesses in parallel with the memory accesses occurring from the data and program buses. External Bus Operation The C4x provides two identical external interfaces: the global memory interface and the local memory interface. Each consists of a 32-bit data bus, a 31-bit (C40) or 24-bit (C44) address bus, and two sets of control signals. Both buses can be used to address external program/data memory or I/O space. The buses also have external RDY signals for wait-state generation with wait

states inserted under software control. Chapter 9, External Bus Operation, covers external bus operation. For multiple processors to access global memory and share data in a coherent manner, arbitration is necessary. This arbitration (handshaking) is the purpose of the C4xs interlocked operations, handled through interlocked instructions. Interrupts The C4x supports four external interrupts (IIOF30), a number of internal interrupts, a nonmaskable external NMI interrupt, and a nonmaskable external RESET signal, which sets the processor to a known state. The DMA and communication ports have their own internal interrupts. When the CPU responds to the interrupt, the IACK pin can be used to signal an external interrupt acknowledge. Peripherals All C4x on-chip peripherals are controlled through memory-mapped registers on a dedicated peripheral bus. This peripheral bus is composed of a 32-bit data bus and a 32-bit address bus. This peripheral bus permits straightforward communication to the peripherals. The C4x peripherals include two timers and six (C40) or four (C44) communication ports. Communication Ports Six (C40) or four (C44) high-speed communication ports provide rapid processorto-processor communication through each ports dedicated communication interfaces. Coupled with the C4xs two memory interfaces (global and local), this allows you to construct a parallel processor system that attains optimum system performance by distributing tasks among several processors. Each C4x can pass the results of its work to another C4x through a communication port, enabling each C4x to continue working. Chapter 12, Communication Ports, explains communication port operation in detail. The communication ports offer several features: _ 160-megabits/s (20-Mbytes or 5-Mwords per second) bidirectional data transfer operations (at 40-ns cycle time) _ Simple processor-to-processor communication via eight data lines and four control lines

_ Buffering of all data transfers, both input and output _ Automatic arbitration to ensure communication synchronization _ Synchronization between the CPU or the direct-memory access (DMA) coprocessor and the six communication ports via internal interrupts and internal ready signals. _ Port direction pin (CDIR) to ease interfacing (C44 only) Direct Memory Access (DMA) Coprocessor The six channels of the on-chip DMA coprocessor can read from or write to any location in the memory map without interfering with the operation of the CPU. This allows interfacing to slow external memories and peripherals without reducing throughput to the CPU. The DMA coprocessor contains its own address generators, source and destination registers, and transfer counter. Dedicated DMA address and data buses allow for minimization of conflicts between the CPU and the DMA coprocessor. A DMA operation consists of a block or single-word transfer to or from memory. A key feature of the DMA coprocessor is its ability to automatically reinitialize each channel following a data transfer. Timers The two timer modules are general-purpose 32-bit timer/event counters with two signaling modes and internal or external clocking. They can signal internally to the C4x or externally to the outside world at specified intervals, or they can count external events. Each timer has an I/O pin that can be used as an input clock to the timer, as an output signal driven by the timer, or as a general purpose I/O pin.

You might also like