You are on page 1of 268

Fault Assisted Circuits for

Electronics Training (FACET)

Digital Signal Processor

Printed in Canada
Student Manual
31946-00

ü
FAULT ASSISTED CIRCUITS FOR
ELECTRONICS TRAINING (FACET)

DIGITAL SIGNAL PROCESSOR

by
the Staff
of
Lab-Volt (Quebec) Ltd

Copyright © 2000 Lab-Volt Ltd

All rights reserved. No part of this publication may be


reproduced, in any form or by any means, without the prior
written permission of Lab-Volt Quebec Ltd.

Legal Deposit – Second Trimester 2000

ISBN 2-89289-480-8

FIRST EDITION, JUNE 2000

Printed in Canada
June 2000
Table of Contents

Unit 1 DSP Trainer Familiarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

Ex. 1-1 Introduction to the DSP Circuit Board . . . . . . . . . . . . . . . . . 1-9

Ex. 1-2 The Assembler and Debugger . . . . . . . . . . . . . . . . . . . . . 1-29

Ex. 1-3 Processor Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-51

Unit 2 CPU Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1

Ex. 2-1 The Central Arithmetic Logic Unit . . . . . . . . . . . . . . . . . . . . 2-9

Ex. 2-2 Memory Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33

Ex. 2-3 Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-51

Unit 3 Program Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1

Ex. 3-1 The Program Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7

Ex. 3-2 The Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27

Unit 4 Basic I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1

Ex. 4-1 DSP Peripherals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7

Ex. 4-2 Digital Signal Processing: The FIR Filter. . . . . . . . . . . . . . 4-23

Appendix A Help Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1

Appendix B New Terms and Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1

We Value Your Opinion

III
IV
Unit 1

DSP Trainer Familiarization

UNIT OBJECTIVES

Upon completion of this unit, you will be able to explain the difference between a
digital signal processor (DSP) and a general-purpose processor. You will be
familiar with the design process for DSP programs.

UNIT FUNDAMENTALS

A Digital Signal Processor (DSP) is an incredibly fast and powerful


microprocessor that, like our brain, can handle the analysis of signals in real-time.

The internal design of DSPs, the key element being the multiply and add
architecture, makes them often much faster at calculating mathematical operations
than other microprocessors.

Digital Signal Processors are characterized by:

– Specialized structures that make them execute commands rapidly and


efficiently,

– fast multiply instructions,

– reduced numbers of commands making the DSP programming process simpler.

1-1
DSP Trainer Familiarization

DSPs have revolutionized telecommunications. They can be found inside of, to


name a few, cellular phones, modems, speech recognition/synthesis devices,
Digital Versatile Disk (DVD) players and high level security devices.

In fact, DSPs are commonly found in other devices that are not immediately, in the
minds of people, associated with them, such as: hard disk drive controllers, vehicle
suspension systems and in the signal processing circuits of medical imagers and
radar systems.

1-2
DSP Trainer Familiarization

DSPs began to appear at the end of the 1970s and the beginning of the 1980s with
Bell Lab's DSP1, Intel's 2920, and NEC's µPD7720.

In 1982, Texas Instruments introduced the TMS32010, the first member of what
was to become a popular 16-bit fixed-point DSP family. This DSP had an average
calculation rate of 8 MIPS.

In 1998, DSPs, using parallelism, reached calculation speeds of up to 1600 MIPS.

The DSP used with the Lab-Volt DIGITAL SIGNAL PROCESSOR circuit board is
a Texas Instruments TMS320C50. The TMS320C50 is a third-generation DSP with
an internal design based on the first-generation TMS32010.

1-3
DSP Trainer Familiarization

Also in 1982, the first floating-point DSPs were produced by Hitachi. This numeric
format greatly increased the dynamic calculation range of DSPs.

NEC introduced, two years later, the first 32-bit floating-point DSPs that had a
calculation speed of 6.6 MIPS.

Generally, real world signals (e.g., radar and sonar) are better processed by
floating-point DSPs. Constructed signals (e.g., telecom, imaging and control) are
generally better processed using fixed-point DSPs.

The uses that DSP's have been put to has grown because:

– They allow for more complex processing than is possible with analog circuitry;

– They provide repeatable signal processing performance;

– Digital processing code can be easily modified, and with it design updates or
changes are more flexible;

– They usually result in a lower development cost than analog designs with
equivalent performance levels.

1-4
DSP Trainer Familiarization

A DSP cannot operate without the intelligence of a program giving it its commands.
The program tells the DSP which instructions it must execute to perform certain
functions. This program is stored as machine code inside of the DSP.

Which of the following choices is written in machine code format (one understood
by processor)?

a. ADD #214,4
b. F9E7h
c. 1011 1110 0001 0110
d. All of the above

If a programmer were to write a DSP program using machine code it would be very
difficult.

For this reason, an assembler language is developed to program the DSP.

1-5
DSP Trainer Familiarization

This is a programming language whose instructions, mnemonics, are symbolic and


usually in one-to-one correspondence with the machine instructions.

An assembler and a linker are used to translate the program written in assembler
language into DSP machine codes.

The assembler translates the program file into object files which are then linked
together to create the executable file.

Which of the following choices is written in assembler language?

a. IF (i.NE.27) THEN (omega = 2*sin(x))


b. 982Eh
c. 1011 1110 0001 0011
d. DMOV *,AR1

The C language is a high-level language which is used more and more to


program complex DSPs or highly complex algorithms.

Programming in C simplifies the design of DSP applications because the


programmer is no longer limited by the small instruction set of low-level languages
(like the assembler language).

A C compiler is used to translate the C source codes into the appropriate DSP
assembler codes.

1-6
DSP Trainer Familiarization

The last part of programming involves checking your program for mistakes and
making changes until it correctly performs the desired function.

This final process is commonly known as debugging.

A program that aids software debugging is called a debugger.

A debugger gives the programmer an ability to diagnose the problems associated


with their DSP programs. This is done before committing the program to the DSP's
memory.

The C5x Visual Development Environment, C5x VDE, is the debugger used with
the Digital Signal Processor.

DSP system developers rarely debug a DSP without the aid of a debugger. As well,
to aid them they often use EVMs, emulators, and simulators.

1-7
DSP Trainer Familiarization

The DSP used with the circuit board is part of the TMS320C5x DSK (Digital Signal
processing Kit) evaluation module.

When using EVMs, emulators, and simulators the developers can change, during
the development process, the model of the DSP being tested.

Once functional, the final test for a program are implemented with a DSP system.

The programs included and used with the Digital Signal Processor are written in the
assembler language. The assembler language used is one specific to the
TMS320C5x EVMs, it has added instructions in it called DSK directives.

To run, or examine the function of, a Digital Signal Processor program, the
executable file (*.dsk) must be downloaded into the DSP through the C5x VDE, the
Trainer's debugger.

EQUIPMENT REQUIRED

In order to complete the following exercises, you will need:

& F.A.C.E.T. base unit


& DIGITAL SIGNAL PROCESSOR circuit board
& C5x VDE program
& Ex1_1 , ex1_2 assembler (asm) and program (dsk) files
& Oscilloscope
& Multimeter

1-8
Exercise 1-1

Introduction to the DSP Circuit Board

EXERCISE OBJECTIVES

Upon completion of this unit, you will be familiar with the location and the function
of each of the various components of the DIGITAL SIGNAL PROCESSOR training
system.

DISCUSSION

The circuit board has two functional sections:

The section containing the circuit board accessories,

and the section containing the Digital Signal Processor and its peripherals.

1-9
Introduction to the DSP Circuit Board

The circuit board accessories are the:

– POWER SUPPLY with AUXILIARY POWER INPUT


– DC SOURCE
– MICROPHONE PRE-AMPLIFIER
– AUDIO AMPLIFIER

The POWER SUPPLY circuit block delivers a filtered and regulated DC supply to
the entire circuit board.

The circuit board can be operated in two different ways. Either the input voltage for
the Power Supply can be received from a Lab-Volt FACET Base Unit or it can be
received through external ±15 V connections found on the AUXILIARY POWER
INPUT block.

1-10
Introduction to the DSP Circuit Board

The DC SOURCE block delivers a DC voltage varying, depending on the position


of the potentiometer, between -3.5 Vdc and +3.5 Vdc.

The DC SOURCE can be used as the source of an input reference signal for
programs run on the DSP.

The MICROPHONE PRE-AMPLIFIER is used to adjust a microphone's signal to a


level suitable for input into the DSP.

The GAIN potentiometer varies the output-level between a low and a high value.

1-11
Introduction to the DSP Circuit Board

To be able to hear the signal from the ANALOG OUTPUT, located on the CODEC
block, the AUDIO AMPLIFIER is used. Either the speaker or the headphones can
be used to listen to the signal.

The second functional section of the circuit board, the DSP and its peripherals,
contains the:

– DSP
– CODEC
– I/O INTERFACE
– INTERRUPTS
– AUXILIARY I/O
– SERIAL PORT

1-12
Introduction to the DSP Circuit Board

The Digital Signal Processor is found at the heart of a digital signal processing
system.

The DSP block contains a TMS320C50 DSP integrated circuit (IC) in a 132-pin
surface mount package.

It may reach execution speeds of up to 50 MIPS.

There are many kinds of DSPs, they may vary in cycle speeds.

The calculation speed is set by a DSP's clock.

However, the speed is limited by the IC's internal system design constraints.

Some DSPs use an internal oscillator to set the clock and others use an external
oscillator.

1-13
Introduction to the DSP Circuit Board

The DSP used on the circuit board is configured to use an external oscillator.

The Oscillator located on the circuit board provides it with a 40 MHz reference
signal.

The DSP divides this signal to make a 20 MHz internal one (the master clock
frequency) that it uses to time its instruction cycles.

What is the calculation speed of the DSP?

a. 20 MHz
b. 40 MHz
c. 20 MIPS
d. 40 MIPS

1-14
Introduction to the DSP Circuit Board

Some DSP programs are written to internal ROM during the manufacturing
process, most, however, use external ROM to store their program.

Both types of DSPs access their ROM at boot-up and store the program to RAM for
execution.

A DSP uses digital signals.

To be able to interact with the outside world it must have a translator to convert the
analog signals to digital ones and then back again.

A CODEC is the translator that is used for this purpose.

1-15
Introduction to the DSP Circuit Board

A CODEC is usually made up of the following components:

– a programmable input GAIN


– an ANTI-ALIASING FILTER
– an Analog-to-Digital converter
– a Digital-to-Analog converter
– a POST-FILTER

The I/O INTERFACE is a means to display and to input program information.

The 8-position DIP switch enters an 8-bit number into the DSP.

Depending on the program being used the information will be processed in different
ways.

The 7-segments displays are used to show program information to the DSP user.

1-16
Introduction to the DSP Circuit Board

Like most microprocessors, DSPs have interrupt control capabilities.

Two push-buttons can be used as user input devices for a program.

When one of the push-buttons is pressed an interrupt is signaled within the DSP
and the program code associated with it is executed.

The AUXILIARY I/O section was added for signal monitoring purposes and for
prototyping of additional DSP exercises done with the circuit board.

1-17
Introduction to the DSP Circuit Board

The headers of the AUXILIARY I/O block can be used to interface the DSP with an
external circuit.

The external circuit can be powered by the 10-pin header located in the AUXILIARY
I/O block.

The AUXILIARY I/O section has three headers.

±5 Vdc and ±15 Vdc connection points are available on the 10-pin right header;
these can be used to power an external circuit. The circuit board supplies have a
common ground.

The left header outputs the 8 LSB pins (labelled D0 to D7) of the external DSP data
bus, and include 4 pre-decoded addresses (labeled PA0# to PA3#) which can be
used for prototype development are also included.

1-18
Introduction to the DSP Circuit Board

The middle header has the following input/output (I/O) pins:

– Data, Program, and I/O space select (DS#, PS#, IS#)


– Timer output (TOUT)
– Read select and Write enable for external devices (RD#, WE#)
– Read/Write select for external accesses (R/W#)
– Interrupt acknowledge signal (IACK#)
– External interrupt input (INT4#)
– Directional and Chip select to control external data transfer (DIR, CS#)

The DSP on the circuit board is programmed to be the slave of a host computer.

For the DSP Trainer to be used the circuit board SERIAL PORT must be connected
to one of your computer's serial ports.

Note: If the host computer does not have a second serial port connection
available, then at the appropriate times during the exercise procedures, you
can disconnect the Base Unit serial link and use it to connect the circuit board
SERIAL PORT to the computer.

1-19
Introduction to the DSP Circuit Board

The C5x Visual Development Environment (VDE) manages hand-shaking


between the circuit board and your computer. It controls all input and output from
the DSP's memory via the serial link.

Once the communication link between your computer and the DSP board is
established, the C5x VDE can be used to download a program into the DSP.

PROCEDURE

Circuit Board Introduction

In this procedure section, you will familiarize yourself with some of the components
and circuit blocks found on your DIGITAL SIGNAL PROCESSOR circuit board.

1-20
Introduction to the DSP Circuit Board

* 1. Locate, on the DIGITAL SIGNAL PROCESSOR circuit board, all of the


common terminals. Using an ohmmeter, verify if the common terminals are
connected together.

See HELP Unit 01 shelp1

Are all of the common terminals connected together?

* Yes * No

* 2. Turn ON the power supply for the DIGITAL SIGNAL PROCESSOR circuit
board.

1-21
Introduction to the DSP Circuit Board

* 3. Using a DC voltmeter, and by varying the potentiometer of the DC


SOURCE from its smallest to its largest values, measure the DC voltage
range at the OUTPUT of the DC SOURCE.

See HELP Unit 01 shelp2

What is the minimum DC voltage (V DC min) and the maximum DC voltage


(V DC max) output from the DC SOURCE?

VDC min =__________ V

VDC max =__________ V

* 4. Make the connections to the DIGITAL SIGNAL PROCESSOR shown in the


figure.

Note: If you are in an area where an audio output from the


speaker is not desired then use the earphones provided with the
circuit board. These connect to the earphone jack located within
the AUDIO AMPLIFIER circuit block.

* 5. While talking into the microphone familiarize yourself with the use of the
potentiometers of the MICROPHONE PRE-AMPLIFIER and of the AUDIO
AMPLIFIER.

* 6. Remove all of the connections present on the circuit board.

1-22
Introduction to the DSP Circuit Board

Familiarization with the Circuit Board Using a DSP Program

In this procedure section, the C5x VDE will be used to load and run a program
inside of the DSP.

Note: Before using the C5x VDE please make certain the circuit
board power source is turned ON, and that the serial connection
is present between the host computer and the DIGITAL SIGNAL
PROCESSOR circuit block labeled SERIAL PORT.

* 7. Open the C5x Visual Development Environment (VDE) program.

* 8. Use the Load Program command in the File menu to load the ex1_1.dsk
program into the DSP.

What two display windows are now open inside of the C5x VDE?

a. The C5x Registers and the Peripheral Registers displays.


b. The Dis-Assembly Window and the Peripheral Registers displays.
c. The C5x Registers and the Dis-Assembly Window displays.
d. The Peripheral Registers display and the File Selection Window.

1-23
Introduction to the DSP Circuit Board

* 9. Make the circuit board connections shown in the figure. These will allow for
proper operation of the ex1_1.dsk program.

Note: Use the earphones if necessary.

* 10. Execute the RUN command found on the C5x VDE Toolbar.

* 11. Observe the read-out displayed within the I/O INTERFACE circuit block.
Adjust the DIP switch (every bit in the 0 position) so that the display reads
0000.

See HELP Unit 01 shelp3

* 12. Press the INT1# push-button, found in the INTERRUPTS circuit block, to
transmit to the DSP the value entered through the DIP switch.

* 13. Using the microphone, input a signal (your voice) into the DSP.

Note: Adjust the GAIN potentiometers of the MICROPHONE


PRE-AMPLIFIER and of the AUDIO AMPLIFIER to improve the
output sound.

* 14. Note that when you are talking into the microphone, dots on the display of
the I/O INTERFACE circuit block light up.

1-24
Introduction to the DSP Circuit Board

What do these dots indicate?

a. The DSP processing speed in MIPS.


b. The magnitude of the input signal received from the microphone.
c. Which I/O INTERFACE display, out of the four, is currently being
refreshed by the DSP.
d. None of the above choices describe what the dots are displaying.

* 15. Adjust the DIP switch such that the I/O INTERFACE display reads 0015.

* 16. Transmit the DIP switch value into the DSP by pressing the INT1# push-
button.

* 17. Observe the effect of the signal processing modification on the sound of
your voice.

* 18. Repeat steps; 15 to 17 for each of the following I/O INTERFACE display
values:

0031, 0063, 0127, 0255

Remember to press the INT1# push-button after setting the DIP switch to
a new value.

1-25
Introduction to the DSP Circuit Board

Which of the following choices best describes the ex1_1.dsk program


loaded in the DSP?

a. It is a voice recorder
b. It is a Base Unit operating system
c. It is a function generator
d. It is an echo generator

With what is the number displayed in the I/O INTERFACE proportional?

a. The time delay (in milliseconds) between consecutive echoes.


b. The number of echoes created.
c. The time taken (in milliseconds) to generate the echos for a sound.
d. The number of samples to be made on the input signal per second.

* 19. Execute the Halt command found on the C5x VDE Toolbar. Close the C5x
Visual Development Environment.

CONCLUSION

& The DIGITAL SIGNAL PROCESSOR has two sections: the circuit board
accessories section and the DSP with peripherals section.

& The circuit board is divided into individual circuit blocks.

& Before a DSP program can be loaded or used, the DIGITAL SIGNAL
PROCESSOR power supply must be turned ON and a serial connection
between the SERIAL PORT circuit block and the host computer must be made.

& The CODEC, I/O INTERFACE, INTERRUPT and AUXILIARY I/O circuit blocks
can only be used by the user if the program loaded into the DSP requires their
use.

& The user has access to the auxiliary circuit board headers for prototype
development.

REVIEW QUESTIONS

1. Before your DIGITAL SIGNAL PROCESSOR circuit board is ready to be used


there are certain steps that must be followed. Which of the following statements
is necessary to complete before using the circuit board?

a. Make certain that the I/O INTERFACE switches are all in the O position.
b. Make certain the serial connection is present between the host computer
and the DIGITAL SIGNAL PROCESSOR circuit block labeled SERIAL
PORT.
c. Make certain the circuit board power source is turned ON.
d. Statements b. and c.

1-26
Introduction to the DSP Circuit Board

2. What is the DC voltage range that the potentiometer for the DC source is
adjusted over?

a. -3.3 V to +3.6 V
b. -3.0 V to +3.0 V
c. -3.5 V to +3.5 V
d. None of the above.

3. Which of the following pins are located on the middle header of the AUXILIARY
I/O circuit block?

a. The 4 pre-decoded addresses (labeled PA0# to PA3#)


b. TOUT, IACK#, INT4#, and RD#
c. DS#, D0, D1, and D2
d. CS#, INT4#, DS#, and PA1#

4. What does the TMS320C50 DSP found on the DIGITAL SIGNAL


PROCESSOR circuit board use to set the frequency of its mater clock (recall
that it is the clock that sets the calculation speed of the DSP)?

a. The DSP uses its 20 MHz internal oscillator.


b. The DSP uses a 40 MHz external oscillator.
c. Through a serial connection, the DSP uses the 33.3 MHz internal oscillator
CODEC.
d. Through the SERIAL PORT circuit block connection, the DSP uses the
internal oscillator of the host computer.

5. Which of the following components is usually found in a CODEC?

a. An anti-aliasing filter
b. An analog-to-digital converter (ADC)
c. A digital-to-analog converter (DAC)
d. All of the above

1-27
1-28
Exercise 1-2

The Assembler and Debugger

EXERCISE OBJECTIVES

Upon completion of this exercise, you will understand basic DSP source file syntax.
You will be able to operate the debugger that accompanies the DIGITAL SIGNAL
PROCESSOR.

DISCUSSION

The source file for a DSP program can be written inside of a text editor, virtually any
ASCII editor can be used.

1-29
The Assembler and Debugger

The instruction lines found in the source file and used in the assembler
programming language are called source statements.

A DSP program is a list of these assembled source statements.

The source statements used in the assembler language have a very precise syntax.
There are four fields that make up a statement:

– the label (optional)


– the instruction mnemonic
– the instruction mnemonic operands (the number of operands depends on the
instruction used)
– the comment (optional)

1-30
The Assembler and Debugger

Each source statement field must be separated by one or more blanks

The source statements themselves must either begin with a label or a blank.

The beginning of a comment line must be indicated by a semicolon or an asterisk.

A source file may also contain assembler directives.

Directives supply the program with data and control the assembly process.

Assembler directives permit the following to be done:

– initialize program instructions and data values into memory.


– define symbolic names for certain DSP registers (using the .mmregs directive).
– reserve space in memory for variables that have not been initialized.
– assemble conditional blocks.

1-31
The Assembler and Debugger

The source files used with the DIGITAL SIGNAL PROCESSOR contain certain
assembler directives, some of the most common ones are:

What assembler directives declare the initial DSP memory addresses where
program instructions and data variables are stored?

a. The mnemonic and the operand directives respectively.


b. The .entry and the .end directives respectively.
c. The .entry directive.
d. The .ps and the .ds directives respectively.

The executable file dsk5a.exe is the assembler program used with the DIGITAL
SIGNAL PROCESSOR.

When a source file (*.asm) is assembled, a dsk file (*.dsk) and a listing file (*.lst)
are created.

1-32
The Assembler and Debugger

The dsk file, also known as the program file, contains a list of machine code
corresponding to assembled source statements. To run a program, the program file
must be loaded into the DSP.

The DSP is loaded with the dsk file.

The listing file lists all source statements, line numbers and any errors that occurred
during assembly.

When the program is viewed inside of the debugger, the listing and the dsk files are
used to create a display of the source file statements.

If an MPY, multiply, instruction uses one operand, #031h, and is labeled OMEGA
then which of the following source statements has the correct syntax?

a. MPY #031h ;OMEGA


b. OMEGA: MPY #031h
c. #031h MPY ;OMEGA
d. None of the above

The C5x VDE is the debugger used with the Digital Signal Processor. It has the
following functions:

– Load dsk programs into memory and view the program code,
– run and halt the program and execute single step commands (execution of
single instructions),
– display in a viewing window the CPU registers and peripheral registers,
– display in a viewing window the DSP memory areas,
– graph DSP memory values while the DSP program is running,
– edit CPU registers, DSP program instructions and memory,
– place breakpoints at specific DSP source statements.

1-33
The Assembler and Debugger

The C5x VDE uses the listing file to dis-assemble (contrary of assemble) machine
code contained within the dsk file. The dis-assembled code is then displayed.

When a dsk file is loaded into DSP memory the Dis-Assembly window automatically
opens.

The Dis-Assembly window displays four columns of information:

1. The address in memory where the instruction is found,


2. the instruction in machine code,
3. the instruction mnemonic,
4. the instruction operands.

The source statement highlighted with a yellow line represents the next instruction
that the DSP will execute.

A source statement highlighted with a purple line corresponds to an instruction


where a breakpoint has been set.

1-34
The Assembler and Debugger

A toolbar located at the top of the debugger screen has commands that aid in the
control of program execution.

Run and Halt, are used to begin and stop program execution.

StepInto: You can single step through the code by clicking on the StepInto button
on the Toolbar.

This will execute one program instruction for every click of the button.

StepOver: If you do not wish to single step through a subroutine, you can execute
the StepOver command once you reach a CALL function.

The entire function will then be executed, at this point single stepping can resume.

StepOut: The StepOut command will execute all of the instructions necessary to
execute a subroutine.

Execution will be halted once a RET (return from subroutine) assembler instruction
is encountered.

1-35
The Assembler and Debugger

The value of all CPU registers are shown in the C5X Registers window.

You will become familiar with many of the CPU registers as you advance through
the course.

For the moment, it is sufficient to know that these registers contain DSP system
information.

The registers displayed in the window contain values, DSP status and control bits
and instruction pointers.

Memory is viewed inside of the debugger by opening a Memory display window.

The memory addresses to be monitored are user selected.

As many memory windows as needed may be launched inside of the debugger.

1-36
The Assembler and Debugger

When a dsk file is loaded inside of the C5x VDE, the following is true for the Dis-
Assembly and Memory display windows:

– All source statement labels, used to declare a variable within the source code,
appear in blue.
– All comments of labeled source statements appear in green.

The Memory display window can be used as a Watch Window. Variables stored in
memory may be watched and edited if necessary.

Within all viewing windows, the following is true:

– Memory addresses and registers appear in red when the values stored within
them are modified during the execution of the previous instruction.
– Memory addresses and registers (except the RAM, XF and INTM registers) can
be edited by simply double-clicking on the desired register or memory address.

1-37
The Assembler and Debugger

The Graph command in the View menu can be used for graphical displays of data
values.

Signals can be viewed in either the time or frequency domain, at any point in your
program.

Breakpoints halt a program for the debugger user to be able to verify the status of
the loaded program after a certain instruction.

When an instruction, in the Dis-Assembly window, is double-clicked on, a


breakpoint is set on the instruction.

1-38
The Assembler and Debugger

The associate breakpoint window can be launched by executing the Associate


Breakpoints command in the Options menu.

A window can be continuously refreshed by using the associate breakpoint feature.


A selected display window (Graph display, Memory display, CPU Register
display, ...) can be associated with any breakpoint.

When a breakpoint is executed any display windows that are associated with it are
updated. This effectively connects a probe to a specific point in the program.

PROCEDURE

The Assembler and Directives

In this procedure section, you will assemble a source file and familiarize yourself
with the assembler source code directives.

1-39
The Assembler and Debugger

* 1. Open the ex1_2.asm file inside of a text editor.

* 2. Find, inside of the source code, the directives instructing the assembler
where to store program instructions and data variables.

See HELP Unit 01 shelp4

At what program memory address does the program code start?

a. 1280h
b. 080Ah
c. 0A80h
d. 0980h

Note: The source file contains entries that begin with .include.
The .include directive tells the assembler to read source
statements from a different file. The .include directive has been
used to eliminate complicated initialization subroutines from the
main source file.

* 3. Notice that the source code contains a wavetable within a .include


directive.

The .ds assembler directive used before the .include directive instructs to
which data memory address (dma) the DSP must begin writing the
wavetable values.

* 4. Assemble the program by executing, within the c:\lv91027\exercise\ex1_2\


folder, the following code at a DOS prompt:

C:\lv91027\bin\dsk5a.exe ex1_2.asm -l

1-40
The Assembler and Debugger

* 5. Confirm that a program file (ex1_2.dsk) and a listing file (ex1_2.lst) were
created when the source file was assembled.

* 6. Open the listing file inside of another text editor. Observe the contents of
the file.

What assembler directive is used to store the wavetable data variables to


DSP memory?

a. .word
b. SPLK
c. .entry
d. ADD

* 7. Close the source code and the listing file text editor windows.

Viewing Memory

In this procedure section, you will open a Memory display window inside of the C5x
VDE.

Note: Before using the C5x VDE please make certain the circuit
board power source is turned ON, and that the serial connection
is present between the host computer and the DIGITAL SIGNAL
PROCESSOR circuit block labeled SERIAL PORT.

* 8. Open the C5x VDE and using the Load Program command, found in the
C5x VDE File menu, load the ex1_2.dsk program into the DSP.

1-41
The Assembler and Debugger

* 9. Open a Data Memory window to the first wavetable value held within the
dma labeled C0 (use a capital C). This window can be launched by
executing the Memory command in the View menu.

What do the blue symbols (C0, C9, C19, C29, ...) within the Data Memory
window represent?

a. Natural divisions of the C5x VDE Data Memory window.


b. Memory addresses.
c. Wavetable values.
d. Source statement labels contained in the source file.

A wavetable is used to generate a waveform. It is a list of sample points


representing one period of the waveform to be generated. The technique
makes a train of discrete samples become a seemingly continuous signal.

1-42
The Assembler and Debugger

Graphing Memory

In this procedure section, you will view DSP memory graphically and use the
Graphical display to gather information about the wavetable of the current program.

* 10. Launch the Graphic Display window with the Graph command found in the
View menu. To visualize the wavetable data enter the setup information
that is found in the above figure (use a capital for C0).

* 11. Note that by clicking in the Graphic Display window a cursor line appears.
The coordinates of the point where the cursor line and the graphed curve
cross is displayed at the bottom of the window.

* 12. Locate the maximum value of the wave signal shown in the Graphic
Display window.

Input the x-coordinate (time) of the maximum value.

tMAX =__________ ms

* 13. Locate the minimum value of the wave signal shown in the Graphic Display
window.

Input the x-coordinate (time) of the minimum value.

tMIN =__________ ms

What is the frequency of the wavetable signal?

f =__________ Hz

1-43
The Assembler and Debugger

* 14. Inside of the Data Memory window edit to 0 the data at DSP memory
location C549 (by double-clicking) and refresh the wavetable Graphic
Display window(using the Windows toolbar menu). Observe the change
caused on the wave signal in the Graphic Display window.

* 15. Change the wavetable value back to the original value of 1F0E. Refresh
the Graphic Display window.

* 16. Connect the OUTPUT of the DC SOURCE to the ANALOG INPUT of the
CODEC.

* 17. Connect the ANALOG OUTPUT of the CODEC to the INPUT of the AUDIO
AMPLIFIER and to the input of an oscilloscope.

* 18. Run the DSP program (the RUN command can be found on the C5x VDE
toolbar). Use the GAIN of the AUDIO AMPLIFIER to adjust the volume
level of the generated signal.

* 19. Observe the generated wave signal on the oscilloscope. Observe that the
frequency of the generated signal is shown on the display of the I/O
INTERFACE circuit block.

* 20. Using the potentiometer of the DC SOURCE vary the frequency of the
generated signal.

1-44
The Assembler and Debugger

* 21. Using the oscilloscope, compare the frequency displayed in the I/O
INTERFACE circuit block with the inferred frequency of the generated
wave signal.

What are the approximate frequency limits of the generated function?

fMIN =__________ Hz

fMAX =__________ Hz

Breakpoints and Associated Breakpoints

In this procedure section, you will create a breakpoint within the DSP program. With
the aid of an associated breakpoint, you will view the variation with time of certain
DSP register values.

* 22. Execute the HALT command found on the C5x VDE toolbar. Place a
breakpoint at the program memory address (pma) labeled MARKER1 by
double-clicking within the dis-assembly window on the label(you might
have to scroll down to find it). Run the DSP program.

Which of the following sentences is correct?

a. After a program is started the breakpoints disappear.


b. The program memory address (pma) labeled MARKER1 cannot
become a breakpoint.
c. The program is automatically halted when an execution line reaches
a breakpoint.
d. All of the above.

1-45
The Assembler and Debugger

* 23. Open a Peripheral Registers window. The Peripheral Registers window


can be launched by executing the Peripheral Registers command inside
of the View menu.

* 24. Associate the MARKER1 breakpoint, set in step 22, with the Peripheral
Registers window. To do so, make certain that the Peripheral Registers
window is active(highlighted). Launch the Associate Breakpoint window by
executing the Associate Breakpoints command inside of the Options menu.
Fill the menu as show and press OK.

* 25. Execute the ANIMATE command found on the C5x VDE toolbar. Make a
note of the peripheral registers that are continuously updated.

The DXR register represents the register where values are stored before
being sent through the CODEC to the ANALOG OUTPUT. It is the stream
of these values that create the signal seen on the oscilloscope.

1-46
The Assembler and Debugger

* 26. Halt the program. To generate a signal with a low frequency, adjust the
potentiometer of the DC SOURCE to the minimum position.

* 27. Make the Graphic Display the current window within the C5x VDE and
execute the Options command that is located on the Toolbar.

* 28. Change the settings of the Setup for Graphics window to the ones shown
in the figure above.

* 29. Associate the breakpoint, placed at MARKER1 in step 22, with the Graphic
Display window. Select to refresh the window only on the associated
breakpoint.

* 30. Animate the DSP program.

* 31. While the program is in Animate mode, execute the Graphic Display
Options command again. Change the graph from the Time Domain to the
Frequency Domain: FFT.

In the Frequency Domain: FFT Graphic Display mode, each spike


represents the component of an individual frequency within the signal
being observed. Because the program is currently generating a sine wave
only one spike appears.

* 32. To generate a signal with a high frequency, adjust the potentiometer of the
DC SOURCE to the maximum position. Observe the effect of the frequency
change inside of the Graphic Display window.

1-47
The Assembler and Debugger

Editing Memory and Registers

In this procedure section, using the C5x VDE you will edit a memory location as
well as a CPU register.

* 33. Halt the animation. Observe that the content of the Program Counter (PC)
register is displayed within the C5x Registers window inside of the C5x
VDE. The PC register stores the address of the next source statement to
be executed.

* 34. Note that the PC value corresponds to the address highlighted in yellow
inside of the Dis-Assembly window.

* 35. Edit the PC register by double-clicking it within the C5x Registers window.
Edit the PC to the pma labeled MAIN.

* 36. Note that the source statement now highlighted in yellow corresponds to
the statement held within the pma labeled MAIN. This is a simple
technique used for moving from one part of code to another.

* 37. Close the C5x Visual Development Environment.

CONCLUSION

& A source statement has a very precise syntax. It contains a mnemonic, and the
mnemonic operands. It may also contain a label and a comment.

& Assembler directives supply the program with data and they control the
assembly process.

& When a source file (*.asm) is assembled, a program file (*.dsk) and a listing file
(*.lst) are created.

& The C5x VDE is the debugger used with the DIGITAL SIGNAL PROCESSOR
circuit board. It gives the programmer the ability to diagnose DSP program
problems and to control program execution.

1-48
The Assembler and Debugger

REVIEW QUESTIONS

1. Out of the following possibilities which is the correct syntax for a source
statement?

a. MNEMONIC [OPERAND LIST] [LABEL] [; COMMENT]


b. [LABEL][:] [OPERAND LIST] MNEMONIC [; COMMENT]
c. [LABEL][:] MNEMONIC [; COMMENT] [OPERAND LIST]
d. [LABEL][:] MNEMONIC [OPERAND LIST] [; COMMENT]

2. Which of the following choices best describes the function of assembler


directives in the source code?

a. They initialize program instructions and data values into memory.


b. Assembler directives supply program data and control during the assembly
process.
c. They reserve space in memory for initialized variables.
d. All of the above.

3. What step(s) must you perform to execute (Run) a dsk program from within the
C5x VDE?

a. Turn power on to the DIGITAL SIGNAL PROCESSOR circuit board and


make the serial connection to the host computer.
b. Open the C5x VDE. Using the Load Program command found in the File
menu, load the dsk program into the DSP.
c. Execute the Run command from the C5x VDE toolbar.
d. All of the above.

4. Which among the following list of features of the C5x VDE is false?

The C5x VDE lets you:

a. Run and halt the program and execute single step commands (execution
of single instructions).
b. Edit, build, debug, profile and manage DSP projects (programs).
c. Load dsk programs into memory and view the program code.
d. Place breakpoints at DSP source statements.

1-49
The Assembler and Debugger

5. Which of the following choices is the reason why Animate mode and Run mode
within the C5x VDE are not the same?

a. In Animate mode, the DSP is not used at all. The program is executed by
the C5x VDE.
b. In Run mode, the DSP is not used at all. The program is executed by the
C5x VDE.
c. In Animate mode, the DSP stops communication with the C5x VDE and the
DSP begins independent execution of the program.
d. In Run mode, the DSP stops communication with the C5x VDE and the
DSP begins independent execution of the program.

1-50
Exercise 1-3

Processor Arithmetic

EXERCISE OBJECTIVES

Upon completion of this exercise, you will be familiar with the numerical formats
and representations used within DSPs.

DISCUSSION

Digital Signal Processors are categorized by the way that their arithmetic is
performed. A DSP can either be:

a fixed-point DSP, or,


a floating-point DSP

The type of DSP chosen for a specific application depends on the suitability of its
arithmetic for the task. The TMS320C50 is a fixed-point DSP.

1-51
Processor Arithmetic

Fixed-point DSPs are usually cheaper than their floating-point counterparts


because they contain less silicon and have less external pins.

Fixed-point devices generally have faster clock cycle rates.

In 1998, these clock cycles were as small as 10 ns, corresponding to a processor


cycle rate of 100 MHz.

1-52
Processor Arithmetic

Floating-point devices are usually more flexible because their arithmetic system has
access to a wider dynamic range and in many cases these systems are more
precise.

A typical 16-bit fixed-point processor stores coefficients and data values with 16-bit
precision.

However, within the internal arithmetic unit of the DSP, intermediate values are
kept at 32 bits of precision.

By so doing, the cumulative rounding error made during calculations is minimized.

1-53
Processor Arithmetic

When you use your computer or your calculator you can calculate such values as:

(-1*23) or (3.453)

A DSP can also provide answers to the same types of questions.

A programmer must use certain numerical formats so that every value desired to
be used in the DSP has a binary representation associated with it.

This binary value will need at times to represent either a positive or negative,
fractional or integer number.

Since a DSP is a processor that specializes in doing rapid calculations, it is


essential to understand how the diverse range of numeric values can be expressed.

1-54
Processor Arithmetic

Integers, both negative and positive, are represented by the two's complement
integer format (2s-format).

Fractional numbers, both negative and positive, are represented by the two's
complement fractional format (Q-format).

These formats differ only by the associated weights that are given to each bit of
information.

In two's complement integer notation (2s-format) a negative sign is associated with


the most significant bit.

The 2s-format provides a numeric range covering:

-2N-1 to +(2N-1 - 1)

where N represents the number of bits in the binary number.

1-55
Processor Arithmetic

Represent (-6) in a 4-bit 2s-format binary number.

-6 =__________

The two's complement fractional format (or Q-format) associates different weights
with each bit as well.

The existence of the binary point separating the fractional weighted values from
the integral weighted values is implied.

In Q15-format the most significant bit is the sign bit and it is given a weight of -20.

This implies that the binary point is located between the MSB and the 14th bit.

By changing the position of the binary point the weight given to each bit is also
changed. Consequently, the dynamic range and the precision of the two's
complement fractional format may vary with the type of format being used.

1-56
Processor Arithmetic

Note that by continuing to move the binary point further and further to the right a
handy relationship is uncovered. The 2s-format and the Q15-format decimal
representations are proportional by a scaling factor of 215.

Which of the following choices represents the proportionality constant between the
2s-format and Q13-format, for the 16-bit binary number?

a. 13
b. 215
c. - 3.0518 x 10-5
d. 8192

The 2s- and Q-formats can be used by the fixed-point internal arithmetic units of
any DSP. These formats are numerical conventions used by programmers. The
binary arithmetic done inside of a fixed-point DSP is not affected by the format of
the binary number used.

1-57
Processor Arithmetic

Floating-point DSPs generally use a 32-bit format where the 24 left-most bits
represent the mantissa and the 8 remaining bits represent the exponent.

So that a continuous range of values is covered by a 32-bit floating-point number,


the mantissa must vary over -1 to 0 and +1 to +2.

This means that the bit weighted by 20 will always be equal to 1. Therefore, it
becomes unnecessary to store it in memory and during calculations it becomes an
implied bit.

1-58
Processor Arithmetic

Floating-point processors are usually more precise and have a larger dynamic
range.

While in theory the choice between fixed- and floating-point arithmetic is


independent of the choice of precision, in practice floating-point processors usually
provide higher precision.

This arises because more bits are provided to define the mantissa (24 bits + 1
implied bit) compared to fixed-point DSPs that usually have 16 bits, although 20-
and 24-bit fixed-point DSPs exist.

PROCEDURE

Converting a Signed Fractional Number to Q14-Format

In this procedure section, you will learn how to convert a signed fractional number
to a binary number written in Q-format.

* 1. Follow steps 2 to 5 to convert the following decimal value:

-0.984375 to binary Q14-format.

* 2. Open the Microsoft® Calculator present in your version of Windows.

* 3. Make certain that the Scientific option under the View menu is checked.
Checking this option makes the Standard calculator become a Scientific
calculator.

1-59
Processor Arithmetic

* 4. Multiply (-0.984375) by 214.

This scales the fractional decimal value to an integral decimal value


allowing easy conversion to a binary number.

* 5. Use the Calculator conversion functions (Dec to Bin) to change the value
obtained in step 4 to a one Word (16 bits) binary value (make certain the
Word check box has been clicked).

What Q14-format binary value representing -0.984375d did you obtain?

a. 1100 0001 0000 0000


b. 1001 0001 0010 0001
c. 0111 0001 0000 0000

1-60
Processor Arithmetic

* 6. Verify that the binary number that you calculated (1100 0001 0000 0000)
is equal to the decimal value -0.984375 it was converted from. Use the
Q14-format bit-weights to calculate the decimal value.

Converting a Binary Number to a Decimal Value

In this procedure section, you will learn how to convert a binary number to a
decimal value.

* 7. Follow steps 7 through 13 to convert the following number:

B093h = 1011 0000 1001 0011b to a decimal value when:

1. The hexadecimal number represents an unsigned integer.


2. The hexadecimal number is written with the 2s-numerical format.
3. The hexadecimal number is written in Q15-numerical format.

* 8. What is the weight given to each bit of a binary word representing


unsigned integers?

a. 20 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13 2-14 2-15
b. 215 214 213 212 211 210 29 28 27 26 25 24 23 22 21 20
c. -215 214 213 212 211 210 29 28 27 26 25 24 23 22 21 20

* 9. Consider that B093h was written with the above numerical format
(unsigned integer). Use the unsigned integer format bit-weights to calculate
the corresponding decimal value. What is the value of the calculated
decimal number?

=__________

1-61
Processor Arithmetic

* 10. What is the weight given to each bit of a binary word in 2s-format?

a. -215 214 213 212 211 210 29 28 27 26 25 24 23 22 21 20


b. -20 21 22 23 24 25 26 27 28 29 210 211 212 213 214 215
c. -22 21 20 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13

* 11. Consider that B093h was written with the above numerical format (2s-
format). Use the 2s-format bit-weights to calculate the corresponding
decimal value. What is the value of the calculated decimal number?

=__________

* 12. What is the weight given to each bit of a binary word in Q15-format?

a. -20 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13 2-14 2-15
b. -20 21 22 23 24 25 26 27 28 29 210 211 212 213 214 215
c. 20 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13 2-14 2-15

* 13. Consider that B093h was written with the above numerical format (Q15-
format). Use the Q15-format bit-weights to calculate the corresponding
decimal value. What is the value of the calculated decimal number?

=__________

Making Numerical Conversions with the C5x VDE

In this procedure section, you will make the same numerical conversions done in
the previous section but this time using the C5x VDE.

Note: Before using the C5x VDE please make certain the circuit
board power source is turned ON, and that the serial connection
is present between the host computer and the DIGITAL SIGNAL
PROCESSOR circuit block labeled SERIAL PORT.

* 14. Open the C5x Visual Development Environment (VDE). Close the C5x
Registers window.

* 15. Open a Memory display window to dma 0x300 by executing the Memory
command found in the View menu. Use the Signed Integer display format.
This window will be used to make numerical conversions.

1-62
Processor Arithmetic

* 16. Edit data memory address 0x300 to the following value:

address value 0x300 -16128 (- 0.984375 x 214)

The value entered into data memory corresponds to the decimal value that
you scaled by 214 in step 4, and then converted to binary.

* 17. Open the Options window of the Data Memory display, and change the
display format to binary.

What is the binary value contained in data memory address 0x300?

a. 1100 0001 0000 0000


b. 1011 1001 0000 0001
c. 0000 0011 0000 0001

In step 5, you had converted -0.984375 to the same binary Q14-format


value: 1100 0001 0000 0000

* 18. Open the Options window of the Data Memory display and change the
display format to Hex (hexadecimal).

1-63
Processor Arithmetic

* 19. Edit data memory address 0x301 to the following value:

address value 0x301 0xB093

The hexadecimal value, entered into the data memory address,


corresponds to the initial value in step 7 before you converted it into three
different decimal values.

* 20. Open the Options window of the Data Memory display, and change its
display format to Unsigned Integer.

What is the decimal value contained in data memory address 0x301?

=__________

In step 9, you had converted B093h to the same unsigned value: 45203

* 21. Open the Options window of the Data Memory display, and change its
display format to Signed Integer.

What is the decimal value contained in data memory address 0x301?

=__________

In step 11, you had converted B093h, written in 2s-format, to the same
decimal value: -20333

* 22. Open the Options window of the Data Memory display, and change its
display format to Fixed-Point Q15.

1-64
Processor Arithmetic

What is the decimal value contained in data memory address 0x301?

=__________

In step 13, you had converted B093h, written in Q15-format, to


approximately: -0.620513916 d

* 23. End the C5x VDE session.

CONCLUSION

& DSPs are categorized by the way that their arithmetic is performed. A DSP can
either be a fixed-point DSP, or a floating-point DSP.

& Integers, both negative and positive, are represented by the two's complement
integer format (the 2s-format).

& Fractional numbers, both negative and positive, are represented by the two's
complement fractional format (the Q-format).

& Numerical formats are a mathematical convention, and they differ only by the
weights that are associated with each bit in a binary word.

REVIEW QUESTIONS

1. What is the range of decimal values that a 2s-format 16-bit binary number can
represent?

a. -65536 to 65535
b. 0 to 65535
c. -32768 to 32767
d. None of the above.

2. The following sentences make certain statements about floating-point Digital


Signal Processors. Which one of the following statements is true?

a. Floating-point processors are usually cheaper than their fixed-point


counterparts because they contain less silicon and have less external pins.
b. Floating-point devices generally have faster clock cycle rates.
c. Floating-point processors are usually more precise and have a larger
dynamic range compared with fixed-point processors.
d. A typical 24-bit floating-point processor stores coefficients and data values
with 24-bit precision.

1-65
Processor Arithmetic

3. Which one of the following choices lists the correct 2s-format bit-weights?

a. 215 214 213 212 211 210 29 28 27 26 25 24 23 22 21 20


b. -215 214 213 212 211 210 29 28 27 26 25 24 23 22 21 20
c. -20 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13 2-14 2-15
d. -20 21 22 23 24 25 26 27 28 29 210 211 212 213 214 215

4. What is the decimal value of the above binary number in Q15-, Q14- and Q13-
format?

a. -19138, 46398, 13630, respectively


b. approximately 1.4160, 2.8319, 5.6638 respectively
c. approximately -0.58405, -1.1681, -2.3362, respectively
d. None of the above.

5. Which of the following statements describes correctly how the decimal


representation, DQ14, of a Q14-format binary number is related to the decimal
representation, D2S, of the same binary number read in 2s-format?

a. DQ14 = 2-15#D2S
b. DQ14 = 2-14#D2S
c. DQ14 = 215#D2S
d. DQ14 = 214#D2S

1-66
Unit Test

1. What characterizes a Digital Signal Processor?

a. Specialized internal structures that make them execute commands rapidly


and efficiently.
b. Fast multiply instructions.
c. Reduced numbers of commands making the DSP programming process
simpler.
d. All of the above.

2. Which among the following choices is not one of the four columns of information
displayed within the Dis-Assembly window of the C5x VDE?

a. The CPU register.


b. The machine code instruction.
c. The instruction mnemonic.
d. The instruction operand(s).

3. Represent (+25) in a 6-bit 2s-format binary number.

a. 010001
b. 011001
c. 011000
d. 011010

4. Represent -0.608978271484375 by a 16-bit binary number using Q15-format.

a. 1011 0010 0000 1101


b. 1100 1101 1111 0010
c. 0011 0010 0000 1101
d. 0100 1101 1111 0010

5. Which of the following choices about fixed-point and floating-point DSPs is true?

a. Fixed-point DSP have a low dynamic range compared with floating-point


DSPs that have a very large dynamic range.
b. In practice, fixed-point DSPs usually have a higher precision than floating-
point DSPs.
c. Binary fixed-point representation is comprised of a mantissa and of an
exponent.
d. The number of external I/O interface pins belonging to a fixed-point DSP is
large and is small on a floating-point DSP.

1-67
Unit Test (cont’d)

6. Which of the following choices is a result of the difference in the individual


methods that fixed-point and floating-point devices have for storing data and
executing arithmetic operations?

a. Fixed-point processors are more precise and have a larger dynamic range
as compared to floating-point processors.
b. Floating-point DSPs have faster clock cycle rates than fixed-point DSPs.
c. Floating-point DSPs are cheaper than fixed-point DSPs.
d. A floating-point DSP has more external I/O interface pins than a fixed-point
DSP.

7. Which of the following is written in machine code?

a. IF (i.NE.27) THEN (omega = 2*sin(x))


b. 982Eh
c. 1011 1110 0001 0011
d. OMEGA: MPY #031h

8. Which of the following statements about Digital Signal Processors is not true?

a. They allow for more complex processing than is possible with analog
circuitry.
b. Their performance over time is affected by temperature changes and
component aging.
c. They provide repeatable performance and produce a higher signal quality.
d. Digital processing code can be easily modified, and with it design updates
or changes are more flexible.

9. What files are created by the dsk5a.exe assembler used with the Digital Signal
Processor?

a. The source code file (*.asm) and the listing file (*.lst).
b. The program file (*.dsk) and the source code file (*.asm).
c. The program file (*.dsk) and the listing file (*.lst).
d. None of the above.

10. Which of the following circuit blocks can be used to control a DSP program?

a. INTERRUPT
b. CODEC
c. AUXILIARY POWER INPUT
d. AUDIO AMPLIFIER

1-68
Unit 2

CPU Architecture

UNIT OBJECTIVES

Upon completion of this unit, you will understand the basic difference between the
architecture of a digital signal processor and that of a general-purpose processor.
You will be familiar with the layout of the internal elements of a DSP CPU.

UNIT FUNDAMENTALS

In the 1950s, analog signal-processing circuit designers began to look to computers


to simulate their designs. They were able to simulate the circuits, but not in real-
time.

It was until the mid-1970s that computers became powerful enough to do the real-
time signal processing of the analog circuits that they had been simulating.

DSPs today are in fact the result of years of research that even now is still a very
active field. Their specialized architecture allows them to implement signal
processing algorithms more effectively than general-purpose processors.

2-1
CPU Architecture

The basic processor architecture that is most often implemented in general-purpose


processors is known as the Von Neumann architecture.

The Von Neumann architecture has a single memory space that is used for both
data and instructions (instructions belonging to the program).

Digital Signal Processors have historically used a slightly different internal structure
known as the Harvard architecture.

The Harvard architecture, as opposed to the Von Neumann, has separate memory
spaces for data and program instructions.

The Harvard architecture differentiates between the types of information it stores


in memory.

The information is either a data word (an operand for an instruction) or a program
word (the instruction).

2-2
CPU Architecture

Data words are kept in data memory space and are read from and written to
different locations within the processor via the data bus (the DB).

Programming words are kept in program memory space and are read from and
written to different locations within the processor via the program bus (the PB).

The Von Neumann architecture only uses one bus.

This bus accesses both data and program instructions.

A typical DSP contains:

– Memory
– a Central Processing Unit (CPU)
– Peripherals
– a Bus structure

2-3
CPU Architecture

Memory consists of all of the addressable storage space inside of a processing


unit:

– Program Read-Only Memory (ROM)


– Data/program Single-Access RAM (SARAM)
– Data/program Dual-Access RAM (DARAM)

The Central Processing Unit (CPU) is that part of a processor where reside the
circuits that control the interpretation and execution of instructions.

2-4
CPU Architecture

The peripherals are those elements such as the timer, that are used by the CPU
to time the execution of instructions or, such as the serial ports, to communicate
with devices exterior to the processor.

The bus structures of processors are differentiated by the way that the individual
processor buses are interconnected with the other elements of the processor (CPU,
memory and peripherals).

It is essentially the bus structure that differentiates a Harvard architecture from a


Von Neumann architecture.

2-5
CPU Architecture

The CPU of the TMS320C50(C50) contains:

– Program control elements


– Memory-mapped registers
– an Auxiliary Register Arithmetic Unit (ARAU)
– a Central Arithmetic Logic Unit (CALU)
– a Parallel Logic Unit (PLU)

The CPU elements are found in practically all DSP models, but they might go under
different names. E.g.: The CALU of the DSP32xx family, designed by Lucent
Technologies, is named a Data Arithmetic Unit (DAU).

The Program Controller is the unit that controls processor instruction execution.

The PC (Program Counter register) and status and control registers are at the
heart of Program Controller unit operation.

Memory-mapped registers are on-chip registers mapped to (associated with) a data


memory address.

2-6
CPU Architecture

There are 28 core CPU registers, 17 peripheral registers, 16 I/O port registers, and
35 reserved registers in the C50.

In total, 96 registers are mapped into data memory.

Since memory-mapped registers are addressed in data memory space, they can
be written to, and read from, in the same way as any other data memory location.

The Auxiliary Register Arithmetic Unit (ARAU) is used to deduce (calculate and
compare) and keep track of the position of information held within DSP memory.

The C50 has eight Auxiliary Registers (ARs) which are used by the ARAU to store
important memory addresses.

The Central Arithmetic Logic Unit (CALU) is responsible for executing logic and all
arithmetic operations within a DSP.

For example on the TMS320C50 DSP, the CALU executes these operations with
a 16-bit x 16-bit multiplier, an accumulator, operand registers, binary shifters, and
a 32-bit 2s-complement Arithmetic Logic Unit (ALU).

The Parallel Logic Unit (PLU) is a 16-bit logic unit that executes logic operations
without interrupting the CALU (the main CPU arithmetic and logic unit).

EQUIPMENT REQUIRED

In order to complete the following exercises, you will need:%

& F.A.C.E.T. base unit


& DIGITAL SIGNAL PROCESSOR circuit board
& C5x VDE program
& Ex2_1, ex2_2, ex2_2b and ex2_3 assembler and program files
& Oscilloscope
& Function generator

2-7
2-8
Exercise 2-1

The Central Arithmetic Logic Unit

EXERCISE OBJECTIVES

Upon completion of this exercise, you will be familiar with the role that the CALU
plays within a DSP.

DISCUSSION

Note: Some 'C50 assembler CALU instructions are briefly covered in this
exercise. It will be left up to you, the student, to cover the rest of the related
material. The material can be found in the following file:
C:\LV91027\DOC\TMS320C5x_UsersGuide.pdf.

The Central Arithmetic Logic Unit (CALU) is where the most important signal
processing manipulations take place.

The CALU, also known as the data path, is the principle arithmetic and logic
processing path for a DSP.

It lies along the data (operand) bus and is an integral part of the execution of nearly
every instructions.

2-9
The Central Arithmetic Logic Unit

A fixed-point CALU contains:

– Multiplier(s)
– Accumulator(s)
– Operand registers
– Shifters
– At least one Arithmetic Logic Unit (ALU)

Signal processing algorithms are almost entirely devoted to arithmetic and logic
operations. The CALU is designed to execute these types of operations extremely
rapidly.

A DSP is differentiated from a general-purpose processor by:

1. Its memory architecture (a DSP usually has a Harvard architecture).

2. The rapid execution time of the CALU (or data path).

2-10
The Central Arithmetic Logic Unit

Both the Multiplier and the ALU are simultaneously used during a MAC instruction.
The CALU is said to be using its entire computational bandwidth.

For most DSPs, when the entire computational bandwidth of the CALU is
repetitively used, a result is produced every clock cycle.

The operand registers play an important role within the CALU.

The registers are used to temporarily store operands, before they are supplied for
arithmetic operations to the ALU or Multiplier.

The CALU of the TMS320C50 ('C50) has 3 operand registers.

Memory-mapped Temporary REGister 0 (TREG0) is an operand register used by


the Multiplier.

It holds one of the multiplication operands for the Multiplier.

The Product REGister (PREG) is a 32-bit operand register which stores the
Multiplier result.

The value held in the PREG can be sent to the ALU for an arithmetic operation, or
it can be passed on to the Data Bus (DB) for the another stage of processing.

ACCB (the ACCumulator Buffer) provides a temporary storage place for the value
held by in the ACCumulator register (ACC).

The ACC register is designed to hold the last arithmetic result produced by the
ALU.

The ALU is designed to implement a wide range of arithmetic and logical


operations.

2-11
The Central Arithmetic Logic Unit

EXAMPLE OPERAND 1 OPERAND 2 OPERATION OUTPUT

1 1011 0100 0001 1101 ADD 1101 0001

2 1011 0100 0001 1010 SUBTRACT 1001 1010

3 0010 1001 1011 1101 AND 0010 1001

4 0010 1001 1011 1101 OR 1011 1101

5 0111 0101 – NEGATE 10001011

Some operations that are commonly executed by the ALU include: addition,
subtraction, negation, and logical and, or, xor, and not.

The majority of ALU instructions execute within a single clock cycle.

Most of the ALU instructions that take more than one clock cycle rely on other units
for pre- or post-processing of data.

E.g., add a data value to the ACC and then execute a binary shift. The TMS320C50
requires 2 clock cycles to execute the operation. The binary shift is an example of
the type of processing that takes place after addition.

The ALUs of fixed-point DSPs execute 2s-complement arithmetic.

The ALU executes operations using twice the precision of the native word width of
the processor.

For example the ALU of the 'C50, a 16-bit fixed-point DSP, inputs, outputs, and
executes with a 32-bit word width.

2-12
The Central Arithmetic Logic Unit

Most DSP have an ALU mode of operation called sign-extension mode.

When enabled all ALU outputs are sign-extended.

Sign extension prevents a negative number from being mistaken for a positive one.

When the number of bits used to represent a word (e.g., 16 bits) is less than the
number of bits required to represent the same word inside of the CALU (32 bits)
then sign-extension extends the sign-bit into the added MSBs.

If the following 16-bit 2s-format number:

1011 0111 0010 0001 b

was loaded using the ALU into a 32-bit Accumulator when sign-extension mode
was enabled what would be the contents of the Accumulator register?

a. 0000 0000 0000 0000 1011 0111 0010 0001 b


b. 1011 0111 0010 0001 b
c. 1111 1111 1111 1111 1011 0111 0010 0001 b
d. None of the above.

2-13
The Central Arithmetic Logic Unit

The last arithmetic or logical operation executed by the ALU is stored in the
ACCumulator (ACC).

The result held in the ACC can either be stored in the ACC Buffer register (ACCB),
passed on to the ALU, or to another stage of processing using the Data Bus (DB).

In the case of the 'C50 DSP, two operands need to be input into the ALU to execute
any of its arithmetic or logical operations.

One of the operands is supplied by the ACCumulator register (ACC).

One of three other locations provide the other data operand for an ALU operation:

– Data path (e.g., to fetch an operand from memory)


– Multiplier Product REGister (PREG)
– ACCumulator Buffer (ACCB) register

Multiplication is an essential operation used in virtually all digital signal processing


applications.

In many of the applications where multiplication is used half or more of the


instructions executed by the processor are multiplication operations.

Central to nearly all programmable digital signal processors is the single-cycle


Multiplier.

The Multiplier refers to the circuit within the DSP that executes the multiplication of
binary numbers.

Depending on operand size(8-bit or 16-bit for the C50), nearly all Multiplier
instructions can be executed within one clock cycle.

2-14
The Central Arithmetic Logic Unit

Multiplication in fixed-point DSPs is executed with 2s-complement arithmetic.

A Multiplier requires a minimum two operands to execute a multiplication.

These operands are treated as 2s-complement numbers.

In the TMS320C50, register TREG0 is always used as one of the operand sources
for the Multiplier.

In certain cases, such as when the square root instructions (SQRA and SQRS) are
executed, there are no other operands than TREGO used by the Multiplier.

When another multiplication operand is required it is fetched from one of two other
locations:

– Data memory using the Data Bus (DB)


– Program memory using the Program Bus (PB)

2-15
The Central Arithmetic Logic Unit

As previously stated, the Multiplier result is stored in a Product REGister (PREG).

The product register is twice as wide as the word width of the multiplication
operands (native data word width of the DSP).

OPERAND 1 OPERAND 2 OPERATION RESULT PREG (AFTER SIGN EXT.)

0111 0111 0011 0111 MULTIPLIER 0001 1001 1001 0001 0001 1001 1001 0001

(+ 119) (+ 55) (+ 6545) (+ 6545)

0110 0110 1011 0111 MULTIPLIER 0010 0010 1110 1010 1110 0010 1110 1010

(+ 102) (- 73) (+ 8938) (- 7446)

FALSE

All Multiplier results are sign-extended before they are stored in the Product
REGister (PREG).

This combined with the fact that the PREG has twice the operand word width
means that, by itself, the Multiplier does not introduce any errors into computations.

To keep the level of arithmetic precision constant, the number of bits that are used
to represent multiplication, accumulation and other arithmetic operation results,
need to be increased.

That is why that in DSPs the Multiplier Product Register and the ALU ACCcum-
ulator (ACC) have a width twice that of the native data word width.

2-16
The Central Arithmetic Logic Unit

OPERAND 1 OPERAND 2 OPERATION ACCUMULATOR OVM CORRECTION

OVER-FLOW 7FFF FFFF h 7FFF FFFF h ADDITION FFFF FFFE h 7FFF FFFF h

UNDER-FLOW 8000 0000 h 8000 0000 h ADDITION 0000 0000 h 8000 0000 h

FALSE

maximum positive value 7FFF FFFF h 231 - 1

maximum negative value 8000 0000 h -231

an overflowed value FFFF FFFF h -1

an underflowed value 0000 0000 h 0

Most signal processing applications require the addition of series of data values.
These operations when executed within fixed-point DSP can easily lead to an
overflow or underflow.

In many processors, a mode of operation exists which is used to decrease the error
that is caused when overflow or underflow occurs. This mode within the
TMS320C50 DSP is named OVerflow saturation Mode (OVM).

Which of the following operations produce overflow of a 32-bit accumulator?

a. (4000 0019 h + 3333 ABB4 h)


b. (3B56 FF5F h + 5432 1145 h)
c. (0455 E089 h + 0054 31AB h)
d. (1223 556F h + 2000 EF02 h)

Barring the occurrence of overflow or underflow the precision level within the ALU
and the Multiplier is kept at the same level as when the arithmetic entered the
CALU.

However, at some point it is usually necessary to reduce the precision of the


results; The data bus is still only half the bit-width of the CALU results.

Therefore, the programmer must select the product register or accumulator bits
which will be passed on to the next stage of processing (via the data bus).

2-17
The Central Arithmetic Logic Unit

The selection of which bits to pass on is done with shifters that are located at the
exit of the PREG and of the ACC.

A shifter can shift a binary number to the right or to the left by so many bits.

However, shifting a number n bits to the left effectively multiplies it by a power of


two (2n).

Pre- and Postscalers are used to scale values before they are input to or output
from the Multiplier and ALU.

Scaling is an important operation in fixed-point DSPs because overflow can be


avoided by prescaling CALU inputs.

2-18
The Central Arithmetic Logic Unit

DSP FAMILY METHOD USED TO AVOID OVERFLOW

AT&T DSP16xx 4 guard bits

Analog Devices ADSP-21xx 8 guard bits

TI TMS320C2x and C5x No guard bits.

Intermediate results can be scaled.

Ideally, the size of an accumulator register should be larger than the size of the
multiplier product register by several bits.

The extra bits named guard bits allow the programmer to accumulate a number of
values without the risk of overflowing the accumulator and without the need to scale
intermediate results (avoiding overflow).

A single-bit field, present in the 'C50, and known as the carry bit (or the C bit), is
associated with the ACC register.

The C bit indicates whether an ALU operation generated a carry or a borrow.

The DSP can be programmed to conditionally test this bit.

The C bit, similar to a guard bit, is useful for extended-precision arithmetic.

2-19
The Central Arithmetic Logic Unit

POCEDURE

The ALU

In this procedure section, you will load the accumulator with a value input from the
DIP switch. Then, using the ALU, you will add three different values to the
accumulator, each one fetched from a different operand source.

Note: Before using the C5x VDE please make certain the circuit board
power source is turned ON, and that the serial connection is present
between the host computer and the DIGITAL SIGNAL PROCESSOR
circuit block labeled SERIAL PORT.

If at anytime during the following procedure you realize that you did not correctly
follow a procedure step, then using the C5x VDE simply edit the PC (Program
Counter register), back to one of the last labeled program memory addresses
(either MAIN, MARKER1, MARKER2, or MARKER3).

Within WinFACET, using the Go to previous page button, return to the beginning
of the Procedure Section associated with the labeled program address that you
returned to and start following the procedure steps once again.

* 1. Open the ex2_1.asm assembler source file within an ASCII text editor. This
is the DSP program source file used for the exercise. You can refer to this
source file at anytime during the procedure.

* 2. Open the C5x VDE, and load the ex2_1.dsk DSP program file.

* 3. Using the C5x VDE, open a Data Memory display to 0980h. This is the
data memory address where the constants and variables used by the
program begin being stored.

2-20
The Central Arithmetic Logic Unit

* 4. Using the C5x VDE, open a Program Memory display to 0A0Dh. This is the
program memory address where program machine code was stored.

* 5. Within the Dis-Assembly window, place a breakpoint at the program


address labeled MAIN and then press the C5x VDE RUN command, this
will initialize the CPU registers required for the exercise.

* 6. Using the C5x VDE, edit the contents of the PREG and ACCB registers.
Input different 16-bit 2s-complement values (0000 XXXXh), of your
choosing, into the registers. They must be 16-bit values.

* 7. Position the eight on-off switches (DIP switch) located in the I/O
INTERFACE circuit block, found on the DIGITAL SIGNAL PROCESSOR
board, to a value of your choosing.

This value will be loaded into the accumulator register, ACC.

* 8. Using the C5x VDE, STEP OVER the CALL DIPSWITCH:

CALL A80h,* instruction.

Observe that the binary value that the DIP switch was set to has been
loaded into the accumulator register (ACC).

What is the content of the data memory address labeled INPUT?

a. 0010 1111 1011 0011b


b. 01FEh
c. The same value that you input through the DIP switch.
d. None of the above

* 9. Using the C5x VDE, STEP OVER the ADD, APAC and ADDB instructions.

The following mathematical operation was executed by the above


instructions,

ACC = ACC + VALUE + PREG + ACCB

* 10. Write down the value contained in the ACC.

2-21
The Central Arithmetic Logic Unit

Sign-Extension Mode inside of the ALU

In this procedure section, you will enable sign-extension mode, execute the code
of the previous procedure section, and compare the results generated by the ALU
for both procedure sections.

* 11. Using the C5x VDE, edit the SXM bit to 1.

Sign-extension mode is enabled in the DSP.

* 12. Make certain that the PREG, ACCB and the DIP switch have the same
values as you had chosen before, if not, then set them back to the same
values.

* 13. Using the C5x VDE, edit the contents of the PC register to the program
address labeled MAIN. The execution line will return to the instruction
labeled MAIN.

* 14. Using the C5x VDE, execute, once again, with the STEP OVER command,
the instruction: CALL A80h,* (and the three add instructions that follow).

* 15. Write down the value contained in the ACC.

* 16. Compare the first accumulator result that you wrote down (this one was
generated with sign-extension mode not enabled) with the second
accumulator result that you just wrote down (this one was generated with
sign-extension mode enabled).

* 17. Observe that the results are not the same. This is because, when the SXM
bit is enabled, the negative 16-bit wide value stored in the data address
labeled VALUE, and added to the accumulator with the ADD ch instruction,
is sign-extended and seen as a negative number by the ALU. This is
contrary to when SXM is not enabled.

Prescaling and Postscaling inside of the ALU

In this procedure section, you will use the prescaler located at the input of the ALU.

* 18. Using the C5x VDE, make certain that the SXM bit is set (SXM = 1). If it is
not already set then edit the SXM bit to 1.

2-22
The Central Arithmetic Logic Unit

* 19. Note the ADD instructions that follow the program address labeled
MARKER1. They prescale an operand before adding it to the accumulator.

* 20. Using the C5x VDE, zero the contents of the ACC. STEP OVER the
instruction: ADD #1111h,3

The value 1111h was added to the ACC register that previously contained
zero. By observing the present contents of the ACC, which of the following
choices describes what the prescaler did to the added value?

a. The added value was scaled by 23.


b. The added value was scaled by 22.
c. The added value was sign-extended.
d. The added value was scaled by 2-3.

* 21. Using the C5x VDE, again zero the contents of the ACC register. STEP
OVER the instruction: ADD #8111h,15

The value 8111h (corresponding to the negative value -32495) was added
to the ACC register that previously contained zero. By observing the
present contents of the ACC, which of the following choices describes how
the prescaler and ALU changed the added ACC value?

a. The added value was sign extended.


b. The added value was shifted 15 bits to the right and then sign-
extended.
c. The added value was scaled by 215 and sign-extended.
d. None of the above.

Both added values were taken from the 16-bit data bus and prescaled. The
bus width between the prescaler and the ALU is 32 bits wide.

* 22. Using the C5x VDE, slowly STEP OVER (while watching the I/O
INTERFACE display) the instruction:

CALL DISPLAYHIGH,*
CALL DISPLAYLOW,*

The value of the ACC will output to the 7 segments

* 23. Using the C5x VDE, STEP OVER the ZAP instruction. The ZAP instruction
will zero the ACC and PREG registers.

2-23
The Central Arithmetic Logic Unit

The Multiplier: Basic Operations

In this procedure section, you will execute basic multiplication operations, and by
enabling one of the product shift modes, shift the output of the product register.

There are eight data values used in this procedure section. Each value is written
as a Q14-format binary number.

The data values are stored in the dmas labeled X0 to X3 and B0 to B3.

* 24. Using the C5x VDE, make certain that the following is true:

SXM = 1
ACC = 0000 0000h
PREG = 0000 0000h
PM = 0

* 25. Using the C5x VDE, STEP OVER the LT instruction.

This loads the content of the data memory address labeled X0 into
TREG0. TREG0 is an operand source for the multiplier.

Are the contents of the dma labeled X0 and of the TREG0 register the
same?

* Yes * No

* 26. Using the C5x VDE, STEP OVER the following MPY instruction.

The contents of the data memory address labeled B0 (the fifth data value)
are multiplied with the contents of TREG0.

2-24
The Central Arithmetic Logic Unit

Observe, using the C5x VDE, that the PREG holds the product of the
contents of the data memory address labeled B0 with TREG0.

* 27. Using the C5x VDE, STEP OVER the instruction: APAC.

The APAC instruction adds the PREG to the ACC.

Observe that the contents of the accumulator register (ACC), and of the
product register (PREG) are the same, both are equal to 01B0 7660h. ACC
was equal to 0000h before executing the APAC instruction.

* 28. Using the C5x VDE, edit the PC register to return the execution line to the
program address labeled MARKER2. Edit the content of the PM bits to 1
(this enables the product-shifter, the PREG output is left shifted by 1-bit).

* 29. Using the C5x VDE, once again zero the contents of the ACC register.
STEP OVER the LT, MPY and APAC instructions executed in steps 25 to
27.

Notice the effect that the postscaler has on the contents of the accumulator, the
PREG was shifted left by 1-bit (multiplied by 21).

If the values entered into the multiplier were written in Q14-format and if a product-
shift of 1-bit to the left occurred, then what would be the numerical format of the
value contained in the accumulator register?

a. Q13-format
b. Q14-format
c. Q28-format
d. Q27-format

2-25
The Central Arithmetic Logic Unit

The Multiplier: Overflow and Overflow Saturation Mode

In this procedure section, you will make the accumulator overflow and then you will
enable OVerflow saturation Mode (OVM) to protect against it occurring again.

* 30. Using the C5x VDE, make certain that the following is true:

ACC = 0000 0000h


PREG = 0000 0000h
TREG0 = 0000h PM = 0
OV = 0
SXM = 1
OVM = 0

* 31. Using the C5x VDE, edit the Program Counter register, PC, to the program
address labeled MARKER2.

* 32. Using the C5x VDE, STEP OVER the instructions located between the
program address labeled MARKER2 and the instruction:

B MARKER2,*

* 33. Execute the, B MARKER2,* instruction. This will branch the execution line
back to the program address labeled MARKER2 (this has the same effect
as editing PC).

* 34. Using the STEP OVER command, continue executing the LT, MPY, APAC,
and B MARKER2 instructions, until OV bit is equal to 1.

2-26
The Central Arithmetic Logic Unit

While executing these instructions, observe that the value held in the ACC
register is becoming larger. ACC overflow occurs when OV = 1.

What is the value of the ACC?

a. 44FD 8DC8h
b. 89FB 1B90h
c. 7838 9B90h
d. 8BAB 91F0h

* 35. Using the C5x VDE, edit the PC register to the program address labeled
MARKER2. Zero the OV bit and the ACC, PREG, TREG0 registers.

* 36. Set the OVM bit to 1.

This enables OVerflow saturation Mode (OVM) in the DSP.

* 37. Once again, STEP OVER the LT, MPY, APAC, and B MARKER2
instructions until the OV bit is equal to 1.

2-27
The Central Arithmetic Logic Unit

When the accumulator overflow occurred, and OVerflow saturation Mode (OVM)
was not enabled, the result contained in the ACC had a relative error (ô) of 186%
compared with the correct value.

However, when OVM was enabled and the same overflow occurred the result
contained in the ACC only had a relative error (ô) of 7% compared with the correct
value.

Multiplier Postscaling

In this procedure section, you will add 128 very large values together,
consecutively, and prevent an accumulator overflow by setting the Product-shift
Mode (PM) so that the output of the PREG is scaled by 2-6.

* 38. Using the C5x VDE, edit the PC to the program address labeled
MARKER3. Clear the OV, and OVM bits.

* 39. Using the C5x VDE, STEP OVER the instruction: ZAP

This zeroes the accumulator and product registers.

* 40. STEP OVER the instruction: SPM 3

This sets the Product-shift Mode (PM) bits to 3.

By setting the PM bits to 3, the output of the PREG will be shifted 6 bits to
the right, which is equivalent to dividing it by 26.

2-28
The Central Arithmetic Logic Unit

* 41. Place a breakpoint at the program address labeled END_BLOCK.

The maximum positive value that can be represented in the 16-bit 2s-
format, 7FFFh, is used as the operand for the LT and MPY instructions.
These instructions, located between the program address labeled
MARKER4 and the one labeled END_BLOCK, fetch the contents of the
data memory address labeled BIG_VALUE.

* 42. Using the C5x VDE, execute the RUN command.

The program is halted at the breakpoint, observe that after executing 128
additions the accumulator register still has not overflowed (OV is still equal
to 0).

* 43. Edit the PC to the program address labeled MARKER4. STEP OVER, once
again, the LT, MPY, and APAC instructions (the 129th consecutive multiply-
accumulate).

Observe that the accumulator overflows this time. Implying that when the
output of the product register is scaled by 2-6, a minimum of 128
consecutive additions (of 7FFFh x 7FFFh) can be executed without
causing overflow.

* 44. End the C5x VDE session.

2-29
The Central Arithmetic Logic Unit

CONCLUSION

& Fixed-point multipliers and ALUs execute 2s-complement arithmetic.

& Multiplier results are sign-extended before they are stored in the product
register. Sign extension prevents a negative number from being mistaken for
a positive one.

& To keep the level of arithmetic precision constant within fixed-point DSPs, the
product and accumulator registers are, at least, twice the native word width of
the internal bus.

& In fixed-point DSPs, scaling is used to lower the risk of overflow and underflow
from occurring and to select subsets of the CALU output bits.

& Overflow saturation mode is used to decrease the error that is caused when
overflow or underflow occurs.

REVIEW QUESTIONS

1. Which of the following operand sources is always used by the ALU?

a. The accumulator register (ACC).


b. The product register (PREG).
c. The accumulator buffer register (ACCB).
d. A data memory address from the data bus.

2. What is the difference between using the DSP when sign-extension mode is
enabled and when it is disabled?

a. When enabled, all data values in the DSP are sign extended.
b. When enabled, the accumulator saturates to the most positive or negative
values when overflow or underflow occurs.
c. When enabled, the multiplier output is sign extended.
d. When enabled, the ALU output is sign extended.

3. Why, within the TMS320C50 DSP, are the accumulator and product registers
twice the bit-width (32 bits) of the internal buses (16 bits)?

a. To keep the level of arithmetic precision constant.


b. To avoid overflow or underflow from occurring.
c. All of the above.
d. None of the above.

2-30
The Central Arithmetic Logic Unit

4. Which of the following elements is not part of the Central Arithmetic Logic Unit
(CALU)?

a. Auxiliary Register Arithmetic Unit (ARAU)


b. Operand Registers
c. Multiplier
d. Arithmetic Logic Unit (ALU)

5. Which of the following choices is not used as a way of avoiding accumulator


overflow in a DSP?

a. Guard bits, extra bits in the accumulator.


b. Product shifter
c. Sign-extension mode
d. None of the above.

2-31
2-32
Exercise 2-2

Memory Space

EXERCISE OBJECTIVES

Upon completion of this exercise, you will be familiar with the basic characteristics
of the modified Harvard architecture, as used by DSPs.

DISCUSSION

Memory is an important part of any microcomputer or microprocessor. In computers


like the one you are using, memory is used to store program information such as
the program code for the C5x VDE and it is also used to store data information.

The DSP contains on-chip memory and is also able to access off-chip memory
through its external address and data buses. On-chip memory is usually of two
types:

& ROM (Read Only Memory) is used to store program code during the
manufacturing process. ROM is a non-volatile memory because it retains its
data after the processor has shut down.

& RAM (Random Access Memory) is used to store temporary program


information. RAM is a volatile memory because when power is removed the
stored information is lost.

Both categories of memory (ROM and RAM) are found on-chip (inside the DSP).
The allocation in memory space of these types of memory is able to be
configured in various ways.

2-33
Memory Space

The TMS320C50 DSP uses two types of on-chip RAM:

& SARAM (Single-Access RAM) - An SARAM memory block can be written to or


read from once within one instruction cycle.

& DARAM (Dual-Access RAM) - A DARAM block can be read from and written to
in the same instruction cycle.

The Harvard architecture has two parallel buses.

One bus (the PB) is dedicated to addressing and transport of programming


information and the other (the DB) is dedicated to addressing and transport of data.
Two parallel buses allow program and data memory to be accessed
simultaneously.

2-34
Memory Space

Each of the parallel busses of a 16-bit fixed-point DSP can allocate 216 addresses
to on-chip memory and peripherals.

If each address of a 16-bit data bus was allocated to on-chip 16-bit/word memory,
how many bits of storage could be used by the DSP?

a. 65536 bits
b. less than 1 million bits
c. more than 1 billion bits
d. more than 1 million bits

Most DSPs today use a modified Harvard architecture to increase their memory
bandwidth. The specific modifications present in the TMS320C5x that have been
added to the traditional Harvard structure are:

– a program/data memory, a memory that can be addressed by both the DB and


the PB;

2-35
Memory Space

– an instruction cache that supplements program/data memory.

HARVARD ARCHITECTURE MODIFICATION 1

In the case of the TMS320C50 DSP, certain SARAM blocks can be configured as
program/data memory. This implies that each memory element within the SARAM
block has been allocated a data bus address and a program bus address.

Program memory is addressed by the program bus. Operands can only be stored
in or read from program memory using the program bus.

Data memory is addressed by the data bus. Operands can only be stored in or read
from data memory using the data bus.

Program/data memory is addressed by both the program bus and the data bus.
Operands can be stored in or read from program/data memory by either using the
program bus or the data bus.

The C50 has four memory configuration bits that select how data and program
bus addresses are allocated among the different on-chip memories, I/O ports,
internal memory-mapped registers and external memory-mapped peripherals.

The memory configuration bits are:

CNF: Enables on-chip DARAM B0 to be addressed by the PB or the DB.

RAM: Enables/disables SARAM from being addressed by the PB.

OVLY: Enables/disables SARAM from being addressed by the DB.

MP/MC#: Enables/disables on-chip ROM from being addressed by the PB.

2-36
Memory Space

The memory configuration bits should be initialized (set or cleared) at the beginning
of a DSP program and then they should no longer be changed.

By altering the value of one of the bits, memory elements either become mapped
to other addresses (sometimes addresses on a different bus) or become no longer
address mapped at all.

HARVARD ARCHITECTURE MODIFICATION 2

In the case of the 'C50, the register named the Program Counter (PC) acts as an
instruction cache. It can store one instruction word (16 bits in width). The instruction
once loaded into the PC can be repeated the number of times is specified by the
RePeaT Counter register (RPTC).

2-37
Memory Space

During a repeat loop, the program bus does not have to be used to read the next
instruction in the program. The DSP simply fetches the next instruction from the
instruction cache.

When the instruction from the cache is being executed, a Program Bus (PB) access
is freed. The PB is no longer required to fetch the next program instruction.

The freed memory access can be used to read another operand from program/data
memory. Specialized instructions like the MAC (Multiply and ACcumulate) when
repeated, use the freed Program Bus access to fetch a total of two operands
during a single clock cycle.

When programming a DSP the only memory space initializations that should be of
a concern to the programmer are:

& The CNF, MP/MC#, RAM and OVLY bit initializations. These select the proper
memory configuration.

& The use of the DSK directives describing the memory locations where program
and data are stored: .entry, .ps, .text, .word, .byte, .data or .ds, .set.

PROCEDURE

IMPORTANT: At DSP power up, the memory configuration bits for the TMS320C50
DSP are set to default values. The default values for some of the configuration bits
are:

MP/MC# = 0
OVLY = 1
RAM = 1

The RAM bit may not be modified. The program code for the C5x VDE application
executes from internal program memory, the C5x VDE application would not
function if this bit were changed.

2-38
Memory Space

Address Allocation of the Data and Program Buses

In this procedure section, you will familiarize yourself with the possible memory
configurations of the TMS320C50 DSP.

Note: Before using the C5x VDE please make certain the circuit board
power source is turned ON, and that the serial connection is present
between the host computer and the DIGITAL SIGNAL PROCESSOR
circuit block labeled SERIAL PORT.

* 1. Open the C5x VDE.

* 2. Open a memory display window with the following options:

Address: 0x0000
Type: Program Memory
Display Format: Hex

* 3. Note in the C5x Registers window that the MP/MC# bit is cleared. This is
the default value at DSP power up. The DSP is now operating in
microcomputer mode and program memory addresses 0h to 800h are
allocated to on-chip ROM.

The contents of the program memory addresses correspond to the


microcode instructions for the kernel used to establish communication
between the C5x VDE and the DSP.

* 4. Set the MP/MC# bit (i.e., make MP/MC# = 1). Highlight the memory display
window and refresh it (toolbar/Window/Refresh).

2-39
Memory Space

* 5. What value have the addresses between 0h and 800h been initialized to
after editing the MP/MC# bit?

a. 0x0000
b. 980h
c. B882h
d. 12103d

* 6. Open another memory display window with the following options selected:

Address: 800h
Type: Data Memory
Display Format: Hex

* 7. Change the address for the first Program Memory window to 800h, using
the window Options menu.

Note that the contents of program and data memory for addresses 0800h
to 2C00h are the same.

* 8. Edit a data memory address (between 800h and 2C00h).

When DSP memory is dually addressed, programmers must be vigilant not


to overwrite microcode instructions when writing values through the data
bus.

* 9. Clear the OVLY bit and note that data addresses 0800h-2C00h become
zero. They are now allocated for off-chip access.

2-40
Memory Space

* 10. Change the address within the Window Options for Program Memory to
FE00h. Change the address within the Window Options for Data Memory
to 0100h. Clear the CNF bit (make CNF=O).

* 11. Observe that the content of data memory addresses 0100h to 0300h are
not equal to zero and that the contents of program memory addresses
FE00h to FFFF are equal to zero.

* 12. Set the CNF bit (make CNF = 1) and note the changes that take place in
the memory displays.

See HELP Unit 02 shelp14

* 13. What occurred after editing the CNF memory configuration bit?

a. The values in data memory were copied to program memory.


b. Memory allocated to the data bus was wiped clear of all information.
c. DARAM B0 that was address mapped by the data bus became
address mapped by the program bus.
d. Nothing occurred after editing the CNF memory configuration bit.

The Recorder

In this procedure section, you will use a Playback/Recorder program to familiarize


yourself with the memory architecture of the TMS320C50.

* 14. Open the ex2_2.asm assembler source file within an ASCII text editor.

This file is the assembler source code for a DSP program that makes the
DIGITAL SIGNAL PROCESSOR circuit board become a Playback/
Recorder. Refer to this source file at anytime during the procedure.

2-41
Memory Space

* 15. Carefully read the description of the ex2_2.asm source file.

Important points:

The DIP switch is used to choose a recording mode.

(The modes are differentiated by their data compression methods.)

The signal input level of the microphone (proportional to the number of dots) is
output to the I/O INTERFACE display.

2-42
Memory Space

Pressing INT1# begins recording and pressing INT3# begins playback.

* 16. Make the following connections on the DIGITAL SIGNAL PROCESSOR


circuit board:

& Connect a microphone to the INPUT of the MICROPHONE PRE-AMP.

& Connect the OUTPUT of the MIC. PRE-AMP. to the ANALOG INPUT
of the CODEC.

& Connect the ANALOG OUTPUT of the CODEC to the INPUT of the
AUDIO AMPLIFIER.

* 17. Position to zero the DIP switch on the I/O INTERFACE of the circuit board.

2-43
Memory Space

* 18. Using the C5x VDE, load the ex2_2.dsk program into the DSP. Press the
Run command.

* 19. With the INT1# push button begin recording your voice. Play it back with
the INT3# push button when done recording.

Adjust the level of the potentiometers in the MICROPHONE PRE-


AMPLIFIER and the AUDIO AMPLIFIER circuit blocks to make the audio
level during playback (INT3#) to be sufficient.

* 20. Repeat step 20 with the other two modes of recording.

DIP SWITCH VALUE AUDIO RECORDING MODE

0 16-bit sampling

1 8-bit µ-law compression

2 16-bit samples truncated to 8-bits

* 21. Make the following connections on the DIGITAL SIGNAL PROCESSOR


circuit board:

& Disconnect the ANALOG INPUT of the CODEC circuit block from the
OUTPUT of the MICROPHONE PRE-AMPLIFIER.

& Connect the ANALOG INPUT to the OUTPUT of a function generator.

* 22. Adjust the function generator to output a ~300 Hz sinusoidal signal, at


~1.00 Vpp.

2-44
Memory Space

* 23. Set the position of the DIP switch to zero and record the generated signal.

* 24. Using the C5x VDE, Halt the DSP program. Open a Graphic Display with
the settings shown in the figure.

To maximize recording time, the recorded signal samples are stored in two
parts. One part is stored in DARAM B1.

* 25. Using the C5x VDE, open a second Graphic Display with the settings
shown in the figure.

The second part of the recorded signal samples are stored in SARAM.

2-45
Memory Space

Recorded Signal Sample Ranges

DARAM B1 03DF h - 04FF h

SARAM 09DD h - 2BFF h

If the Playback/Recorder program stores one recorded signal sample per


data memory address then how many samples can be stored if the
recorded signal samples are stored in the address ranges shown above?

975 = __________ samples

* 26. Close the text editor displaying the ex2_2.asm source file. End the C5x
VDE session.

The Instruction Cache

In this procedure section, you will observe the difference in the execution time of
a DSP algorithm that uses the instruction cache and one that does not.

* 27. Open a new C5x VDE session.

* 28. Load the ex2_2b.dsk program into the DSP.

* 29. Open three Data Memory displays, at the following addresses:

& Address: TIMER1

Within this data memory window there are three constants.

& Address: SARAM2

The constants named SARAM1 and SARAM2 are located in different


SARAM memory blocks.

& Address: DARAM

* 30. Set a breakpoint at the program memory address labeled SLOW (by
double-clicking on the Dis-Assembly window instruction line).

* 31. Execute the code, corresponding to the DSP initialization sequence,


located between the execution line in the Dis-Assembly window and the

2-46
Memory Space

breakpoint labeled SLOW. Do this by executing the RUN command within


the C5x VDE.

* 32. Within the Dis-Assembly window of the C5x VDE, set a breakpoint at the
program memory address labeled FAST.

* 33. Execute the SLOW algorithm by executing the C5x VDE RUN command
once.

When the RUN command was pressed the SLOW algorithm was executed.
The SLOW algorithm is all of the code located between the instruction lines
labeled SLOW and FAST.

The SLOW algorithm executed ten consecutive MAC instructions. The


MAC instructions are read consecutively from memory, they do not use the
instruction cache to repeat the instruction.

What is the value of the following operation (TIMER1 - TIMER2) converted


to decimal?

1014 __________ clock cycles

What is a better definition of the meaning of the value (TIMER1-TIMER2)?

a. The value of the ACCumulator register after the execution of 10


multiply and accumulate (MAC) instruction.
b. The number of memory accesses made during the execution of the
SLOW algorithm.
c. The speed of the DSP during the calculation of the SLOW algorithm.
d. The relative number of instruction cycles taken to execute the code
located between the SMMR instructions.

* 34. Execute the FAST algorithm by pressing the RUN command again.

When the RUN command was pressed the FAST algorithm was executed.
The FAST algorithm is all of the code located between the instruction line
labeled FAST and the instruction:

B MAIN,*

When the FAST algorithm executed, ten consecutive MAC instruction were
executed within a repeat loop. The program bus was freed because each
MAC instruction (except the first) was fetched directly from the program
cache.

What relative number of instruction cycles was taken to execute the fast
algorithm (TIMER1-TIMER2)?

1027 __________ clock cycles

2-47
Memory Space

* 35. Compare the code of each algorithm and the amount of time it took to
execute each.

Which of the following statements relating to the instruction cache is true?

a. It is used to differentiate between memory that stores data words and


memory that stores program words.
b. It is used to free up the data bus of an access.
c. It is a small memory within the processor core that is used for storing
program instructions.
d. It is present to increase the calculation speed of the DSP.

Measuring the Relative Memory Access Rates Provided by SARAM and


DARAM.

In this procedure section, you will observe the difference between SARAM and
DARAM access rates.

* 36. Open the ex2_2b.asm file inside of an ASCII text editor (such as Notepad).

* 37. Familiarize yourself with the source code.

The SLOW and the FAST algorithms are clearly identified. At the top of the
source file are the data constants that are or that can be used by the
program. The source code belonging to the initialization sequence is
clearly identified.

* 38. Edit the fast algorithm, within the ex2_2b.asm source code file, so that the
multiply and accumulate instruction(MAC) calls two constants located in
different SARAM blocks as so:

MAC SARAM2,SARAM1

* 39. Save the modified source file to your personal student folder as:

ex2_2bv2.asm

* 40. Assemble the file from within your student folder. Execute from within your
student folder the following command at a DOS prompt:

c:\lv91027\bin\dsk5a.exe ex2_2bv2.asm -l

* 41. Answer the following question. The answer can be found by loading the
ex2_2bv2.dsk file into the DSP and executing the necessary part of code.

2-48
Memory Space

Block the code off with breakpoints and press the RUN command inside
of the C5x VDE.

What are the number of cycles (TIMER1-TIMER2) taken to execute the


FAST algorithm when the operands called by the MAC instruction are not
located in the same SARAM memory block?

1055 __________ clock cycles

* 42. Close the text editor open with the ex2_2bv2.asm source file. End the C5x
VDE session.

CONCLUSION

& SARAM and DARAM are divided up into different sized memory blocks.

& An SARAM memory block can be written to or read from once within one
instruction cycle.

& A DARAM memory block can be read from and written to inside of the same
instruction cycle.

& A program cache, when used, frees the PB of one instruction read which can
then be used in data/program memory as a data read.

& Memory configuration bits (such as the four found in the 'C50) are used to
control the configuration of the PB and DB memory map.

REVIEW QUESTIONS

1. Which of the following modifications to the basic Harvard architecture are used
in some DSPs to increase their number of available memory accesses?

a. An instruction cache and a data/program (dual) addressed memory.


b. An instruction cache and a parallel bus structure.
c. A parallel bus structure and a data/program (dual) addressed memory.
d. none of the above.

2. How many addresses can each of the parallel buses of a 16-bit Harvard
architecture DSP allocate to on-chip memory and to peripherals?

a. 215 addresses
b. 216 addresses
c. 32768 addresses
d. 32767 addresses

2-49
Memory Space

3. Instruction words cannot be read from which of the following types of memory?

a. data memory
b. data/program memory
c. program cache
d. program memory

4. Why can a dually addressed SARAM memory block be of importance to a


DSP?

a. Two values (a data operand and a microcode instruction) can be stored per
memory element.
b. The memory storage capacity of the SARAM memory block is doubled.
c. An additional SARAM memory access is gained if the program bus is not
required to fetch a microcode instruction.
d. None of the above.

5. What is the instruction cache used to store?

a. Program/data memory contents.


b. Instructions
c. Program addresses
d. Operands

2-50
Exercise 2-3

Addressing

EXERCISE OBJECTIVES

Upon completion of this exercise, you will understand the function that of address
generation unit within a DSP and the specialized addressing modes that it offers.

DISCUSSION

A processor uses an address to identify specific memory storage spaces (such as


a DARAM memory element or a peripheral register).

An address in fact becomes the name of a certain location. The address is used
any time that the processor is required to write an operand to or read an operand
from the location.

Addressing is the means by which operand locations are specified to the processor
when a read or write is executed.

Many addressing modes exist. Depending on the addressing mode used with an
instruction, an operand could be fetched directly from an internal memory address
or from a register.

2-51
Addressing

The most common types of addressing found in DSPs are:

implied addressing
direct addressing
immediate (short and long) addressing
indirect addressing
circular addressing

Different types of DSPs offer a variety of different addressing modes. The type of
addressing used with an instruction influences program flexibility and performance.

Certain types of addressing were meant only to be used for specific situations and
others are restricted for use by a small processor instruction subset.

Implied addressing is when the operand addresses are implied by the instruction.

An example of a 'C50 instruction that uses implied addressing is:

ADDB

This instruction adds the ACC and ACCB registers together. ACC and ACCB are
thus the implied operands for the instruction.

Direct addressing encodes operand address within the instruction word or within
a word following the instruction word.

An Example of Direct Addressing with One Program Word

Instruction Opcode

Machine Code

100111011 1101001

Partially Encoded Operand Address


7 bits wide

The direct addressing used within the TMS320C50 is known as paged memory-
direct addressing. When the instruction word is executed, the encoded 7-bit
address is concatenated with the upper 9 bits of the address held in a status and
control register.

Direct addressing encodes the operand address within the instruction word or
within a word following the instruction word.

An Example of Direct Addressing with Two Program Words

2-52
Addressing

Instruction Opcode

Machine Code

1110110010101001

0001110101101110

Encoded Operand Address

Memory-direct and register-direct addressing are other forms of direct addressing.


Both use a second word following the instruction word and which encodes the
operand address.

Processors using paged memory-direct addressing have data memory divided


up into memory pages. Each memory page corresponds to a section of memory.

A special register (known in the 'C50 as the Data Page pointer, DP) stores the
number of the current memory page.

In the case of the 'C50, the DP points to one of 512 possible data pages. Each
page containing 128 words.

2-53
Addressing

Direct addressing within the TMS320C50 requires that the 7 lower bits of the data
memory address be encoded within the instruction word.

When the word is executed, the 9 bits from the Data memory Page pointer (DP) are
concatenated with the 7 bits encoded within the instruction word. The operation
forms the full 16-bit data memory address of the operand.

Assume the 9-bit Data memory Page pointer (DP) of a DSP held the value 126
(7E h). The 7 lower bits of the addressed dma encoded within an instruction word
were equal to 43 h. What is the data memory address that the instruction word
requires?

a. 3F43 h
b. 00C1 h
c. 867E h
d. 7E43 h

Example of Short Immediate Addressing in the TMS320C50 DSP.

ADD #05Ah

ADD Opcode

10111000 01011010

5A h
Operand

Known as short immediate addressing, the type of addressing shown encodes the
operand into the instruction word.

2-54
Addressing

Example of Long Immediate Addressing in the TMS320C50 DSP.

ADD #0B948h

ADD Opcode

1011111110010000

1011100101001000

B948 h
Operand

Known as long immediate addressing, the type of addressing shown encodes the
operand into a second word that follows the instruction word.

DSPs use indirect and circular addressing to manage operand address sets.
These are required when performing repetitive calculations on data series. The
series are often stored sequentially in memory.

DSPs include an Address Generation Unit (AGU). The AGU is dedicated to the
calculation of addresses for the different types of addressing modes.

2-55
Addressing

The AGU has its own separate arithmetic unit, AGU arithmetic is independent of the
CALU. All address calculations take place in parallel with instruction execution.

The incorporation of an Address Generation Unit (AGU) within a DSP allows


arithmetic processing to proceed at maximum speed while multiple instruction
operands are specified.

The figure shows the Auxiliary Register Arithmetic Unit (ARAU) of the
TMS320C50 ('C50).

The 'C50 AGU has eight memory-mapped auxiliary registers, identified AR0
through AR7.

The registers can be used for storing addresses or temporary data. Indirect and
circular addressing require the use of some of the Auxiliary Registers (AR0 to AR7).

Within the ST0 register of the TMS320C50 is a 3-bit field identified as ARP, the
Auxiliary Register Pointer.

ARP is a 3-bit wide field that holds a value between 0 and 7. The field specifies the
current auxiliary register (AR0 to AR7) being used for indirect-register addressing.

Once the appropriate registers have been configured, the AGU provides the
necessary operand address required by the processor for the execution of an
instruction.

As stated previously, the address generation unit operations are executed in


parallel with the CALU arithmetic instructions.

2-56
Addressing

Any location in data memory can be read from or written to using an address
contained in an Auxiliary Register (AR0 to AR7).

To select the specific AR used to address data memory, the Auxiliary Register
Pointer (ARP) must be loaded with the value of the AR (0 to 7) to be used. The
ARP points to the current auxiliary register used to address memory.

When an assembler instruction supporting indirect addressing, such as, the


TMS320C50 DSP addition (ADD) instruction:

ADD *

is executed, the processor fetches the data memory address from the correct
auxiliary register (which is pointed to by ARP).

In this example, ARP is equal to 2 and so the current auxiliary register is AR2.
When an assembler instruction supporting and using indirect addressing is
executed, the address for the operand will be fetched from AR2.

Many DSP applications manage data buffers.

Circular addressing, also known as modulo addressing, is used to manage circular


buffers.

2-57
Addressing

In real-time applications such as the ones executed by DSPs, the programmer must
determine the size of the data buffer and then must set aside a portion of memory
for the buffer.

The data buffers implemented on DSPs generally use a first-in, first-out (FIFO)
protocol. This means that the first values that are written to the buffer will be the first
values read out of the buffer.

For the programmer to manage data into and out of the buffer, two address pointers
must be maintained (in the case of the 'C50, two auxiliary registers are used as the
pointers).

One of the pointers, the read pointer, indicates the current value to be read from the
buffer.

The second pointer, the write pointer, indicates the current location to write a new
value to in the buffer.

2-58
Addressing

After each read or write operation in a FIFO buffer with linear addressing the
corresponding pointer moves down (increments to the next location in the buffer).
Once the pointers have advanced to the end of the buffer they must be reset to
point back to the beginning of the buffer.

In a FIFO buffer with circular addressing, after the read or the write pointer reaches
the end of the buffer it automatically advances to the start of the buffer. The
automated end-of-buffer verification and advance-to-start-of-buffer operation
(usually automated by the AGU) make the buffer appear circular to the
programmer.

Many DSP processors provide a form of circular addressing. However, the facility
of use and the mechanisms used to control it vary from DSP to DSP.

The approach taken to implement circular addressing in the TMS320C50 is to use


start and end address registers. The registers, respectively named CBSR1 and
CBER1, hold the start and end addresses for the circular buffer.

2-59
Addressing

The AGU of the 'C50 takes charge of determining whether one of the write or read
pointers is at the end of the buffer or at the beginning.

The AGU, thus automates the end-of-buffer verification and advance to start-of-
buffer operation.

Circular addressing is indirect addressing, however, with added circular buffer


management.

TMS320C50 Instructions

...

MAR *, AR3

LAR AR3, #0984h

...

ADD *+, 0, AR0

...

Data buffers are often used and always implemented with indirect addressing
(circular and linear). Processors require two additional elements to be specified
within the indirect address field of an instruction when indirect addressing is used:

& The content of the current AR can be incremented, decremented or it can stay
unchanged. The change to be implemented must be specified.

E.g., Increment by 1: AR3 = 0984h, AR3 + 1 = 0985h.

& Whether the Auxiliary Register Pointer ARP should be updated to another AR
or should stay the same must be specified in the indirect address field of the
instruction.

E.g., Update ARP from AR3 to AR0.

If the start address for a circular buffer is 0980h, the end address is 09E4h and the
current address of the read pointer is 09E4h then what is the next address that the
read pointer will have?

a. 09E5 h
b. 0980 h
c. 09E3 h
d. 0981 h

Most addressing modes covered in this section involve attaching a second word to
the instruction word. These addressing modes thus require two program words to
be stored in memory, increasing program size and slowing execution time.

2-60
Addressing

To remedy the effects of so called long addressing modes (2 program words in


length) many processors offer short versions of some of their addressing modes,
or simply put, short addressing modes.

Short addressing modes use only one program word to specify both the instruction
and the address. However, by so doing, the range of addresses that can be
specified is shortened.

PROCEDURE

Addressing Mode Initialization

In this procedure section, you will initialize the direct and indirect addressing mode
registers of the 'C50 ARAU. As well, you will witness the effect of an uninitialized
circular buffer in a program that requires its use.

* 1. Open the assembler ex2_3.asm file inside of an ASCII text editor.

* 2. Read the description of the program and familiarize yourself with the
Initialization Sequence and Main code.

Points of interest that should be noted are:

& The program stores in DSP memory the last 16 samples received from
the CODEC. It calculates the average value of these samples. The
average is sent to the ANALOG OUTPUT.

& There are three types of addressing used within the program: direct,
indirect, and circular addressing.

2-61
Addressing

& The ARAU register initializations, required to correctly address


instruction operands, have been left out of the code.

E.g., circular buffer register (CBSR1, CBER1, CBCR) have not been
initialized.

& Labels (Circular, Indirect, and Direct) given in the source code indicate
where the ARAU register initialization instructions should be located.

Note: Before using the C5x VDE please make certain the circuit
board power source is turned ON, and that the serial connection
is present between the host computer and the DIGITAL SIGNAL
PROCESSOR circuit block labeled SERIAL PORT.

* 3. Open the C5x VDE.

* 4. Using the C5x VDE, load the ex2_3.dsk program into the DSP.

* 5. Open a memory display window with the following display options:

Address: XN0
Type: Data Memory
Display Format: Signed Integer

The ex. 2-3 program dedicates labeled dma XN0 to XN15 for storage of the most
recent 16 samples received by the DSP. The samples are transmitted by the
CODEC which converted the signal received from its ANALOG INPUT.

* 6. Using the C5x VDE, set a breakpoint at the program memory address
labeled INDIRECT. Set another breakpoint at the program memory
address labeled DIRECT.

2-62
Addressing

* 7. Execute the C5x VDE RUN command.

All code located before the labeled instruction line is executed, this
includes DSP initializations (apart for the ARAU unit). The instruction
execution line (yellow) has stopped at the indirect breakpoint.

See HELP Unit 02 shelp15

The following instructions, located in the main program code, use indirect
addressing:

SACL *+, 0, AR0


ADD *+, 0, AR0

For these instructions to execute properly certain indirect addressing


registers (AR0 and ARP) must be initialized.

* 8. Using the C5x VDE, edit AR0 register to make it point towards the dma
labeled XN0. AR0 is used for indirect addressing.

* 9. Using the C5x VDE, edit the content of the Auxiliary Register Pointer
(ARP). Make ARP point towards auxiliary register 0 (AR0).

You have now initialized the ARAU for indirect addressing as used by this
program.

* 10. Execute the C5x VDE RUN command. The instruction execution line
(yellow) will stop at the breakpoint labeled DIRECT.

See HELP Unit 02 shelp16

The following instruction, located in the main program code, uses direct
addressing:

SACL 10 h

A direct addressing register the Data Page pointer (DP) used by the above
instruction must be initialized. In fact, this instruction uses paged memory-
direct addressing.

* 11. Using the C5x VDE, edit the contents of the DP bits. Set the Data Page
pointer (DP) to the data page starting on 0x0980h (change the content of
DP to 0x0980). It is, among others, the data memory address labeled
OUTPUT that is found on this data page.

2-63
Addressing

Note that after making the change to DP, register ST0 is shown as
modified (highlighted in red). ST0 is modified because the Data Page
pointer bits (DP) are held in the ST0 register status and control register.

You have now initialized the ARAU for paged memory-direct addressing
with the averaging program.

* 12. Using the STEP OVER command found on the C5x VDE Toolbar, execute
the following instructions:

NOP
CLRC INTM
ZAP

The execution of these last instructions completes the program initialization


section.

Note that we did not make any ARAU circular addressing initializations.

2-64
Addressing

* 13. Using the C5x VDE, set a breakpoint at the AND instruction found within
the RECEIVE subroutine. This is one of the instructions which precedes
the TRANSMIT subroutine label.

ìè
;1 õìè ÷ Lô ;1í ø ;1ìø ïïï ø ;1ìè
$&& ö M ö
L ö í ìè ø ì ìç

* 14. Execute the RUN command found on the C5x VDE Toolbar. The RECEIVE
subroutine will be entered and executed once a sample sent by the
CODEC is received by the DSP.

The RECEIVE subroutine when property initialized for indirect, direct, and
circular addressing, computes the average of the 16 samples stored in
data memory addresses XN0 TO XN15.

Recall that the circular addressing initialization has not yet been made.

* 15. Using the Windows™ calculator or a hand-held calculator, add the


contents of the dma labeled XN0 to XN15 and divide the sum by 16
(averaging the samples).

Is your calculated average value equal to the contents of the accumulator


register?

* Yes * No

2-65
Addressing

The accumulator contents do not equal the average of the contents of the
dma labeled XN0 to XN15. The RECEIVE subroutine begins calculating
the average at XN1. This is because AR0 is used (and auto-incremented
by the SACL instruction) to load XN0 with the most recent value received
from the CODEC.

However, as stated previously, the circular buffer was not initialized. When
the indirect read pointer AR0 was auto-incremented from XN15, it went to
OUTPUT.

In a properly initialized circular buffer AR0 would have pointed to XN0


after the increment.

Therefore, the averaging process added XN1 to XN15 and the dma labeled
OUTPUT. In data memory, OUTPUT is found immediately after the dma
labeled XN15.

If the circular buffer had been initialized, when the averaging process had
finished adding XN15 to the accumulator the next value to be added would
have been XN0.

Circular Addressing Initialization

In this procedure section, you will initialize the circular buffer so that the averaging
program may be used properly.

2-66
Addressing

* 16. Edit the Program Counter (PC) register to the program memory address
0x0A80 h.

Using the C5x VDE, STEP OVER the Dis-Assembly window instructions
until the instruction execution line (yellow) is over the instruction labeled
CIRCULAR.

SACL *+, 0, AR0


ADD *+, 0, AR0

The above instructions address indirectly data (XN0 to XN15) that must be
held in a circular buffer.

The circular addressing registers (CBCR, CBSR1, CBER1) must be


initialized.

2-67
Addressing

* 17. Edit Circular Buffer 1 Start Register (CBSR1) to the first data memory
address belonging to the circular buffer (XN0).

* 18. Edit Circular Buffer 1 End Register (CBER1) to the last data memory
address belonging to the circular buffer (XN15).

* 19. Edit the Circular buffer 1 Auxiliary Register bits (CAR1) to AR0. This
makes circular buffer 1 use auxiliary register 0 (AR0) as the pointer for the
buffer elements.

Note that after making the change to the CAR1 bits the CBCR register is
also shown as modified (written in red within the C5x VDE). The Circular
buffer 1 Auxiliary Register bits (CAR1) are located within the CBCR
register.

* 20. Enable circular buffer 1 by setting the circular buffer 1 enable bit (CENB1).

Note that after making the change to the CENB1 bit the CBCR register is
also shown as modified (written in red within the C5x VDE). The circular
buffer 1 enable bit (CENB1) is located within the CBCR register.

* 21. Execute the C5x VDE RUN command. The instruction execution line
(yellow) found within the Dis-Assembly window will stop at the breakpoint
labeled INDIRECT.

* 22. Using the C5x VDE, edit AR0 to make it point towards the dma labeled
XN0.

2-68
Addressing

* 23. Using the C5x VDE, edit the content of the Auxiliary Register Pointer
(ARP). Make ARP point towards auxiliary register 0 (AR0).

You have now re-initialized the ARAU for indirect addressing with the
averaging program. The ARAU for direct addressing has been initialized.

* 24. Using the C5x VDE STEP OVER command, execute the following
instructions lines:

NOP
CLRC INTM
ZAP

* 25. Execute the RUN command found on the C5x VDE Toolbar. The RECEIVE
subroutine is entered and executed every time a sample sent by the
CODEC is received by the DSP.

ìè
;1 õìè ÷ Lô ;1í ø ;1ìø ïïï ø ;1ìè
ö M ö
Löí ìè ø ì ìç

The RECEIVE subroutine computes the average of the 16 samples stored


in data memory addresses XN0 to XN15.

* 26. Using the Windows™ calculator or a hand-held calculator, add the


contents, once again, of the dma labeled XN0 to XN15 and divide the sum
by 16 (this averages the samples).

Is your calculated average value equal to the contents of the accumulator


register?

* Yes * No

Having initialized the circular buffer the averaging operation now properly
executes.

* 27. End the C5x VDE session.

Running the Averaging Program

In this procedure section, you will make the necessary initialization corrections to
the averaging program source code. You will verify, using a function generator and
an oscilloscope, if the ex2_3.asm program correctly makes a 16-point average.

2-69
Addressing

* 28. Locate within the ex2_3 source file, opened inside of an ASCII text editor,
the source statement labeled Circular:.

* 29. Make the NOP instruction a comment (by placing a semi-colon in front of
it) and remove the semi-colons from in front of the three commented SPLK
instructions.

These three added lines of source code initialize the CBSR1, CBER1 and
CBCR registers. The source statements initialize the circular buffer.

* 30. Locate within the ex2_3 source file the source statement labeled Indirect:.

* 31. Make the NOP instruction a comment (by placing a semi-colon in front of
it) and remove the semi-colon from in front of the two commented
instructions (LAR and MAR).

The added lines of source code initialize the AR0 and ARP registers.
These are the source statements that initialize indirect addressing.

* 32. Locate within the ex2_3 source file the source statement labeled Direct:

* 33. Make the NOP instruction a comment (by placing a semi-colon in front of
it) and remove the semi-colon from in front of the LDP instruction.

The added source code initializes the DP bits located within status and
control register ST0. This source statement initializes direct addressing.

* 34. Save the modified source file to your personal student folder as:

ex2_3v2.asm

* 35. Assemble the file, from within your student folder. Execute within the
student folder the following command at a DOS prompt:

c:\lv91027\bin\dsk5a.exe ex2_3v2.asm -l

You have now assembled the corrected averaging DSP file.

* 36. Open the C5x VDE.

* 37. Using the C5x VDE, load the ex2_3v2.dsk file into the DSP.

2-70
Addressing

* 38. Execute the RUN command found on the C5x VDE Toolbar.

* 39. Make the following connections:

Connect the OUTPUT of a function generator to a BNC tee adaptor. The


adapter divides the signal in two.

Connect the function generator signal to CHANNEL 1 of an oscilloscope


and to the ANALOG INPUT of the CODEC circuit block located on the
DIGITAL SIGNAL PROCESSOR.

Connect the ANALOG OUTPUT of the CODEC circuit block to the


CHANNEL 2 input of the oscilloscope.

* 40. Make the following settings on the function generator:

Function generated . . . . . . . . . . . . . . . . sinusoidal


Frequency . . . . . . . . . . . . . . . . . . . . . . . . . 500 Hz
Amplitude . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 VPP
Offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 VDC

* 41. Adjust the oscilloscope as follows:

Time Base . . . . . . . . . . . . . . . . . . . . . . 0.5 ms/DIV


Channel 1 . . . . . . . . . . . . . . 2 V/DIV (DC coupled)
Channel 2 . . . . . . . . . . . . . . 2 V/DIV (DC coupled)

You should observe the superposition of two sinusoidal waveforms on the


oscilloscope display. One of the waveforms will be less than the amplitude
of the other.

2-71
Addressing

The smallest amplitude waveform corresponds to the averaged signal


received from the ANALOG OUTPUT of the CODEC circuit block located
on the DIGITAL SIGNAL PROCESSOR.

* 42. Vary the frequency and the type of function generated by the function
generator. Observe the results.

* 43. End the C5x VDE session.

CONCLUSION

& An address is used any time that the processor is required to write an operand
to or read an operand from a location.

& Addressing is the means by which operand locations are specified to the
processor when a read or write instruction is executed.

& Certain types of addressing are meant to be used for certain specific situations
and others are restricted for use by a small processor instruction subset.

& The type of addressing used with an instruction influences program flexibility
and performance.

& DSPs include an Address Generation Unit (AGU), it is dedicated to calculating


addresses. All address calculations take place in parallel with instruction
execution.

REVIEW QUESTIONS

1. Which of the following choices is not a location that can be addressed using an
addressing mode?

a. external program memory address 2C05h


b. internal data memory address 0983h
c. data bus
d. accumulator register

2. Assume that a certain DSP using direct paged-memory addressing has 512
data pages each containing 128 words. If the current data page is 42d (DP =
42d) then which of the following addresses can be directly addressed by an
instruction word?

a. 0047 h
b. 1530 h
c. 0095 h
d. 0986 h

2-72
Addressing

3. Which of the following is true of a short addressing mode?

a. A second word is attached to the instruction word.


b. The range of addresses that can be specified is reduced.
c. When short addressing is used program size is increased.
d. When short addressing is used program execution time is slowed.

4. Where is the data memory address of an indirectly addressed operand stored?

a. It is held in the accumulator.


b. It is stored in data memory.
c. It is stored in a memory-mapped “Auxiliary Register” of the AGU.
d. It is encoded in the instruction word.

5. Which of the following choices does not describe an operation performed by an


AGU managing a circular buffer?

a. Read and write pointer update.


b. Send the current circularly addressed dma to the accumulator.
c. Verify if the read or write pointers are at the end of the buffer.
d. None of the above.

2-73
2-74
Unit Test

1. Why is it important to incorporate an Address Generation Unit within a DSP?

a. It allows AGU arithmetic to be independent of the CALU.


b. It allows the DSP to process digital signals.
c. It allows all address calculations to proceed in series with instruction
execution.
d. Data buffers can be managed.

2. What is the basic processor memory architecture most often implemented in


digital signal processors?

a. The Von Neumann architecture.


b. The Harvard architecture.
c. The modified Von Neumann architecture.
d. The DSP architecture.

3. Which of the following elements is not part of a typical programmable DSP?

a. CODEC
b. Central Processing Unit (CPU)
c. Bus structure
d. Peripherals

4. In which of the following situations does an instruction require the use of an


addressing mode?

a. When the instruction needs to read an operand from memory.


b. When the instruction needs to write an operand to memory.
c. When the instruction needs to fetch the value held within the accumulator.
d. All of the above.

5. Which among the following choices does not differentiate a DSP from a
general-purpose processor?

a. addressing modes
b. memory architecture(bus structure)
c. execution time of the CALU
d. processor native word width

6. Which of the following statements is true?

a. Most fixed-point DSP processor multipliers produce a result that is the


width of the input operands.
b. CALU refers to the entire arithmetic processing path excluding the
multiplier.
c. The CALU uses post-scalers for address calculations.
d. ALU refers to the combination adder/subtractor/logical function unit.

2-75
Unit Test (cont’d)

7. Which of the following DSP characteristics permit the Multiplier and Arithmetic
Logic Unit to keep a constant arithmetic precision during computation?

a. Presence of the accumulator Guard Bits.


b. Option of enabling sign-extension mode.
c. Option of enabling overflow saturation mode.
d. The Multiplier and ALU operate with twice the internal bus bit-width.

8. Why do most DSPs use modified Harvard memory architectures (as opposed
to Harvard memory architectures)?

a. To differentiate operands from instructions.


b. To increase their memory bandwidth.
c. To increase their calculation rate.
d. To increase the number of specialized addressing modes.

9. Which of the following statements is true? The instruction cache when used:

a. frees a data memory access.


b. enables/disables on-chip program/data memory.
c. frees a program memory access.
d. None of the above.

10. Which of the following choices is true? A dually addressed program/data


memory block can:

a. hold operands only and is accessed by the PB.


b. hold instructions and operands accessed by the PB and DB.
c. hold instructions only and is accessed by the DB.
d. hold operands only and is addressed by the instruction cache alone.

2-76
Unit 3

Program Execution

UNIT OBJECTIVES

Upon completion of this unit, you will be familiar with the fundamentals of DSP
program execution.

UNIT FUNDAMENTALS

Program control refers to the rules (mechanisms) used in a processor for


determining the next instruction to be executed.

The instruction set available to a DSP is executed by the Program Controller unit.
The instruction set controls such things as how data is sequenced through the
CALU and how values are read from and written to memory.

3-1
Program Execution

An instruction sent through the Program Controller is sent through an organization


of computational hardware. Different execution stages for one instruction proceed
in parallel with the different execution stages of other instructions.

Executing instructions in such a fashion is known as pipelining. A pipeline is a


method of executing instructions in an assembly line fashion.

A must for developing efficient application code is detailed knowledge of the DSP
architecture being programmed.

This is often true when it comes to knowing which “dirty tricks” can be used to take
advantage of an architecture's strengths.

A DSP that has an orthogonal instruction set is simpler to program and optimize
than a DSP whose commands work only on specific ALU registers.

3-2
Program Execution

DSP architecture constrains the type of operations and instructions that may be
performed by the processor.

A DSP's instruction set has a profound influence on the processor's suitability for
different tasks.

Not all instructions found in one DSP have analogs in other DSPs using different
architectures. The instruction used with a given architecture must be natural and
efficient.

Processor architectures also have a profound influence on the suitability of a DSP


for certain tasks. As a result, different types of DSP architectures are used for
different types of tasks.

There are many basic similarities between the DSP architectures available. These
similarities are those that have been covered or that will be covered in this course.

& CALU configured for digital signal processing


& specialized instruction set
& multiple memory banks and buses
& specialized addressing modes
& specialized execution control
& peripherals specialized for digital signal processing

These similarities are present because of the effort to provide high throughput with
key signal processing applications.

Traditionally, high throughput was provided by specialized hardware (designed to


accelerate multiplications) and a dual-bus (Harvard) memory architecture.

Any improvements since then have been made with incremental enhancements.

Examples of incremental enhancements that have been made to the basic Harvard
architecture are the addition of an instruction cache and a memory block accessible

3-3
Program Execution

by both the program and data bus (program/data memory). Both of these
improvements were made to the TMS320C50 DSP.

The need, in recent years, to make faster and better DSPs has permitted new
architectures to appear.

Processor performance can be improved by using faster clock speeds, however,


the process has limits.

To increase performance beyond these limits, newer types of DSP architectures


are being made. These designs focus primarily on increasing the amount of useful
work that gets done every clock cycle.

Improving the instruction execution rate is done by making modifications to the way
the Program Controller unit and the pipeline operate.

VLIW and Superscalar architectures are some of the non-traditional DSP


architectures that have begun being designed for programmable DSPs. They
provide a 80% to 100% average increase in performance over traditional DSP
architectures.

DSP processors have used complex, compound instructions that allow a


programmer to encode multiple operations in a single instruction.

A MAC instruction (Multiply and ACcumulate) is an example of a complex


compound-instruction.

Processors using complex compound-instructions are limited to issuing and


executing one instruction per instruction cycle.

By issuing a single instruction and using a complex-instruction approach, DSP


processors have achieved very strong signal processing performance without
requiring a large amount of program memory.

3-4
Program Execution

Newer designs such as the VLIW and Superscalar architectures use parallelism
to increase the number of instructions executed per instruction cycle.

These types of architectures when implemented within DSPs are made to execute
multiple RISC-like instructions during each clock cycle.

As opposed to a complex-instruction, a RISC instruction performs a basic operation


such as a data move.

A RISC instruction set is small, certain instructions (those that provide similar
operations) are eliminated. The process of elimination is based upon careful
quantitative analyses leading in the end to higher performance.

VLIW and Superscalar architectures are similar. VLIW instruction parallelism is


established and conducted by the compiler. Instruction scheduling is done at
compile time.

3-5
Program Execution

Superscalar architectures schedule operation parallelism at run time, making it


invisible to users. By so doing, the Superscalar design adds an amount of hardware
overhead to the CPU. This is not true of VLIW, where the burden is placed on the
compiler.

EQUIPMENT REQUIRED

In order to complete the following exercises, you will need:

& F.A.C.E.T. base unit


& DIGITAL SIGNAL PROCESSOR circuit board
& C5x VDE program
& Ex3_1 and ex3_2 assembler and program files
& Oscilloscope
& Function generator

3-6
Exercise 3-1

The Program Controller

EXERCISE OBJECTIVES

Upon completion of this exercise, you will be familiar with the function of the
hardware and software features that digital signal processors have evolved to
handle program control.

DISCUSSION

All digital signal processors, and for that matter general-purpose processors, have
a specialized unit dedicated to executing the current instruction and determining the
next instruction to execute.

Within the TMS320C50 DSP ('C50), the unit is known as the Program Controller.

DSP Program Controllers have evolved efficient hardware features to rapidly


execute instructions. Program controller hardware is said to have low-overhead
(zero-overhead).

The Program Controller decodes instructions, manages the pipeline, stores the
central processing unit (CPU) status, and decodes conditional operations.

3-7
The Program Controller

The following software mechanisms are managed by a DSP Program Controller.

branch
subroutine
reset
interrupt
repeat
conditional processing

The software mechanisms listed above, though not unique to all digital signal
processors, are used for program control. By using these software mechanisms
a DSP programmer is in fact, using specialized hardware features that belong to
the Program Controller.

The specialized hardware in question can be categorized as follows:

Program Counter register


stack support
repeat counters
program counter-related hardware
status registers

The Program Counter register (often abbreviated PC register) holds the program
memory address of the next instruction to be fetched and executed by the program
controller.

The content of the program counter is updated every instruction cycle. Depending
on the previous instruction executed, the surrounding hardware (the program
counter-related hardware) usually increments the program counter by one.

In certain cases, the program controller is loaded with an entirely different program
memory address.

3-8
The Program Controller

These cases occur when the previous instruction executed was a program control
instruction, such as a call, a return from subroutine, or an interrupt service
routine (ISR).

Digital signal processors use stacks to save and return address and status
information during subroutines and interrupt service routines (ISR).

A stack can consist of any memory device. Most often DSPs provide at least one
of three kinds of stack support:

A stack can consist of any memory device. Most often DSPs provide at least one
of three kinds of stack support:

shadow registers

3-9
The Program Controller

A stack can consist of any memory device. Most often DSPs provide at least one
of three kinds of stack support:

hardware stack
software stack

Every time that an interrupt is executed an interrupt context save is initiated. The
content of key DSP registers are saved to their respective backup registers
(shadow registers).

The values in the copied registers are still available to the interrupt service routine
(ISR) code but after context save they are protected while held in the shadow
registers. The shadow registers are copied back to the CPU registers when the
return from subroutine instruction is given.

Context save and restore is automatic and reduces DSP ISR overhead; The
programmer avoids including the save and restore operations as instructions in the
ISR code.

The hardware stack is used during interrupts and subroutines to save and restore
the content of the Program Counter register. The programmer usually does not
have control over the hardware stack.

The hardware stack is not used except invisibly during subroutine calls, interrupt
service routines and repeat instructions.

When a subroutine is called, an interrupt occurs or a repeat is executed, the current


contents of the Program Counter register (the return address) is automatically
saved to the stack (pushed on to the stack).

When a return operation occurs, the return address is retrieved from the stack
(popped from the stack) and loaded into the Program Counter.

The key advantage of a software stack over a hardware stack is that its depth can
be configured by the programmer. This can be done by simply reserving an
appropriately sized section of memory.

Hardware stacks, in contrast, are usually fairly shallow and the programmer must
carefully guard against stack overflow (by avoiding nesting of too many interrupts
or subroutines).

Which of the following types of stacks is used by DSPs during an interrupt context
save?

a. software stack
b. hardware stack
c. interrupt service routine
d. shadow register

DSP algorithms frequently involve the repetitive execution of a small number of


instructions. Such operations are required in FIR and IIR filters, FFT and matrix
multiplication algorithms (these are different types of signal processing operations).

3-10
The Program Controller

To eliminate looping overhead in DSPs, Program Controllers have been designed


with circuits capable of repetitively executing a small number of instructions. The
operation they perform is often called hardware looping.

The following registers in the 'C50 are used by hardware loops.

The Program Counter acts as the instruction cache that stores the instruction to be
repeated during single-instruction hardware looping.

The RePeaT Counter register holds the count on the number of times the
instruction held in the instruction cache must be repeated during single-instruction
hardware looping.

The multiple-instruction hardware looping registers used in the 'C50 (BRCR, PASR,
and PAER) are used for control and status of the hardware loop.

Hardware loops, as opposed to software loops, lose no time incrementing or


decrementing an index.

Example: Difference in the overhead required for a software and a hardware loop.

B = 16 RPT #16
LOOP: MAC H0, X0 MAC H0, X0
B=B-1
Branch to LOOP

The above examples implement an FIR filter. The one on the left uses a software
loop and the one on the right uses a hardware loop (done using the RPT
instruction).

The hardware loop executes the RPT instruction only once and then automatically
repeats 16 times the multiply and accumulate instruction. Hardware looping
overhead is reduced compared with software looping overhead.

3-11
The Program Controller

The hardware loop, in this example, executes a single-instruction several times. It


is a single-instruction hardware loop.

During a single-instruction hardware loop, after the repeat (RPT) instruction is


executed:

The microcode instruction to be repeated is loaded into the instruction cache (PC
register), and a counter (RPTC) is loaded with the value of the number of times the
instruction is to be repeated.

During the loop, the Program Counter acting as the instruction cache supplies to
the Program Controller, the instruction to be executed.

Many instructions that take two or more cycles to execute will only take one when
executed from within a hardware loop that uses an instruction cache.

All DSPs use instruction caches (that are 1-word deep) to implement single-
instruction hardware loops, however, not all DSPs use multi-word instruction
caches to implement multi-instruction hardware loops.

Multi-instruction hardware loops that don't use an instruction cache must re-read
the instructions being repeated each time the processor (Program Controller)
proceeds through the loop.

By not using an instruction cache and needing to re-read repeated instructions, the
program bus cannot be freed. This means that instructions that execute more
rapidly in single-instruction hardware loops won't in multi-instruction hardware loops
without instruction caches.

3-12
The Program Controller

Based on your current knowledge of hardware loops which of the following is true?

a. Multi-instruction loops are limited by the number of instructions that can be


repeated.
b. Single-instruction and multi-instruction loops are often limited by the
minimum number of times that a loop can be repeated.
c. Single-instruction and multi-instruction loops are often limited by the
maximum number of times that a loop can be repeated.
d. All of the above.

Hardware loops have certain limitations associated with them that are not
necessarily associated with software loops.

& The number of instructions repeated in multi-instruction loops might be limited


by a maximum value.

& The minimum and the maximum number of times a loop can be repeated for
both single- and multi-instruction loops might also be limited.

The fallbacks of traditional software approaches to repeated instruction execution,


however, are that:

& Branch instructions typically require several instruction cycles to execute. The
processor must usually use a register to maintain the loop index, which is the
count of the number of times the instruction(s) to be repeated must still be
executed.

& The processor data path must then be used to increment or decrement the index
and test to see if the loop condition has been met.

To avoid these problems, DSP processors have evolved special hardware control
constructs that repeat either a single instruction or a group of instructions a number
of times.

3-13
The Program Controller

As stated, the primary role of the Program Controller is to determine the next
instruction to be executed. Interrupts are used to signal to a processor both external
(a push-button is pressed) and internal (a word is received through the serial port)
events.

All DSPs use interrupts and most use interrupts as their primary means of
communicating with peripherals.

An interrupt is an external event that causes the processor to stop executing its
current program and to branch to a special block of code called an interrupt service
routine (ISR).

The ISR code, once called, typically deals with the source of the data that signaled
the interrupt.

E.g., if a word is received through the serial port, an interrupt is signaled. The ISR
will execute the necessary code to process the word.

Once an interrupt is signaled to the Program Controller, a branch instruction is


executed and the Program Counter is loaded with the pma of a special block of
code (often called an interrupt vector).

Interrupts can be disabled. In fact, this occurs during DSP initialization, ISRs, and
single-instruction hardware loops.

It is the Program Controller that disables interrupts for the duration of single-
instruction hardware loop execution. A direct consequence of this inability to access
an interrupt is that a programmer must carefully consider the maximum interrupt
lockout-time that can be accepted.

3-14
The Program Controller

Most processors, including DSPs, sample the status of the interrupt lines every
instruction cycle. The processor uses status registers to signal interrupts (once
sampled) and other information to the Program Controller.

From the previous discussion and procedure sections in this manual you have
become acquainted with a few of the status and control registers of the
TMS320C50 DSP.

Though, not all DSPs can be said to have the same number of status and control
registers, it is true that all DSPs do contain these types of registers.

Many of the registers used by the 'C50 Program Controller and CPU have
equivalent counterparts in other DSPs.

PROCEDURE

Software Program Control

In this procedure section, you will assemble a program using individual pieces of
pre-programmed source code.

* 1. Open the ex3_1.asm file with an ASCII text editor.

* 2. Briefly familiarize yourself with the contents of the file.

The ex3_1.asm file is an incomplete DSP program source file. It contains


pieces of code (mostly subroutines and ISRs) that must be joined together
with missing source statements.

Once the source file is completed and compiled the DSP circuit will be able
to be used as an audio effects generator.

3-15
The Program Controller

The memory initialization directives (.ds and .ps) for the audio effects
generator have not been set.

* 3. Locate within the ex3_1.asm file, the CODE#1 label. Replace the label with
the following memory initialization directive:

<TAB> <TAB> .mmregs

* 4. Locate within the ex3_1.asm file, the CODE#2 label. Replace the label with
the following statement:

<TAB> <TAB> .ds <TAB> 00300h

* 5. Locate within the ex3_1.asm file, the CODE#3 label. Replace the label with
the following statement:

<TAB> <TAB> .ps <TAB> 080Ah

* 6. Locate within the ex3_1.asm file, the CODE#5 label. Replace the label with
the following statements:

<TAB> <TAB> .ps <TAB> 0FE00h


<TAB> <TAB> .entry

The above memory initialization directives instruct the assembler to what


addresses within the DSP that the program code and data should be
stored.

* 7. Save the source file to your personal student folder as: ex3_1v2.asm

CALL
B
RET
BCND
RPT
RPTB
RETE
IDLE

The missing statements to be added to the source file consist either of ISR
and subroutine labels or of different software program control instructions
(such as the ones shown above).

A program control instruction has the effect, once executed, of modifying


the contents of the Program Counter.

3-16
The Program Controller

The assembled program when loaded into the DSP will generate 3 audio
effects. This implies that at least three subroutines will be used by the
program.

& An echo effect


& A flanger effect
& A voice effect

The MAIN program loop may be interrupted by one of four interrupts.

& RINT
& XINT
& INT1#
& INT3#

* 8. Locate within ex3_1v2.asm the CODE#7 and CODE#8 labels.

* 9. Replace the labels by the following statements:

CODE#7: --> MAIN:


CODE#8: --> <TAB> B <TAB> MAIN

These statements instruct the Program Controller to continue executing


NOP (No OPeration). This will continue until an interrupt is signaled to the
DSP.

See HELP Unit 03 shelp2

Note that before a branch (B) instruction can be used, a label must exist
to refer to the location of the branch.

3-17
The Program Controller

As stated, the effects generator program uses 4 interrupts. Thus, the program
Vector Table must contain 4 source statements. Each Vector Table source
statement is a branch instruction that directs the Program Controller to execute the
respective interrupt service routines.

* 10. Locate within ex3_1v2.asm the CODE#4 label.

* 11. Within ex3_1v2.asm, replace the CODE#4 label with the following:

<TAB> .ps <TAB> 00802h


int1: <TAB> B <TAB> SEL_EFFECT
<TAB> .ps <TAB> 00806h
int3: <TAB> B <TAB> READ_DIP

* 12. Locate within ex3_1v2.asm the CODE#9, CODE#14, and CODE#16


labels.

These labels respectively correspond to the RECEIVE, SEL_EFFECT, and


READ_DIP interrupt service routine (ISR) program start addresses.

See HELP Unit 03 shelp13

* 13. Replace each of the ISR CODE labels with its ISR label

(RECEIVE: , SEL_EFFECT: , READ_DIP:).

* 14. Locate within ex3_1v2.asm the CODE#18, CODE#20, CODE#23,


CODE#25, and CODE#27 labels.

3-18
The Program Controller

These labels respectively correspond to the ECHO, FLANGER, VOICE,


CODECINIT and AIC_2ND subroutine program start addresses.

* 15. Replace each of the CODE labels with its subroutine label

(ECHO: , FLANGER: , VOICE: , CODECINIT: , AIC_2ND:).


E.g.: CODE#23: --> VOICE:

To begin executing an interrupt service routine an interrupt must be


signaled, from the MAIN code to the DSP. Only then is the corresponding
interrupt vector from the Vector Table taken.

To reach anyone of the subroutines either a CALL or a B must be made.

* 16. Locate within ex3_1v2.asm the CODE#6 label.

The labeled program memory address corresponds to where a


CODECINIT subroutine CALL instruction should be inserted.

* 17. Replace the label with the following source statement:

<TAB> CALL <TAB> CODECINIT

* 18. Locate within ex3_1v2.asm the CODE#10, CODE#11, CODE#12, and


CODE#13 labels.

The labels respectively correspond to the program memory addresses


where the RECEIVE ISR conditionally branches off to one of the effects
subroutines (echo, flanger, or voice).

3-19
The Program Controller

* 19. Respectively replace the labels with the following source statements:

<TAB> BCND <TAB> ECHO, EQ


<TAB> BCND <TAB> FLANGER, EQ
<TAB> BCND <TAB> VOICE, EQ
<TAB> RETE

The last source statement is a return (RETE) from the RECEIVE


subroutine instruction. Every ISR and subroutine requires a return to main-
code instruction.

* 20. Within ex3_1v2.asm replace the CODE#15, CODE#17, CODE#19,


CODE#22, and CODE#24 labels with RETE instructions.

These labels respectively correspond to the return to main code


instructions for the SEL_EFFECT, READ_DIP, ECHO, FLANGER, and
VOICE code.

* 21. Within ex3_1v2.asm replace the CODE#26 and CODE#32 labels with RET
instructions.

The RET instruction, as opposed to the RETE instruction, does not re-
enable interrupts when executed. Interrupts must be disabled during
initialization.

* 22. Locate within ex3_1v2.asm the CODE#21 label.

* 23. Replace the label with the following instruction:

<TAB> RPT <TAB> #512

This label is located within the FLANGER subroutine. The DMOV


instruction must be executed 513 times (to move the flanger time-delayed
samples) this can be done with a single-instruction hardware loop
implemented with the RPT instruction

* 24. Locate within ex3_1v2.asm the CODE#28, CODE#29, CODE#30, and


CODE#31 labels.

* 25. Replace the labels with the following instruction:

<TAB> IDLE

3-20
The Program Controller

These labels are where the AIC_2ND subroutine must pause processing
(IDLE) to wait for acknowledgment from the serial port (using the XINT
interrupt) that a word has been sent to the CODEC.

* 26. Save the ex3_1v2.asm source file.

* 27. Assemble the file, from within your student directory. Execute from your
student folder the following command at a DOS prompt:

c:\lv91027\bin\dsk5a.exe ex3_1v2.asm -l

See HELP Unit 03 shelp3

Effects Generator

In this procedure section, you will load and use the Effects Generator program.

Note: Before using the C5x VDE please make certain the circuit
board power source is turned ON, and that the serial connection
is present between the host computer and the DIGITAL SIGNAL
PROCESSOR circuit block labeled SERIAL PORT.

* 28. Make the circuit board connections shown in the figure. These are the
connections to be made to properly operate the Audio Effects Generator
program.

* 29. Open the C5x VDE.

3-21
The Program Controller

* 30. Load the ex3_1v2.dsk program into the DSP. This file is located in your
student directory.

* 31. Execute the RUN command found on the C5x VDE Toolbar.

I/O INTERFACE AUDIO EFFECTS


EFFECT OF PRESSING INT3#
VALUE DISPLAYED MODE

---0 Echo Transmits the DIP switch value (millisecond


delay between echoes) to the DSP.

---1 Flanger Transmits the DIP switch value (degree of


flanging) to the DSP.

---2 Voice No effect.

* 32. Talk into the microphone and familiarize yourself with the Effects
Generator program.

INT1# push button: Changes between the effects.


INT3# push button: Reads the DIP switch.

* 33. Halt the DSP program and close the C5x VDE session when finished using
the ex3_1v2 program.

Interrupt Context Save

In this procedure section, you will familiarize yourself with the basic uses of a
context save.

* 34. Open the C5x VDE and load ex3_1v2.dsk into the DSP.

* 35. Create a breakpoint at the program memory address labeled MAIN.

* 36. Press the RUN command found on the C5x VDE Toolbar, this will execute
the DSP initialization code.

3-22
The Program Controller

To demonstrate an interrupt context save we will force a specific interrupt


to occur. The TRANSMIT ISR is executed every time that the XINT
interrupt is signaled.

XINT is signaled when a sample is transmitted from the DSP serial port
register(DXR) to the CODEC.

* 37. To force the XINT interrupt to occur, enable the XINT interrupt by editing
the IMR register to 0x0027.

* 38. Edit the ACC to AAAA AAAA h.

* 39. Create a breakpoint at the program memory address labeled TRANSMIT.

* 40. Execute the RUN command found on the C5x VDE Toolbar.

The XINT interrupt vector is taken and the TRANSMIT ISR is begun. At this
point, because the ISR has been executed, key DSP registers have been
saved to their shadow registers (this process is called context save).

* 41. Edit the ACC to BBBB BBBB h.

* 42. Using the C5x VDE STEP OVER command, execute the RETE command
to exit the ISR and return to the MAIN program loop.

3-23
The Program Controller

Note that the content of the ACC register has returned to AAAA AAAAh,
the value it had before entering the TRANSMIT ISR.

The context save process stored the contents of key CPU registers
(including ACC) before entering the subroutine and returned them on exit.

* 43. Close the C5x VDE session.

CONCLUSION

& The primary role of a Program Controller is to determine the next instruction to
be executed.

& A Program Controller is made up of a program counter register, program


counter related hardware, stack support, repeat counters and status registers.

& The Program Counter register holds the program memory address of the next
instruction to be fetched by the Program Controller and executed.

& DSPs use stacks to save and return address and status information during a
subroutine or an interrupt service routine (ISR).

& Interrupts are used to signal to a processor both external and internal events.

& Most DSPs use interrupts as their primary means of communication with
peripherals.

3-24
The Program Controller

REVIEW QUESTIONS

1. Which of the following software constructs does not require a Program


Controller with stack support?

a. branch
b. subroutine
c. repeat
d. interrupt

2. Which of the following hardware features is not part of a Program Controller?

a. status registers
b. hardware stack
c. repeat counters
d. program bus

3. Which of the following registers holds the program memory address of the next
instruction to be fetched and executed by the Program Controller?

a. Program Counter register (PC)


b. RePeaT Counter register (RPTC)
c. The ACCumulator (ACC)
d. Status register 1 (ST1)

4. Which of the following types of events are signaled using an interrupt?

a. A word is transmitted by the DSP.


b. A word transmitted by the CODEC is received by the DSP.
c. A push-button is pressed.
d. All of the above.

5. What is the process called, where DSPs use stacks to save and return address
and status information during a subroutine or an interrupt service routine?

a. Call and return.


b. Hardware looping.
c. Context save.
d. Hardware nesting.

3-25
3-26
Exercise 3-2

The Pipeline

EXERCISE OBJECTIVES

Upon completion of this exercise, you will know the advantages of having a deep
pipeline as used in DSPs. You will be familiar with the layout of a pipeline
reservation table and be able to use it to solve pipeline conflicts.

DISCUSSION

In everyday life, whenever a large task must be done rapidly it is divided into
smaller tasks and then distributed among workers. A process whose beginnings
took place in the early 20th century and which is now known as the assembly line.

By dividing the task into smaller operations and working on each operation
separately yet at the same time, the overall task to be completed is finished much
faster.

Digital Signal Processors (nearly all on the market) use the same process to
execute program instructions. The process, when used within a processor, is given
the name of pipelining.

By dividing the sequence of operations into smaller pieces, and by executing the
pieces in a pipeline, processor performance is increased. The number of
instructions executed per unit time is increased without changing the total time
required to execute an instruction.

3-27
The Pipeline

Pipelining, though meant to improve performance, can complicate programming.


For example,

& On some processors, pipelining causes certain instruction sequences to execute


more slowly.

& On other processors, certain instruction sequences must be avoided for correct
operation of the program.

Pipelining represents a trade-off between efficiency and ease of use.

To illustrate how pipelining increases performance, consider the Texas Instruments


TMS320C50 ('C50) DSP. It uses four separate execution units to process a single
instruction word. The units accomplish the following actions:

& Fetch
& Decode
& Read
& Execute

Because four actions are implemented per instruction word this pipeline is said to
be a 4-level pipeline.

A DSP possesses a 5-level pipeline. How many pipeline execution units are
simultaneously used during an instruction cycle?

a. 5
b. 1
c. 4
d. None of the above.

3-28
The Pipeline

Most DSPs are pipelined, pipeline depth may vary between the types of DSPs.

DSPs when compared to general-purpose processors have on average deeper


pipelines.

A processor with an ideal (no problems occur during instruction execution) N-level
pipeline, will have the number of instructions that it can execute per instruction
cycle approximately increased by a factor N, compared with the same processor
not using a pipeline.

However, processor performance begins to drop when the pipeline becomes too
large; The time required to control the pipeline execution stages becomes too large.

1 2 3 4 5 6 7 8

FETCH I1 I2

DECODE I1 I2

READ I1 I2

EXECUTE I1 I2

The figure illustrates the operation of four processor execution units when used
sequentially. The hardware associated with each phase of instruction execution is
left idle 75% of the time.

Parallel execution of the different hardware execution units is not possible in this
case.

A processor using the execution units sequentially, does not begin processing the
next instruction until the current instruction has been executed.

3-29
The Pipeline

If a clock cycle occurs every 50 ns (as it does with the 'C50 found on the DSP
circuit board), then in this case an instruction takes 200 ns to complete.

1 2 3 4 5 6 7 8

FETCH I1 I2 I3 I4 I5 I6 I7 I8

DECODE I1 I2 I3 I4 I5 I6 I7

READ I1 I2 I3 I4 I5 I6

EXECUTE I1 I2 I3 I4 I4

A pipelined implementation of the processor, as shown in the figure, results in the


overlap of the various stages of execution (pipeline stages). The execution stages
now work in parallel; While one stage is fetching the next instruction, another is
decoding the previous instruction, and so on.

Because these operations (Fetch, Decode, Read, and Execute) are done in
parallel, the instruction cycle times are much shorter than they are when the
operations are executed sequentially.

A subtle point about pipelining is that an instruction may be spread out over multiple
instruction cycles, and yet still appear to the programmer to execute in one
instruction cycle.

In reality, an instruction is executed every instruction cycle, though, it takes each


individual instruction, 4 clock cycles to execute.

In our example, an instruction is executed every 50 ns when using the pipelined


approach. That is, it is executed four times faster than with a non-pipelined
(sequential) implementation.

The figure demonstrates the operation of a four-level pipeline used for single-
instruction execution.

3-30
The Pipeline

When all four stages operate in parallel as shown in the figure (no problem occurs
between the stages), the execution sequence is referred to as a perfect overlap.
There is 100% utilization of the processor Execute stage.

1 2 3 4 5 6 7 8

FETCH I1 I2 I3 I4 I5 I6 I7 I8

DECODE I1 I2 I3 I4 I5 I6 I7

READ I1 I2 I3 I3/I4 I5 I6

EXECUTE I1 I2 I3 I4 I4

Unfortunately when used in applications, DSP (and general-purpose processor)


pipelines may not provide the same level of performance as a perfectly overlapping
pipeline.

No priority is given between the four stages of a pipeline. Therefore, when more
than one pipeline stage requires processing on the same resource (such as a data
bus, a memory address, or a CPU register) a pipeline conflict occurs.

An important pipeline characteristic is that by increasing the depth of a pipeline, the


chance of occurence of ressource contention is increased and by the same token
the programming complexity level needed to avoid pipeline conflicts is also
increased.

A pipeline conflict is a situation encountered due to the natural operation of the


pipeline, the occurrence of such a situation reduces processor performance.
Pipeline conflicts can be avoided by careful programming.

A pipeline conflict can be categorized into one of three different classes:

& Structural conflicts


& Data conflicts
& Control Conflicts

A structural conflict occurs during a given instruction cycle because two or more
phases of a pipeline require the same hardware resource, such as data bus use,
register access or memory block access.

A data conflict (also called a pipeline hazard) occurs when a dependance between
instructions exists and, because of the pipeline, a data operand is not provided to
an instruction at the appropriate time.

This type of situation could occur because certain processor register modifications
could only happen during certain pipeline phases.

The 'C50 may only make modifications to the ARAU registers during the Read
phase of the pipeline. An instruction using and modifying an ARAU register and
depending on the previous instruction for the contents of the register will cause a
pipeline data conflict.

3-31
The Pipeline

A control conflict occurs when a conditional software control instruction is executed.


The execution of the program memory addresses sequentially following the
conditional instruction must be suspended until the conditional instruction has been
executed.

Though the previously described situations are given the name of conflict, proper
program execution is not necessarily halted. In fact, to avoid resource contention,
and in the process pipeline conflicts, programmable DSPs can use three
fundamentally different techniques:

& interlocking
& time-stationary coding
& data-stationary coding

A fine line exists between the different techniques listed above. Interlocking is a
type of pipeline behavior. The pipeline reacts in certain manners when confronted
with certain situations, such as pipeline conflicts. Where as time-stationary and
data-stationary coding are programming models, that is, code formats used by a
programmer.

The TI TMS320C5x family of DSPs use interlocking, while most members of the
AT&T DSP16/16A family use time-stationary coding. The members of the AT&T
DSP32/32C family use data-stationary coding.

As with most definitions of technical concepts, the boundaries between the


techniques used among the various DSPs for avoiding pipeline conflicts are not
rigid; Most DSPs use a flavor of all three techniques.

1 2 3 4 5 6 7 8

FETCH

DECODE

READ

EXECUTE

Interlocks are not always easy to spot. Pipeline operation and interlocks can be
visualized using a reservation table.

The columns are divided between the many instruction cycles that are executed by
the DSP. The operations held in a column occur at the same time; Progression
through time is from left to right.

The four rows correspond to each of the four pipeline stages. The reservation table
for a 5-level pipeline would have five rows.

3-32
The Pipeline

1 2 3 4 5 6 7 8

FETCH I1 I2 I3 I4 I5 I5 I6 I8

DECODE I1 I2 I3 I4 I4 I5 I8

READ I1 I2 I3 I3 I4 I5

EXECUTE I1 I2 I3 NOP I4

One solution to pipeline conflicts is interlocking. An interlocking pipeline delays the


progression of an instruction through a pipeline stage. This occurs when the
instruction is in resource contention with the pipeline stage of another instruction
(structural conflict).

The figure demonstrates an interlocked pipeline. In this case, resource contention


between instruction 3 and 4 (I3 and I4) would have occurred in the Read stage of
the pipeline.

Consequently, instruction 4 (I4) was delayed (interlocked) allowing time for the I3
instruction to complete its READ stage. A NOP (No OPeration) instruction is
executed where the now delayed I4 instruction was supposed to have executed.

An interlock always incurs a one cycle penalty. Thus, instruction execution time on
a processor with an interlocking pipeline may vary depending on the instructions
found before and after it in the program.

If interlocking is not used during resource contention situations then erroneous or


unintended results may be produced by the conflict.

An interlocking pipeline implies a precise design philosophy, which can be stated


as follows:

& The programmer should not be bothered with the internal timing or parallelism
of a processor architecture.

The assembly language for a processor with an interlocking pipeline is flexible


enough to allow a programmer to assume that every action specified in an
instruction completes before the next instruction begins.

In essence, a processor may be pipelined but it should not appear as so to the


programmer.

Processors that use interlocking to avoid resource contention have a pipeline that
is essentially invisible to the programmer. However, in certain cases such as the
pipeline data conflict (or pipeline hazard) the pipeline is no longer transparent.

The 'C50 pipeline is essentially invisible to the programmer except in the following
cases: auxiliary register updates, memory-mapped accesses of the CPU registers,
the NORM instruction, and memory configuration commands.

3-33
The Pipeline

1 2 3 4 5 6 7 8

FETCH I4 - - - ISR1 ISR2 ISR3 ISR4

DECODE I3 INTR - - - ISR1 ISR2 ISR3

READ I2 I3 INTR - - - ISR1 ISR2

EXECUTE I1 I2 I3 INTR NOP NOP NOP ISR1

When an interrupt occurs, almost all processors allow instructions at the decode
stage or further in the pipeline to finish executing because these instructions may
be partially executed.

What occurs past this point varies from processor to processor.

The figure illustrates the 'C50 pipeline during an interrupt.

& One cycle after the interrupt is recognized the processor inserts an INTR
instruction into the pipeline.

& The INTR causes a four-instruction delay before the first word of the interrupt
vector (ISR1 in the diagram) is executed.

& The instructions (I1, I2, I3) that were at or past the decode stage in the pipeline
when the interrupt was recognized are allowed to finish their execution.

& Instruction I4 is discarded, but will be refetched on return from interrupt.

The pipeline often increases a processor's interrupt response time (interrupt


latency), much as it slows down branch execution.

PROCEDURE

Pipeline Reservation Table

In this procedure section, you will complete the pipeline reservation tables for two
different instruction sets.

* 1. Open ex3_2.asm with an ASCII text editor.

See HELP Unit 03 shelp14

* 2. Briefly familiarize yourself with the description and contents of the source
file.

Ex3_2.asm is the same source file as ex1_2.asm (the Sine Wave


Generator) except that specific source statements have been removed
from the code. The source statements were taken out to cause a pipeline
data conflict (pipeline hazard).

3-34
The Pipeline

* 3. Locate within ex3_2.asm the PIPELINE1 label.

Under the label are the following instructions:

SAMM DXR
(store ACC to DXR)
LT INDX
(load TREG0 with INDX)
MPY #0Ah
(multiply 0Ah with TREG0 and store to PREG)
LDP #TEMP
(DP = memory data page belonging to TEMP)
SPL TEMP
(store low PREG to TEMP)

Each of these instructions must pass through the DSP pipeline stages.

1 2 3 4 5 6 7 8

FETCH SAMM LT A1 LDP SPL

DECODE SAMM LT MPY LDP SPL

READ A2 LT MPY LDP SPL

EXECUTE SAMM LT A3 LDP SPL

* 4. Complete the above reservation table for the instructions that follow the
PIPELINE1 label by answering these question.

Which of the following choices is the correct content of table cell A1?

a. LDP
b. #0Ah
c. INDX
d. MPY

1 2 3 4 5 6 7 8

FETCH SAMM LT MPY LDP SPL

DECODE SAMM LT MPY LDP SPL

READ A2 LT MPY LDP SPL

EXECUTE SAMM LT A3 LDP SPL

3-35
The Pipeline

Which of the following choices is the correct content of table cell A2?

a. NOP
b. SAMM
c. LT
d. SPL

1 2 3 4 5 6 7 8

FETCH SAMM LT MPY LDP SPL

DECODE SAMM LT MPY LDP SPL

READ SAMM LT MPY LDP SPL

EXECUTE SAMM LT A3 LDP SPL

Which of the following choices is the correct content of table cell A3?

a. MPY
b. LT
c. #0Ah
d. #TEMP

1 2 3 4 5 6 7 8

FETCH SAMM LT MPY LDP SPL

DECODE SAMM LT MPY LDP SPL

READ SAMM LT MPY LDP SPL

EXECUTE SAMM LT MPY LDP SPL

You have completed the 4-level pipeline reservation table for the one-word
instructions found after the pma labeled PIPELINE1.

* 5. Locate within ex3_2.asm the PIPELINE2 label.

Under the PIPELINE2 label is an ADD instruction:

ADD #8192,0
(ACC = ACC + 8192 h)

3-36
The Pipeline

The instruction uses long immediate addressing and is thus, a two-word


instruction. The first word is the ADD instruction and the second word is
the operand (#8192) to be added to the accumulator. Under the ADD
instruction are the following two instructions:

RPT #4
(hardware repeat SFR 5 times)
SFR
(shift the accumulator one bit right)

1 2 3 4 5 6 7 8

FETCH ADD B1 RPT SFR SFR SFR SFR B3

DECODE ADD NOP RPT SFR SFR SFR SFR

READ ADD NOP RPT SFR SFR SFR

EXECUTE ADD NOP RPT B2 SFR

* 6. Complete the above reservation table for the instructions that follow the
PIPELINE2 label by answering these question.

Which of the following choices is the correct content of table cell B1?

a. NOP
b. #4
c. #8192
d. ADD

1 2 3 4 5 6 7 8

FETCH ADD #8192 RPT SFR SFR SFR SFR B3

DECODE ADD NOP RPT SFR SFR SFR SFR

READ ADD NOP RPT SFR SFR SFR

EXECUTE ADD NOP RPT B2 SFR

Which of the following choices is the correct content of table cell B2?

a. RPT
b. #4
c. NOP
d. SFR

3-37
The Pipeline

1 2 3 4 5 6 7 8

FETCH ADD #8192 RPT SFR SFR SFR SFR B3

DECODE ADD NOP RPT SFR SFR SFR SFR

READ ADD NOP RPT SFR SFR SFR

EXECUTE ADD NOP RPT SFR SFR

See HELP Unit 03 shelp4

Which of the following choices is the correct content of table cell B3?

a. SFR
b. NOP
c. [no instructions are fetched]
d. ADD

1 2 3 4 5 6 7 8

FETCH ADD #8192 RPT SFR SFR SFR SFR SFR

DECODE ADD NOP RPT SFR SFR SFR SFR

READ ADD NOP RPT SFR SFR SFR

EXECUTE ADD NOP RPT SFR SFR

You have completed the 4-level pipeline reservation table for a two-word
instruction and a repeat loop.

Correcting a Pipeline Conflict

In this procedure section, you will identify and correct the source statements
causing a pipeline conflict.

* 7. Connect the OUTPUT of the DC SOURCE to the ANALOG INPUT of the


CODEC.

* 8. Connect the ANALOG OUTPUT of the CODEC to the INPUT of the AUDIO
AMPLIFIER and to an oscilloscope input channel.

Note: Before using the C5x VDE please make certain the circuit
board power source is turned ON, and that the serial connection
is present between the host computer and the DIGITAL SIGNAL
PROCESSOR circuit block labeled SERIAL PORT.

* 9. Open the C5x VDE, and load the ex3_2.dsk program file into the DSP.

3-38
The Pipeline

* 10. Execute the RUN command found on the C5x VDE Toolbar.

* 11. Use the AUDIO AMPLIFIER GAIN potentiometer to adjust the volume level
of the generated signal.

* 12. Observe that the sound of the generated sinusoidal signal has an added
noise component.

* 13. Using the potentiometer of the DC SOURCE adjust the frequency of the
generated signal to approximately 500 Hz.

* 14. Adjust the oscilloscope to trigger on the input signal.

* 15. Observe that the DSP generated sinusoidal signal is not a perfect
sinusoidal waveform but contains some higher frequency components.

The incorrectly generated sinusoidal signal is due to a pipeline data


conflict. Missing source statements that corrected the pipeline problem
were removed from the source code. The pipeline data conflict occurs at
the pma labeled CONFLICT1.

* 16. Halt the ex3_2.dsk program and close the C5x VDE session.

3-39
The Pipeline

* 17. Locate within ex3_2.asm the CONFLICT1 label.

Under the CONFLICT1 label are the following instructions:

SAMM AR1
(Store ACC to AR1)
LACC *0+,2,AR1
(Load shifted dma into ACC)

It is these two instructions that cause the pipeline conflict.

1 2 3 4 5

FETCH SAMM LACC ... ... ...

LACC
DECODE ... SAMM ... ...
AR1=AR1+1

READ ... ... SAMM LACC ...

SAMM
EXECUTE ... ... ... LACC
AR1 = ACC

* 18. Observe the above pipeline reservation table describing the execution
sequence for the instructions located after the CONFLICT1 label.

Answer the following question and remember that an ARAU update always
occurs during the Decode stage of the pipeline (the AR1 register is part of
the ARAU).

Which of the following choices explains how the pipeline conflict occurred
at the pma labeled CONFLICT1?

a. AR1 was modified by LACC in the wrong pipeline stage.


b. LACC was executed before SAMM.
c. AR1 was loaded with the contents of ACC.
d. AR1 is modified by LACC before it can be used by SAMM.

3-40
The Pipeline

1 2 3 4 5

FETCH SAMM LACC ... ... ...

LACC
DECODE ... SAMM ... ...
AR1=AR1+1

READ ... ... SAMM LACC ...

SAMM
EXECUTE ... ... ... LACC
AR1 = ACC

A pipeline data conflict occurs when the AR1 register is used by the LACC
and SAMM instructions.

AR1 is meant to be used by SAMM before it is updated by LACC.

However, AR1 is updated by LACC before being used by SAMM.

* 19. What solution should be used to correct for the pipeline conflict?

a. Place 1 NOP (No OPeration) instruction after SAMM.


b. Sequentially place the LACC instruction before the SAMM instruction.
c. Place 2 NOP (No OPeration) instructions after SAMM.
d. Modify the entire program so that SAMM does not require AR1 as an
operand.

1 2 3 4 5

FETCH SAMM NOP NOP LACC ...

LACC
DECODE ... SAMM NOP NOP
AR1=AR1+1

READ ... ... SAMM NOP NOP

SAMM
EXECUTE ... ... ... NOP
AR1 = ACC

* 20. Use the ASCII text editor to make the suggested modification to
ex3_2.asm, and thus correct the pipeline conflict.

That is, add two NOP instructions after the SAMM instruction labeled
CONFLICT1.

3-41
The Pipeline

* 21. Save the file to your personal student folder as:

ex3_2v2.asm

* 22. Assemble the file, from within your student directory. Execute from your
student folder the following command at a DOS prompt:

c:\lv91027\bin\dsk5a.exe ex3_2v2.asm -l

* 23. Make certain the connections shown in the figure are still present on your
DIGITAL SIGNAL PROCESSOR circuit board and between the board and
an oscilloscope.

* 24. Open the C5x VDE, and load the ex3_2v2.dsk program file into the DSP.

* 25. Execute the RUN command found on the C5x VDE Toolbar.

* 26. Observe that this time the sound from the Sine Wave Generator does not
have an added noise component.

3-42
The Pipeline

* 27. Using the potentiometer of the DC SOURCE vary the frequency of the
generated signal to approximately 500 Hz. Adjust the oscilloscope to
trigger on the input signal.

* 28. Observe that the DSP generated sinusoidal signal now has a sinusoidal
waveform.

The signal generated by the corrected code contains no higher frequency


components as compared to the previous signal that was generated by the code
with a pipeline conflict.

* 29. Halt the program and close the C5x VDE session.

3-43
The Pipeline

* 30. Disconnect all connections made on the circuit board and between the
board and the oscilloscope.

CONCLUSION

& A pipeline divides execution of an instruction among several execution units


that operate simultaneously. Once an instruction has passed through one of the
execution units it is passed on to the next until completely executed.

& No priority is given between the four stages of a pipeline. When more than one
pipeline stage requires processing on the same resource a pipeline conflict
occurs.

& Pipeline conflicts can be categorized into three different types: structural
conflicts, data conflicts, and control conflicts.

& Programmable DSPs have three fundamentally different techniques to avoid


resource contention and in the process pipeline conflicts: interlocking, time-
stationary coding, and data-stationary coding.

& Interlocks and pipeline operation can be visualized using a reservation table.

REVIEW QUESTIONS

1. A DSP has a 5-level pipeline. How many instruction cycles does it take to
execute an instruction?

a. 10
b. 4
c. 1
d. 5

2. Which of the following is not a type of pipeline conflict that occurs with pipelined
processors?

a. Interlock conflict
b. Structural conflict
c. Data conflict
d. Control conflict

3. What is the design philosophy behind pipeline interlocking?

a. The processor pipeline must be the least complex possible.


b. The programmer should understand the internal timing of a processor at
all points during the program.
c. The programmer should not be bothered with the internal timing or
parallelism of a processor.
d. The stages of a pipeline must be prioritized.

3-44
The Pipeline

4. During which of the following situations does an interlock occur?

a. Control conflict or a Data conflict


b. Structural conflict or a Data conflict
c. Structural conflict or a Control Conflict
d. None of the above.

5. What is a reservation table used for?

a. Visualization of pipeline operation.


b. Shows how processor resources are used over time.
c. Visualize interlocks.
d. All of the above.

3-45
3-46
Unit Test

1. The following definition corresponds to which term?

An organization of computational hardware in which different stages of the


execution of an instruction proceed in parallel with different instructions.

a. program control
b. pipeline
c. VLIW architecture
d. Program Controller

2. How are the instructions provided by DSPs with traditional architectures


characterized?

a. Long instruction words containing multiple RISC-like instructions.


b. Complex, compound instructions.
c. Small RISC instructions.
d. None of the above.

3. What does parallelism refer to when discussing DSP architectures?

a. Multiple instruction execution during an instruction cycle.


b. CALU working in parallel with the ARAU.
c. An organization of computational hardware.
d. An instruction word containing multiple RISC-like instructions.

4. Which of the following choices is responsible for decoding instructions,


managing the pipeline, storing CPU status, and decoding conditional
operands?

a. Program counter
b. CALU
c. Pipeline
d. Program Controller unit

5. What hardware feature do DSPs use to save and restore address and status
information during a subroutine or an interrupt service routine (ISR)?

a. context save
b. registers
c. stacks
d. hardware repeat loop

3-47
Unit Test (cont’d)

6. The time taken by a processor for operations that do not belong to a user's task
is the definition of what term? The operations usually consist of the allocation
of resources for the execution of the next instruction.

a. overhead.
b. nesting.
c. interlocking.
d. conditional processing.

7. A DSP possesses a 4-level pipeline. How many instructions are fetched per
instruction cycle?

a. 1 instruction
b. 2 instructions
c. 4 instructions
d. 1 instruction is fetched every four instruction cycles.

8. What prevents a pipeline from being perfectly overlapping when used in


applications?

a. Program Controller
b. context saves
c. pipeline conflicts
d. interrupts

9. Interlocks, pipeline operation, and how processor resources are used over time
can be visualized using which of the following?

a. reservation table
b. computer
c. debugger
d. VLIW

10. DSP algorithms frequently involve repetitive execution of a small number of


instructions. What DSP hardware is a design consequence of the above fact?

a. RPT instruction
b. instruction cache
c. zero-overhead hardware looping circuitry
d. program counter-related hardware

3-48
Unit 4

Basic I/O

UNIT OBJECTIVES

Upon completion of this unit, you will have a sound understanding of the methods
used by a DSP for off-chip communication.

UNIT FUNDAMENTALS

Most signal processing applications require the DSP to communicate with the
external (analog) world. In these cases, data must usually be input to and/or output
from the DSP.

Filtering, waveform generation, and companding are examples of signal


processing applications that require communication between the DSP and the
outside world.

4-1
Basic I/O

For any given DSP in existence today, features such as arithmetic performance,
memory bandwidth, addressing modes, execution control, and instruction set
orthogonality were carefully evaluated before a final design was done.

To manage communication with the outside world, as well as ensure real-time


signal processing, today's DSPs have also had to evolve specialized peripherals.

Signal processing applications demand specialized peripherals, the following are


found integrated into most DSPs:

& Synchronous serial ports


& Parallel ports
& Timers
& On-chip Analog to Digital and Digital to Analog converters
& Host ports
& Bit I/O ports
& On-chip DMA controller
& Clock generators

The listed peripherals, when integrated into DSPs, were designed to operate even
when the CPU is in power down mode (idle).

4-2
Basic I/O

Actual communication between the digital signal processor and off-chip circuitry,
devices, or peripherals is made via its input and output pins.

The pins are located on the exterior surface of the DSP integrated circuit package.
Each pin corresponds to the output or input of a key processor signal.

Most DSP package pins are used for external memory interfaces (serial and
parallel port interfaces), some are dedicated to output and input of clock signals,
others still (such as the external interrupt lines or the reset pin), allow for external
devices to assert processor states.

For most DSPs, transmission and reception of data words is done using serial and
parallel ports. The serial and parallel interfaces are found among the DSP
package input and output pins.

4-3
Basic I/O

A serial interface transmits and receives data one bit at a time. Parallel interfaces
send or receive entire data words (8, 16, or 32 bits long) one at a time. To do this
a parallel port has a data line for each bit sent.

Parallel ports transmit more bits per second, however they require more external
interface pins than serial ports.

A parallel port is used to transmit and receive data words from external hardware
such as an off-chip memory block, a DIP switch, or a display unit.

A parallel port interface is usually made up of two types of data lines:

& data bit signals.


& a strobe or handshake signal.

A serial port interface is usually made up of three separate data lines:

& a bit clock signal.


& a frame synchronization signal.
& a data signal.

The serial port interface can be used for various applications, such as:

& Transmission and reception of data samples from a codec, an analog to digital
(A/D), or a digital to analog (D/A) converter.

& Communication with other processors (such as other DSPs).

& Communication with other external serial systems.

Once communication has been established with an external peripheral, such as a


codec, an off-chip memory block or another processor, the DSP signal processing
application may be implemented unhindered.

4-4
Basic I/O

Certain DSPs support multi-processor setups (a form of parallelism).

Parallelism not only refers to the concurrent execution of multiple instructions it can
also refer to parallel processing.

Parallel processing is when two or more digital signal processor chips are used for
a given application (multi-processing) and are connected through a shared serial
line, allowing inter-processor communication.

One chip is assigned as the master and the others as the slaves. Many applications
have stringent real-time constraints that require multiple DSPs to be used in
concert.

DSPs that support multi-processor setups require special Time-Division


Multiplexing (TDM) serial port optimization.

4-5
Basic I/O

TDM is used to manage communication between the processors over the shared
serial line.

In a TDM network, time is divided into time slots. Each time slot is associated with
a different processor. During a given time slot the associated processor may
transmit data, the others must receive and may not transmit.

The destination for the transmitted data word can be included in the data word or
it can be sent via a secondary data line in parallel with the data word. Either
approach can be used.

The DSP peripherals and their interfaces are important to, and answer the needs
of digital signal processor tasks.

EQUIPMENT REQUIRED

In order to complete the following exercises, you will need:

& F.A.C.E.T. base unit


& DIGITAL SIGNAL PROCESSOR circuit board
& C5x VDE program
& Ex4_1 and ex4_2 assembler and program files
& Oscilloscope
& Function generator

4-6
Exercise 4-1

DSP Peripherals

EXERCISE OBJECTIVES

Upon completion of this exercise, you will be familiar with the specialized
peripherals used by DSPs.

DISCUSSION

The peripherals found on the TMS320C50 ('C50) DSP are good examples of the
types used on many existing programmable DSPs.

The 'C50 used by the Lab-Volt Digital Signal Processor FACET board, has many
of its on-chip peripherals communicating with devices on the circuit board.

The 'C50 serial port is used to communicate with the CODEC.

The 'C50 parallel port is used to communicate with the ROM memory chip, the 4
I/O INTERFACE displays, and the DIP switch.

The 'C50 master clock is externally generated by the oscillator and input to the
DSP.

Two of the 'C50 external user interrupt lines are each connected to a push-button
on the surface of the FACET circuit board.

4-7
DSP Peripherals

A DSP serial port is usually divided into two sections: a receive section and a
transmit section.

The transmit and receive sections may be independent. There will be a:

receive data line


receive frame synchronization line
receive bit clock line
transmit data line
transmit frame synchronization line
transmit bit clock line

In other DSPs, independent receive and transmit data pins exist, however, the
frame synchronization and bit clock lines are shared between the two sections.

The 'C50 serial port operates through three memory-mapped user registers (known
as DRR, DXR, and SPC).

DRR (Data Receive Register) is where words received through the serial port
receive data pin are stored. DRR is memory-mapped to data memory address
(dma) 20h.

DXR (Data transmit Register) is where words to be transmitted through the serial
transmit data pin are stored. Once a word is stored in DXR the serial port transmit
circuitry takes charge of transmitting the word. DXR is memory-mapped to dma
21h.

SPC (Serial Port Control) is a status and mode control register for the DSP serial
port.

4-8
DSP Peripherals

Through the memory-mapped Serial Port Control register (SPC) the 'C50 DSP
allows the programmer to specify the serial port transmit and receive
characteristics.

The bit clock line polarity, the shift direction, data word length, and whether the
frame synchronization signals are bit-length or word-length are serial port
characteristics that certain DSP chips may permit the programmer to configure.

A DSP parallel port is usually implemented in one of two ways.

The main processor data bus can be used as the parallel port or the parallel port
can be made separate from the processor external bus interface.

Processors separating the parallel port from the external bus interface simplify
interfacing to external devices.

4-9
DSP Peripherals

DSPs which use their data bus as a parallel port typically reserve a special section
of their address space for access to off-chip devices.

In some DSPs, the reserved memory addresses are accessed with specialized
instructions. The 'C50 has two such parallel port instructions (IN and OUT).

The 'C50 IN and OUT instructions are used to read and write a data word from and
to an external I/O port.

I/O port accesses are distinguished from program and data accesses by a
designated strobe or handshake pin. The pin is asserted when the external read or
write is performed.

Clock signals are used to sequence DSP operations.

A clock signal consists of a square wave at some known frequency. The highest
frequency clock signal within a processor is known as the master clock.

The master clock signal is typically generated externally. However, a number of


DSPs now have phase-locked loops (PLL). These DSPs require only a very low-
frequency input signal to generate using the PLL the master clock signal.

The master clock signal can be generated in many different ways. The choice of
which way to generate the clock signal can usually be configured by the
programmer.

A large number of programmable DSPs provide timers. A timer is a peripheral that


changes the content of a register at regular intervals in such a manner as to
measure time.

Some DSPs provide a timer output pin. A square wave at the timer frequency can
be output from the pin providing a software-controlled oscillator to a programmer.

4-10
DSP Peripherals

The 'C50 timer as used on the FACET circuit board outputs a square wave at a
frequency of 10 MHz. The signal is input to the codec as its master clock.

Recall that in exercise 2 of Unit 2, we had used the timer of the 'C50 DSP. The
timer register (TIM) was read at the beginning and at the end of an algorithm. In this
manner the duration of the algorithm was measured.

Measuring the duration of an event is a possible timer application, however, on


DSPs, timers are usually used as a source of periodic interrupts.

A timer in reality consists of a:

& clock source


& prescaler
& counter

The clock source usually consists of the DSP master clock signal.

A plethora of package pins exist on programmable DSPs, related to some of these


pins are additional peripherals that we have not yet covered in this discussion.

& External user interrupts

& Bit I/O ports

The TMS320C50 DSP found on the FACET circuit board uses external user
interrupts. In fact, most DSPs provide this type of peripheral.

An external interrupt functions exactly the same as an “internal” interrupt, however,


in most cases, external user interrupts are given lower response priority than other
interrupts.

4-11
DSP Peripherals

The TMS320C50 DSP found on the FACET circuit board uses bit I/O ports (two
of them, BIO# and XF), also known in certain DSPs as general-purpose I/O pins,
to establish communication between the C5x VDE and the DSP.

Bit I/O ports are software controlled. In this particular application (communication
between the C5x VDE and the DSP), software control of the BIO# and XF I/O ports
is done with a communication program (kernel) held in off-chip ROM.

The software is first run by the DSP when communication between the C5x VDE
debugger program and the DSP is attempted.

CODEC is the abbreviation for CODer-DECoder. It is an electronic circuit that


converts analog signals into digital representations, and decodes digital signals into
analog form.

Though signal processing can be entirely done with digital signals, most often a
conversion from analog-to-digital and back again is required.

4-12
DSP Peripherals

A CODEC is usually made up of the following components:

& a programmable input gain


& an anti-aliasing filter
& an Analog-to-Digital converter
& a Digital-to-Analog converter
& a post-filter

As stated, 'C50 communication with the codec is established through the DSP
receive and transmit serial interfaces. The dual serial communication can only be
implemented after both the serial DSP peripherals and the codec are initialized.

PROCEDURE

Timer Initialization

In this procedure section, you will initialize the timer of the TMS320C50 DSP.

Note: Before using the C5x VDE please make certain the circuit board
power source is turned ON, and that the serial connection is present
between the host computer and the DIGITAL SIGNAL PROCESSOR
circuit block labeled SERIAL PORT.

* 1. Open the C5x VDE, and load the ex4_1.dsk program file into the DSP.

* 2. Place a breakpoint at the program memory address labeled CODECINIT.


Do not place the breakpoint at the CALL CODECINIT source statement.

* 3. Place a second breakpoint at the program memory address labeled


SERIALINIT.

4-13
DSP Peripherals

* 4. Press the C5x VDE RUN command, this will execute the DSP initialization
code preceding the CODEC initialization subroutine (CODECINIT).

The timer initialization code sets the timer to generate a 10 MHz signal that is
output on a DSP package pin (TOUT). Note that the TOUT package pin is part of
the second AUXILIARY I/O header.

* 5. Connect oscilloscope channel 1 to the TOUT pin on the AUXILIARY I/O


circuit block.

See HELP Unit 04 shelp5

* 6. Press the C5x VDE RUN command, this will execute the code that
initializes the DSP timer to generate a 10 MHz signal.

In particular it is the PRD (timer PeRioD register) and TCR (Timer Control
Register) that are initialized. As previously stated these two registers
control the timer for the C50 DSP.

4-14
DSP Peripherals

* 7. Adjust the oscilloscope to trigger on the TOUT waveform.

What is the frequency of the TOUT signal?

fTOUT =__________ MHz

It is the TOUT signal that is sent to the codec. The 10 MHz signal is used
by the CODEC as the master clock. The timer can, however, be
reconfigured to generate a signal with a different frequency.

* 8. Using the C5x VDE, open a View Peripheral Registers window.

* 9. Edit the PRD register to 00C7h.

* 10. Adjust the oscilloscope to trigger on the TOUT waveform.

What is the frequency of the TOUT signal?

fTOUT =__________ kHz

The timer frequency is related to the PRD and TCR registers as follows:

ì
7 ö
7 0& # õ7''5 ø ìô # õ35' ø ìô

where fTOUT is the frequency of the TOUT signal,


TM C is the period of the DSP master clock
(this value is (1/20 MHz) = 50 ns),
TDDR is the decimal value of a series of bits within the TCR register, and
PRD is the decimal value of the PRD register.

4-15
DSP Peripherals

* 11. Using the C5x VDE, edit the PRD peripheral register back to 0001 h. Verify
that the TOUT signal now has a frequency of 10 MHz.

See HELP Unit 04 shelp6

* 12. Using the C5x VDE, edit the TDDR bits (found in the TCR register) to 3.

What is the frequency of the TOUT signal?

fTOUT =__________ MHz

* 13. Using the C5x VDE, edit the TCR peripheral register back to 0000 h. Verify
that the TOUT signal now has a frequency of 10 MHz.

See HELP Unit 04 shelp6

The PRD register value is related to the TOUT frequency and the TCR
register as follows:

ì
35' ö ÷ì
I7287 # 70& # õ7''5 ø ìô

* 14. Holding the TDDR bits constant at zero and using the above equation,
what decimal value would the PRD register have to take on for the timer
to have a frequency of 1.25 MHz?

PRD =__________ d

* 15. Using the C5x VDE, edit the PRD register to 000F h (15 d). Verify if the
TOUT signal has a 1.25 MHz frequency.

* 16. Using the C5x VDE, return the values of the PRD and the TCR registers
respectively to 0001 h and 0000 h. The timer will then have a frequency of
10 MHz, this is the required codec master clock frequency to be used in
the next section.

* 17. Remove the connection between the DSP circuit board and the
oscilloscope.

4-16
DSP Peripherals

DSP Serial Interface and CODEC Initialization

In this procedure section, you will initialize the CODEC using the DSP serial
interface.

* 18. Using the C5x VDE, set a breakpoint at program memory address labeled
RESET.

* 19. Press the C5x VDE RUN command, this will initialize the DSP serial port
and establish communication between the DSP and the CODEC.

Using the SPC (Serial Port Control) register, the DSP transmit and receive
serial ports have been enabled and initialized to use 16-bit words. Transmit
frame synchronization has also been enabled.

4-17
DSP Peripherals

* 20. Using the C5x VDE, set a breakpoint at the program memory address of
the instruction labeled LACC #6h,9.

* 21. Press the C5x VDE RUN command, this will reset the CODEC.

The reset must occur before the control words initializing the codec
sampling frequency, filter cut-off frequency and other codec characteristics
can be transmitted from the DSP to the codec.

* 22. Using the C5x VDE, set a breakpoint at the program memory address
labeled RETURN.

* 23. Press the C5x VDE RUN command, the DSP will then transmit the codec
control words (TA, TB, RA, RB, and AIC_CTR).

4-18
DSP Peripherals

It is these words that set the CODEC sampling and cut-off frequencies, and
other characteristics.

The Voltmeter Program

In this procedure section, you will operate the voltmeter program.

* 24. Connect the OUTPUT of the DC SOURCE to the ANALOG INPUT of the
CODEC circuit block and to a voltmeter (as shown in the figure).

* 25. Execute the RUN command, found on the C5x VDE Toolbar.

This starts the voltmeter program. Note that the I/O interface displays a
value. This is the voltage reading of the signal sent from the DC SOURCE
OUTPUT.

* 26. Change the voltage level of the DC signal by turning the DC SOURCE
potentiometer.

The signal sent to the codec is sampled and transmitted via the serial port
to the DSP. The DSP then converts the value into its proportional voltage
value (the program can only read values between -3.00 V and 2.99 V) and
outputs it to the I/O INTERFACE via the parallel I/O ports.

* 27. Note that the value read off of the voltmeter and the value read by the DSP
do not always correspond.

This is due to imprecisions within the CODEC during voltage signal


sampling.

4-19
DSP Peripherals

What type of communication is used between the CODEC and the DSP on
the Digital Signal Processor circuit board?

a. parallel ports
b. bit I/O ports
c. serial ports
d. none of the above

How are words transmitted by the DSP to the I/O INTERFACE displays?

a. serial ports
b. parallel ports
c. bit I/O ports
d. interrupts

* 28. Remove the connection between the DC SOURCE circuit block and the
voltmeter.

CONCLUSION

& A DSP provides many peripherals, which are specialized for signal processing
applications.

& A DSP serial port is usually divided into two sections: a receive section and a
transmit section.

& A large number of programmable DSPs provide timers. A timer is a peripheral


that changes the content of a register at regular intervals in such a manner as
to measure time.

& CODEC is the abbreviation for coder-decoder. It is an electronic circuit that


converts analog signals into digital representations, and decodes digital signals
into analog form.

REVIEW QUESTIONS

1. Which of the following is not a DSP peripheral?

a. parallel port
b. serial port
c. central arithmetic logic unit
d. bit I/O port

4-20
DSP Peripherals

2. Which of the following choices is used by a serial port interface?

a. A bit clock line.


b. A data line.
c. A frame synchronization line.
d. All of the above.

3. Which of the following is used in many DSPs as a parallel port?

a. the timer output pin


b. clock generator
c. main processor data bus
d. external user interrupt lines

4. The timer found in most DSPs, may be used as which of the following?

a. a sine wave generator


b. a source of periodic interrupts
c. a voltmeter
d. a register for storing the accumulator

5. Which of the following serial port characteristics may some DSPs permit the
programmer to configure?

a. data word length


b. bit clock line polarity
c. bit clock shift direction
d. bit- or word-length frame synchronization signals.

4-21
4-22
Exercise 4-2

Digital Signal Processing: The FIR Filter

EXERCISE OBJECTIVES

Upon completion of this exercise, you will be familiar with a common DSP
application, known as filtering.

DISCUSSION

Signals are received and transmitted by both digital and analog processing
systems. The effect of system processing on a signal can be visualized (analyzed)
in two different ways, either using the time domain or the frequency domain.

In the frequency domain a single-frequency signal is represented as a peak.

4-23
Digital Signal Processing: The FIR Filter

More complex, and in turn, more common signals such as the square wave, can
be represented as a superposition of many harmonically related sinusoids.

A signal visualized in the frequency domain has a certain spectrum. In our


example, the square wave spectrum shows that there is a large voltage (amplitude)
associated with a central frequency and then decreasing voltages associated with
side frequencies.

Electrical components can be similarly described. Resistors, capacitors, and


inductors can be expressed in both the time and frequency domains.

Certain electrical circuits thus have the effect of attenuating or amplifying the
frequency components of signals.

4-24
Digital Signal Processing: The FIR Filter

The fact that electrical circuits produce a gain or loss that is proportional to signal
frequency is exploited when creating filters.

A filter is a device which transmits signals at frequencies within one or more


frequency bands and attenuates signals at all other frequencies.

An RC circuit, shown above, acts as a filter. DC signals are transmitted unatten-


uated through the circuit, while higher frequency components are greatly
attenuated.

A filter is used to, in effect, shape the spectrum of a signal.

Filtering consists in the suppression or attenuation of unwanted signal frequencies.


The operation can be implemented by analog circuits (using different types of
electrical components) or using a DSP.

4-25
Digital Signal Processing: The FIR Filter

A filter is often defined, depending on the frequencies that it is meant to attenuate,


as either:

& low-pass
& high-pass
& band-pass (notch)

The above graphs are named Bode plots, or also generally known as filter
frequency responses.

From a frequency response graph (Bode plot) many filter properties and
characteristics are apparent.

The pass band of a filter is defined as the range of frequencies over which signals
pass virtually unattenuated through the filter.

The cut-off frequency is known as the point where the response of the pass band
drops by 3 dB.

The transition region is the area between the pass band and the stop band. A large
gain rate of change with frequency within this region usually improves filter
performance (depending on the specific application).

The stop band is defined as the range of frequencies where signals pass through
the circuit and are attenuated. The level of attenuation is dependant on the filter
design specifications.

Many other characteristics belong to a filter, however, describing the full extent of
them would require a more in depth discussion.

For the moment, it is important to remember that a filter creates a variation of signal
gain with frequency.

4-26
Digital Signal Processing: The FIR Filter

As stated, a filter takes advantage of the frequency domain representation of


electronic circuit components (such as the capacitor and inductor). However,
filtering can also be performed by digital signal processors.

The effects, once produced only by analog circuits, are able to be mathematically
represented and efficiently performed by processors. Until recently, the most
common application for the DSP was the filter (the digital filter).

DSPs are preferred over analog circuits when implementing such things as filters.
This is because digitizing any design ensures that the same results can be
reproduced time and time again.

Digital filters are implemented with the summation, of filter coefficient (Ai) and data
sample (Sj) multiplications.

The coefficients are stored in memory and they represent the filter frequency
response information. Usually, the more coefficients used to represent a filter the
smaller the transition region of the filter.

The data samples represent the input signal.

The operation shown above can be executed with the aid of the Multiply and
ACcumulate (MAC) instruction, found on nearly all DSPs. Traditionally, DSPs had
been specifically enhanced to implement filter operations in real-time.

The MAC instruction was and still is the cornerstone of the filter operation.

RPT #79
MACD #C0,*-
APAC

The TMS320C50 DSP requires only three source code instructions to properly
implement the mathematical part of a filter algorithm.

4-27
Digital Signal Processing: The FIR Filter

The RPT source statement instructs the DSP to loop the following instruction 80
times (79+1).

The MACD instruction multiplies the corresponding filter coefficient C with an


indirectly addressed data point which has been stored in memory.

The APAC source statement finishes the calculation of the filtered signal sample
by adding the accumulator and product register together.

PROCEDURE

Running the Filter Program

In this procedure section, you will operate the filtering program.

* 1. Make the following connections to the Digital Signal Processor circuit


board.

& Connect a function generator to the CODEC circuit block ANALOG


INPUT and to oscilloscope CHANNEL 1.

& Connect the CODEC circuit block ANALOG OUTPUT to oscilloscope


CHANNEL 2. (Make sure that the connections between the
oscilloscope and CODEC circuit block have a common ground).

* 2. Adjust the function generator to output a sinusoidal signal at 0 1500 Hz,


and with an amplitude of 4.0 V peak-to-peak.

Note: Before using the C5x VDE please make certain the circuit
board power source is turned ON, and that the serial connection
is present between the host computer and the DIGITAL SIGNAL
PROCESSOR circuit block labeled SERIAL PORT.

4-28
Digital Signal Processing: The FIR Filter

* 3. Open the C5x VDE and load the ex4_2.dsk filter program into the DSP.

* 4. Execute the RUN command found on the C5x VDE Toolbar.

* 5. Adjust the oscilloscope to display both signals (CH1 and CH2) on the
oscilloscope, and trigger on the generated signal (CH1).

Note that the CH2 signal output from the DSP circuit board, has a smaller
amplitude than the signal input into the DSP circuit board (the generated
signal, CH1).

* 6. Slowly vary the generated signal frequency between 200 Hz and 9000 Hz.
Observe the effect of the frequency variation on the amplitude of the DSP
filter output (oscilloscope CH2).

From your observations, what type of filter is the DSP implementing?

a. a high-pass filter
b. a comb filter
c. a band-pass filter
d. a low-pass filter

Band-Pass Frequency Response

In this procedure section, you will measure the frequency response of a band-pass
filter.

4-29
Digital Signal Processing: The FIR Filter

* 7. Using graph paper and a pencil, or a spreadsheet program, draw (plot) the
frequency response, between 500 Hz and 2500 Hz, of the filter.

Measure the Amplitue (voltage) of the filtered output signal every 100 Hz,
using a voltmeter. Begin at 500 Hz and end at 2500 Hz. Record the data
sets and then plot the data points.

The frequency response that you have drawn is similar to the above plot.
In this plot the logarithmic decibel (dB) amplitude scale is used. The
decibel scale highlights the filter gains and losses.

* 8. Execute the HALT command found on the C5x VDE Toolbar.

4-30
Digital Signal Processing: The FIR Filter

* 9. Execute the Load Data command found under the C5x VDE File pull-down
menu. Load the low_pass.dat file into the DSP with the options shown
above.

This data file contains new coefficients describing a low-pass filter.

* 10. Press the C5x VDE RUN command.Vary the frequency of the generated
signal. Observe the frequency response of the filter.

* 11. Disconnect the oscilloscope, voltmeter, and function generator from the
DSP circuit board and end the C5x VDE session.

CONCLUSION

& An electrical signal can be represented as a superposition of numerous


different frequencies called harmonics.

& Filters can be divided into three different types, low-pass, band-pass(notch),
and high-pass filters.

& Filtering consists in the attenuation of unwanted signal frequencies. The


operation can be implemented by analog circuits or by using a DSP.

& The Multiply and ACcumulate instruction, MAC, is the cornerstone of the DSP
implemented filter.

4-31
Digital Signal Processing: The FIR Filter

REVIEW QUESTIONS

1. In which of the following domains is the spectrum of a signal visualized?

a. phase domain
b. frequency domain
c. time domain
d. spectrum domain

2. Which of the following types of filters attenuates all signal frequencies above
a certain range and, lets passes DC signals unsuppressed?

a. low-pass filter
b. comb filter
c. band-pass filter
d. high-pass filter

3. A filter creates a variation of ... ?

a. signal gain
b. signal gain with frequency
c. signal frequency
d. signal frequency with gain

4. Which of the following DSP instructions are the cornerstone of the


implementation of a real-time digital filter?

a. Multiply a data memory address with a value (MPY)


b. Add Product register to the ACcumulator (APAC)
c. Single-instruction hardware loop (RPT)
d. Multiply and ACcumulate (MAC)

5. Why are DSP implemented filters preferred over analog filtering circuits?

a. Digitizing a filter ensures reproducible results.


b. Analog filter circuits do not exist.
c. DSP filters can be used on all types of signals.
d. All of the above.

4-32
Unit Test

1. Which of the following choices is not a specialized peripheral used for digital
signal processing?

a. asynchronous serial port


b. synchronous serial port
c. on-chip A/D, D/A converters
d. timer

2. A DSP serial port can transmit 16-bit words. How many lines are required to
establish serial port communication between the DSP and an off-chip device?

a. 16
b. 17
c. 3
d. 18

3. A DSP parallel port can transmit 4-bit words. How many lines are required to
establish parallel port communication between the DSP and an off-chip device?

a. 17
b. 4
c. 3
d. 5

4. How is actual communication between a digital signal processor and its off-chip
circuitry devices made?

a. via its input and output pins


b. via the CALU
c. via the codec
d. via the serial port interface

5. What is an application of a serial port interface optimized for time-division


multiplexing(TDM)?

a. Managing transmission of serial signals.


b. Managing reception of serial signals.
c. Managing communication errors.
d. Managing communication between multiple processors.

6. An electrical signal can be represented by...?

a. a bode plot.
b. a superposition of numerous different frequency responses.
c. a band pass filter.
d. none if the above.

4-33
Unit Test

7. A bode plot is used to visualize the way that a filter modifies a signal. In what
domain is a bode plot drawn?

a. frequency domain
b. time domain
c. phase domain
d. voltage domain

8. What is the effect produced by a filter?

a. frequency variation with time of the input signal


b. frequency variation of the input signal
c. amplitude variation of the input signal
d. amplitude variation with frequency of the input signal

9. What is the difference between the peripherals offered by a DSP and those
offered by a general-purpose processor?

a. There is no difference between these peripherals.


b. DSP peripherals are specialized for real-time signal-processing.
c. General-purpose processors do not have peripherals.
d. None of the above.

10. What within a DSP controls the status of DSP peripherals?

a. central arithmetic logic unit


b. Program Controller
c. status and control registers
d. external user interrupts

4-34
Appendix A
Help Pages

Help Unit 01 shelp1

Help Unit 01 shelp2

A-1
Help Pages

Help Unit 01 shelp3

Help Unit 01 shelp4

A-2
Help Pages

Help Unit 02 shelp1

ASSEMBLER DESCRIPTION
MNEMONIC

ABS Absolute value of ACC

ADD Addition: ACC + (data memory value)

ADDB Addition: ACCB + ACC

AND AND data memory value with ACCL


(ACC low bites)

ANDB AND ACCB with ACC

NEG Negate the 2s-complement ACC contents

SBB Subtract ACCB from ACC

For more information about the different assembler mnemonics that execute ALU
operations please refer yourself to C:\LV31946\DOC\TMS320C5x_UsersGuide.pdf.

Help Unit 02 shelp2

Within the TMS320C50 the Sign eXtension Mode bit (SXM) is found in status and
control register ST1.

SXM = 0 Sign Extension of ALU arithmetic operations is


suppressed.

SXM = 1 Sign Extension is produced on data sent to the


accumulator.

A-3
Help Pages

Help Unit 02 shelp3

The Data Bus has a native word width of 16 bits, it cannot transmit information that
has more than 16 bits.

The ACCumulator of the 'C50 is 32 bits wide and is split into two 16-bit segments:
the ACCumulator High (ACCH) bits and the ACCumulator Low (ACCL) bits.

The segments are used when the ACCumulator is sent to the next stage of
processing via the 16-bit Data Bus.

Help Unit 02 shelp4

ASSEMBLER DESCRIPTION
MNEMONIC

LT Load data memory value to TREG0

MPY Multiplication: TREG0 + (data memory


value)

MPYU Unsigned Multiplication: TREG0 + (data


memory value)

SQRA PREG + ACC = ACC


load data memory value to TREG0
square TREG0 and store in PREG

ZPR Zero the contents of the PREG

For more information about the different assembler mnemonics that execute
multiplier operations please refer yourself to C:\LV31946\DOC\TMS320C5x_Users
Guide.pdf.

A-4
Help Pages

Help Unit 02 shelp5

The 'C50 has an unsigned multiplication instruction, MPYU, that when executed
does not sign extend the result of the multiplication.

Help Unit 02 shelp6

Within the TMS320C50 the Overflow Saturation Mode bits are found in status and
control register ST0.

OV = 0 Overflow did not occur in the ALU.

OV = 1 Overflow did occur in the ALU.

OVM = 0 Overflow saturation mode disabled.

OVM = 1 Overflow saturation mode enabled.

A-5
Help Pages

Help Unit 02 shelp7

The TMS320C50 has a product shifter. The shifter has four product shift modes
(PM) that are controlled by a 2-bit field in status register ST1.

PM = 00 b PREG output is not shifted.

PM = 10 b PREG output is left shifted by 4 bits.

PM = 01 b PREG output is left shifted by 1 bit.

PM = 11 b PREG output is right shifted by 6 bits.

A-6
Help Pages

Help Unit 02 shelp8

Within the TMS320C50, the carry bit is found in status and control register ST1.

C=0 A carry was not generated by the ALU when adding.


A borrow was generated by the ALU when adding.

C=1 A carry was generated by the ALU when adding.


A borrow was not generated by the ALU when subtracting.

Help Unit 02 shelp9

The status and control register ST0 contains a 3-bit field known as ARP, Auxiliary
Register Pointer. The ARP field assigns the current auxiliary register to the AGU.
The AGU uses the auxiliary register when indirect or circular addressing must be
executed.

A-7
Help Pages

Help Unit 02 shelp10

The current auxiliary register pointed to is referred by the Auxiliary Register Pointer
(ARP). One of three methods within the 'C50 Assembly language can be used to
load the contents of the ARP field (3 bits wide).

A first option is to use the LST instruction. It is specifically designed to modify the
content of STatus registers ST0 or ST1. The ARP field is located within ST0.

The ARP field can also be loaded by either using the MAR instruction or by
specifying an ARP update within the indirect addressing field of any instruction
supporting indirect addressing.

For more information refer yourself to the TMS320C50 DSP User’s Guide
(C:\LV31946\TMS320C5x_UsersGuide.pdf).

Help Unit 02 shelp11

The 'C50 assembly language offers only one way of loading an auxiliary register
and that is with the LAR (Load Auxiliary Register) instruction.

An auxiliary register must be loaded with a memory address before being used for
indirect addressing.

Once loaded with a value, the content of the AR can be changed with one of four
ways. By either using the ADRK, MAR or SBRK instructions, or by specifying an
increment or decrement within the indirect addressing field of any instruction
supporting indirect addressing.

For more information refer yourself to the TMS320C50 DSP User’s Guide
(C:\LV31946\TMS320C5x_UsersGuide.pdf).

A-8
Help Pages

Help Unit 02 shelp12

The status and control register ST0 has a 9-bit field known as DP. DP is used as
a Data Page pointer. Data memory is divided up into 512 data pages. The DP
points to the current data page. When direct addressing is used the 9 DP bits serve
as the 9 MSBs of the data address fetched.

Help Unit 02 shelp13

CENBX Circular buffer X enable bit

CARX 3 bits which assign the auxiliary register that is assigned to


circular buffer X.

A-9
Help Pages

Help Unit 02 shelp14

Help Unit 02 shelp15

SACL *+, 0, AR0 ADD *+, 0, AR0

Within the left highlight are indirect addressing symbols indicating to the ARAU to
increment AR0 by one after the instruction is executed.

Within the right are the auxiliary registers used by the instructions and pointing
towards the operand data memory addresses.

Help Unit 02 shelp16

A-10
Help Pages

Help Unit 03 shelp2

An assembler source file for the TMS320C50 DSP has a certain format that must
be respected.

All comments (except those to the right of source statements) and labels must be
written at the beginning of a line.

All source statements must have a spacing (a tab) between the instruction and the
beginning of the line.

As well, a spacing must be present between the instruction and the operand.

Help Unit 03 shelp3

If the assembler detects errors in your source file, do the following:

Make certain your assembler file does not have a comment or source statement
that has been badly written. Verify the requirements for correct assembly as listed
below.

Make certain that you have not replaced a CODE#X label by an incorrect source
statement.

A-11
Help Pages

An assembler source file for the TMS320C50 DSP has a certain format that must
be respected.

All comments (except those to the right of source statements) and labels must be
written at the beginning of a line.

All source statements must have a spacing (a tab) between the instruction and the
beginning of the line.

As well, a spacing must be present between the instruction and the operand.

Help Unit 03 shelp4

The RPT instruction executes a single-instruction hardware loop. The instruction


following the RPT instruction is then repeated N+1 times, where N is the operand
for RPT.

Help Unit 03 shelp5

ASSEMBLER DESCRIPTION
MNEMONIC

B Branch unconditionally to pma.

CALL, Call to subroutine. Save the PC to the hardware stack and


an interrupt is flagged load with the first pma of the subroutine.

RET,RETE Return from subroutine. Load the PC with the pma saved to
the stack before the subroutine or interrupt call.

RPT Repeat next instruction. The PC remains constant until the


repeat counter register (RPTC, the repeat index) reaches 0.

A-12
Help Pages

For more information about the different mnemonics that modify the Program
Counter register please refer yourself to C:\LV31946\TMS320C5x_UsersGuide.pdf.

Help Unit 03 shelp6

ASSEMBLER DESCRIPTION
MNEMONIC

RPT Executes single-instruction hardware looping of the following


instruction for an operand-specified number of times.

RPTB Executes multiple-instruction hardware looping of an instruction


block (must contain a minimum of three instructions) for an
operand-specified number of times.

RPTZ Similar to RPT, however, the ACC and PREG are first cleared.

For more information about the different types of 'C50 repeat instructions please
refer yourself to C:\LV31946\TMS320C5x_UsersGuide.pdf.

Help Unit 03 shelp7

ASSEMBLER DESCRIPTION
MNEMONIC

BLDD Block move from data to data memory.

IN Input data from I/O port to dma.

MAC Add PREG, with shift specified by PM bits, to ACC; load dma value
to TREG0; multiply data memory value by program memory value
and store the result in the PREG; and move data.

Multi-cycle instructions transformed into single-cycle instructions with the repeat


function.

For more information about the different types of 'C50 repeat instructions please
refer yourself to C:\LV31946\TMS320C5x_UsersGuide.pdf.

A-13
Help Pages

Help Unit 03 shelp8

Help Unit 03 shelp9

A-14
Help Pages

Help Unit 03 shelp10

Help Unit 03 shelp11

A-15
Help Pages

Help Unit 03 shelp12

Help Unit 03 shelp13

Use the TRANSMIT ISR as an example. The TRANSMIT ISR already has the
correct label.

Help Unit 03 shelp14

The Sine Wave Generator

Recall that the Sine Wave Generator uses a data table containing 1924 values. The
data table represents one period of a sine wave. When the data table values are
sent to the CODEC at a rate equal to the sampling frequency (Fs = 19380 Hz), a
10 Hz ((19380 Hz)/1924) sine wave signal is generated. To increase the frequency
of the generated sine wave, the program sends every second, third, fourth or nth
value of the data table to the CODEC. A data table pointer is used to easily modify
the frequency of the generated signal. The generated signal will only have a
frequency of a multiple of 10 Hz.

This method of generating a sine wave is fast and relatively simple. It is, however,
not overly accurate and it requires a significant amount of DSP memory used to
store the wavetable.

A-16
Help Pages

Help Unit 04 shelp1

The 'C50 SPC register permits some of the following serial port characteristics to
be changed by the programmer:

TXM configures whether the transmit frame sync.line acts as input or an output.

FSM specifies whether frame synchronization pulses are required after the initial
frame sync.pulse for serial port operation.

FO specifies the word length of the serial port transmitter and receiver.

Help Unit 04 shelp2

A-17
Help Pages

The 'C50 64K parallel I/O ports. Sixteen of the 64K I/O ports are memory-mapped
in data page 0 (50 h - 5F h). You can access the I/O ports using the IN and OUT
instructions or any instruction which writes or reads a word to a location in data
memory space.

Help Unit 04 shelp3

CLKMD1 CLKMD2 Mode

Internal oscillator disabled.


0 0
External input signal divided by 2

0 1 Reserved for testing.

PLL enabled, multiply-by-1 of external


1 0
input signal

Internal oscillator enabled.


1 1
External input signal divided by 2.

Depending on the logical status of the CLKMD1 and CLKMD2 external 'C50
package pins the mode of the clock generator can be changed.

Help Unit 04 shelp4

Timer operation is controlled via the 'C50 DSP Timer Control Register (TCR).

A-18
Help Pages

Help Unit 04 shelp5

Help Unit 04 shelp6

The TDDR bits are located in the TCR (Timer Control Register).

The TDDR bits and the value of the PRD register set the DSP timer frequency.

A-19
A-20
Appendix B
New Terms and Words
access – To access memory is the action of reading the value held within a certain
memory location or of storing a value to a certain memory location.

ADD *+, 0, AR0 – The 'C50 ADD instruction is used to add the operand to the
accumulator. In this case, the ADD instruction is being repeated, and is indirectly
addressing the 16 most-recently received samples. This in effect adds the samples
together (an operation required by the averaging process).

addressing modes – An addressing mode is one of a set of methods used for


specifying the operand(s) of a machine code instruction. An addressing mode
describes to the processor the method that it will use for storing and retrieving data
from memory.

allocation – To allocate memory is to associate a specific address of the data or


program bus with a memory storage space.

Analog to Digital and Digital to Analog converters – Functional units that


respectively convert data from an analog representation to a digital representation,
and from digital pulses to analog signals.

ANTI-ALIASING FILTER – Low pass filter designed to remove, from the input
signal the high frequency components that degrade the analog-to-digital conversion
of the output signal.

ARAU – ARAU stands for Auxiliary Register Arithmetic Unit, it is the unit within a
DSP that is responsible for addressing.

architecture – Architecture is a term applied to the overall structure and the logical
interrelationships of the components of a processor (or of a computer, a network)
and its software. Processor architecture can be divided into five fundamental
components: input/output, storage, communication, control, and processing.

Arithmetic Logic Unit (ALU) – That part of a processor that performs arithmetic
(addition, subtraction) and logic (AND, OR, ...) operations.

assembler – A program that converts, for execution, symbolic instructions


(mnemonics) into machine code.

Auxiliary Register Arithmetic Unit – An Auxiliary Register Arithmetic Unit (or


ARAU) is the name given by Texas Instruments the developers of the TMS320C50
DSP, to the AGU of the DSP. This is common practice, many DSP developers give
different names to the units (like the AGU) inside of their DSPs.

B – The assembler mnemonic for the TMS320C50 DSP branch instruction.

BCND – The assembler mnemonic for the TMS320C50 DSP branch conditional
instruction.

B-1
New Terms and Words

binary point – The character, in binary notation, that separates the integral part of
a numerical expression from its fractional part.

bit clock – The signal that the serial interface uses to determine when data bits are
valid.

bit I/O ports – An I/O port in which each bit can be individually configured to an
input or an output and in which each bit can be independently read or written. A
processor must pole the bit I/O port to determine if input values have changed.

branch – An instruction that results in a change of processor execution-flow to


continue execution at a new address.

breakpoints – Breakpoints are used to correct or debug programs. A breakpoint


is a place in a computer program, usually an instruction, where the execution of the
program is interrupted.

bus – Bus is a transmission path for the signals sent between processor devices.

C language – A general purpose programming language that produces code


independent of the type of microprocessors it is developed for.

CALL – The assembler mnemonic for the TMS320C50 DSP subroutine call
instruction.

circular addressing – An addressing mode in which the contents of a register is


used to cycle through a range of addresses, creating a circular memory buffer.
Circular addressing is also known as modulo addressing.

circular buffers – A section of memory used as a buffer and that appears to wrap
around on itself. Circular buffers are typically implemented in software on
conventional processors and via modulo, circular, addressing on DSPs.

clock cycle rates – Synonymous with processor cycle rate, it usually refers to the
rate at which the DSP system performs its most basic unit of work.

clock line polarity – Line polarity is a characteristic of the bit clock of a serial port
interface. Clock line polarity describes which edge of the clock signal controls when
data changes (when does data bit N becomes data bit N+1).

Clock generators – An instrument or device designed to generate pulses that


control the timing of the switching circuits in a microprocessor. Clock frequency is
one determination (but not the only one) of the data flow or manipulation rate of a
processor.

code – A piece of programming text found in a programming language.

CODEC – Is the abbreviation for CODer-DECoder. It is an electronic circuit that


converts analog signals into digital representations, and decodes digital signals into
analog form.

B-2
New Terms and Words

comment – The portion of a source statement that documents or improves the


readability of a source file. Comments are not compiled, assembled, or linked; they
have no effect on the object file.

companding – Companding stands for compressing-expanding. The operation is


often implemented such that high amplitude signal values are compressed and
small amplitude signal values are enhanced (expanded). Companding has the
effect of minimizing the dynamic range of a signal, permitting a precise sampling
with fewer bits.

compiler – A program that converts a high-level language into a low-level machine


language.

conditional blocks – A block of code that is only assembled if a certain conditional


statement is true.

conditional processing – A method of processing one block of source code or an


alternate block of source code according to the evaluation of a specified
expression.

context save – A save and/or restore of system status (status registers,


accumulator, product register, temporary registers, hardware stack, auxiliary
registers, etc) when the device enters and/or exits a subroutine such as an interrupt
service routine.

CPU – The CPU, Central Processing Unit, is that portion of the processor involved
in arithmetic, shifting, and Boolean logic operations, as well as the generation of
data- and program-memory addresses.

data bit signals – The data bit signal lines are used by the parallel port interface
to transmit and receive bits in parallel. The signal line indicates the state of a bit
(either high or low).

data buffers – A data buffer is a section of memory that is used to store data. The
data arrives from an off-chip source (such as a CODEC) or from a previous
computation. It is held in the buffer until the processor is ready to process the data.

data signal – The signal used by a serial interface to transmit the state of a bit
(either high or low).

data-stationary coding – Data-stationary coding is a more natural way of viewing


an algorithm. In data-stationary coding, a single instruction specifies all of the
operations performed on a set of operands from memory. In other words, the
instruction specifies what happens to the data, rather than, in the case of time-
stationary coding, specifying what happens at a particular time in the hardware.

Decode – The second pipeline stage found in many DSPs including the
TMS320C50. The instruction word is decoded (i.e., it is determined what the
instruction is supposed to do) and address generation as well as ARAU updates of
auxiliary registers are performed.

B-3
New Terms and Words

Digital – Pertaining to data represented by numbers. A digital signal is not


continuous, it does not have a numerical value associated with every point in time,
and it has discrete amplitudes.

Direct addressing – A type of addressing that encodes the operand address within
the instruction word or within a word following the instruction word. This addressing
mode is also known as register-direct addressing or paged memory-direct
addressing.

DMA controller – A specialized unit that moves data directly between main storage
and peripheral equipment by taking bus control away from the CPU and thus does
not require processing the data with the processing unit.

dynamic range – The dynamic range is the ratio between the largest and smallest
value a quantity or parameter can take.

echo – An echo is reflected sound that is loud enough and received late enough
to be heard as distinct from the source. Echoes can be produced by generating
delayed repetitions (sometimes several rapid repetitions) of the original sound or
signal.

emulators – A combination of hardware microprograms and software that enables


one computer system to execute programs written for another type of
microprocessor.

EVMs – Evaluation Modules are low cost development boards that include a target
processor, and a limited amount of peripherals and of external memory. EVMs are
used to test codes in real-time.

Execute – The fourth pipeline stage found in many DSPs including the
TMS320C50. The ALU or MAC portion of the instruction is executed and, if
required, results of a previous operation are written to memory.

External user interrupts – An external user interrupt is signaled by either an


external device requiring attention (such as a signal from a communications device)
or the timer going to zero. The interrupt initiates a user defined interrupt service
routine (ISR).

Fetch – The first pipeline stage found in many DSPs including the TMS320C50. An
instruction word is fetched from memory and the Program Counter is updated. The
fetch execution stage of the pipeline can sometimes be used to read operands from
program memory. For example, when a 2-word instruction using long immediate
addressing is executed.

FIFO – A First-In, First-Out queue in which the most recent arrival is placed at the
end of the waiting list and the item waiting the longest receives service first. A FIFO
is used as a buffer to connect two devices operating asynchronously at different
speeds. Each device is connected to one end of the FIFO.

B-4
New Terms and Words

Filtering – The suppression or attenuation of unwanted signal frequencies. Filtering


can be implemented by DSPs or analog circuits. A filter is often defined as either
a low-pass, high-pass, or band-pass(notch) filter depending on the frequencies it
is meant to attenuate.

fixed-point – A system of arithmetic in which all numerical quantities are expressed


by a number of bits. In this system the decimal point is implicitly located at some
predetermined position.

flanger – An effect that is used to spice-up sounds. Difficult to describe, when the
effect is applied to a vocal signal it may resemble a voice underwater or, as in some
movies, a Martian's voice. Flanging adds together a time-delayed and a direct
signal where the delay time (in the range of 0.50 to 0.35 ms) is constantly altered
(varying between 1.0 and 10 Hz). To further vary the signal the weights given to
each signal before addition can be varied.

floating-point – A system of arithmetic characterized by a notation where real


numbers are represented by a fixed-point value known as the mantissa, and by an
integer known as the exponent. The real number is equal to the mantissa multiplied
by two to the power of the exponent.

frame synchronization – The signal used by a synchronous serial interface


indicating to the receiver the position of the first bit of a data word on the serial data
line.

frequency domain – The frequency domain is the reference frame where signals
are represented as functions of frequency. The frequency domain is a way of
looking at the world where the independent variable (the one that voltage, current,
capacitance varies with) is not time but frequency.

fTOUT – The frequency of the TOUT signal.

general-purpose processors – A processor designed to operate on a wide variety


of computational and logical problems. E.g., the Intel Pentium line of processors.

hand-shaking – The dialogue that takes place between two devices before a
transfer of information begins. Hand-shaking is the exchange of predetermined
signals for purposes of control when a connection is established.

hardware stack – A hardware stack is a special block of on-chip memory, typically


only a few words long, used during interrupts and subroutines to save and restore
the program counter.

harmonically related sinusoids – A harmonic is a signal with a frequency that is


a simple multiple of a fundamental sinusoidal signal (the first harmonic). All other
harmonics are multiples of the fundamental. Harmonically related sinusoids are
sinusoidal signals whose frequencies are multiples of each other.

B-5
New Terms and Words

Harvard Architecture – The internal organization of a microprocessor which is


characterized by separate memory spaces for program instructions and data. The
program and data memory spaces are each accessed by one of two parallel buses.
The Harvard architecture allows each memory space to be accessed
simultaneously.

high-level language – A programming language closer to human language, each


program instruction or statement corresponds to one or more machine-executable
instructions.

Host ports – A specialized parallel port on a DSP intended to interface easily to a


host processor. In addition to data transfer, some host interfaces allow the host
processor (usually a general-purpose processor used for control functions in an
embedded system) to force the DSP to execute interrupt service routines, which
can be useful for control.

IDLE – The assembler mnemonic for the TMS320C50 DSP idle instruction which
places the processor into a power down mode that is exited when an interrupt is
signaled.

immediate (short and long) addressing – Immediate addressing encodes the


operand in the instruction word or in a separate word that follows the instruction
word.

Immediate Addressing – Immediate addressing encodes the operand in the


instruction word or in a separate word that follows the instruction word.

Implied addressing – Implied addressing means that the instruction operand


addresses are implied by the instruction. An example of a 'C50 instruction that uses
implied addressing is ADDB (addition of ACC and ACCB registers).

IMR – IMR is a 'C50 memory-mapped register that masks external and internal
interrupts.

indirect addressing – In this type of addressing the operand being addressed


resides in memory and the address of the memory location containing the operand
is stored within a register. It is this register that is specified during indirect
addressing.

indirect – In this type of addressing the operand being addressed resides in


memory and the address of the memory location containing the operand is stored
within a register. It is this register that is specified during indirect addressing.

instruction cache – An instruction cache saves an instruction and is usually used


to repeat that instruction in a program. An instruction cache is also known as a
program cache.

instruction set – The instruction set is the hardware "language" in which the
software tells the processor what to do.

B-6
New Terms and Words

INT1# – INT1# is a 'C50 external user interrupt signaled by either an external


device requiring attention (such as a signal from a communications device) or the
timer going to zero. The interrupt initiates a user defined interrupt service routine
(ISR).

INT3# – INT3# is a 'C50 external user interrupt signaled by either an external


device requiring attention (such as a signal from a communications device) or the
timer going to zero. The interrupt initiates a user defined interrupt service routine
(ISR).

interface – An interface is the path along which information can flow between a
peripheral and the CPU. Having to do with the device (peripheral) through which
a processor communicates to the outside world.

interlocking – An interlocking pipeline suspends the progression of instructions


occurring immediately after a conflict and until the instruction causing the conflict
has been executed. Although clearly beneficial to the programmer, interlocking has
its costs.

Interlocks – Pipeline interlocks are mechanisms used to ensure that when data is
written to a register, any reference to this register causes a stall until the data is
available.

internal arithmetic unit – That part of a computer which performs arithmetic


operations. E.g., taking two numbers stored in specific places in memory, adding
them together, and storing the result.

interrupt – A signal sent by hardware or software to a processor requesting


attention. An interrupt tells the processor to suspend its current operation, save the
current task status, and branch to a special block of code called an interrupt service
routine (ISR). Interrupts communicate with the Program Controller and prioritize
tasks to be performed. Software constructs such as the interrupt vector and
interrupt service routine are executed after an interrupt occurs. The suspension of
a computer process caused by an external event. Once the external event handling
procedure is completed, the computer process is resumed.

interrupt service routine – An interrupt service routine (ISR) is a subroutine that


is run every time that a specific event occurs. In this case, program ex2_3.asm
executes the ISR when a sample from the CODEC is received by the DSP. In RUN
mode, execution of the ISR is done automatically.

interrupt service routines – An interrupt service routine (ISR) is a subroutine that


is run every time that a specific event occurs and which is signaled by an interrupt.

interrupt vector – Typical interrupt vectors are one to two words long and are
located in low memory. An interrupt vector does not actually contain the ISR; rather,
it contains a branch to the address of the interrupt service routine (ISR).

INTR – INTR is a specialized branch instruction for the TMS320C50 DSP that
causes the processor to begin execution at the appropriate interrupt vector.

B-7
New Terms and Words

ISR – An abbreviation for Interrupt Service Routine, the subroutine executed when
an interrupt occurs.

kernel – The programs that form the “core” or the most essential parts of an
operating system for a computer. Nucleus is a near-synonym for kernel and tends
to be used where the effects are achieved by a mixture of normal programming and
micro coding (such as is done with the assembler language).

label – A symbol that begins in column 1 of an assembler source statement. A label


is the only assembler statement that can begin in column 1.

linker – A program that creates one executable file from one or many object files.

low-level languages – A programming language close to machine language and


in which each mnemonic has a one-to-one equivalence with machine code.

MAC – An abbreviation (mnemonic) for Multiply and ACcumulate, an operation


often executed in DSPs.

machine code – Instruction code recognized and executed by a microprocessor.


The code is expressed in a binary numerical representation.

master clock – The master clock is the primary source of timing signals used to
control the operations that take place within a processor.

memory bandwidth – The memory bandwidth of a processor is proportional to the


number of memory cycles per instruction cycle. A high memory bandwidth occurs
in processors with many data and address buses.

memory block – A portion or section of memory storage. A storage block is


considered a single element for holding a specific or fixed number of words.

memory configuration bits – These bits are status and control bits that select the
memory configuration that is used by the DSP. Within the TMS320C50 DSP the
MP/MC#, RAM, OVLY bits are found within the Processor Mode Status Register
(PMST) and the CNF bit, another memory configuration bit, is found in Status
Register 1 (ST1).

memory – A device in which information can be inserted and stored and from
which it may be extracted when wanted.

memory space – Memory space is a property of the DSP. Memory space


represents the range of addresses allocated to either internal or external memory
devices by the DSP bus structure. On-chip memory (ROM and RAM) for a specific
processor is said to reside in the processor memory space as does the processors
peripherals and memory-mapped registers.

MIPS – A unit of measure proportional to the performance level of a processor. One


MIPS corresponds to the execution of a Million Instructions Per Second (it is
sometimes abbreviated MIP). Often the multiply/accumulate instruction, common
to nearly all DSPs, is used to calculate the MIPS rate.

B-8
New Terms and Words

mnemonic – A symbolic representation made of alphabetic letters and designed


to aid human memory; It is commonly an abbreviation, or shortened form, of the
description of the machine code operation that it performs. The assembler
translates the mnemonic into machine code.

modified Harvard architecture – A modified Harvard architecture is a variation on


the basic structure of the Harvard architecture. The variations are used to increase
the simultaneous memory accesses of the DSP. A modified Harvard architecture
is also known as an extended Harvard architecture or as a Super Harvard
ARChitecture (SHARC).

nesting – The programming practice of making a loop or a subroutine part of


another loop, another subroutine. If a subroutine, interrupt or repeat loop is
respectively part of another subroutine, interrupt or repeat loop then it is said to be
nested.

non-volatile – A characteristic of a memory device not subject to the loss of stored


information when power is removed.

NORM – The NORM assembler instruction is used for the TMS320C50 DSP. The
instruction normalizes the accumulator, that is, a fixed-point number is converted
to a floating-point number.

numerical formats – A programmer's convention where each bit in a word of


information is implied to be weighted by a certain value.

object files – File which consists of machine code directives that usually represent
a portion of a program.

operands – The part of an instruction that designates where the central processing
unit (CPU) will fetch or store data during instruction execution.

orthogonal instruction set – Basically, a set of instructions where all commands


work on all registers. Separate components (like the arithmetic operations, operand
specifications, addressing modes, and parallel moves) of an instruction are
encoded independently in separate fields of the instruction word. For an instruction
set to be orthogonal choosing one component must not constrain the other
components.

orthogonality – Defines a set of instructions where all commands work on all


registers. Separate components (like the arithmetic operations, operand
specifications, addressing modes, and parallel moves) of an instruction are
encoded independently in separate fields of the instruction word. For an instruction
set to be orthogonal choosing one component must not constrain the other
components.

overflow – In an arithmetic operation, a result whose absolute value is too large


to be represented within the range of the numeration system in use.

B-9
New Terms and Words

OVerflow saturation Mode (OVM) – When enabled, any overflow value produced
by the ALU appears as the maximum possible value. For the TMS320C50, the
value appears as 7FFF FFFFh. When enabled, any underflow value produced by
the ALU will appear as 8000 0000h, the minimum possible value.

overhead – The time a processor uses for operations that do not belong to the user
task. In the case of digital signal processors these operations usually consist of the
allocation of resources for the execution of the next instruction. Overhead is also
known as execution overhead.

paged memory-direct addressing – A type of addressing that encodes the


operand address within the instruction word or within a word following the
instruction word. This addressing mode is a type of direct addressing.

parallelism – The concurrent operation of several parts of a computer system. This


can imply simultaneous processing of multiple program instruction or simultaneous
operation of multiple processors.

peripheral – In a data processing system, any equipment, distinct from the central
processing unit, which may provide the system with outside communication or
additional facilities.

phase-locked loops – A phase-locked loop (or PLL) is a circuit that acts as a


phase detector by comparing the frequency of a known oscillator with an incoming
signal and then feeds back the output of the detector to keep the oscillator in phase
with the incoming frequency.

pipeline – An organization of computational hardware in which different stages of


the execution of an instruction proceed in parallel for different instructions. A
pipeline is a method of executing instructions in an assembly line fashion.

pipeline conflict – Pipeline conflicts occur during a given clock cycle. They prevent
the next instruction in the program from being correctly executed during the
following clock cycle. If a pipeline conflict is not automatically corrected for by the
processor, or foreseen to occur and corrected for by the programmer, then
erroneous program results may result and processor performance reduced.

pipeline hazard – A pipeline hazard is the term given by some to describe what is
called in this manual a pipeline data conflict. A pipeline data conflict and a pipeline
hazard are one and the same.

pma – An abbreviation for Program Memory Address.

POST-FILTER – Low pass filter designed to remove, from the output signal, high
frequency components that are created by the digital-to-analog conversion.

PRD – The decimal value of the PRD register.

prescaler – The prescaler is used to change the frequency of the clock source and
thus have the counter count longer periods of time (changing the frequency of the
timer).

B-10
New Terms and Words

Processor –A device that performs operations on data according to specific rules


given to it by a list of instructions.

program control – Program control refers to the rules (mechanisms) used by a


processor for determining the next instruction to execute.

program/data memory – Program/data memory is a memory that can be accessed


by either one of the two parallel buses inside of a processor with a Harvard
architecture.

RAM – RAM, Random Access Memory. This is usually used to store temporary
program information. RAM is a volatile memory because when power is removed
the stored information is lost.

RC circuit – A circuit made up of a resistor and a capacitor, and which acts as a


1st order filter.

Read – The third pipeline stage found in many DSPs including the TMS320C50.
During this stage a data operand is read from or written to memory.

real-time – A processor operating mode under which a data sample is received,


processed, and returned before the processor's next data sample is received. This
is done so quickly as to allow the user: to respond instantaneously, affect the
functioning of the environment or guide the physical processes which the processor
controls. Most interactive systems operate in a real-time mode.

registers – A storage device having a specified capacity such as a bit, a byte, or


a computer word and usually intended for a special purpose.

repeat – An instruction that repetitively executes a sequence of instructions a finite


number of times.

reservation table – A figure that shows how processor resources are used over
time. A reservation table aids the visualization of pipeline operation.

reset – A means to bring the central processing unit (CPU) to a known state by
setting the registers and control bits to predetermined values and signaling
execution to start at a specified address.

RET – The assembler mnemonic for the TMS320C50 DSP return from subroutine
instruction.

RETE – The assembler mnemonic for the TMS320C50 DSP return from subroutine
and enable interrupts instruction.

RINT – RINT is a 'C50 hardware interrupt that is signaled after a word has been
received through the DSP serial port.

RISC – Abbreviation for Reduced Instruction Set Computer. A RISC architecture


uses simple instructions that are executed very quickly.

B-11
New Terms and Words

ROM – ROM, Read Only Memory, this type of memory is used to store program
code during the manufacturing process. ROM is a non-volatile memory because it
retains its data after the processor has shut down.

RPT – The assembler mnemonic for the TMS320C50 DSP single-instruction


hardware looping instruction.

RPTB – The assembler mnemonic for the TMS320C50 DSP multiple-instruction


hardware looping instruction.

SACL *+, 0, AR0 – The 'C50 SACL instruction stores the ACCL (ACCumulator
Low) bits in memory. At this point in the program the accumulator contains the most
recent sample received by the DSP from the CODEC. The indirect addressing
operands tell the CPU to store the sample in one of the dma labeled XN0 to XN15.
The dma is pointed to by auxiliary register 0 (AR0).

SACL 10 h – The 'C50 SACL instruction, as previously stated, stores the ACCL
(ACCumulator Low) bits in memory. In this particular case, when SACL is executed,
the accumulator holds the average of the 16 most recent samples received by the
DSP and stored in memory. SACL stores this average to the dma labeled OUTPUT.

shadow register – Shadow registers are dedicated registers that hold the contents
of key CPU registers during interrupt processing (when an interrupt is executed).
A shadow register is a one-level deep stack that belongs to one of the processor’s
shadowed registers (ACC, PREG, ST0, ...).

shift direction – Shift direction is the order in which bits are transmitted by serial
interfaces. Devices may send the LSB as the first bit or the MSB as the first bit.
Some DSP serial interfaces allow selection of the shift direction.

sign-extension – The process of filling the high-order bits of a number with the
sign bit. For example, when loading a 16-bit number into a 32-bit field, the sign bit
of the 16-bit number is extended into bit positions 17 to 32.

Signal – A time-dependant physical quantity (like a current level) by which, for


example, information is transmitted in an electronic system or circuit.

simulators – A program that permits a computer system to imitate the logical


operation of another type of microprocessor.

software loops – A software loop is a method of repetitively executing an


instruction or group of instructions using program code.

software stack – A software stack is a conventional stack located in processor


main memory and which is created, controlled and used by the programmer. It is
usually configured to store values that might be changed during interrupt
processing or subroutines calls.

spectrum – In this case, spectrum refers to the distribution of the values of a


specific signal quantity (voltage amplitude, power amplitude) with respect to signal
frequency.

B-12
New Terms and Words

status and control registers – The operation of the TMS320C50 DSP CPU is
determined by the information found inside of four 16-bit Status and Control
Registers. The four status and control registers are: the Circular Buffer Control
Register (CBCR), Processor Mode STatus register (PMST), STatus Register 0
(ST0), STatus Register 1 (ST1).

strobe – The strobe signal indicates to the external device sending the DSP data
through the parallel, port that the data word has been received. The strobe signal
is also known as a handshake signal.

subroutine – A sequence of computer instructions that perform a specific task and


that are usually used repeatedly by the main program (routine). The subroutine is
called from the main program by means of a standard function-call and returns from
the subroutine by means of a standard function-return.

Superscalar – Like VLIW processors, Superscalar processors issue and execute


multiple instructions in parallel. Unlike VLIW processors, in which the programmer
(or code generation tool, assembler) explicitly specifies which instructions will be
executed in parallel, the hardware in Superscalar processors dynamically combines
instructions such that data dependencies are obeyed.

surface mount – A type of technology that allows for a fully automated


manufacturing process for printed circuits. It consists of soldering the pieces directly
on the surface of a printed circuit board (PCB).

Synchronous serial ports – A type of serial interface where a bit clock signal is
transmitted in addition to the serial data signal. The receiver uses the bit clock
signal to decide when to sample the received data signal.

TDDR – The decimal value of a series of bits within the TCR register.

TDM – TDM is the abbreviation for Time-Division Multiplexing, the process of


transmitting two or more signals over a common path by using successive time
intervals for different signals.

time domain – The time domain is the frame of reference for signals that vary as
a function oftime.

Time-Division Multiplexing – Time-Division Multiplexing, abbreviated TDM, is the


process of transmitting two or more signals over a common path by using
successive time intervals for different signals.

time-stationary coding – Time-stationary coding gives to programmers more


explicit control over the pipeline, significantly reduces the complexity of the DSP’s
controller, and thus results in higher performance levels from the processor. A
programmer using time-stationary coding specifies within each instruction the
operations that occur simultaneously during one instruction cycle.

Timers – A peripheral that changes the content of a register at regular intervals in


such a manner as to measure time.

B-13
New Terms and Words

TM C – The period of the DSP master clock (this value is (1/20) MHz = 50 ns for the
'C50 DSP).

transition region – The transition region is the area between the pass band and
the stop band. A large gain rate of change with frequency within this region usually
improves filter performance (depending on the specific application).

two's complement – A numerical convention for the representation of values in


fixed-point processors. The left-most bit represents a negative decimal value and
the remaining bits each represent a different positive decimal value.

underflow – In an arithmetic operation, a result whose absolute value is too small


to be represented within the range of the numeration system in use.

VLIW – VLIW stands for Very Long Instruction Word. A VLIW machine is a parallel
processor in which several instructions grouped together into a single word are
carried out simultaneously by several functional units within the Program Controller
unit.

voice – An effect created by the assembled ex3_1.asm DSP program. The effect
consists of sampling the microphone input signal, and via the CODEC and DSP
outputting it to the AUDIO AMPLIFIER.

volatile – A characteristic of a memory device subject to the loss of stored


information when power is removed.

wavetable – A list of values that define one period of a signal. The wavetable is
stored in memory and is used to generate a waveform.

weights – The factor by which a digit in a binary number is multiplied to obtain its
additive contribution in the representation of a real number.

XINT – XINT is a 'C50 hardware interrupt that is signaled after a word has been
transmitted through the DSP serial port.

B-14