Professional Documents
Culture Documents
Printed in Canada
Student Manual
31946-00
ü
FAULT ASSISTED CIRCUITS FOR
ELECTRONICS TRAINING (FACET)
by
the Staff
of
Lab-Volt (Quebec) Ltd
ISBN 2-89289-480-8
Printed in Canada
June 2000
Table of Contents
III
IV
Unit 1
UNIT OBJECTIVES
Upon completion of this unit, you will be able to explain the difference between a
digital signal processor (DSP) and a general-purpose processor. You will be
familiar with the design process for DSP programs.
UNIT FUNDAMENTALS
The internal design of DSPs, the key element being the multiply and add
architecture, makes them often much faster at calculating mathematical operations
than other microprocessors.
1-1
DSP Trainer Familiarization
In fact, DSPs are commonly found in other devices that are not immediately, in the
minds of people, associated with them, such as: hard disk drive controllers, vehicle
suspension systems and in the signal processing circuits of medical imagers and
radar systems.
1-2
DSP Trainer Familiarization
DSPs began to appear at the end of the 1970s and the beginning of the 1980s with
Bell Lab's DSP1, Intel's 2920, and NEC's µPD7720.
In 1982, Texas Instruments introduced the TMS32010, the first member of what
was to become a popular 16-bit fixed-point DSP family. This DSP had an average
calculation rate of 8 MIPS.
The DSP used with the Lab-Volt DIGITAL SIGNAL PROCESSOR circuit board is
a Texas Instruments TMS320C50. The TMS320C50 is a third-generation DSP with
an internal design based on the first-generation TMS32010.
1-3
DSP Trainer Familiarization
Also in 1982, the first floating-point DSPs were produced by Hitachi. This numeric
format greatly increased the dynamic calculation range of DSPs.
NEC introduced, two years later, the first 32-bit floating-point DSPs that had a
calculation speed of 6.6 MIPS.
Generally, real world signals (e.g., radar and sonar) are better processed by
floating-point DSPs. Constructed signals (e.g., telecom, imaging and control) are
generally better processed using fixed-point DSPs.
The uses that DSP's have been put to has grown because:
– They allow for more complex processing than is possible with analog circuitry;
– Digital processing code can be easily modified, and with it design updates or
changes are more flexible;
– They usually result in a lower development cost than analog designs with
equivalent performance levels.
1-4
DSP Trainer Familiarization
A DSP cannot operate without the intelligence of a program giving it its commands.
The program tells the DSP which instructions it must execute to perform certain
functions. This program is stored as machine code inside of the DSP.
Which of the following choices is written in machine code format (one understood
by processor)?
a. ADD #214,4
b. F9E7h
c. 1011 1110 0001 0110
d. All of the above
If a programmer were to write a DSP program using machine code it would be very
difficult.
1-5
DSP Trainer Familiarization
An assembler and a linker are used to translate the program written in assembler
language into DSP machine codes.
The assembler translates the program file into object files which are then linked
together to create the executable file.
A C compiler is used to translate the C source codes into the appropriate DSP
assembler codes.
1-6
DSP Trainer Familiarization
The last part of programming involves checking your program for mistakes and
making changes until it correctly performs the desired function.
The C5x Visual Development Environment, C5x VDE, is the debugger used with
the Digital Signal Processor.
DSP system developers rarely debug a DSP without the aid of a debugger. As well,
to aid them they often use EVMs, emulators, and simulators.
1-7
DSP Trainer Familiarization
The DSP used with the circuit board is part of the TMS320C5x DSK (Digital Signal
processing Kit) evaluation module.
When using EVMs, emulators, and simulators the developers can change, during
the development process, the model of the DSP being tested.
Once functional, the final test for a program are implemented with a DSP system.
The programs included and used with the Digital Signal Processor are written in the
assembler language. The assembler language used is one specific to the
TMS320C5x EVMs, it has added instructions in it called DSK directives.
To run, or examine the function of, a Digital Signal Processor program, the
executable file (*.dsk) must be downloaded into the DSP through the C5x VDE, the
Trainer's debugger.
EQUIPMENT REQUIRED
1-8
Exercise 1-1
EXERCISE OBJECTIVES
Upon completion of this unit, you will be familiar with the location and the function
of each of the various components of the DIGITAL SIGNAL PROCESSOR training
system.
DISCUSSION
and the section containing the Digital Signal Processor and its peripherals.
1-9
Introduction to the DSP Circuit Board
The POWER SUPPLY circuit block delivers a filtered and regulated DC supply to
the entire circuit board.
The circuit board can be operated in two different ways. Either the input voltage for
the Power Supply can be received from a Lab-Volt FACET Base Unit or it can be
received through external ±15 V connections found on the AUXILIARY POWER
INPUT block.
1-10
Introduction to the DSP Circuit Board
The DC SOURCE can be used as the source of an input reference signal for
programs run on the DSP.
The GAIN potentiometer varies the output-level between a low and a high value.
1-11
Introduction to the DSP Circuit Board
To be able to hear the signal from the ANALOG OUTPUT, located on the CODEC
block, the AUDIO AMPLIFIER is used. Either the speaker or the headphones can
be used to listen to the signal.
The second functional section of the circuit board, the DSP and its peripherals,
contains the:
– DSP
– CODEC
– I/O INTERFACE
– INTERRUPTS
– AUXILIARY I/O
– SERIAL PORT
1-12
Introduction to the DSP Circuit Board
The Digital Signal Processor is found at the heart of a digital signal processing
system.
The DSP block contains a TMS320C50 DSP integrated circuit (IC) in a 132-pin
surface mount package.
There are many kinds of DSPs, they may vary in cycle speeds.
However, the speed is limited by the IC's internal system design constraints.
Some DSPs use an internal oscillator to set the clock and others use an external
oscillator.
1-13
Introduction to the DSP Circuit Board
The DSP used on the circuit board is configured to use an external oscillator.
The Oscillator located on the circuit board provides it with a 40 MHz reference
signal.
The DSP divides this signal to make a 20 MHz internal one (the master clock
frequency) that it uses to time its instruction cycles.
a. 20 MHz
b. 40 MHz
c. 20 MIPS
d. 40 MIPS
1-14
Introduction to the DSP Circuit Board
Some DSP programs are written to internal ROM during the manufacturing
process, most, however, use external ROM to store their program.
Both types of DSPs access their ROM at boot-up and store the program to RAM for
execution.
To be able to interact with the outside world it must have a translator to convert the
analog signals to digital ones and then back again.
1-15
Introduction to the DSP Circuit Board
The 8-position DIP switch enters an 8-bit number into the DSP.
Depending on the program being used the information will be processed in different
ways.
The 7-segments displays are used to show program information to the DSP user.
1-16
Introduction to the DSP Circuit Board
When one of the push-buttons is pressed an interrupt is signaled within the DSP
and the program code associated with it is executed.
The AUXILIARY I/O section was added for signal monitoring purposes and for
prototyping of additional DSP exercises done with the circuit board.
1-17
Introduction to the DSP Circuit Board
The headers of the AUXILIARY I/O block can be used to interface the DSP with an
external circuit.
The external circuit can be powered by the 10-pin header located in the AUXILIARY
I/O block.
±5 Vdc and ±15 Vdc connection points are available on the 10-pin right header;
these can be used to power an external circuit. The circuit board supplies have a
common ground.
The left header outputs the 8 LSB pins (labelled D0 to D7) of the external DSP data
bus, and include 4 pre-decoded addresses (labeled PA0# to PA3#) which can be
used for prototype development are also included.
1-18
Introduction to the DSP Circuit Board
The DSP on the circuit board is programmed to be the slave of a host computer.
For the DSP Trainer to be used the circuit board SERIAL PORT must be connected
to one of your computer's serial ports.
Note: If the host computer does not have a second serial port connection
available, then at the appropriate times during the exercise procedures, you
can disconnect the Base Unit serial link and use it to connect the circuit board
SERIAL PORT to the computer.
1-19
Introduction to the DSP Circuit Board
Once the communication link between your computer and the DSP board is
established, the C5x VDE can be used to download a program into the DSP.
PROCEDURE
In this procedure section, you will familiarize yourself with some of the components
and circuit blocks found on your DIGITAL SIGNAL PROCESSOR circuit board.
1-20
Introduction to the DSP Circuit Board
* Yes * No
* 2. Turn ON the power supply for the DIGITAL SIGNAL PROCESSOR circuit
board.
1-21
Introduction to the DSP Circuit Board
* 5. While talking into the microphone familiarize yourself with the use of the
potentiometers of the MICROPHONE PRE-AMPLIFIER and of the AUDIO
AMPLIFIER.
1-22
Introduction to the DSP Circuit Board
In this procedure section, the C5x VDE will be used to load and run a program
inside of the DSP.
Note: Before using the C5x VDE please make certain the circuit
board power source is turned ON, and that the serial connection
is present between the host computer and the DIGITAL SIGNAL
PROCESSOR circuit block labeled SERIAL PORT.
* 8. Use the Load Program command in the File menu to load the ex1_1.dsk
program into the DSP.
What two display windows are now open inside of the C5x VDE?
1-23
Introduction to the DSP Circuit Board
* 9. Make the circuit board connections shown in the figure. These will allow for
proper operation of the ex1_1.dsk program.
* 10. Execute the RUN command found on the C5x VDE Toolbar.
* 11. Observe the read-out displayed within the I/O INTERFACE circuit block.
Adjust the DIP switch (every bit in the 0 position) so that the display reads
0000.
* 12. Press the INT1# push-button, found in the INTERRUPTS circuit block, to
transmit to the DSP the value entered through the DIP switch.
* 13. Using the microphone, input a signal (your voice) into the DSP.
* 14. Note that when you are talking into the microphone, dots on the display of
the I/O INTERFACE circuit block light up.
1-24
Introduction to the DSP Circuit Board
* 15. Adjust the DIP switch such that the I/O INTERFACE display reads 0015.
* 16. Transmit the DIP switch value into the DSP by pressing the INT1# push-
button.
* 17. Observe the effect of the signal processing modification on the sound of
your voice.
* 18. Repeat steps; 15 to 17 for each of the following I/O INTERFACE display
values:
Remember to press the INT1# push-button after setting the DIP switch to
a new value.
1-25
Introduction to the DSP Circuit Board
a. It is a voice recorder
b. It is a Base Unit operating system
c. It is a function generator
d. It is an echo generator
* 19. Execute the Halt command found on the C5x VDE Toolbar. Close the C5x
Visual Development Environment.
CONCLUSION
& The DIGITAL SIGNAL PROCESSOR has two sections: the circuit board
accessories section and the DSP with peripherals section.
& Before a DSP program can be loaded or used, the DIGITAL SIGNAL
PROCESSOR power supply must be turned ON and a serial connection
between the SERIAL PORT circuit block and the host computer must be made.
& The CODEC, I/O INTERFACE, INTERRUPT and AUXILIARY I/O circuit blocks
can only be used by the user if the program loaded into the DSP requires their
use.
& The user has access to the auxiliary circuit board headers for prototype
development.
REVIEW QUESTIONS
a. Make certain that the I/O INTERFACE switches are all in the O position.
b. Make certain the serial connection is present between the host computer
and the DIGITAL SIGNAL PROCESSOR circuit block labeled SERIAL
PORT.
c. Make certain the circuit board power source is turned ON.
d. Statements b. and c.
1-26
Introduction to the DSP Circuit Board
2. What is the DC voltage range that the potentiometer for the DC source is
adjusted over?
a. -3.3 V to +3.6 V
b. -3.0 V to +3.0 V
c. -3.5 V to +3.5 V
d. None of the above.
3. Which of the following pins are located on the middle header of the AUXILIARY
I/O circuit block?
a. An anti-aliasing filter
b. An analog-to-digital converter (ADC)
c. A digital-to-analog converter (DAC)
d. All of the above
1-27
1-28
Exercise 1-2
EXERCISE OBJECTIVES
Upon completion of this exercise, you will understand basic DSP source file syntax.
You will be able to operate the debugger that accompanies the DIGITAL SIGNAL
PROCESSOR.
DISCUSSION
The source file for a DSP program can be written inside of a text editor, virtually any
ASCII editor can be used.
1-29
The Assembler and Debugger
The instruction lines found in the source file and used in the assembler
programming language are called source statements.
The source statements used in the assembler language have a very precise syntax.
There are four fields that make up a statement:
1-30
The Assembler and Debugger
The source statements themselves must either begin with a label or a blank.
Directives supply the program with data and control the assembly process.
1-31
The Assembler and Debugger
The source files used with the DIGITAL SIGNAL PROCESSOR contain certain
assembler directives, some of the most common ones are:
What assembler directives declare the initial DSP memory addresses where
program instructions and data variables are stored?
The executable file dsk5a.exe is the assembler program used with the DIGITAL
SIGNAL PROCESSOR.
When a source file (*.asm) is assembled, a dsk file (*.dsk) and a listing file (*.lst)
are created.
1-32
The Assembler and Debugger
The dsk file, also known as the program file, contains a list of machine code
corresponding to assembled source statements. To run a program, the program file
must be loaded into the DSP.
The listing file lists all source statements, line numbers and any errors that occurred
during assembly.
When the program is viewed inside of the debugger, the listing and the dsk files are
used to create a display of the source file statements.
If an MPY, multiply, instruction uses one operand, #031h, and is labeled OMEGA
then which of the following source statements has the correct syntax?
The C5x VDE is the debugger used with the Digital Signal Processor. It has the
following functions:
– Load dsk programs into memory and view the program code,
– run and halt the program and execute single step commands (execution of
single instructions),
– display in a viewing window the CPU registers and peripheral registers,
– display in a viewing window the DSP memory areas,
– graph DSP memory values while the DSP program is running,
– edit CPU registers, DSP program instructions and memory,
– place breakpoints at specific DSP source statements.
1-33
The Assembler and Debugger
The C5x VDE uses the listing file to dis-assemble (contrary of assemble) machine
code contained within the dsk file. The dis-assembled code is then displayed.
When a dsk file is loaded into DSP memory the Dis-Assembly window automatically
opens.
The source statement highlighted with a yellow line represents the next instruction
that the DSP will execute.
1-34
The Assembler and Debugger
A toolbar located at the top of the debugger screen has commands that aid in the
control of program execution.
Run and Halt, are used to begin and stop program execution.
StepInto: You can single step through the code by clicking on the StepInto button
on the Toolbar.
This will execute one program instruction for every click of the button.
StepOver: If you do not wish to single step through a subroutine, you can execute
the StepOver command once you reach a CALL function.
The entire function will then be executed, at this point single stepping can resume.
StepOut: The StepOut command will execute all of the instructions necessary to
execute a subroutine.
Execution will be halted once a RET (return from subroutine) assembler instruction
is encountered.
1-35
The Assembler and Debugger
The value of all CPU registers are shown in the C5X Registers window.
You will become familiar with many of the CPU registers as you advance through
the course.
For the moment, it is sufficient to know that these registers contain DSP system
information.
The registers displayed in the window contain values, DSP status and control bits
and instruction pointers.
1-36
The Assembler and Debugger
When a dsk file is loaded inside of the C5x VDE, the following is true for the Dis-
Assembly and Memory display windows:
– All source statement labels, used to declare a variable within the source code,
appear in blue.
– All comments of labeled source statements appear in green.
The Memory display window can be used as a Watch Window. Variables stored in
memory may be watched and edited if necessary.
– Memory addresses and registers appear in red when the values stored within
them are modified during the execution of the previous instruction.
– Memory addresses and registers (except the RAM, XF and INTM registers) can
be edited by simply double-clicking on the desired register or memory address.
1-37
The Assembler and Debugger
The Graph command in the View menu can be used for graphical displays of data
values.
Signals can be viewed in either the time or frequency domain, at any point in your
program.
Breakpoints halt a program for the debugger user to be able to verify the status of
the loaded program after a certain instruction.
1-38
The Assembler and Debugger
When a breakpoint is executed any display windows that are associated with it are
updated. This effectively connects a probe to a specific point in the program.
PROCEDURE
In this procedure section, you will assemble a source file and familiarize yourself
with the assembler source code directives.
1-39
The Assembler and Debugger
* 2. Find, inside of the source code, the directives instructing the assembler
where to store program instructions and data variables.
a. 1280h
b. 080Ah
c. 0A80h
d. 0980h
Note: The source file contains entries that begin with .include.
The .include directive tells the assembler to read source
statements from a different file. The .include directive has been
used to eliminate complicated initialization subroutines from the
main source file.
The .ds assembler directive used before the .include directive instructs to
which data memory address (dma) the DSP must begin writing the
wavetable values.
C:\lv91027\bin\dsk5a.exe ex1_2.asm -l
1-40
The Assembler and Debugger
* 5. Confirm that a program file (ex1_2.dsk) and a listing file (ex1_2.lst) were
created when the source file was assembled.
* 6. Open the listing file inside of another text editor. Observe the contents of
the file.
a. .word
b. SPLK
c. .entry
d. ADD
* 7. Close the source code and the listing file text editor windows.
Viewing Memory
In this procedure section, you will open a Memory display window inside of the C5x
VDE.
Note: Before using the C5x VDE please make certain the circuit
board power source is turned ON, and that the serial connection
is present between the host computer and the DIGITAL SIGNAL
PROCESSOR circuit block labeled SERIAL PORT.
* 8. Open the C5x VDE and using the Load Program command, found in the
C5x VDE File menu, load the ex1_2.dsk program into the DSP.
1-41
The Assembler and Debugger
* 9. Open a Data Memory window to the first wavetable value held within the
dma labeled C0 (use a capital C). This window can be launched by
executing the Memory command in the View menu.
What do the blue symbols (C0, C9, C19, C29, ...) within the Data Memory
window represent?
1-42
The Assembler and Debugger
Graphing Memory
In this procedure section, you will view DSP memory graphically and use the
Graphical display to gather information about the wavetable of the current program.
* 10. Launch the Graphic Display window with the Graph command found in the
View menu. To visualize the wavetable data enter the setup information
that is found in the above figure (use a capital for C0).
* 11. Note that by clicking in the Graphic Display window a cursor line appears.
The coordinates of the point where the cursor line and the graphed curve
cross is displayed at the bottom of the window.
* 12. Locate the maximum value of the wave signal shown in the Graphic
Display window.
tMAX =__________ ms
* 13. Locate the minimum value of the wave signal shown in the Graphic Display
window.
tMIN =__________ ms
f =__________ Hz
1-43
The Assembler and Debugger
* 14. Inside of the Data Memory window edit to 0 the data at DSP memory
location C549 (by double-clicking) and refresh the wavetable Graphic
Display window(using the Windows toolbar menu). Observe the change
caused on the wave signal in the Graphic Display window.
* 15. Change the wavetable value back to the original value of 1F0E. Refresh
the Graphic Display window.
* 16. Connect the OUTPUT of the DC SOURCE to the ANALOG INPUT of the
CODEC.
* 17. Connect the ANALOG OUTPUT of the CODEC to the INPUT of the AUDIO
AMPLIFIER and to the input of an oscilloscope.
* 18. Run the DSP program (the RUN command can be found on the C5x VDE
toolbar). Use the GAIN of the AUDIO AMPLIFIER to adjust the volume
level of the generated signal.
* 19. Observe the generated wave signal on the oscilloscope. Observe that the
frequency of the generated signal is shown on the display of the I/O
INTERFACE circuit block.
* 20. Using the potentiometer of the DC SOURCE vary the frequency of the
generated signal.
1-44
The Assembler and Debugger
* 21. Using the oscilloscope, compare the frequency displayed in the I/O
INTERFACE circuit block with the inferred frequency of the generated
wave signal.
fMIN =__________ Hz
fMAX =__________ Hz
In this procedure section, you will create a breakpoint within the DSP program. With
the aid of an associated breakpoint, you will view the variation with time of certain
DSP register values.
* 22. Execute the HALT command found on the C5x VDE toolbar. Place a
breakpoint at the program memory address (pma) labeled MARKER1 by
double-clicking within the dis-assembly window on the label(you might
have to scroll down to find it). Run the DSP program.
1-45
The Assembler and Debugger
* 24. Associate the MARKER1 breakpoint, set in step 22, with the Peripheral
Registers window. To do so, make certain that the Peripheral Registers
window is active(highlighted). Launch the Associate Breakpoint window by
executing the Associate Breakpoints command inside of the Options menu.
Fill the menu as show and press OK.
* 25. Execute the ANIMATE command found on the C5x VDE toolbar. Make a
note of the peripheral registers that are continuously updated.
The DXR register represents the register where values are stored before
being sent through the CODEC to the ANALOG OUTPUT. It is the stream
of these values that create the signal seen on the oscilloscope.
1-46
The Assembler and Debugger
* 26. Halt the program. To generate a signal with a low frequency, adjust the
potentiometer of the DC SOURCE to the minimum position.
* 27. Make the Graphic Display the current window within the C5x VDE and
execute the Options command that is located on the Toolbar.
* 28. Change the settings of the Setup for Graphics window to the ones shown
in the figure above.
* 29. Associate the breakpoint, placed at MARKER1 in step 22, with the Graphic
Display window. Select to refresh the window only on the associated
breakpoint.
* 31. While the program is in Animate mode, execute the Graphic Display
Options command again. Change the graph from the Time Domain to the
Frequency Domain: FFT.
* 32. To generate a signal with a high frequency, adjust the potentiometer of the
DC SOURCE to the maximum position. Observe the effect of the frequency
change inside of the Graphic Display window.
1-47
The Assembler and Debugger
In this procedure section, using the C5x VDE you will edit a memory location as
well as a CPU register.
* 33. Halt the animation. Observe that the content of the Program Counter (PC)
register is displayed within the C5x Registers window inside of the C5x
VDE. The PC register stores the address of the next source statement to
be executed.
* 34. Note that the PC value corresponds to the address highlighted in yellow
inside of the Dis-Assembly window.
* 35. Edit the PC register by double-clicking it within the C5x Registers window.
Edit the PC to the pma labeled MAIN.
* 36. Note that the source statement now highlighted in yellow corresponds to
the statement held within the pma labeled MAIN. This is a simple
technique used for moving from one part of code to another.
CONCLUSION
& A source statement has a very precise syntax. It contains a mnemonic, and the
mnemonic operands. It may also contain a label and a comment.
& Assembler directives supply the program with data and they control the
assembly process.
& When a source file (*.asm) is assembled, a program file (*.dsk) and a listing file
(*.lst) are created.
& The C5x VDE is the debugger used with the DIGITAL SIGNAL PROCESSOR
circuit board. It gives the programmer the ability to diagnose DSP program
problems and to control program execution.
1-48
The Assembler and Debugger
REVIEW QUESTIONS
1. Out of the following possibilities which is the correct syntax for a source
statement?
3. What step(s) must you perform to execute (Run) a dsk program from within the
C5x VDE?
4. Which among the following list of features of the C5x VDE is false?
a. Run and halt the program and execute single step commands (execution
of single instructions).
b. Edit, build, debug, profile and manage DSP projects (programs).
c. Load dsk programs into memory and view the program code.
d. Place breakpoints at DSP source statements.
1-49
The Assembler and Debugger
5. Which of the following choices is the reason why Animate mode and Run mode
within the C5x VDE are not the same?
a. In Animate mode, the DSP is not used at all. The program is executed by
the C5x VDE.
b. In Run mode, the DSP is not used at all. The program is executed by the
C5x VDE.
c. In Animate mode, the DSP stops communication with the C5x VDE and the
DSP begins independent execution of the program.
d. In Run mode, the DSP stops communication with the C5x VDE and the
DSP begins independent execution of the program.
1-50
Exercise 1-3
Processor Arithmetic
EXERCISE OBJECTIVES
Upon completion of this exercise, you will be familiar with the numerical formats
and representations used within DSPs.
DISCUSSION
Digital Signal Processors are categorized by the way that their arithmetic is
performed. A DSP can either be:
The type of DSP chosen for a specific application depends on the suitability of its
arithmetic for the task. The TMS320C50 is a fixed-point DSP.
1-51
Processor Arithmetic
1-52
Processor Arithmetic
Floating-point devices are usually more flexible because their arithmetic system has
access to a wider dynamic range and in many cases these systems are more
precise.
A typical 16-bit fixed-point processor stores coefficients and data values with 16-bit
precision.
However, within the internal arithmetic unit of the DSP, intermediate values are
kept at 32 bits of precision.
1-53
Processor Arithmetic
When you use your computer or your calculator you can calculate such values as:
(-1*23) or (3.453)
A programmer must use certain numerical formats so that every value desired to
be used in the DSP has a binary representation associated with it.
This binary value will need at times to represent either a positive or negative,
fractional or integer number.
1-54
Processor Arithmetic
Integers, both negative and positive, are represented by the two's complement
integer format (2s-format).
Fractional numbers, both negative and positive, are represented by the two's
complement fractional format (Q-format).
These formats differ only by the associated weights that are given to each bit of
information.
-2N-1 to +(2N-1 - 1)
1-55
Processor Arithmetic
-6 =__________
The two's complement fractional format (or Q-format) associates different weights
with each bit as well.
The existence of the binary point separating the fractional weighted values from
the integral weighted values is implied.
In Q15-format the most significant bit is the sign bit and it is given a weight of -20.
This implies that the binary point is located between the MSB and the 14th bit.
By changing the position of the binary point the weight given to each bit is also
changed. Consequently, the dynamic range and the precision of the two's
complement fractional format may vary with the type of format being used.
1-56
Processor Arithmetic
Note that by continuing to move the binary point further and further to the right a
handy relationship is uncovered. The 2s-format and the Q15-format decimal
representations are proportional by a scaling factor of 215.
Which of the following choices represents the proportionality constant between the
2s-format and Q13-format, for the 16-bit binary number?
a. 13
b. 215
c. - 3.0518 x 10-5
d. 8192
The 2s- and Q-formats can be used by the fixed-point internal arithmetic units of
any DSP. These formats are numerical conventions used by programmers. The
binary arithmetic done inside of a fixed-point DSP is not affected by the format of
the binary number used.
1-57
Processor Arithmetic
Floating-point DSPs generally use a 32-bit format where the 24 left-most bits
represent the mantissa and the 8 remaining bits represent the exponent.
This means that the bit weighted by 20 will always be equal to 1. Therefore, it
becomes unnecessary to store it in memory and during calculations it becomes an
implied bit.
1-58
Processor Arithmetic
Floating-point processors are usually more precise and have a larger dynamic
range.
This arises because more bits are provided to define the mantissa (24 bits + 1
implied bit) compared to fixed-point DSPs that usually have 16 bits, although 20-
and 24-bit fixed-point DSPs exist.
PROCEDURE
In this procedure section, you will learn how to convert a signed fractional number
to a binary number written in Q-format.
* 3. Make certain that the Scientific option under the View menu is checked.
Checking this option makes the Standard calculator become a Scientific
calculator.
1-59
Processor Arithmetic
* 5. Use the Calculator conversion functions (Dec to Bin) to change the value
obtained in step 4 to a one Word (16 bits) binary value (make certain the
Word check box has been clicked).
1-60
Processor Arithmetic
* 6. Verify that the binary number that you calculated (1100 0001 0000 0000)
is equal to the decimal value -0.984375 it was converted from. Use the
Q14-format bit-weights to calculate the decimal value.
In this procedure section, you will learn how to convert a binary number to a
decimal value.
a. 20 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13 2-14 2-15
b. 215 214 213 212 211 210 29 28 27 26 25 24 23 22 21 20
c. -215 214 213 212 211 210 29 28 27 26 25 24 23 22 21 20
* 9. Consider that B093h was written with the above numerical format
(unsigned integer). Use the unsigned integer format bit-weights to calculate
the corresponding decimal value. What is the value of the calculated
decimal number?
=__________
1-61
Processor Arithmetic
* 10. What is the weight given to each bit of a binary word in 2s-format?
* 11. Consider that B093h was written with the above numerical format (2s-
format). Use the 2s-format bit-weights to calculate the corresponding
decimal value. What is the value of the calculated decimal number?
=__________
* 12. What is the weight given to each bit of a binary word in Q15-format?
a. -20 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13 2-14 2-15
b. -20 21 22 23 24 25 26 27 28 29 210 211 212 213 214 215
c. 20 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13 2-14 2-15
* 13. Consider that B093h was written with the above numerical format (Q15-
format). Use the Q15-format bit-weights to calculate the corresponding
decimal value. What is the value of the calculated decimal number?
=__________
In this procedure section, you will make the same numerical conversions done in
the previous section but this time using the C5x VDE.
Note: Before using the C5x VDE please make certain the circuit
board power source is turned ON, and that the serial connection
is present between the host computer and the DIGITAL SIGNAL
PROCESSOR circuit block labeled SERIAL PORT.
* 14. Open the C5x Visual Development Environment (VDE). Close the C5x
Registers window.
* 15. Open a Memory display window to dma 0x300 by executing the Memory
command found in the View menu. Use the Signed Integer display format.
This window will be used to make numerical conversions.
1-62
Processor Arithmetic
The value entered into data memory corresponds to the decimal value that
you scaled by 214 in step 4, and then converted to binary.
* 17. Open the Options window of the Data Memory display, and change the
display format to binary.
* 18. Open the Options window of the Data Memory display and change the
display format to Hex (hexadecimal).
1-63
Processor Arithmetic
* 20. Open the Options window of the Data Memory display, and change its
display format to Unsigned Integer.
=__________
In step 9, you had converted B093h to the same unsigned value: 45203
* 21. Open the Options window of the Data Memory display, and change its
display format to Signed Integer.
=__________
In step 11, you had converted B093h, written in 2s-format, to the same
decimal value: -20333
* 22. Open the Options window of the Data Memory display, and change its
display format to Fixed-Point Q15.
1-64
Processor Arithmetic
=__________
CONCLUSION
& DSPs are categorized by the way that their arithmetic is performed. A DSP can
either be a fixed-point DSP, or a floating-point DSP.
& Integers, both negative and positive, are represented by the two's complement
integer format (the 2s-format).
& Fractional numbers, both negative and positive, are represented by the two's
complement fractional format (the Q-format).
& Numerical formats are a mathematical convention, and they differ only by the
weights that are associated with each bit in a binary word.
REVIEW QUESTIONS
1. What is the range of decimal values that a 2s-format 16-bit binary number can
represent?
a. -65536 to 65535
b. 0 to 65535
c. -32768 to 32767
d. None of the above.
1-65
Processor Arithmetic
3. Which one of the following choices lists the correct 2s-format bit-weights?
4. What is the decimal value of the above binary number in Q15-, Q14- and Q13-
format?
a. DQ14 = 2-15#D2S
b. DQ14 = 2-14#D2S
c. DQ14 = 215#D2S
d. DQ14 = 214#D2S
1-66
Unit Test
2. Which among the following choices is not one of the four columns of information
displayed within the Dis-Assembly window of the C5x VDE?
a. 010001
b. 011001
c. 011000
d. 011010
5. Which of the following choices about fixed-point and floating-point DSPs is true?
1-67
Unit Test (cont’d)
a. Fixed-point processors are more precise and have a larger dynamic range
as compared to floating-point processors.
b. Floating-point DSPs have faster clock cycle rates than fixed-point DSPs.
c. Floating-point DSPs are cheaper than fixed-point DSPs.
d. A floating-point DSP has more external I/O interface pins than a fixed-point
DSP.
8. Which of the following statements about Digital Signal Processors is not true?
a. They allow for more complex processing than is possible with analog
circuitry.
b. Their performance over time is affected by temperature changes and
component aging.
c. They provide repeatable performance and produce a higher signal quality.
d. Digital processing code can be easily modified, and with it design updates
or changes are more flexible.
9. What files are created by the dsk5a.exe assembler used with the Digital Signal
Processor?
a. The source code file (*.asm) and the listing file (*.lst).
b. The program file (*.dsk) and the source code file (*.asm).
c. The program file (*.dsk) and the listing file (*.lst).
d. None of the above.
10. Which of the following circuit blocks can be used to control a DSP program?
a. INTERRUPT
b. CODEC
c. AUXILIARY POWER INPUT
d. AUDIO AMPLIFIER
1-68
Unit 2
CPU Architecture
UNIT OBJECTIVES
Upon completion of this unit, you will understand the basic difference between the
architecture of a digital signal processor and that of a general-purpose processor.
You will be familiar with the layout of the internal elements of a DSP CPU.
UNIT FUNDAMENTALS
It was until the mid-1970s that computers became powerful enough to do the real-
time signal processing of the analog circuits that they had been simulating.
DSPs today are in fact the result of years of research that even now is still a very
active field. Their specialized architecture allows them to implement signal
processing algorithms more effectively than general-purpose processors.
2-1
CPU Architecture
The Von Neumann architecture has a single memory space that is used for both
data and instructions (instructions belonging to the program).
Digital Signal Processors have historically used a slightly different internal structure
known as the Harvard architecture.
The Harvard architecture, as opposed to the Von Neumann, has separate memory
spaces for data and program instructions.
The information is either a data word (an operand for an instruction) or a program
word (the instruction).
2-2
CPU Architecture
Data words are kept in data memory space and are read from and written to
different locations within the processor via the data bus (the DB).
Programming words are kept in program memory space and are read from and
written to different locations within the processor via the program bus (the PB).
– Memory
– a Central Processing Unit (CPU)
– Peripherals
– a Bus structure
2-3
CPU Architecture
The Central Processing Unit (CPU) is that part of a processor where reside the
circuits that control the interpretation and execution of instructions.
2-4
CPU Architecture
The peripherals are those elements such as the timer, that are used by the CPU
to time the execution of instructions or, such as the serial ports, to communicate
with devices exterior to the processor.
The bus structures of processors are differentiated by the way that the individual
processor buses are interconnected with the other elements of the processor (CPU,
memory and peripherals).
2-5
CPU Architecture
The CPU elements are found in practically all DSP models, but they might go under
different names. E.g.: The CALU of the DSP32xx family, designed by Lucent
Technologies, is named a Data Arithmetic Unit (DAU).
The Program Controller is the unit that controls processor instruction execution.
The PC (Program Counter register) and status and control registers are at the
heart of Program Controller unit operation.
2-6
CPU Architecture
There are 28 core CPU registers, 17 peripheral registers, 16 I/O port registers, and
35 reserved registers in the C50.
Since memory-mapped registers are addressed in data memory space, they can
be written to, and read from, in the same way as any other data memory location.
The Auxiliary Register Arithmetic Unit (ARAU) is used to deduce (calculate and
compare) and keep track of the position of information held within DSP memory.
The C50 has eight Auxiliary Registers (ARs) which are used by the ARAU to store
important memory addresses.
The Central Arithmetic Logic Unit (CALU) is responsible for executing logic and all
arithmetic operations within a DSP.
For example on the TMS320C50 DSP, the CALU executes these operations with
a 16-bit x 16-bit multiplier, an accumulator, operand registers, binary shifters, and
a 32-bit 2s-complement Arithmetic Logic Unit (ALU).
The Parallel Logic Unit (PLU) is a 16-bit logic unit that executes logic operations
without interrupting the CALU (the main CPU arithmetic and logic unit).
EQUIPMENT REQUIRED
2-7
2-8
Exercise 2-1
EXERCISE OBJECTIVES
Upon completion of this exercise, you will be familiar with the role that the CALU
plays within a DSP.
DISCUSSION
Note: Some 'C50 assembler CALU instructions are briefly covered in this
exercise. It will be left up to you, the student, to cover the rest of the related
material. The material can be found in the following file:
C:\LV91027\DOC\TMS320C5x_UsersGuide.pdf.
The Central Arithmetic Logic Unit (CALU) is where the most important signal
processing manipulations take place.
The CALU, also known as the data path, is the principle arithmetic and logic
processing path for a DSP.
It lies along the data (operand) bus and is an integral part of the execution of nearly
every instructions.
2-9
The Central Arithmetic Logic Unit
– Multiplier(s)
– Accumulator(s)
– Operand registers
– Shifters
– At least one Arithmetic Logic Unit (ALU)
Signal processing algorithms are almost entirely devoted to arithmetic and logic
operations. The CALU is designed to execute these types of operations extremely
rapidly.
2-10
The Central Arithmetic Logic Unit
Both the Multiplier and the ALU are simultaneously used during a MAC instruction.
The CALU is said to be using its entire computational bandwidth.
For most DSPs, when the entire computational bandwidth of the CALU is
repetitively used, a result is produced every clock cycle.
The registers are used to temporarily store operands, before they are supplied for
arithmetic operations to the ALU or Multiplier.
The Product REGister (PREG) is a 32-bit operand register which stores the
Multiplier result.
The value held in the PREG can be sent to the ALU for an arithmetic operation, or
it can be passed on to the Data Bus (DB) for the another stage of processing.
ACCB (the ACCumulator Buffer) provides a temporary storage place for the value
held by in the ACCumulator register (ACC).
The ACC register is designed to hold the last arithmetic result produced by the
ALU.
2-11
The Central Arithmetic Logic Unit
Some operations that are commonly executed by the ALU include: addition,
subtraction, negation, and logical and, or, xor, and not.
Most of the ALU instructions that take more than one clock cycle rely on other units
for pre- or post-processing of data.
E.g., add a data value to the ACC and then execute a binary shift. The TMS320C50
requires 2 clock cycles to execute the operation. The binary shift is an example of
the type of processing that takes place after addition.
The ALU executes operations using twice the precision of the native word width of
the processor.
For example the ALU of the 'C50, a 16-bit fixed-point DSP, inputs, outputs, and
executes with a 32-bit word width.
2-12
The Central Arithmetic Logic Unit
Sign extension prevents a negative number from being mistaken for a positive one.
When the number of bits used to represent a word (e.g., 16 bits) is less than the
number of bits required to represent the same word inside of the CALU (32 bits)
then sign-extension extends the sign-bit into the added MSBs.
was loaded using the ALU into a 32-bit Accumulator when sign-extension mode
was enabled what would be the contents of the Accumulator register?
2-13
The Central Arithmetic Logic Unit
The last arithmetic or logical operation executed by the ALU is stored in the
ACCumulator (ACC).
The result held in the ACC can either be stored in the ACC Buffer register (ACCB),
passed on to the ALU, or to another stage of processing using the Data Bus (DB).
In the case of the 'C50 DSP, two operands need to be input into the ALU to execute
any of its arithmetic or logical operations.
One of three other locations provide the other data operand for an ALU operation:
The Multiplier refers to the circuit within the DSP that executes the multiplication of
binary numbers.
Depending on operand size(8-bit or 16-bit for the C50), nearly all Multiplier
instructions can be executed within one clock cycle.
2-14
The Central Arithmetic Logic Unit
In the TMS320C50, register TREG0 is always used as one of the operand sources
for the Multiplier.
In certain cases, such as when the square root instructions (SQRA and SQRS) are
executed, there are no other operands than TREGO used by the Multiplier.
When another multiplication operand is required it is fetched from one of two other
locations:
2-15
The Central Arithmetic Logic Unit
The product register is twice as wide as the word width of the multiplication
operands (native data word width of the DSP).
0111 0111 0011 0111 MULTIPLIER 0001 1001 1001 0001 0001 1001 1001 0001
0110 0110 1011 0111 MULTIPLIER 0010 0010 1110 1010 1110 0010 1110 1010
FALSE
All Multiplier results are sign-extended before they are stored in the Product
REGister (PREG).
This combined with the fact that the PREG has twice the operand word width
means that, by itself, the Multiplier does not introduce any errors into computations.
To keep the level of arithmetic precision constant, the number of bits that are used
to represent multiplication, accumulation and other arithmetic operation results,
need to be increased.
That is why that in DSPs the Multiplier Product Register and the ALU ACCcum-
ulator (ACC) have a width twice that of the native data word width.
2-16
The Central Arithmetic Logic Unit
OVER-FLOW 7FFF FFFF h 7FFF FFFF h ADDITION FFFF FFFE h 7FFF FFFF h
UNDER-FLOW 8000 0000 h 8000 0000 h ADDITION 0000 0000 h 8000 0000 h
FALSE
Most signal processing applications require the addition of series of data values.
These operations when executed within fixed-point DSP can easily lead to an
overflow or underflow.
In many processors, a mode of operation exists which is used to decrease the error
that is caused when overflow or underflow occurs. This mode within the
TMS320C50 DSP is named OVerflow saturation Mode (OVM).
Barring the occurrence of overflow or underflow the precision level within the ALU
and the Multiplier is kept at the same level as when the arithmetic entered the
CALU.
Therefore, the programmer must select the product register or accumulator bits
which will be passed on to the next stage of processing (via the data bus).
2-17
The Central Arithmetic Logic Unit
The selection of which bits to pass on is done with shifters that are located at the
exit of the PREG and of the ACC.
A shifter can shift a binary number to the right or to the left by so many bits.
Pre- and Postscalers are used to scale values before they are input to or output
from the Multiplier and ALU.
2-18
The Central Arithmetic Logic Unit
Ideally, the size of an accumulator register should be larger than the size of the
multiplier product register by several bits.
The extra bits named guard bits allow the programmer to accumulate a number of
values without the risk of overflowing the accumulator and without the need to scale
intermediate results (avoiding overflow).
A single-bit field, present in the 'C50, and known as the carry bit (or the C bit), is
associated with the ACC register.
2-19
The Central Arithmetic Logic Unit
POCEDURE
The ALU
In this procedure section, you will load the accumulator with a value input from the
DIP switch. Then, using the ALU, you will add three different values to the
accumulator, each one fetched from a different operand source.
Note: Before using the C5x VDE please make certain the circuit board
power source is turned ON, and that the serial connection is present
between the host computer and the DIGITAL SIGNAL PROCESSOR
circuit block labeled SERIAL PORT.
If at anytime during the following procedure you realize that you did not correctly
follow a procedure step, then using the C5x VDE simply edit the PC (Program
Counter register), back to one of the last labeled program memory addresses
(either MAIN, MARKER1, MARKER2, or MARKER3).
Within WinFACET, using the Go to previous page button, return to the beginning
of the Procedure Section associated with the labeled program address that you
returned to and start following the procedure steps once again.
* 1. Open the ex2_1.asm assembler source file within an ASCII text editor. This
is the DSP program source file used for the exercise. You can refer to this
source file at anytime during the procedure.
* 2. Open the C5x VDE, and load the ex2_1.dsk DSP program file.
* 3. Using the C5x VDE, open a Data Memory display to 0980h. This is the
data memory address where the constants and variables used by the
program begin being stored.
2-20
The Central Arithmetic Logic Unit
* 4. Using the C5x VDE, open a Program Memory display to 0A0Dh. This is the
program memory address where program machine code was stored.
* 6. Using the C5x VDE, edit the contents of the PREG and ACCB registers.
Input different 16-bit 2s-complement values (0000 XXXXh), of your
choosing, into the registers. They must be 16-bit values.
* 7. Position the eight on-off switches (DIP switch) located in the I/O
INTERFACE circuit block, found on the DIGITAL SIGNAL PROCESSOR
board, to a value of your choosing.
Observe that the binary value that the DIP switch was set to has been
loaded into the accumulator register (ACC).
* 9. Using the C5x VDE, STEP OVER the ADD, APAC and ADDB instructions.
2-21
The Central Arithmetic Logic Unit
In this procedure section, you will enable sign-extension mode, execute the code
of the previous procedure section, and compare the results generated by the ALU
for both procedure sections.
* 12. Make certain that the PREG, ACCB and the DIP switch have the same
values as you had chosen before, if not, then set them back to the same
values.
* 13. Using the C5x VDE, edit the contents of the PC register to the program
address labeled MAIN. The execution line will return to the instruction
labeled MAIN.
* 14. Using the C5x VDE, execute, once again, with the STEP OVER command,
the instruction: CALL A80h,* (and the three add instructions that follow).
* 16. Compare the first accumulator result that you wrote down (this one was
generated with sign-extension mode not enabled) with the second
accumulator result that you just wrote down (this one was generated with
sign-extension mode enabled).
* 17. Observe that the results are not the same. This is because, when the SXM
bit is enabled, the negative 16-bit wide value stored in the data address
labeled VALUE, and added to the accumulator with the ADD ch instruction,
is sign-extended and seen as a negative number by the ALU. This is
contrary to when SXM is not enabled.
In this procedure section, you will use the prescaler located at the input of the ALU.
* 18. Using the C5x VDE, make certain that the SXM bit is set (SXM = 1). If it is
not already set then edit the SXM bit to 1.
2-22
The Central Arithmetic Logic Unit
* 19. Note the ADD instructions that follow the program address labeled
MARKER1. They prescale an operand before adding it to the accumulator.
* 20. Using the C5x VDE, zero the contents of the ACC. STEP OVER the
instruction: ADD #1111h,3
The value 1111h was added to the ACC register that previously contained
zero. By observing the present contents of the ACC, which of the following
choices describes what the prescaler did to the added value?
* 21. Using the C5x VDE, again zero the contents of the ACC register. STEP
OVER the instruction: ADD #8111h,15
The value 8111h (corresponding to the negative value -32495) was added
to the ACC register that previously contained zero. By observing the
present contents of the ACC, which of the following choices describes how
the prescaler and ALU changed the added ACC value?
Both added values were taken from the 16-bit data bus and prescaled. The
bus width between the prescaler and the ALU is 32 bits wide.
* 22. Using the C5x VDE, slowly STEP OVER (while watching the I/O
INTERFACE display) the instruction:
CALL DISPLAYHIGH,*
CALL DISPLAYLOW,*
* 23. Using the C5x VDE, STEP OVER the ZAP instruction. The ZAP instruction
will zero the ACC and PREG registers.
2-23
The Central Arithmetic Logic Unit
In this procedure section, you will execute basic multiplication operations, and by
enabling one of the product shift modes, shift the output of the product register.
There are eight data values used in this procedure section. Each value is written
as a Q14-format binary number.
The data values are stored in the dmas labeled X0 to X3 and B0 to B3.
* 24. Using the C5x VDE, make certain that the following is true:
SXM = 1
ACC = 0000 0000h
PREG = 0000 0000h
PM = 0
This loads the content of the data memory address labeled X0 into
TREG0. TREG0 is an operand source for the multiplier.
Are the contents of the dma labeled X0 and of the TREG0 register the
same?
* Yes * No
* 26. Using the C5x VDE, STEP OVER the following MPY instruction.
The contents of the data memory address labeled B0 (the fifth data value)
are multiplied with the contents of TREG0.
2-24
The Central Arithmetic Logic Unit
Observe, using the C5x VDE, that the PREG holds the product of the
contents of the data memory address labeled B0 with TREG0.
* 27. Using the C5x VDE, STEP OVER the instruction: APAC.
Observe that the contents of the accumulator register (ACC), and of the
product register (PREG) are the same, both are equal to 01B0 7660h. ACC
was equal to 0000h before executing the APAC instruction.
* 28. Using the C5x VDE, edit the PC register to return the execution line to the
program address labeled MARKER2. Edit the content of the PM bits to 1
(this enables the product-shifter, the PREG output is left shifted by 1-bit).
* 29. Using the C5x VDE, once again zero the contents of the ACC register.
STEP OVER the LT, MPY and APAC instructions executed in steps 25 to
27.
Notice the effect that the postscaler has on the contents of the accumulator, the
PREG was shifted left by 1-bit (multiplied by 21).
If the values entered into the multiplier were written in Q14-format and if a product-
shift of 1-bit to the left occurred, then what would be the numerical format of the
value contained in the accumulator register?
a. Q13-format
b. Q14-format
c. Q28-format
d. Q27-format
2-25
The Central Arithmetic Logic Unit
In this procedure section, you will make the accumulator overflow and then you will
enable OVerflow saturation Mode (OVM) to protect against it occurring again.
* 30. Using the C5x VDE, make certain that the following is true:
* 31. Using the C5x VDE, edit the Program Counter register, PC, to the program
address labeled MARKER2.
* 32. Using the C5x VDE, STEP OVER the instructions located between the
program address labeled MARKER2 and the instruction:
B MARKER2,*
* 33. Execute the, B MARKER2,* instruction. This will branch the execution line
back to the program address labeled MARKER2 (this has the same effect
as editing PC).
* 34. Using the STEP OVER command, continue executing the LT, MPY, APAC,
and B MARKER2 instructions, until OV bit is equal to 1.
2-26
The Central Arithmetic Logic Unit
While executing these instructions, observe that the value held in the ACC
register is becoming larger. ACC overflow occurs when OV = 1.
a. 44FD 8DC8h
b. 89FB 1B90h
c. 7838 9B90h
d. 8BAB 91F0h
* 35. Using the C5x VDE, edit the PC register to the program address labeled
MARKER2. Zero the OV bit and the ACC, PREG, TREG0 registers.
* 37. Once again, STEP OVER the LT, MPY, APAC, and B MARKER2
instructions until the OV bit is equal to 1.
2-27
The Central Arithmetic Logic Unit
When the accumulator overflow occurred, and OVerflow saturation Mode (OVM)
was not enabled, the result contained in the ACC had a relative error (ô) of 186%
compared with the correct value.
However, when OVM was enabled and the same overflow occurred the result
contained in the ACC only had a relative error (ô) of 7% compared with the correct
value.
Multiplier Postscaling
In this procedure section, you will add 128 very large values together,
consecutively, and prevent an accumulator overflow by setting the Product-shift
Mode (PM) so that the output of the PREG is scaled by 2-6.
* 38. Using the C5x VDE, edit the PC to the program address labeled
MARKER3. Clear the OV, and OVM bits.
* 39. Using the C5x VDE, STEP OVER the instruction: ZAP
By setting the PM bits to 3, the output of the PREG will be shifted 6 bits to
the right, which is equivalent to dividing it by 26.
2-28
The Central Arithmetic Logic Unit
The maximum positive value that can be represented in the 16-bit 2s-
format, 7FFFh, is used as the operand for the LT and MPY instructions.
These instructions, located between the program address labeled
MARKER4 and the one labeled END_BLOCK, fetch the contents of the
data memory address labeled BIG_VALUE.
The program is halted at the breakpoint, observe that after executing 128
additions the accumulator register still has not overflowed (OV is still equal
to 0).
* 43. Edit the PC to the program address labeled MARKER4. STEP OVER, once
again, the LT, MPY, and APAC instructions (the 129th consecutive multiply-
accumulate).
Observe that the accumulator overflows this time. Implying that when the
output of the product register is scaled by 2-6, a minimum of 128
consecutive additions (of 7FFFh x 7FFFh) can be executed without
causing overflow.
2-29
The Central Arithmetic Logic Unit
CONCLUSION
& Multiplier results are sign-extended before they are stored in the product
register. Sign extension prevents a negative number from being mistaken for
a positive one.
& To keep the level of arithmetic precision constant within fixed-point DSPs, the
product and accumulator registers are, at least, twice the native word width of
the internal bus.
& In fixed-point DSPs, scaling is used to lower the risk of overflow and underflow
from occurring and to select subsets of the CALU output bits.
& Overflow saturation mode is used to decrease the error that is caused when
overflow or underflow occurs.
REVIEW QUESTIONS
2. What is the difference between using the DSP when sign-extension mode is
enabled and when it is disabled?
a. When enabled, all data values in the DSP are sign extended.
b. When enabled, the accumulator saturates to the most positive or negative
values when overflow or underflow occurs.
c. When enabled, the multiplier output is sign extended.
d. When enabled, the ALU output is sign extended.
3. Why, within the TMS320C50 DSP, are the accumulator and product registers
twice the bit-width (32 bits) of the internal buses (16 bits)?
2-30
The Central Arithmetic Logic Unit
4. Which of the following elements is not part of the Central Arithmetic Logic Unit
(CALU)?
2-31
2-32
Exercise 2-2
Memory Space
EXERCISE OBJECTIVES
Upon completion of this exercise, you will be familiar with the basic characteristics
of the modified Harvard architecture, as used by DSPs.
DISCUSSION
The DSP contains on-chip memory and is also able to access off-chip memory
through its external address and data buses. On-chip memory is usually of two
types:
& ROM (Read Only Memory) is used to store program code during the
manufacturing process. ROM is a non-volatile memory because it retains its
data after the processor has shut down.
Both categories of memory (ROM and RAM) are found on-chip (inside the DSP).
The allocation in memory space of these types of memory is able to be
configured in various ways.
2-33
Memory Space
& DARAM (Dual-Access RAM) - A DARAM block can be read from and written to
in the same instruction cycle.
2-34
Memory Space
Each of the parallel busses of a 16-bit fixed-point DSP can allocate 216 addresses
to on-chip memory and peripherals.
If each address of a 16-bit data bus was allocated to on-chip 16-bit/word memory,
how many bits of storage could be used by the DSP?
a. 65536 bits
b. less than 1 million bits
c. more than 1 billion bits
d. more than 1 million bits
Most DSPs today use a modified Harvard architecture to increase their memory
bandwidth. The specific modifications present in the TMS320C5x that have been
added to the traditional Harvard structure are:
2-35
Memory Space
In the case of the TMS320C50 DSP, certain SARAM blocks can be configured as
program/data memory. This implies that each memory element within the SARAM
block has been allocated a data bus address and a program bus address.
Program memory is addressed by the program bus. Operands can only be stored
in or read from program memory using the program bus.
Data memory is addressed by the data bus. Operands can only be stored in or read
from data memory using the data bus.
Program/data memory is addressed by both the program bus and the data bus.
Operands can be stored in or read from program/data memory by either using the
program bus or the data bus.
The C50 has four memory configuration bits that select how data and program
bus addresses are allocated among the different on-chip memories, I/O ports,
internal memory-mapped registers and external memory-mapped peripherals.
2-36
Memory Space
The memory configuration bits should be initialized (set or cleared) at the beginning
of a DSP program and then they should no longer be changed.
By altering the value of one of the bits, memory elements either become mapped
to other addresses (sometimes addresses on a different bus) or become no longer
address mapped at all.
In the case of the 'C50, the register named the Program Counter (PC) acts as an
instruction cache. It can store one instruction word (16 bits in width). The instruction
once loaded into the PC can be repeated the number of times is specified by the
RePeaT Counter register (RPTC).
2-37
Memory Space
During a repeat loop, the program bus does not have to be used to read the next
instruction in the program. The DSP simply fetches the next instruction from the
instruction cache.
When the instruction from the cache is being executed, a Program Bus (PB) access
is freed. The PB is no longer required to fetch the next program instruction.
The freed memory access can be used to read another operand from program/data
memory. Specialized instructions like the MAC (Multiply and ACcumulate) when
repeated, use the freed Program Bus access to fetch a total of two operands
during a single clock cycle.
When programming a DSP the only memory space initializations that should be of
a concern to the programmer are:
& The CNF, MP/MC#, RAM and OVLY bit initializations. These select the proper
memory configuration.
& The use of the DSK directives describing the memory locations where program
and data are stored: .entry, .ps, .text, .word, .byte, .data or .ds, .set.
PROCEDURE
IMPORTANT: At DSP power up, the memory configuration bits for the TMS320C50
DSP are set to default values. The default values for some of the configuration bits
are:
MP/MC# = 0
OVLY = 1
RAM = 1
The RAM bit may not be modified. The program code for the C5x VDE application
executes from internal program memory, the C5x VDE application would not
function if this bit were changed.
2-38
Memory Space
In this procedure section, you will familiarize yourself with the possible memory
configurations of the TMS320C50 DSP.
Note: Before using the C5x VDE please make certain the circuit board
power source is turned ON, and that the serial connection is present
between the host computer and the DIGITAL SIGNAL PROCESSOR
circuit block labeled SERIAL PORT.
Address: 0x0000
Type: Program Memory
Display Format: Hex
* 3. Note in the C5x Registers window that the MP/MC# bit is cleared. This is
the default value at DSP power up. The DSP is now operating in
microcomputer mode and program memory addresses 0h to 800h are
allocated to on-chip ROM.
* 4. Set the MP/MC# bit (i.e., make MP/MC# = 1). Highlight the memory display
window and refresh it (toolbar/Window/Refresh).
2-39
Memory Space
* 5. What value have the addresses between 0h and 800h been initialized to
after editing the MP/MC# bit?
a. 0x0000
b. 980h
c. B882h
d. 12103d
* 6. Open another memory display window with the following options selected:
Address: 800h
Type: Data Memory
Display Format: Hex
* 7. Change the address for the first Program Memory window to 800h, using
the window Options menu.
Note that the contents of program and data memory for addresses 0800h
to 2C00h are the same.
* 9. Clear the OVLY bit and note that data addresses 0800h-2C00h become
zero. They are now allocated for off-chip access.
2-40
Memory Space
* 10. Change the address within the Window Options for Program Memory to
FE00h. Change the address within the Window Options for Data Memory
to 0100h. Clear the CNF bit (make CNF=O).
* 11. Observe that the content of data memory addresses 0100h to 0300h are
not equal to zero and that the contents of program memory addresses
FE00h to FFFF are equal to zero.
* 12. Set the CNF bit (make CNF = 1) and note the changes that take place in
the memory displays.
* 13. What occurred after editing the CNF memory configuration bit?
The Recorder
* 14. Open the ex2_2.asm assembler source file within an ASCII text editor.
This file is the assembler source code for a DSP program that makes the
DIGITAL SIGNAL PROCESSOR circuit board become a Playback/
Recorder. Refer to this source file at anytime during the procedure.
2-41
Memory Space
Important points:
The signal input level of the microphone (proportional to the number of dots) is
output to the I/O INTERFACE display.
2-42
Memory Space
& Connect the OUTPUT of the MIC. PRE-AMP. to the ANALOG INPUT
of the CODEC.
& Connect the ANALOG OUTPUT of the CODEC to the INPUT of the
AUDIO AMPLIFIER.
* 17. Position to zero the DIP switch on the I/O INTERFACE of the circuit board.
2-43
Memory Space
* 18. Using the C5x VDE, load the ex2_2.dsk program into the DSP. Press the
Run command.
* 19. With the INT1# push button begin recording your voice. Play it back with
the INT3# push button when done recording.
0 16-bit sampling
& Disconnect the ANALOG INPUT of the CODEC circuit block from the
OUTPUT of the MICROPHONE PRE-AMPLIFIER.
2-44
Memory Space
* 23. Set the position of the DIP switch to zero and record the generated signal.
* 24. Using the C5x VDE, Halt the DSP program. Open a Graphic Display with
the settings shown in the figure.
To maximize recording time, the recorded signal samples are stored in two
parts. One part is stored in DARAM B1.
* 25. Using the C5x VDE, open a second Graphic Display with the settings
shown in the figure.
The second part of the recorded signal samples are stored in SARAM.
2-45
Memory Space
* 26. Close the text editor displaying the ex2_2.asm source file. End the C5x
VDE session.
In this procedure section, you will observe the difference in the execution time of
a DSP algorithm that uses the instruction cache and one that does not.
* 30. Set a breakpoint at the program memory address labeled SLOW (by
double-clicking on the Dis-Assembly window instruction line).
2-46
Memory Space
* 32. Within the Dis-Assembly window of the C5x VDE, set a breakpoint at the
program memory address labeled FAST.
* 33. Execute the SLOW algorithm by executing the C5x VDE RUN command
once.
When the RUN command was pressed the SLOW algorithm was executed.
The SLOW algorithm is all of the code located between the instruction lines
labeled SLOW and FAST.
* 34. Execute the FAST algorithm by pressing the RUN command again.
When the RUN command was pressed the FAST algorithm was executed.
The FAST algorithm is all of the code located between the instruction line
labeled FAST and the instruction:
B MAIN,*
When the FAST algorithm executed, ten consecutive MAC instruction were
executed within a repeat loop. The program bus was freed because each
MAC instruction (except the first) was fetched directly from the program
cache.
What relative number of instruction cycles was taken to execute the fast
algorithm (TIMER1-TIMER2)?
2-47
Memory Space
* 35. Compare the code of each algorithm and the amount of time it took to
execute each.
In this procedure section, you will observe the difference between SARAM and
DARAM access rates.
* 36. Open the ex2_2b.asm file inside of an ASCII text editor (such as Notepad).
The SLOW and the FAST algorithms are clearly identified. At the top of the
source file are the data constants that are or that can be used by the
program. The source code belonging to the initialization sequence is
clearly identified.
* 38. Edit the fast algorithm, within the ex2_2b.asm source code file, so that the
multiply and accumulate instruction(MAC) calls two constants located in
different SARAM blocks as so:
MAC SARAM2,SARAM1
* 39. Save the modified source file to your personal student folder as:
ex2_2bv2.asm
* 40. Assemble the file from within your student folder. Execute from within your
student folder the following command at a DOS prompt:
c:\lv91027\bin\dsk5a.exe ex2_2bv2.asm -l
* 41. Answer the following question. The answer can be found by loading the
ex2_2bv2.dsk file into the DSP and executing the necessary part of code.
2-48
Memory Space
Block the code off with breakpoints and press the RUN command inside
of the C5x VDE.
* 42. Close the text editor open with the ex2_2bv2.asm source file. End the C5x
VDE session.
CONCLUSION
& SARAM and DARAM are divided up into different sized memory blocks.
& An SARAM memory block can be written to or read from once within one
instruction cycle.
& A DARAM memory block can be read from and written to inside of the same
instruction cycle.
& A program cache, when used, frees the PB of one instruction read which can
then be used in data/program memory as a data read.
& Memory configuration bits (such as the four found in the 'C50) are used to
control the configuration of the PB and DB memory map.
REVIEW QUESTIONS
1. Which of the following modifications to the basic Harvard architecture are used
in some DSPs to increase their number of available memory accesses?
2. How many addresses can each of the parallel buses of a 16-bit Harvard
architecture DSP allocate to on-chip memory and to peripherals?
a. 215 addresses
b. 216 addresses
c. 32768 addresses
d. 32767 addresses
2-49
Memory Space
3. Instruction words cannot be read from which of the following types of memory?
a. data memory
b. data/program memory
c. program cache
d. program memory
a. Two values (a data operand and a microcode instruction) can be stored per
memory element.
b. The memory storage capacity of the SARAM memory block is doubled.
c. An additional SARAM memory access is gained if the program bus is not
required to fetch a microcode instruction.
d. None of the above.
2-50
Exercise 2-3
Addressing
EXERCISE OBJECTIVES
Upon completion of this exercise, you will understand the function that of address
generation unit within a DSP and the specialized addressing modes that it offers.
DISCUSSION
An address in fact becomes the name of a certain location. The address is used
any time that the processor is required to write an operand to or read an operand
from the location.
Addressing is the means by which operand locations are specified to the processor
when a read or write is executed.
Many addressing modes exist. Depending on the addressing mode used with an
instruction, an operand could be fetched directly from an internal memory address
or from a register.
2-51
Addressing
implied addressing
direct addressing
immediate (short and long) addressing
indirect addressing
circular addressing
Different types of DSPs offer a variety of different addressing modes. The type of
addressing used with an instruction influences program flexibility and performance.
Certain types of addressing were meant only to be used for specific situations and
others are restricted for use by a small processor instruction subset.
Implied addressing is when the operand addresses are implied by the instruction.
ADDB
This instruction adds the ACC and ACCB registers together. ACC and ACCB are
thus the implied operands for the instruction.
Direct addressing encodes operand address within the instruction word or within
a word following the instruction word.
Instruction Opcode
Machine Code
100111011 1101001
The direct addressing used within the TMS320C50 is known as paged memory-
direct addressing. When the instruction word is executed, the encoded 7-bit
address is concatenated with the upper 9 bits of the address held in a status and
control register.
Direct addressing encodes the operand address within the instruction word or
within a word following the instruction word.
2-52
Addressing
Instruction Opcode
Machine Code
1110110010101001
0001110101101110
A special register (known in the 'C50 as the Data Page pointer, DP) stores the
number of the current memory page.
In the case of the 'C50, the DP points to one of 512 possible data pages. Each
page containing 128 words.
2-53
Addressing
Direct addressing within the TMS320C50 requires that the 7 lower bits of the data
memory address be encoded within the instruction word.
When the word is executed, the 9 bits from the Data memory Page pointer (DP) are
concatenated with the 7 bits encoded within the instruction word. The operation
forms the full 16-bit data memory address of the operand.
Assume the 9-bit Data memory Page pointer (DP) of a DSP held the value 126
(7E h). The 7 lower bits of the addressed dma encoded within an instruction word
were equal to 43 h. What is the data memory address that the instruction word
requires?
a. 3F43 h
b. 00C1 h
c. 867E h
d. 7E43 h
ADD #05Ah
ADD Opcode
10111000 01011010
5A h
Operand
Known as short immediate addressing, the type of addressing shown encodes the
operand into the instruction word.
2-54
Addressing
ADD #0B948h
ADD Opcode
1011111110010000
1011100101001000
B948 h
Operand
Known as long immediate addressing, the type of addressing shown encodes the
operand into a second word that follows the instruction word.
DSPs use indirect and circular addressing to manage operand address sets.
These are required when performing repetitive calculations on data series. The
series are often stored sequentially in memory.
DSPs include an Address Generation Unit (AGU). The AGU is dedicated to the
calculation of addresses for the different types of addressing modes.
2-55
Addressing
The AGU has its own separate arithmetic unit, AGU arithmetic is independent of the
CALU. All address calculations take place in parallel with instruction execution.
The figure shows the Auxiliary Register Arithmetic Unit (ARAU) of the
TMS320C50 ('C50).
The 'C50 AGU has eight memory-mapped auxiliary registers, identified AR0
through AR7.
The registers can be used for storing addresses or temporary data. Indirect and
circular addressing require the use of some of the Auxiliary Registers (AR0 to AR7).
Within the ST0 register of the TMS320C50 is a 3-bit field identified as ARP, the
Auxiliary Register Pointer.
ARP is a 3-bit wide field that holds a value between 0 and 7. The field specifies the
current auxiliary register (AR0 to AR7) being used for indirect-register addressing.
Once the appropriate registers have been configured, the AGU provides the
necessary operand address required by the processor for the execution of an
instruction.
2-56
Addressing
Any location in data memory can be read from or written to using an address
contained in an Auxiliary Register (AR0 to AR7).
To select the specific AR used to address data memory, the Auxiliary Register
Pointer (ARP) must be loaded with the value of the AR (0 to 7) to be used. The
ARP points to the current auxiliary register used to address memory.
ADD *
is executed, the processor fetches the data memory address from the correct
auxiliary register (which is pointed to by ARP).
In this example, ARP is equal to 2 and so the current auxiliary register is AR2.
When an assembler instruction supporting and using indirect addressing is
executed, the address for the operand will be fetched from AR2.
2-57
Addressing
In real-time applications such as the ones executed by DSPs, the programmer must
determine the size of the data buffer and then must set aside a portion of memory
for the buffer.
The data buffers implemented on DSPs generally use a first-in, first-out (FIFO)
protocol. This means that the first values that are written to the buffer will be the first
values read out of the buffer.
For the programmer to manage data into and out of the buffer, two address pointers
must be maintained (in the case of the 'C50, two auxiliary registers are used as the
pointers).
One of the pointers, the read pointer, indicates the current value to be read from the
buffer.
The second pointer, the write pointer, indicates the current location to write a new
value to in the buffer.
2-58
Addressing
After each read or write operation in a FIFO buffer with linear addressing the
corresponding pointer moves down (increments to the next location in the buffer).
Once the pointers have advanced to the end of the buffer they must be reset to
point back to the beginning of the buffer.
In a FIFO buffer with circular addressing, after the read or the write pointer reaches
the end of the buffer it automatically advances to the start of the buffer. The
automated end-of-buffer verification and advance-to-start-of-buffer operation
(usually automated by the AGU) make the buffer appear circular to the
programmer.
Many DSP processors provide a form of circular addressing. However, the facility
of use and the mechanisms used to control it vary from DSP to DSP.
2-59
Addressing
The AGU of the 'C50 takes charge of determining whether one of the write or read
pointers is at the end of the buffer or at the beginning.
The AGU, thus automates the end-of-buffer verification and advance to start-of-
buffer operation.
TMS320C50 Instructions
...
MAR *, AR3
...
...
Data buffers are often used and always implemented with indirect addressing
(circular and linear). Processors require two additional elements to be specified
within the indirect address field of an instruction when indirect addressing is used:
& The content of the current AR can be incremented, decremented or it can stay
unchanged. The change to be implemented must be specified.
& Whether the Auxiliary Register Pointer ARP should be updated to another AR
or should stay the same must be specified in the indirect address field of the
instruction.
If the start address for a circular buffer is 0980h, the end address is 09E4h and the
current address of the read pointer is 09E4h then what is the next address that the
read pointer will have?
a. 09E5 h
b. 0980 h
c. 09E3 h
d. 0981 h
Most addressing modes covered in this section involve attaching a second word to
the instruction word. These addressing modes thus require two program words to
be stored in memory, increasing program size and slowing execution time.
2-60
Addressing
Short addressing modes use only one program word to specify both the instruction
and the address. However, by so doing, the range of addresses that can be
specified is shortened.
PROCEDURE
In this procedure section, you will initialize the direct and indirect addressing mode
registers of the 'C50 ARAU. As well, you will witness the effect of an uninitialized
circular buffer in a program that requires its use.
* 2. Read the description of the program and familiarize yourself with the
Initialization Sequence and Main code.
& The program stores in DSP memory the last 16 samples received from
the CODEC. It calculates the average value of these samples. The
average is sent to the ANALOG OUTPUT.
& There are three types of addressing used within the program: direct,
indirect, and circular addressing.
2-61
Addressing
E.g., circular buffer register (CBSR1, CBER1, CBCR) have not been
initialized.
& Labels (Circular, Indirect, and Direct) given in the source code indicate
where the ARAU register initialization instructions should be located.
Note: Before using the C5x VDE please make certain the circuit
board power source is turned ON, and that the serial connection
is present between the host computer and the DIGITAL SIGNAL
PROCESSOR circuit block labeled SERIAL PORT.
* 4. Using the C5x VDE, load the ex2_3.dsk program into the DSP.
Address: XN0
Type: Data Memory
Display Format: Signed Integer
The ex. 2-3 program dedicates labeled dma XN0 to XN15 for storage of the most
recent 16 samples received by the DSP. The samples are transmitted by the
CODEC which converted the signal received from its ANALOG INPUT.
* 6. Using the C5x VDE, set a breakpoint at the program memory address
labeled INDIRECT. Set another breakpoint at the program memory
address labeled DIRECT.
2-62
Addressing
All code located before the labeled instruction line is executed, this
includes DSP initializations (apart for the ARAU unit). The instruction
execution line (yellow) has stopped at the indirect breakpoint.
The following instructions, located in the main program code, use indirect
addressing:
* 8. Using the C5x VDE, edit AR0 register to make it point towards the dma
labeled XN0. AR0 is used for indirect addressing.
* 9. Using the C5x VDE, edit the content of the Auxiliary Register Pointer
(ARP). Make ARP point towards auxiliary register 0 (AR0).
You have now initialized the ARAU for indirect addressing as used by this
program.
* 10. Execute the C5x VDE RUN command. The instruction execution line
(yellow) will stop at the breakpoint labeled DIRECT.
The following instruction, located in the main program code, uses direct
addressing:
SACL 10 h
A direct addressing register the Data Page pointer (DP) used by the above
instruction must be initialized. In fact, this instruction uses paged memory-
direct addressing.
* 11. Using the C5x VDE, edit the contents of the DP bits. Set the Data Page
pointer (DP) to the data page starting on 0x0980h (change the content of
DP to 0x0980). It is, among others, the data memory address labeled
OUTPUT that is found on this data page.
2-63
Addressing
Note that after making the change to DP, register ST0 is shown as
modified (highlighted in red). ST0 is modified because the Data Page
pointer bits (DP) are held in the ST0 register status and control register.
You have now initialized the ARAU for paged memory-direct addressing
with the averaging program.
* 12. Using the STEP OVER command found on the C5x VDE Toolbar, execute
the following instructions:
NOP
CLRC INTM
ZAP
Note that we did not make any ARAU circular addressing initializations.
2-64
Addressing
* 13. Using the C5x VDE, set a breakpoint at the AND instruction found within
the RECEIVE subroutine. This is one of the instructions which precedes
the TRANSMIT subroutine label.
ìè
;1 õìè ÷ Lô ;1í ø ;1ìø ïïï ø ;1ìè
$&& ö M ö
L ö í ìè ø ì ìç
* 14. Execute the RUN command found on the C5x VDE Toolbar. The RECEIVE
subroutine will be entered and executed once a sample sent by the
CODEC is received by the DSP.
The RECEIVE subroutine when property initialized for indirect, direct, and
circular addressing, computes the average of the 16 samples stored in
data memory addresses XN0 TO XN15.
Recall that the circular addressing initialization has not yet been made.
* Yes * No
2-65
Addressing
The accumulator contents do not equal the average of the contents of the
dma labeled XN0 to XN15. The RECEIVE subroutine begins calculating
the average at XN1. This is because AR0 is used (and auto-incremented
by the SACL instruction) to load XN0 with the most recent value received
from the CODEC.
However, as stated previously, the circular buffer was not initialized. When
the indirect read pointer AR0 was auto-incremented from XN15, it went to
OUTPUT.
Therefore, the averaging process added XN1 to XN15 and the dma labeled
OUTPUT. In data memory, OUTPUT is found immediately after the dma
labeled XN15.
If the circular buffer had been initialized, when the averaging process had
finished adding XN15 to the accumulator the next value to be added would
have been XN0.
In this procedure section, you will initialize the circular buffer so that the averaging
program may be used properly.
2-66
Addressing
* 16. Edit the Program Counter (PC) register to the program memory address
0x0A80 h.
Using the C5x VDE, STEP OVER the Dis-Assembly window instructions
until the instruction execution line (yellow) is over the instruction labeled
CIRCULAR.
The above instructions address indirectly data (XN0 to XN15) that must be
held in a circular buffer.
2-67
Addressing
* 17. Edit Circular Buffer 1 Start Register (CBSR1) to the first data memory
address belonging to the circular buffer (XN0).
* 18. Edit Circular Buffer 1 End Register (CBER1) to the last data memory
address belonging to the circular buffer (XN15).
* 19. Edit the Circular buffer 1 Auxiliary Register bits (CAR1) to AR0. This
makes circular buffer 1 use auxiliary register 0 (AR0) as the pointer for the
buffer elements.
Note that after making the change to the CAR1 bits the CBCR register is
also shown as modified (written in red within the C5x VDE). The Circular
buffer 1 Auxiliary Register bits (CAR1) are located within the CBCR
register.
* 20. Enable circular buffer 1 by setting the circular buffer 1 enable bit (CENB1).
Note that after making the change to the CENB1 bit the CBCR register is
also shown as modified (written in red within the C5x VDE). The circular
buffer 1 enable bit (CENB1) is located within the CBCR register.
* 21. Execute the C5x VDE RUN command. The instruction execution line
(yellow) found within the Dis-Assembly window will stop at the breakpoint
labeled INDIRECT.
* 22. Using the C5x VDE, edit AR0 to make it point towards the dma labeled
XN0.
2-68
Addressing
* 23. Using the C5x VDE, edit the content of the Auxiliary Register Pointer
(ARP). Make ARP point towards auxiliary register 0 (AR0).
You have now re-initialized the ARAU for indirect addressing with the
averaging program. The ARAU for direct addressing has been initialized.
* 24. Using the C5x VDE STEP OVER command, execute the following
instructions lines:
NOP
CLRC INTM
ZAP
* 25. Execute the RUN command found on the C5x VDE Toolbar. The RECEIVE
subroutine is entered and executed every time a sample sent by the
CODEC is received by the DSP.
ìè
;1 õìè ÷ Lô ;1í ø ;1ìø ïïï ø ;1ìè
ö M ö
Löí ìè ø ì ìç
* Yes * No
Having initialized the circular buffer the averaging operation now properly
executes.
In this procedure section, you will make the necessary initialization corrections to
the averaging program source code. You will verify, using a function generator and
an oscilloscope, if the ex2_3.asm program correctly makes a 16-point average.
2-69
Addressing
* 28. Locate within the ex2_3 source file, opened inside of an ASCII text editor,
the source statement labeled Circular:.
* 29. Make the NOP instruction a comment (by placing a semi-colon in front of
it) and remove the semi-colons from in front of the three commented SPLK
instructions.
These three added lines of source code initialize the CBSR1, CBER1 and
CBCR registers. The source statements initialize the circular buffer.
* 30. Locate within the ex2_3 source file the source statement labeled Indirect:.
* 31. Make the NOP instruction a comment (by placing a semi-colon in front of
it) and remove the semi-colon from in front of the two commented
instructions (LAR and MAR).
The added lines of source code initialize the AR0 and ARP registers.
These are the source statements that initialize indirect addressing.
* 32. Locate within the ex2_3 source file the source statement labeled Direct:
* 33. Make the NOP instruction a comment (by placing a semi-colon in front of
it) and remove the semi-colon from in front of the LDP instruction.
The added source code initializes the DP bits located within status and
control register ST0. This source statement initializes direct addressing.
* 34. Save the modified source file to your personal student folder as:
ex2_3v2.asm
* 35. Assemble the file, from within your student folder. Execute within the
student folder the following command at a DOS prompt:
c:\lv91027\bin\dsk5a.exe ex2_3v2.asm -l
* 37. Using the C5x VDE, load the ex2_3v2.dsk file into the DSP.
2-70
Addressing
* 38. Execute the RUN command found on the C5x VDE Toolbar.
2-71
Addressing
* 42. Vary the frequency and the type of function generated by the function
generator. Observe the results.
CONCLUSION
& An address is used any time that the processor is required to write an operand
to or read an operand from a location.
& Addressing is the means by which operand locations are specified to the
processor when a read or write instruction is executed.
& Certain types of addressing are meant to be used for certain specific situations
and others are restricted for use by a small processor instruction subset.
& The type of addressing used with an instruction influences program flexibility
and performance.
REVIEW QUESTIONS
1. Which of the following choices is not a location that can be addressed using an
addressing mode?
2. Assume that a certain DSP using direct paged-memory addressing has 512
data pages each containing 128 words. If the current data page is 42d (DP =
42d) then which of the following addresses can be directly addressed by an
instruction word?
a. 0047 h
b. 1530 h
c. 0095 h
d. 0986 h
2-72
Addressing
2-73
2-74
Unit Test
a. CODEC
b. Central Processing Unit (CPU)
c. Bus structure
d. Peripherals
5. Which among the following choices does not differentiate a DSP from a
general-purpose processor?
a. addressing modes
b. memory architecture(bus structure)
c. execution time of the CALU
d. processor native word width
2-75
Unit Test (cont’d)
7. Which of the following DSP characteristics permit the Multiplier and Arithmetic
Logic Unit to keep a constant arithmetic precision during computation?
8. Why do most DSPs use modified Harvard memory architectures (as opposed
to Harvard memory architectures)?
9. Which of the following statements is true? The instruction cache when used:
2-76
Unit 3
Program Execution
UNIT OBJECTIVES
Upon completion of this unit, you will be familiar with the fundamentals of DSP
program execution.
UNIT FUNDAMENTALS
The instruction set available to a DSP is executed by the Program Controller unit.
The instruction set controls such things as how data is sequenced through the
CALU and how values are read from and written to memory.
3-1
Program Execution
A must for developing efficient application code is detailed knowledge of the DSP
architecture being programmed.
This is often true when it comes to knowing which “dirty tricks” can be used to take
advantage of an architecture's strengths.
A DSP that has an orthogonal instruction set is simpler to program and optimize
than a DSP whose commands work only on specific ALU registers.
3-2
Program Execution
DSP architecture constrains the type of operations and instructions that may be
performed by the processor.
A DSP's instruction set has a profound influence on the processor's suitability for
different tasks.
Not all instructions found in one DSP have analogs in other DSPs using different
architectures. The instruction used with a given architecture must be natural and
efficient.
There are many basic similarities between the DSP architectures available. These
similarities are those that have been covered or that will be covered in this course.
These similarities are present because of the effort to provide high throughput with
key signal processing applications.
Any improvements since then have been made with incremental enhancements.
Examples of incremental enhancements that have been made to the basic Harvard
architecture are the addition of an instruction cache and a memory block accessible
3-3
Program Execution
by both the program and data bus (program/data memory). Both of these
improvements were made to the TMS320C50 DSP.
The need, in recent years, to make faster and better DSPs has permitted new
architectures to appear.
Improving the instruction execution rate is done by making modifications to the way
the Program Controller unit and the pipeline operate.
3-4
Program Execution
Newer designs such as the VLIW and Superscalar architectures use parallelism
to increase the number of instructions executed per instruction cycle.
These types of architectures when implemented within DSPs are made to execute
multiple RISC-like instructions during each clock cycle.
A RISC instruction set is small, certain instructions (those that provide similar
operations) are eliminated. The process of elimination is based upon careful
quantitative analyses leading in the end to higher performance.
3-5
Program Execution
EQUIPMENT REQUIRED
3-6
Exercise 3-1
EXERCISE OBJECTIVES
Upon completion of this exercise, you will be familiar with the function of the
hardware and software features that digital signal processors have evolved to
handle program control.
DISCUSSION
All digital signal processors, and for that matter general-purpose processors, have
a specialized unit dedicated to executing the current instruction and determining the
next instruction to execute.
Within the TMS320C50 DSP ('C50), the unit is known as the Program Controller.
The Program Controller decodes instructions, manages the pipeline, stores the
central processing unit (CPU) status, and decodes conditional operations.
3-7
The Program Controller
branch
subroutine
reset
interrupt
repeat
conditional processing
The software mechanisms listed above, though not unique to all digital signal
processors, are used for program control. By using these software mechanisms
a DSP programmer is in fact, using specialized hardware features that belong to
the Program Controller.
The Program Counter register (often abbreviated PC register) holds the program
memory address of the next instruction to be fetched and executed by the program
controller.
The content of the program counter is updated every instruction cycle. Depending
on the previous instruction executed, the surrounding hardware (the program
counter-related hardware) usually increments the program counter by one.
In certain cases, the program controller is loaded with an entirely different program
memory address.
3-8
The Program Controller
These cases occur when the previous instruction executed was a program control
instruction, such as a call, a return from subroutine, or an interrupt service
routine (ISR).
Digital signal processors use stacks to save and return address and status
information during subroutines and interrupt service routines (ISR).
A stack can consist of any memory device. Most often DSPs provide at least one
of three kinds of stack support:
A stack can consist of any memory device. Most often DSPs provide at least one
of three kinds of stack support:
shadow registers
3-9
The Program Controller
A stack can consist of any memory device. Most often DSPs provide at least one
of three kinds of stack support:
hardware stack
software stack
Every time that an interrupt is executed an interrupt context save is initiated. The
content of key DSP registers are saved to their respective backup registers
(shadow registers).
The values in the copied registers are still available to the interrupt service routine
(ISR) code but after context save they are protected while held in the shadow
registers. The shadow registers are copied back to the CPU registers when the
return from subroutine instruction is given.
Context save and restore is automatic and reduces DSP ISR overhead; The
programmer avoids including the save and restore operations as instructions in the
ISR code.
The hardware stack is used during interrupts and subroutines to save and restore
the content of the Program Counter register. The programmer usually does not
have control over the hardware stack.
The hardware stack is not used except invisibly during subroutine calls, interrupt
service routines and repeat instructions.
When a return operation occurs, the return address is retrieved from the stack
(popped from the stack) and loaded into the Program Counter.
The key advantage of a software stack over a hardware stack is that its depth can
be configured by the programmer. This can be done by simply reserving an
appropriately sized section of memory.
Hardware stacks, in contrast, are usually fairly shallow and the programmer must
carefully guard against stack overflow (by avoiding nesting of too many interrupts
or subroutines).
Which of the following types of stacks is used by DSPs during an interrupt context
save?
a. software stack
b. hardware stack
c. interrupt service routine
d. shadow register
3-10
The Program Controller
The Program Counter acts as the instruction cache that stores the instruction to be
repeated during single-instruction hardware looping.
The RePeaT Counter register holds the count on the number of times the
instruction held in the instruction cache must be repeated during single-instruction
hardware looping.
The multiple-instruction hardware looping registers used in the 'C50 (BRCR, PASR,
and PAER) are used for control and status of the hardware loop.
Example: Difference in the overhead required for a software and a hardware loop.
B = 16 RPT #16
LOOP: MAC H0, X0 MAC H0, X0
B=B-1
Branch to LOOP
The above examples implement an FIR filter. The one on the left uses a software
loop and the one on the right uses a hardware loop (done using the RPT
instruction).
The hardware loop executes the RPT instruction only once and then automatically
repeats 16 times the multiply and accumulate instruction. Hardware looping
overhead is reduced compared with software looping overhead.
3-11
The Program Controller
The microcode instruction to be repeated is loaded into the instruction cache (PC
register), and a counter (RPTC) is loaded with the value of the number of times the
instruction is to be repeated.
During the loop, the Program Counter acting as the instruction cache supplies to
the Program Controller, the instruction to be executed.
Many instructions that take two or more cycles to execute will only take one when
executed from within a hardware loop that uses an instruction cache.
All DSPs use instruction caches (that are 1-word deep) to implement single-
instruction hardware loops, however, not all DSPs use multi-word instruction
caches to implement multi-instruction hardware loops.
Multi-instruction hardware loops that don't use an instruction cache must re-read
the instructions being repeated each time the processor (Program Controller)
proceeds through the loop.
By not using an instruction cache and needing to re-read repeated instructions, the
program bus cannot be freed. This means that instructions that execute more
rapidly in single-instruction hardware loops won't in multi-instruction hardware loops
without instruction caches.
3-12
The Program Controller
Based on your current knowledge of hardware loops which of the following is true?
Hardware loops have certain limitations associated with them that are not
necessarily associated with software loops.
& The minimum and the maximum number of times a loop can be repeated for
both single- and multi-instruction loops might also be limited.
& Branch instructions typically require several instruction cycles to execute. The
processor must usually use a register to maintain the loop index, which is the
count of the number of times the instruction(s) to be repeated must still be
executed.
& The processor data path must then be used to increment or decrement the index
and test to see if the loop condition has been met.
To avoid these problems, DSP processors have evolved special hardware control
constructs that repeat either a single instruction or a group of instructions a number
of times.
3-13
The Program Controller
As stated, the primary role of the Program Controller is to determine the next
instruction to be executed. Interrupts are used to signal to a processor both external
(a push-button is pressed) and internal (a word is received through the serial port)
events.
All DSPs use interrupts and most use interrupts as their primary means of
communicating with peripherals.
An interrupt is an external event that causes the processor to stop executing its
current program and to branch to a special block of code called an interrupt service
routine (ISR).
The ISR code, once called, typically deals with the source of the data that signaled
the interrupt.
E.g., if a word is received through the serial port, an interrupt is signaled. The ISR
will execute the necessary code to process the word.
Interrupts can be disabled. In fact, this occurs during DSP initialization, ISRs, and
single-instruction hardware loops.
It is the Program Controller that disables interrupts for the duration of single-
instruction hardware loop execution. A direct consequence of this inability to access
an interrupt is that a programmer must carefully consider the maximum interrupt
lockout-time that can be accepted.
3-14
The Program Controller
Most processors, including DSPs, sample the status of the interrupt lines every
instruction cycle. The processor uses status registers to signal interrupts (once
sampled) and other information to the Program Controller.
From the previous discussion and procedure sections in this manual you have
become acquainted with a few of the status and control registers of the
TMS320C50 DSP.
Though, not all DSPs can be said to have the same number of status and control
registers, it is true that all DSPs do contain these types of registers.
Many of the registers used by the 'C50 Program Controller and CPU have
equivalent counterparts in other DSPs.
PROCEDURE
In this procedure section, you will assemble a program using individual pieces of
pre-programmed source code.
Once the source file is completed and compiled the DSP circuit will be able
to be used as an audio effects generator.
3-15
The Program Controller
The memory initialization directives (.ds and .ps) for the audio effects
generator have not been set.
* 3. Locate within the ex3_1.asm file, the CODE#1 label. Replace the label with
the following memory initialization directive:
* 4. Locate within the ex3_1.asm file, the CODE#2 label. Replace the label with
the following statement:
* 5. Locate within the ex3_1.asm file, the CODE#3 label. Replace the label with
the following statement:
* 6. Locate within the ex3_1.asm file, the CODE#5 label. Replace the label with
the following statements:
* 7. Save the source file to your personal student folder as: ex3_1v2.asm
CALL
B
RET
BCND
RPT
RPTB
RETE
IDLE
The missing statements to be added to the source file consist either of ISR
and subroutine labels or of different software program control instructions
(such as the ones shown above).
3-16
The Program Controller
The assembled program when loaded into the DSP will generate 3 audio
effects. This implies that at least three subroutines will be used by the
program.
& RINT
& XINT
& INT1#
& INT3#
Note that before a branch (B) instruction can be used, a label must exist
to refer to the location of the branch.
3-17
The Program Controller
As stated, the effects generator program uses 4 interrupts. Thus, the program
Vector Table must contain 4 source statements. Each Vector Table source
statement is a branch instruction that directs the Program Controller to execute the
respective interrupt service routines.
* 11. Within ex3_1v2.asm, replace the CODE#4 label with the following:
* 13. Replace each of the ISR CODE labels with its ISR label
3-18
The Program Controller
* 15. Replace each of the CODE labels with its subroutine label
3-19
The Program Controller
* 19. Respectively replace the labels with the following source statements:
* 21. Within ex3_1v2.asm replace the CODE#26 and CODE#32 labels with RET
instructions.
The RET instruction, as opposed to the RETE instruction, does not re-
enable interrupts when executed. Interrupts must be disabled during
initialization.
<TAB> IDLE
3-20
The Program Controller
These labels are where the AIC_2ND subroutine must pause processing
(IDLE) to wait for acknowledgment from the serial port (using the XINT
interrupt) that a word has been sent to the CODEC.
* 27. Assemble the file, from within your student directory. Execute from your
student folder the following command at a DOS prompt:
c:\lv91027\bin\dsk5a.exe ex3_1v2.asm -l
Effects Generator
In this procedure section, you will load and use the Effects Generator program.
Note: Before using the C5x VDE please make certain the circuit
board power source is turned ON, and that the serial connection
is present between the host computer and the DIGITAL SIGNAL
PROCESSOR circuit block labeled SERIAL PORT.
* 28. Make the circuit board connections shown in the figure. These are the
connections to be made to properly operate the Audio Effects Generator
program.
3-21
The Program Controller
* 30. Load the ex3_1v2.dsk program into the DSP. This file is located in your
student directory.
* 31. Execute the RUN command found on the C5x VDE Toolbar.
* 32. Talk into the microphone and familiarize yourself with the Effects
Generator program.
* 33. Halt the DSP program and close the C5x VDE session when finished using
the ex3_1v2 program.
In this procedure section, you will familiarize yourself with the basic uses of a
context save.
* 34. Open the C5x VDE and load ex3_1v2.dsk into the DSP.
* 36. Press the RUN command found on the C5x VDE Toolbar, this will execute
the DSP initialization code.
3-22
The Program Controller
XINT is signaled when a sample is transmitted from the DSP serial port
register(DXR) to the CODEC.
* 37. To force the XINT interrupt to occur, enable the XINT interrupt by editing
the IMR register to 0x0027.
* 40. Execute the RUN command found on the C5x VDE Toolbar.
The XINT interrupt vector is taken and the TRANSMIT ISR is begun. At this
point, because the ISR has been executed, key DSP registers have been
saved to their shadow registers (this process is called context save).
* 42. Using the C5x VDE STEP OVER command, execute the RETE command
to exit the ISR and return to the MAIN program loop.
3-23
The Program Controller
Note that the content of the ACC register has returned to AAAA AAAAh,
the value it had before entering the TRANSMIT ISR.
The context save process stored the contents of key CPU registers
(including ACC) before entering the subroutine and returned them on exit.
CONCLUSION
& The primary role of a Program Controller is to determine the next instruction to
be executed.
& The Program Counter register holds the program memory address of the next
instruction to be fetched by the Program Controller and executed.
& DSPs use stacks to save and return address and status information during a
subroutine or an interrupt service routine (ISR).
& Interrupts are used to signal to a processor both external and internal events.
& Most DSPs use interrupts as their primary means of communication with
peripherals.
3-24
The Program Controller
REVIEW QUESTIONS
a. branch
b. subroutine
c. repeat
d. interrupt
a. status registers
b. hardware stack
c. repeat counters
d. program bus
3. Which of the following registers holds the program memory address of the next
instruction to be fetched and executed by the Program Controller?
5. What is the process called, where DSPs use stacks to save and return address
and status information during a subroutine or an interrupt service routine?
3-25
3-26
Exercise 3-2
The Pipeline
EXERCISE OBJECTIVES
Upon completion of this exercise, you will know the advantages of having a deep
pipeline as used in DSPs. You will be familiar with the layout of a pipeline
reservation table and be able to use it to solve pipeline conflicts.
DISCUSSION
In everyday life, whenever a large task must be done rapidly it is divided into
smaller tasks and then distributed among workers. A process whose beginnings
took place in the early 20th century and which is now known as the assembly line.
By dividing the task into smaller operations and working on each operation
separately yet at the same time, the overall task to be completed is finished much
faster.
Digital Signal Processors (nearly all on the market) use the same process to
execute program instructions. The process, when used within a processor, is given
the name of pipelining.
By dividing the sequence of operations into smaller pieces, and by executing the
pieces in a pipeline, processor performance is increased. The number of
instructions executed per unit time is increased without changing the total time
required to execute an instruction.
3-27
The Pipeline
& On other processors, certain instruction sequences must be avoided for correct
operation of the program.
& Fetch
& Decode
& Read
& Execute
Because four actions are implemented per instruction word this pipeline is said to
be a 4-level pipeline.
A DSP possesses a 5-level pipeline. How many pipeline execution units are
simultaneously used during an instruction cycle?
a. 5
b. 1
c. 4
d. None of the above.
3-28
The Pipeline
Most DSPs are pipelined, pipeline depth may vary between the types of DSPs.
A processor with an ideal (no problems occur during instruction execution) N-level
pipeline, will have the number of instructions that it can execute per instruction
cycle approximately increased by a factor N, compared with the same processor
not using a pipeline.
However, processor performance begins to drop when the pipeline becomes too
large; The time required to control the pipeline execution stages becomes too large.
1 2 3 4 5 6 7 8
FETCH I1 I2
DECODE I1 I2
READ I1 I2
EXECUTE I1 I2
The figure illustrates the operation of four processor execution units when used
sequentially. The hardware associated with each phase of instruction execution is
left idle 75% of the time.
Parallel execution of the different hardware execution units is not possible in this
case.
A processor using the execution units sequentially, does not begin processing the
next instruction until the current instruction has been executed.
3-29
The Pipeline
If a clock cycle occurs every 50 ns (as it does with the 'C50 found on the DSP
circuit board), then in this case an instruction takes 200 ns to complete.
1 2 3 4 5 6 7 8
FETCH I1 I2 I3 I4 I5 I6 I7 I8
DECODE I1 I2 I3 I4 I5 I6 I7
READ I1 I2 I3 I4 I5 I6
EXECUTE I1 I2 I3 I4 I4
Because these operations (Fetch, Decode, Read, and Execute) are done in
parallel, the instruction cycle times are much shorter than they are when the
operations are executed sequentially.
A subtle point about pipelining is that an instruction may be spread out over multiple
instruction cycles, and yet still appear to the programmer to execute in one
instruction cycle.
The figure demonstrates the operation of a four-level pipeline used for single-
instruction execution.
3-30
The Pipeline
When all four stages operate in parallel as shown in the figure (no problem occurs
between the stages), the execution sequence is referred to as a perfect overlap.
There is 100% utilization of the processor Execute stage.
1 2 3 4 5 6 7 8
FETCH I1 I2 I3 I4 I5 I6 I7 I8
DECODE I1 I2 I3 I4 I5 I6 I7
READ I1 I2 I3 I3/I4 I5 I6
EXECUTE I1 I2 I3 I4 I4
No priority is given between the four stages of a pipeline. Therefore, when more
than one pipeline stage requires processing on the same resource (such as a data
bus, a memory address, or a CPU register) a pipeline conflict occurs.
A structural conflict occurs during a given instruction cycle because two or more
phases of a pipeline require the same hardware resource, such as data bus use,
register access or memory block access.
A data conflict (also called a pipeline hazard) occurs when a dependance between
instructions exists and, because of the pipeline, a data operand is not provided to
an instruction at the appropriate time.
This type of situation could occur because certain processor register modifications
could only happen during certain pipeline phases.
The 'C50 may only make modifications to the ARAU registers during the Read
phase of the pipeline. An instruction using and modifying an ARAU register and
depending on the previous instruction for the contents of the register will cause a
pipeline data conflict.
3-31
The Pipeline
Though the previously described situations are given the name of conflict, proper
program execution is not necessarily halted. In fact, to avoid resource contention,
and in the process pipeline conflicts, programmable DSPs can use three
fundamentally different techniques:
& interlocking
& time-stationary coding
& data-stationary coding
A fine line exists between the different techniques listed above. Interlocking is a
type of pipeline behavior. The pipeline reacts in certain manners when confronted
with certain situations, such as pipeline conflicts. Where as time-stationary and
data-stationary coding are programming models, that is, code formats used by a
programmer.
The TI TMS320C5x family of DSPs use interlocking, while most members of the
AT&T DSP16/16A family use time-stationary coding. The members of the AT&T
DSP32/32C family use data-stationary coding.
1 2 3 4 5 6 7 8
FETCH
DECODE
READ
EXECUTE
Interlocks are not always easy to spot. Pipeline operation and interlocks can be
visualized using a reservation table.
The columns are divided between the many instruction cycles that are executed by
the DSP. The operations held in a column occur at the same time; Progression
through time is from left to right.
The four rows correspond to each of the four pipeline stages. The reservation table
for a 5-level pipeline would have five rows.
3-32
The Pipeline
1 2 3 4 5 6 7 8
FETCH I1 I2 I3 I4 I5 I5 I6 I8
DECODE I1 I2 I3 I4 I4 I5 I8
READ I1 I2 I3 I3 I4 I5
EXECUTE I1 I2 I3 NOP I4
Consequently, instruction 4 (I4) was delayed (interlocked) allowing time for the I3
instruction to complete its READ stage. A NOP (No OPeration) instruction is
executed where the now delayed I4 instruction was supposed to have executed.
An interlock always incurs a one cycle penalty. Thus, instruction execution time on
a processor with an interlocking pipeline may vary depending on the instructions
found before and after it in the program.
& The programmer should not be bothered with the internal timing or parallelism
of a processor architecture.
Processors that use interlocking to avoid resource contention have a pipeline that
is essentially invisible to the programmer. However, in certain cases such as the
pipeline data conflict (or pipeline hazard) the pipeline is no longer transparent.
The 'C50 pipeline is essentially invisible to the programmer except in the following
cases: auxiliary register updates, memory-mapped accesses of the CPU registers,
the NORM instruction, and memory configuration commands.
3-33
The Pipeline
1 2 3 4 5 6 7 8
When an interrupt occurs, almost all processors allow instructions at the decode
stage or further in the pipeline to finish executing because these instructions may
be partially executed.
& One cycle after the interrupt is recognized the processor inserts an INTR
instruction into the pipeline.
& The INTR causes a four-instruction delay before the first word of the interrupt
vector (ISR1 in the diagram) is executed.
& The instructions (I1, I2, I3) that were at or past the decode stage in the pipeline
when the interrupt was recognized are allowed to finish their execution.
PROCEDURE
In this procedure section, you will complete the pipeline reservation tables for two
different instruction sets.
* 2. Briefly familiarize yourself with the description and contents of the source
file.
3-34
The Pipeline
SAMM DXR
(store ACC to DXR)
LT INDX
(load TREG0 with INDX)
MPY #0Ah
(multiply 0Ah with TREG0 and store to PREG)
LDP #TEMP
(DP = memory data page belonging to TEMP)
SPL TEMP
(store low PREG to TEMP)
Each of these instructions must pass through the DSP pipeline stages.
1 2 3 4 5 6 7 8
* 4. Complete the above reservation table for the instructions that follow the
PIPELINE1 label by answering these question.
Which of the following choices is the correct content of table cell A1?
a. LDP
b. #0Ah
c. INDX
d. MPY
1 2 3 4 5 6 7 8
3-35
The Pipeline
Which of the following choices is the correct content of table cell A2?
a. NOP
b. SAMM
c. LT
d. SPL
1 2 3 4 5 6 7 8
Which of the following choices is the correct content of table cell A3?
a. MPY
b. LT
c. #0Ah
d. #TEMP
1 2 3 4 5 6 7 8
You have completed the 4-level pipeline reservation table for the one-word
instructions found after the pma labeled PIPELINE1.
ADD #8192,0
(ACC = ACC + 8192 h)
3-36
The Pipeline
RPT #4
(hardware repeat SFR 5 times)
SFR
(shift the accumulator one bit right)
1 2 3 4 5 6 7 8
* 6. Complete the above reservation table for the instructions that follow the
PIPELINE2 label by answering these question.
Which of the following choices is the correct content of table cell B1?
a. NOP
b. #4
c. #8192
d. ADD
1 2 3 4 5 6 7 8
Which of the following choices is the correct content of table cell B2?
a. RPT
b. #4
c. NOP
d. SFR
3-37
The Pipeline
1 2 3 4 5 6 7 8
Which of the following choices is the correct content of table cell B3?
a. SFR
b. NOP
c. [no instructions are fetched]
d. ADD
1 2 3 4 5 6 7 8
You have completed the 4-level pipeline reservation table for a two-word
instruction and a repeat loop.
In this procedure section, you will identify and correct the source statements
causing a pipeline conflict.
* 8. Connect the ANALOG OUTPUT of the CODEC to the INPUT of the AUDIO
AMPLIFIER and to an oscilloscope input channel.
Note: Before using the C5x VDE please make certain the circuit
board power source is turned ON, and that the serial connection
is present between the host computer and the DIGITAL SIGNAL
PROCESSOR circuit block labeled SERIAL PORT.
* 9. Open the C5x VDE, and load the ex3_2.dsk program file into the DSP.
3-38
The Pipeline
* 10. Execute the RUN command found on the C5x VDE Toolbar.
* 11. Use the AUDIO AMPLIFIER GAIN potentiometer to adjust the volume level
of the generated signal.
* 12. Observe that the sound of the generated sinusoidal signal has an added
noise component.
* 13. Using the potentiometer of the DC SOURCE adjust the frequency of the
generated signal to approximately 500 Hz.
* 15. Observe that the DSP generated sinusoidal signal is not a perfect
sinusoidal waveform but contains some higher frequency components.
* 16. Halt the ex3_2.dsk program and close the C5x VDE session.
3-39
The Pipeline
SAMM AR1
(Store ACC to AR1)
LACC *0+,2,AR1
(Load shifted dma into ACC)
1 2 3 4 5
LACC
DECODE ... SAMM ... ...
AR1=AR1+1
SAMM
EXECUTE ... ... ... LACC
AR1 = ACC
* 18. Observe the above pipeline reservation table describing the execution
sequence for the instructions located after the CONFLICT1 label.
Answer the following question and remember that an ARAU update always
occurs during the Decode stage of the pipeline (the AR1 register is part of
the ARAU).
Which of the following choices explains how the pipeline conflict occurred
at the pma labeled CONFLICT1?
3-40
The Pipeline
1 2 3 4 5
LACC
DECODE ... SAMM ... ...
AR1=AR1+1
SAMM
EXECUTE ... ... ... LACC
AR1 = ACC
A pipeline data conflict occurs when the AR1 register is used by the LACC
and SAMM instructions.
* 19. What solution should be used to correct for the pipeline conflict?
1 2 3 4 5
LACC
DECODE ... SAMM NOP NOP
AR1=AR1+1
SAMM
EXECUTE ... ... ... NOP
AR1 = ACC
* 20. Use the ASCII text editor to make the suggested modification to
ex3_2.asm, and thus correct the pipeline conflict.
That is, add two NOP instructions after the SAMM instruction labeled
CONFLICT1.
3-41
The Pipeline
ex3_2v2.asm
* 22. Assemble the file, from within your student directory. Execute from your
student folder the following command at a DOS prompt:
c:\lv91027\bin\dsk5a.exe ex3_2v2.asm -l
* 23. Make certain the connections shown in the figure are still present on your
DIGITAL SIGNAL PROCESSOR circuit board and between the board and
an oscilloscope.
* 24. Open the C5x VDE, and load the ex3_2v2.dsk program file into the DSP.
* 25. Execute the RUN command found on the C5x VDE Toolbar.
* 26. Observe that this time the sound from the Sine Wave Generator does not
have an added noise component.
3-42
The Pipeline
* 27. Using the potentiometer of the DC SOURCE vary the frequency of the
generated signal to approximately 500 Hz. Adjust the oscilloscope to
trigger on the input signal.
* 28. Observe that the DSP generated sinusoidal signal now has a sinusoidal
waveform.
* 29. Halt the program and close the C5x VDE session.
3-43
The Pipeline
* 30. Disconnect all connections made on the circuit board and between the
board and the oscilloscope.
CONCLUSION
& No priority is given between the four stages of a pipeline. When more than one
pipeline stage requires processing on the same resource a pipeline conflict
occurs.
& Pipeline conflicts can be categorized into three different types: structural
conflicts, data conflicts, and control conflicts.
& Interlocks and pipeline operation can be visualized using a reservation table.
REVIEW QUESTIONS
1. A DSP has a 5-level pipeline. How many instruction cycles does it take to
execute an instruction?
a. 10
b. 4
c. 1
d. 5
2. Which of the following is not a type of pipeline conflict that occurs with pipelined
processors?
a. Interlock conflict
b. Structural conflict
c. Data conflict
d. Control conflict
3-44
The Pipeline
3-45
3-46
Unit Test
a. program control
b. pipeline
c. VLIW architecture
d. Program Controller
a. Program counter
b. CALU
c. Pipeline
d. Program Controller unit
5. What hardware feature do DSPs use to save and restore address and status
information during a subroutine or an interrupt service routine (ISR)?
a. context save
b. registers
c. stacks
d. hardware repeat loop
3-47
Unit Test (cont’d)
6. The time taken by a processor for operations that do not belong to a user's task
is the definition of what term? The operations usually consist of the allocation
of resources for the execution of the next instruction.
a. overhead.
b. nesting.
c. interlocking.
d. conditional processing.
7. A DSP possesses a 4-level pipeline. How many instructions are fetched per
instruction cycle?
a. 1 instruction
b. 2 instructions
c. 4 instructions
d. 1 instruction is fetched every four instruction cycles.
a. Program Controller
b. context saves
c. pipeline conflicts
d. interrupts
9. Interlocks, pipeline operation, and how processor resources are used over time
can be visualized using which of the following?
a. reservation table
b. computer
c. debugger
d. VLIW
a. RPT instruction
b. instruction cache
c. zero-overhead hardware looping circuitry
d. program counter-related hardware
3-48
Unit 4
Basic I/O
UNIT OBJECTIVES
Upon completion of this unit, you will have a sound understanding of the methods
used by a DSP for off-chip communication.
UNIT FUNDAMENTALS
Most signal processing applications require the DSP to communicate with the
external (analog) world. In these cases, data must usually be input to and/or output
from the DSP.
4-1
Basic I/O
For any given DSP in existence today, features such as arithmetic performance,
memory bandwidth, addressing modes, execution control, and instruction set
orthogonality were carefully evaluated before a final design was done.
The listed peripherals, when integrated into DSPs, were designed to operate even
when the CPU is in power down mode (idle).
4-2
Basic I/O
Actual communication between the digital signal processor and off-chip circuitry,
devices, or peripherals is made via its input and output pins.
The pins are located on the exterior surface of the DSP integrated circuit package.
Each pin corresponds to the output or input of a key processor signal.
Most DSP package pins are used for external memory interfaces (serial and
parallel port interfaces), some are dedicated to output and input of clock signals,
others still (such as the external interrupt lines or the reset pin), allow for external
devices to assert processor states.
For most DSPs, transmission and reception of data words is done using serial and
parallel ports. The serial and parallel interfaces are found among the DSP
package input and output pins.
4-3
Basic I/O
A serial interface transmits and receives data one bit at a time. Parallel interfaces
send or receive entire data words (8, 16, or 32 bits long) one at a time. To do this
a parallel port has a data line for each bit sent.
Parallel ports transmit more bits per second, however they require more external
interface pins than serial ports.
A parallel port is used to transmit and receive data words from external hardware
such as an off-chip memory block, a DIP switch, or a display unit.
The serial port interface can be used for various applications, such as:
& Transmission and reception of data samples from a codec, an analog to digital
(A/D), or a digital to analog (D/A) converter.
4-4
Basic I/O
Parallelism not only refers to the concurrent execution of multiple instructions it can
also refer to parallel processing.
Parallel processing is when two or more digital signal processor chips are used for
a given application (multi-processing) and are connected through a shared serial
line, allowing inter-processor communication.
One chip is assigned as the master and the others as the slaves. Many applications
have stringent real-time constraints that require multiple DSPs to be used in
concert.
4-5
Basic I/O
TDM is used to manage communication between the processors over the shared
serial line.
In a TDM network, time is divided into time slots. Each time slot is associated with
a different processor. During a given time slot the associated processor may
transmit data, the others must receive and may not transmit.
The destination for the transmitted data word can be included in the data word or
it can be sent via a secondary data line in parallel with the data word. Either
approach can be used.
The DSP peripherals and their interfaces are important to, and answer the needs
of digital signal processor tasks.
EQUIPMENT REQUIRED
4-6
Exercise 4-1
DSP Peripherals
EXERCISE OBJECTIVES
Upon completion of this exercise, you will be familiar with the specialized
peripherals used by DSPs.
DISCUSSION
The peripherals found on the TMS320C50 ('C50) DSP are good examples of the
types used on many existing programmable DSPs.
The 'C50 used by the Lab-Volt Digital Signal Processor FACET board, has many
of its on-chip peripherals communicating with devices on the circuit board.
The 'C50 parallel port is used to communicate with the ROM memory chip, the 4
I/O INTERFACE displays, and the DIP switch.
The 'C50 master clock is externally generated by the oscillator and input to the
DSP.
Two of the 'C50 external user interrupt lines are each connected to a push-button
on the surface of the FACET circuit board.
4-7
DSP Peripherals
A DSP serial port is usually divided into two sections: a receive section and a
transmit section.
In other DSPs, independent receive and transmit data pins exist, however, the
frame synchronization and bit clock lines are shared between the two sections.
The 'C50 serial port operates through three memory-mapped user registers (known
as DRR, DXR, and SPC).
DRR (Data Receive Register) is where words received through the serial port
receive data pin are stored. DRR is memory-mapped to data memory address
(dma) 20h.
DXR (Data transmit Register) is where words to be transmitted through the serial
transmit data pin are stored. Once a word is stored in DXR the serial port transmit
circuitry takes charge of transmitting the word. DXR is memory-mapped to dma
21h.
SPC (Serial Port Control) is a status and mode control register for the DSP serial
port.
4-8
DSP Peripherals
Through the memory-mapped Serial Port Control register (SPC) the 'C50 DSP
allows the programmer to specify the serial port transmit and receive
characteristics.
The bit clock line polarity, the shift direction, data word length, and whether the
frame synchronization signals are bit-length or word-length are serial port
characteristics that certain DSP chips may permit the programmer to configure.
The main processor data bus can be used as the parallel port or the parallel port
can be made separate from the processor external bus interface.
Processors separating the parallel port from the external bus interface simplify
interfacing to external devices.
4-9
DSP Peripherals
DSPs which use their data bus as a parallel port typically reserve a special section
of their address space for access to off-chip devices.
In some DSPs, the reserved memory addresses are accessed with specialized
instructions. The 'C50 has two such parallel port instructions (IN and OUT).
The 'C50 IN and OUT instructions are used to read and write a data word from and
to an external I/O port.
I/O port accesses are distinguished from program and data accesses by a
designated strobe or handshake pin. The pin is asserted when the external read or
write is performed.
A clock signal consists of a square wave at some known frequency. The highest
frequency clock signal within a processor is known as the master clock.
The master clock signal can be generated in many different ways. The choice of
which way to generate the clock signal can usually be configured by the
programmer.
Some DSPs provide a timer output pin. A square wave at the timer frequency can
be output from the pin providing a software-controlled oscillator to a programmer.
4-10
DSP Peripherals
The 'C50 timer as used on the FACET circuit board outputs a square wave at a
frequency of 10 MHz. The signal is input to the codec as its master clock.
Recall that in exercise 2 of Unit 2, we had used the timer of the 'C50 DSP. The
timer register (TIM) was read at the beginning and at the end of an algorithm. In this
manner the duration of the algorithm was measured.
The clock source usually consists of the DSP master clock signal.
The TMS320C50 DSP found on the FACET circuit board uses external user
interrupts. In fact, most DSPs provide this type of peripheral.
4-11
DSP Peripherals
The TMS320C50 DSP found on the FACET circuit board uses bit I/O ports (two
of them, BIO# and XF), also known in certain DSPs as general-purpose I/O pins,
to establish communication between the C5x VDE and the DSP.
Bit I/O ports are software controlled. In this particular application (communication
between the C5x VDE and the DSP), software control of the BIO# and XF I/O ports
is done with a communication program (kernel) held in off-chip ROM.
The software is first run by the DSP when communication between the C5x VDE
debugger program and the DSP is attempted.
Though signal processing can be entirely done with digital signals, most often a
conversion from analog-to-digital and back again is required.
4-12
DSP Peripherals
As stated, 'C50 communication with the codec is established through the DSP
receive and transmit serial interfaces. The dual serial communication can only be
implemented after both the serial DSP peripherals and the codec are initialized.
PROCEDURE
Timer Initialization
In this procedure section, you will initialize the timer of the TMS320C50 DSP.
Note: Before using the C5x VDE please make certain the circuit board
power source is turned ON, and that the serial connection is present
between the host computer and the DIGITAL SIGNAL PROCESSOR
circuit block labeled SERIAL PORT.
* 1. Open the C5x VDE, and load the ex4_1.dsk program file into the DSP.
4-13
DSP Peripherals
* 4. Press the C5x VDE RUN command, this will execute the DSP initialization
code preceding the CODEC initialization subroutine (CODECINIT).
The timer initialization code sets the timer to generate a 10 MHz signal that is
output on a DSP package pin (TOUT). Note that the TOUT package pin is part of
the second AUXILIARY I/O header.
* 6. Press the C5x VDE RUN command, this will execute the code that
initializes the DSP timer to generate a 10 MHz signal.
In particular it is the PRD (timer PeRioD register) and TCR (Timer Control
Register) that are initialized. As previously stated these two registers
control the timer for the C50 DSP.
4-14
DSP Peripherals
It is the TOUT signal that is sent to the codec. The 10 MHz signal is used
by the CODEC as the master clock. The timer can, however, be
reconfigured to generate a signal with a different frequency.
The timer frequency is related to the PRD and TCR registers as follows:
ì
7 ö
7 0& # õ7''5 ø ìô # õ35' ø ìô
4-15
DSP Peripherals
* 11. Using the C5x VDE, edit the PRD peripheral register back to 0001 h. Verify
that the TOUT signal now has a frequency of 10 MHz.
* 12. Using the C5x VDE, edit the TDDR bits (found in the TCR register) to 3.
* 13. Using the C5x VDE, edit the TCR peripheral register back to 0000 h. Verify
that the TOUT signal now has a frequency of 10 MHz.
The PRD register value is related to the TOUT frequency and the TCR
register as follows:
ì
35' ö ÷ì
I7287 # 70& # õ7''5 ø ìô
* 14. Holding the TDDR bits constant at zero and using the above equation,
what decimal value would the PRD register have to take on for the timer
to have a frequency of 1.25 MHz?
PRD =__________ d
* 15. Using the C5x VDE, edit the PRD register to 000F h (15 d). Verify if the
TOUT signal has a 1.25 MHz frequency.
* 16. Using the C5x VDE, return the values of the PRD and the TCR registers
respectively to 0001 h and 0000 h. The timer will then have a frequency of
10 MHz, this is the required codec master clock frequency to be used in
the next section.
* 17. Remove the connection between the DSP circuit board and the
oscilloscope.
4-16
DSP Peripherals
In this procedure section, you will initialize the CODEC using the DSP serial
interface.
* 18. Using the C5x VDE, set a breakpoint at program memory address labeled
RESET.
* 19. Press the C5x VDE RUN command, this will initialize the DSP serial port
and establish communication between the DSP and the CODEC.
Using the SPC (Serial Port Control) register, the DSP transmit and receive
serial ports have been enabled and initialized to use 16-bit words. Transmit
frame synchronization has also been enabled.
4-17
DSP Peripherals
* 20. Using the C5x VDE, set a breakpoint at the program memory address of
the instruction labeled LACC #6h,9.
* 21. Press the C5x VDE RUN command, this will reset the CODEC.
The reset must occur before the control words initializing the codec
sampling frequency, filter cut-off frequency and other codec characteristics
can be transmitted from the DSP to the codec.
* 22. Using the C5x VDE, set a breakpoint at the program memory address
labeled RETURN.
* 23. Press the C5x VDE RUN command, the DSP will then transmit the codec
control words (TA, TB, RA, RB, and AIC_CTR).
4-18
DSP Peripherals
It is these words that set the CODEC sampling and cut-off frequencies, and
other characteristics.
* 24. Connect the OUTPUT of the DC SOURCE to the ANALOG INPUT of the
CODEC circuit block and to a voltmeter (as shown in the figure).
* 25. Execute the RUN command, found on the C5x VDE Toolbar.
This starts the voltmeter program. Note that the I/O interface displays a
value. This is the voltage reading of the signal sent from the DC SOURCE
OUTPUT.
* 26. Change the voltage level of the DC signal by turning the DC SOURCE
potentiometer.
The signal sent to the codec is sampled and transmitted via the serial port
to the DSP. The DSP then converts the value into its proportional voltage
value (the program can only read values between -3.00 V and 2.99 V) and
outputs it to the I/O INTERFACE via the parallel I/O ports.
* 27. Note that the value read off of the voltmeter and the value read by the DSP
do not always correspond.
4-19
DSP Peripherals
What type of communication is used between the CODEC and the DSP on
the Digital Signal Processor circuit board?
a. parallel ports
b. bit I/O ports
c. serial ports
d. none of the above
How are words transmitted by the DSP to the I/O INTERFACE displays?
a. serial ports
b. parallel ports
c. bit I/O ports
d. interrupts
* 28. Remove the connection between the DC SOURCE circuit block and the
voltmeter.
CONCLUSION
& A DSP provides many peripherals, which are specialized for signal processing
applications.
& A DSP serial port is usually divided into two sections: a receive section and a
transmit section.
REVIEW QUESTIONS
a. parallel port
b. serial port
c. central arithmetic logic unit
d. bit I/O port
4-20
DSP Peripherals
4. The timer found in most DSPs, may be used as which of the following?
5. Which of the following serial port characteristics may some DSPs permit the
programmer to configure?
4-21
4-22
Exercise 4-2
EXERCISE OBJECTIVES
Upon completion of this exercise, you will be familiar with a common DSP
application, known as filtering.
DISCUSSION
Signals are received and transmitted by both digital and analog processing
systems. The effect of system processing on a signal can be visualized (analyzed)
in two different ways, either using the time domain or the frequency domain.
4-23
Digital Signal Processing: The FIR Filter
More complex, and in turn, more common signals such as the square wave, can
be represented as a superposition of many harmonically related sinusoids.
Certain electrical circuits thus have the effect of attenuating or amplifying the
frequency components of signals.
4-24
Digital Signal Processing: The FIR Filter
The fact that electrical circuits produce a gain or loss that is proportional to signal
frequency is exploited when creating filters.
4-25
Digital Signal Processing: The FIR Filter
& low-pass
& high-pass
& band-pass (notch)
The above graphs are named Bode plots, or also generally known as filter
frequency responses.
From a frequency response graph (Bode plot) many filter properties and
characteristics are apparent.
The pass band of a filter is defined as the range of frequencies over which signals
pass virtually unattenuated through the filter.
The cut-off frequency is known as the point where the response of the pass band
drops by 3 dB.
The transition region is the area between the pass band and the stop band. A large
gain rate of change with frequency within this region usually improves filter
performance (depending on the specific application).
The stop band is defined as the range of frequencies where signals pass through
the circuit and are attenuated. The level of attenuation is dependant on the filter
design specifications.
Many other characteristics belong to a filter, however, describing the full extent of
them would require a more in depth discussion.
For the moment, it is important to remember that a filter creates a variation of signal
gain with frequency.
4-26
Digital Signal Processing: The FIR Filter
The effects, once produced only by analog circuits, are able to be mathematically
represented and efficiently performed by processors. Until recently, the most
common application for the DSP was the filter (the digital filter).
DSPs are preferred over analog circuits when implementing such things as filters.
This is because digitizing any design ensures that the same results can be
reproduced time and time again.
Digital filters are implemented with the summation, of filter coefficient (Ai) and data
sample (Sj) multiplications.
The coefficients are stored in memory and they represent the filter frequency
response information. Usually, the more coefficients used to represent a filter the
smaller the transition region of the filter.
The operation shown above can be executed with the aid of the Multiply and
ACcumulate (MAC) instruction, found on nearly all DSPs. Traditionally, DSPs had
been specifically enhanced to implement filter operations in real-time.
The MAC instruction was and still is the cornerstone of the filter operation.
RPT #79
MACD #C0,*-
APAC
The TMS320C50 DSP requires only three source code instructions to properly
implement the mathematical part of a filter algorithm.
4-27
Digital Signal Processing: The FIR Filter
The RPT source statement instructs the DSP to loop the following instruction 80
times (79+1).
The APAC source statement finishes the calculation of the filtered signal sample
by adding the accumulator and product register together.
PROCEDURE
Note: Before using the C5x VDE please make certain the circuit
board power source is turned ON, and that the serial connection
is present between the host computer and the DIGITAL SIGNAL
PROCESSOR circuit block labeled SERIAL PORT.
4-28
Digital Signal Processing: The FIR Filter
* 3. Open the C5x VDE and load the ex4_2.dsk filter program into the DSP.
* 5. Adjust the oscilloscope to display both signals (CH1 and CH2) on the
oscilloscope, and trigger on the generated signal (CH1).
Note that the CH2 signal output from the DSP circuit board, has a smaller
amplitude than the signal input into the DSP circuit board (the generated
signal, CH1).
* 6. Slowly vary the generated signal frequency between 200 Hz and 9000 Hz.
Observe the effect of the frequency variation on the amplitude of the DSP
filter output (oscilloscope CH2).
a. a high-pass filter
b. a comb filter
c. a band-pass filter
d. a low-pass filter
In this procedure section, you will measure the frequency response of a band-pass
filter.
4-29
Digital Signal Processing: The FIR Filter
* 7. Using graph paper and a pencil, or a spreadsheet program, draw (plot) the
frequency response, between 500 Hz and 2500 Hz, of the filter.
Measure the Amplitue (voltage) of the filtered output signal every 100 Hz,
using a voltmeter. Begin at 500 Hz and end at 2500 Hz. Record the data
sets and then plot the data points.
The frequency response that you have drawn is similar to the above plot.
In this plot the logarithmic decibel (dB) amplitude scale is used. The
decibel scale highlights the filter gains and losses.
4-30
Digital Signal Processing: The FIR Filter
* 9. Execute the Load Data command found under the C5x VDE File pull-down
menu. Load the low_pass.dat file into the DSP with the options shown
above.
* 10. Press the C5x VDE RUN command.Vary the frequency of the generated
signal. Observe the frequency response of the filter.
* 11. Disconnect the oscilloscope, voltmeter, and function generator from the
DSP circuit board and end the C5x VDE session.
CONCLUSION
& Filters can be divided into three different types, low-pass, band-pass(notch),
and high-pass filters.
& The Multiply and ACcumulate instruction, MAC, is the cornerstone of the DSP
implemented filter.
4-31
Digital Signal Processing: The FIR Filter
REVIEW QUESTIONS
a. phase domain
b. frequency domain
c. time domain
d. spectrum domain
2. Which of the following types of filters attenuates all signal frequencies above
a certain range and, lets passes DC signals unsuppressed?
a. low-pass filter
b. comb filter
c. band-pass filter
d. high-pass filter
a. signal gain
b. signal gain with frequency
c. signal frequency
d. signal frequency with gain
5. Why are DSP implemented filters preferred over analog filtering circuits?
4-32
Unit Test
1. Which of the following choices is not a specialized peripheral used for digital
signal processing?
2. A DSP serial port can transmit 16-bit words. How many lines are required to
establish serial port communication between the DSP and an off-chip device?
a. 16
b. 17
c. 3
d. 18
3. A DSP parallel port can transmit 4-bit words. How many lines are required to
establish parallel port communication between the DSP and an off-chip device?
a. 17
b. 4
c. 3
d. 5
4. How is actual communication between a digital signal processor and its off-chip
circuitry devices made?
a. a bode plot.
b. a superposition of numerous different frequency responses.
c. a band pass filter.
d. none if the above.
4-33
Unit Test
7. A bode plot is used to visualize the way that a filter modifies a signal. In what
domain is a bode plot drawn?
a. frequency domain
b. time domain
c. phase domain
d. voltage domain
9. What is the difference between the peripherals offered by a DSP and those
offered by a general-purpose processor?
4-34
Appendix A
Help Pages
A-1
Help Pages
A-2
Help Pages
ASSEMBLER DESCRIPTION
MNEMONIC
For more information about the different assembler mnemonics that execute ALU
operations please refer yourself to C:\LV31946\DOC\TMS320C5x_UsersGuide.pdf.
Within the TMS320C50 the Sign eXtension Mode bit (SXM) is found in status and
control register ST1.
A-3
Help Pages
The Data Bus has a native word width of 16 bits, it cannot transmit information that
has more than 16 bits.
The ACCumulator of the 'C50 is 32 bits wide and is split into two 16-bit segments:
the ACCumulator High (ACCH) bits and the ACCumulator Low (ACCL) bits.
The segments are used when the ACCumulator is sent to the next stage of
processing via the 16-bit Data Bus.
ASSEMBLER DESCRIPTION
MNEMONIC
For more information about the different assembler mnemonics that execute
multiplier operations please refer yourself to C:\LV31946\DOC\TMS320C5x_Users
Guide.pdf.
A-4
Help Pages
The 'C50 has an unsigned multiplication instruction, MPYU, that when executed
does not sign extend the result of the multiplication.
Within the TMS320C50 the Overflow Saturation Mode bits are found in status and
control register ST0.
A-5
Help Pages
The TMS320C50 has a product shifter. The shifter has four product shift modes
(PM) that are controlled by a 2-bit field in status register ST1.
A-6
Help Pages
Within the TMS320C50, the carry bit is found in status and control register ST1.
The status and control register ST0 contains a 3-bit field known as ARP, Auxiliary
Register Pointer. The ARP field assigns the current auxiliary register to the AGU.
The AGU uses the auxiliary register when indirect or circular addressing must be
executed.
A-7
Help Pages
The current auxiliary register pointed to is referred by the Auxiliary Register Pointer
(ARP). One of three methods within the 'C50 Assembly language can be used to
load the contents of the ARP field (3 bits wide).
A first option is to use the LST instruction. It is specifically designed to modify the
content of STatus registers ST0 or ST1. The ARP field is located within ST0.
The ARP field can also be loaded by either using the MAR instruction or by
specifying an ARP update within the indirect addressing field of any instruction
supporting indirect addressing.
For more information refer yourself to the TMS320C50 DSP User’s Guide
(C:\LV31946\TMS320C5x_UsersGuide.pdf).
The 'C50 assembly language offers only one way of loading an auxiliary register
and that is with the LAR (Load Auxiliary Register) instruction.
An auxiliary register must be loaded with a memory address before being used for
indirect addressing.
Once loaded with a value, the content of the AR can be changed with one of four
ways. By either using the ADRK, MAR or SBRK instructions, or by specifying an
increment or decrement within the indirect addressing field of any instruction
supporting indirect addressing.
For more information refer yourself to the TMS320C50 DSP User’s Guide
(C:\LV31946\TMS320C5x_UsersGuide.pdf).
A-8
Help Pages
The status and control register ST0 has a 9-bit field known as DP. DP is used as
a Data Page pointer. Data memory is divided up into 512 data pages. The DP
points to the current data page. When direct addressing is used the 9 DP bits serve
as the 9 MSBs of the data address fetched.
A-9
Help Pages
Within the left highlight are indirect addressing symbols indicating to the ARAU to
increment AR0 by one after the instruction is executed.
Within the right are the auxiliary registers used by the instructions and pointing
towards the operand data memory addresses.
A-10
Help Pages
An assembler source file for the TMS320C50 DSP has a certain format that must
be respected.
All comments (except those to the right of source statements) and labels must be
written at the beginning of a line.
All source statements must have a spacing (a tab) between the instruction and the
beginning of the line.
As well, a spacing must be present between the instruction and the operand.
Make certain your assembler file does not have a comment or source statement
that has been badly written. Verify the requirements for correct assembly as listed
below.
Make certain that you have not replaced a CODE#X label by an incorrect source
statement.
A-11
Help Pages
An assembler source file for the TMS320C50 DSP has a certain format that must
be respected.
All comments (except those to the right of source statements) and labels must be
written at the beginning of a line.
All source statements must have a spacing (a tab) between the instruction and the
beginning of the line.
As well, a spacing must be present between the instruction and the operand.
ASSEMBLER DESCRIPTION
MNEMONIC
RET,RETE Return from subroutine. Load the PC with the pma saved to
the stack before the subroutine or interrupt call.
A-12
Help Pages
For more information about the different mnemonics that modify the Program
Counter register please refer yourself to C:\LV31946\TMS320C5x_UsersGuide.pdf.
ASSEMBLER DESCRIPTION
MNEMONIC
RPTZ Similar to RPT, however, the ACC and PREG are first cleared.
For more information about the different types of 'C50 repeat instructions please
refer yourself to C:\LV31946\TMS320C5x_UsersGuide.pdf.
ASSEMBLER DESCRIPTION
MNEMONIC
MAC Add PREG, with shift specified by PM bits, to ACC; load dma value
to TREG0; multiply data memory value by program memory value
and store the result in the PREG; and move data.
For more information about the different types of 'C50 repeat instructions please
refer yourself to C:\LV31946\TMS320C5x_UsersGuide.pdf.
A-13
Help Pages
A-14
Help Pages
A-15
Help Pages
Use the TRANSMIT ISR as an example. The TRANSMIT ISR already has the
correct label.
Recall that the Sine Wave Generator uses a data table containing 1924 values. The
data table represents one period of a sine wave. When the data table values are
sent to the CODEC at a rate equal to the sampling frequency (Fs = 19380 Hz), a
10 Hz ((19380 Hz)/1924) sine wave signal is generated. To increase the frequency
of the generated sine wave, the program sends every second, third, fourth or nth
value of the data table to the CODEC. A data table pointer is used to easily modify
the frequency of the generated signal. The generated signal will only have a
frequency of a multiple of 10 Hz.
This method of generating a sine wave is fast and relatively simple. It is, however,
not overly accurate and it requires a significant amount of DSP memory used to
store the wavetable.
A-16
Help Pages
The 'C50 SPC register permits some of the following serial port characteristics to
be changed by the programmer:
TXM configures whether the transmit frame sync.line acts as input or an output.
FSM specifies whether frame synchronization pulses are required after the initial
frame sync.pulse for serial port operation.
FO specifies the word length of the serial port transmitter and receiver.
A-17
Help Pages
The 'C50 64K parallel I/O ports. Sixteen of the 64K I/O ports are memory-mapped
in data page 0 (50 h - 5F h). You can access the I/O ports using the IN and OUT
instructions or any instruction which writes or reads a word to a location in data
memory space.
Depending on the logical status of the CLKMD1 and CLKMD2 external 'C50
package pins the mode of the clock generator can be changed.
Timer operation is controlled via the 'C50 DSP Timer Control Register (TCR).
A-18
Help Pages
The TDDR bits are located in the TCR (Timer Control Register).
The TDDR bits and the value of the PRD register set the DSP timer frequency.
A-19
A-20
Appendix B
New Terms and Words
access – To access memory is the action of reading the value held within a certain
memory location or of storing a value to a certain memory location.
ADD *+, 0, AR0 – The 'C50 ADD instruction is used to add the operand to the
accumulator. In this case, the ADD instruction is being repeated, and is indirectly
addressing the 16 most-recently received samples. This in effect adds the samples
together (an operation required by the averaging process).
ANTI-ALIASING FILTER – Low pass filter designed to remove, from the input
signal the high frequency components that degrade the analog-to-digital conversion
of the output signal.
ARAU – ARAU stands for Auxiliary Register Arithmetic Unit, it is the unit within a
DSP that is responsible for addressing.
architecture – Architecture is a term applied to the overall structure and the logical
interrelationships of the components of a processor (or of a computer, a network)
and its software. Processor architecture can be divided into five fundamental
components: input/output, storage, communication, control, and processing.
Arithmetic Logic Unit (ALU) – That part of a processor that performs arithmetic
(addition, subtraction) and logic (AND, OR, ...) operations.
BCND – The assembler mnemonic for the TMS320C50 DSP branch conditional
instruction.
B-1
New Terms and Words
binary point – The character, in binary notation, that separates the integral part of
a numerical expression from its fractional part.
bit clock – The signal that the serial interface uses to determine when data bits are
valid.
bit I/O ports – An I/O port in which each bit can be individually configured to an
input or an output and in which each bit can be independently read or written. A
processor must pole the bit I/O port to determine if input values have changed.
bus – Bus is a transmission path for the signals sent between processor devices.
CALL – The assembler mnemonic for the TMS320C50 DSP subroutine call
instruction.
circular buffers – A section of memory used as a buffer and that appears to wrap
around on itself. Circular buffers are typically implemented in software on
conventional processors and via modulo, circular, addressing on DSPs.
clock cycle rates – Synonymous with processor cycle rate, it usually refers to the
rate at which the DSP system performs its most basic unit of work.
clock line polarity – Line polarity is a characteristic of the bit clock of a serial port
interface. Clock line polarity describes which edge of the clock signal controls when
data changes (when does data bit N becomes data bit N+1).
B-2
New Terms and Words
CPU – The CPU, Central Processing Unit, is that portion of the processor involved
in arithmetic, shifting, and Boolean logic operations, as well as the generation of
data- and program-memory addresses.
data bit signals – The data bit signal lines are used by the parallel port interface
to transmit and receive bits in parallel. The signal line indicates the state of a bit
(either high or low).
data buffers – A data buffer is a section of memory that is used to store data. The
data arrives from an off-chip source (such as a CODEC) or from a previous
computation. It is held in the buffer until the processor is ready to process the data.
data signal – The signal used by a serial interface to transmit the state of a bit
(either high or low).
Decode – The second pipeline stage found in many DSPs including the
TMS320C50. The instruction word is decoded (i.e., it is determined what the
instruction is supposed to do) and address generation as well as ARAU updates of
auxiliary registers are performed.
B-3
New Terms and Words
Direct addressing – A type of addressing that encodes the operand address within
the instruction word or within a word following the instruction word. This addressing
mode is also known as register-direct addressing or paged memory-direct
addressing.
DMA controller – A specialized unit that moves data directly between main storage
and peripheral equipment by taking bus control away from the CPU and thus does
not require processing the data with the processing unit.
dynamic range – The dynamic range is the ratio between the largest and smallest
value a quantity or parameter can take.
echo – An echo is reflected sound that is loud enough and received late enough
to be heard as distinct from the source. Echoes can be produced by generating
delayed repetitions (sometimes several rapid repetitions) of the original sound or
signal.
EVMs – Evaluation Modules are low cost development boards that include a target
processor, and a limited amount of peripherals and of external memory. EVMs are
used to test codes in real-time.
Execute – The fourth pipeline stage found in many DSPs including the
TMS320C50. The ALU or MAC portion of the instruction is executed and, if
required, results of a previous operation are written to memory.
Fetch – The first pipeline stage found in many DSPs including the TMS320C50. An
instruction word is fetched from memory and the Program Counter is updated. The
fetch execution stage of the pipeline can sometimes be used to read operands from
program memory. For example, when a 2-word instruction using long immediate
addressing is executed.
FIFO – A First-In, First-Out queue in which the most recent arrival is placed at the
end of the waiting list and the item waiting the longest receives service first. A FIFO
is used as a buffer to connect two devices operating asynchronously at different
speeds. Each device is connected to one end of the FIFO.
B-4
New Terms and Words
flanger – An effect that is used to spice-up sounds. Difficult to describe, when the
effect is applied to a vocal signal it may resemble a voice underwater or, as in some
movies, a Martian's voice. Flanging adds together a time-delayed and a direct
signal where the delay time (in the range of 0.50 to 0.35 ms) is constantly altered
(varying between 1.0 and 10 Hz). To further vary the signal the weights given to
each signal before addition can be varied.
frequency domain – The frequency domain is the reference frame where signals
are represented as functions of frequency. The frequency domain is a way of
looking at the world where the independent variable (the one that voltage, current,
capacitance varies with) is not time but frequency.
hand-shaking – The dialogue that takes place between two devices before a
transfer of information begins. Hand-shaking is the exchange of predetermined
signals for purposes of control when a connection is established.
B-5
New Terms and Words
IDLE – The assembler mnemonic for the TMS320C50 DSP idle instruction which
places the processor into a power down mode that is exited when an interrupt is
signaled.
IMR – IMR is a 'C50 memory-mapped register that masks external and internal
interrupts.
instruction set – The instruction set is the hardware "language" in which the
software tells the processor what to do.
B-6
New Terms and Words
interface – An interface is the path along which information can flow between a
peripheral and the CPU. Having to do with the device (peripheral) through which
a processor communicates to the outside world.
Interlocks – Pipeline interlocks are mechanisms used to ensure that when data is
written to a register, any reference to this register causes a stall until the data is
available.
interrupt vector – Typical interrupt vectors are one to two words long and are
located in low memory. An interrupt vector does not actually contain the ISR; rather,
it contains a branch to the address of the interrupt service routine (ISR).
INTR – INTR is a specialized branch instruction for the TMS320C50 DSP that
causes the processor to begin execution at the appropriate interrupt vector.
B-7
New Terms and Words
ISR – An abbreviation for Interrupt Service Routine, the subroutine executed when
an interrupt occurs.
kernel – The programs that form the “core” or the most essential parts of an
operating system for a computer. Nucleus is a near-synonym for kernel and tends
to be used where the effects are achieved by a mixture of normal programming and
micro coding (such as is done with the assembler language).
linker – A program that creates one executable file from one or many object files.
master clock – The master clock is the primary source of timing signals used to
control the operations that take place within a processor.
memory configuration bits – These bits are status and control bits that select the
memory configuration that is used by the DSP. Within the TMS320C50 DSP the
MP/MC#, RAM, OVLY bits are found within the Processor Mode Status Register
(PMST) and the CNF bit, another memory configuration bit, is found in Status
Register 1 (ST1).
memory – A device in which information can be inserted and stored and from
which it may be extracted when wanted.
B-8
New Terms and Words
NORM – The NORM assembler instruction is used for the TMS320C50 DSP. The
instruction normalizes the accumulator, that is, a fixed-point number is converted
to a floating-point number.
object files – File which consists of machine code directives that usually represent
a portion of a program.
operands – The part of an instruction that designates where the central processing
unit (CPU) will fetch or store data during instruction execution.
B-9
New Terms and Words
OVerflow saturation Mode (OVM) – When enabled, any overflow value produced
by the ALU appears as the maximum possible value. For the TMS320C50, the
value appears as 7FFF FFFFh. When enabled, any underflow value produced by
the ALU will appear as 8000 0000h, the minimum possible value.
overhead – The time a processor uses for operations that do not belong to the user
task. In the case of digital signal processors these operations usually consist of the
allocation of resources for the execution of the next instruction. Overhead is also
known as execution overhead.
peripheral – In a data processing system, any equipment, distinct from the central
processing unit, which may provide the system with outside communication or
additional facilities.
pipeline conflict – Pipeline conflicts occur during a given clock cycle. They prevent
the next instruction in the program from being correctly executed during the
following clock cycle. If a pipeline conflict is not automatically corrected for by the
processor, or foreseen to occur and corrected for by the programmer, then
erroneous program results may result and processor performance reduced.
pipeline hazard – A pipeline hazard is the term given by some to describe what is
called in this manual a pipeline data conflict. A pipeline data conflict and a pipeline
hazard are one and the same.
POST-FILTER – Low pass filter designed to remove, from the output signal, high
frequency components that are created by the digital-to-analog conversion.
prescaler – The prescaler is used to change the frequency of the clock source and
thus have the counter count longer periods of time (changing the frequency of the
timer).
B-10
New Terms and Words
RAM – RAM, Random Access Memory. This is usually used to store temporary
program information. RAM is a volatile memory because when power is removed
the stored information is lost.
Read – The third pipeline stage found in many DSPs including the TMS320C50.
During this stage a data operand is read from or written to memory.
reservation table – A figure that shows how processor resources are used over
time. A reservation table aids the visualization of pipeline operation.
reset – A means to bring the central processing unit (CPU) to a known state by
setting the registers and control bits to predetermined values and signaling
execution to start at a specified address.
RET – The assembler mnemonic for the TMS320C50 DSP return from subroutine
instruction.
RETE – The assembler mnemonic for the TMS320C50 DSP return from subroutine
and enable interrupts instruction.
RINT – RINT is a 'C50 hardware interrupt that is signaled after a word has been
received through the DSP serial port.
B-11
New Terms and Words
ROM – ROM, Read Only Memory, this type of memory is used to store program
code during the manufacturing process. ROM is a non-volatile memory because it
retains its data after the processor has shut down.
SACL *+, 0, AR0 – The 'C50 SACL instruction stores the ACCL (ACCumulator
Low) bits in memory. At this point in the program the accumulator contains the most
recent sample received by the DSP from the CODEC. The indirect addressing
operands tell the CPU to store the sample in one of the dma labeled XN0 to XN15.
The dma is pointed to by auxiliary register 0 (AR0).
SACL 10 h – The 'C50 SACL instruction, as previously stated, stores the ACCL
(ACCumulator Low) bits in memory. In this particular case, when SACL is executed,
the accumulator holds the average of the 16 most recent samples received by the
DSP and stored in memory. SACL stores this average to the dma labeled OUTPUT.
shadow register – Shadow registers are dedicated registers that hold the contents
of key CPU registers during interrupt processing (when an interrupt is executed).
A shadow register is a one-level deep stack that belongs to one of the processor’s
shadowed registers (ACC, PREG, ST0, ...).
shift direction – Shift direction is the order in which bits are transmitted by serial
interfaces. Devices may send the LSB as the first bit or the MSB as the first bit.
Some DSP serial interfaces allow selection of the shift direction.
sign-extension – The process of filling the high-order bits of a number with the
sign bit. For example, when loading a 16-bit number into a 32-bit field, the sign bit
of the 16-bit number is extended into bit positions 17 to 32.
B-12
New Terms and Words
status and control registers – The operation of the TMS320C50 DSP CPU is
determined by the information found inside of four 16-bit Status and Control
Registers. The four status and control registers are: the Circular Buffer Control
Register (CBCR), Processor Mode STatus register (PMST), STatus Register 0
(ST0), STatus Register 1 (ST1).
strobe – The strobe signal indicates to the external device sending the DSP data
through the parallel, port that the data word has been received. The strobe signal
is also known as a handshake signal.
Synchronous serial ports – A type of serial interface where a bit clock signal is
transmitted in addition to the serial data signal. The receiver uses the bit clock
signal to decide when to sample the received data signal.
TDDR – The decimal value of a series of bits within the TCR register.
time domain – The time domain is the frame of reference for signals that vary as
a function oftime.
B-13
New Terms and Words
TM C – The period of the DSP master clock (this value is (1/20) MHz = 50 ns for the
'C50 DSP).
transition region – The transition region is the area between the pass band and
the stop band. A large gain rate of change with frequency within this region usually
improves filter performance (depending on the specific application).
VLIW – VLIW stands for Very Long Instruction Word. A VLIW machine is a parallel
processor in which several instructions grouped together into a single word are
carried out simultaneously by several functional units within the Program Controller
unit.
voice – An effect created by the assembled ex3_1.asm DSP program. The effect
consists of sampling the microphone input signal, and via the CODEC and DSP
outputting it to the AUDIO AMPLIFIER.
wavetable – A list of values that define one period of a signal. The wavetable is
stored in memory and is used to generate a waveform.
weights – The factor by which a digit in a binary number is multiplied to obtain its
additive contribution in the representation of a real number.
XINT – XINT is a 'C50 hardware interrupt that is signaled after a word has been
transmitted through the DSP serial port.
B-14