Professional Documents
Culture Documents
Laboratory
Manual
School
of
Computer
Science
The
University
of
Manchester
September
2011
Table
of
Contents
Chapter
1
Introduction ................................................................................................................ 4
Laboratory
Aims ......................................................................................................................... 4
Learning
Outcomes .................................................................................................................... 4
Chapter
2
Laboratory
Organisation.......................................................................................... 6
Schedule......................................................................................................................................... 6
The
Project.................................................................................................................................... 7
Preparation................................................................................................................................... 7
Deadlines....................................................................................................................................... 7
Marks .............................................................................................................................................. 7
Chapter
3
The
Design
Process ..................................................................................................10
3.1
Introduction ........................................................................................................................10
3.2
Specification ........................................................................................................................12
3.3
Top-level
Behavioural
Description .............................................................................12
3.4
Chip
Architecture
Design ................................................................................................14
3.5
Register-Transfer
Level
(RTL)
Design........................................................................14
3.6
Logic
Level
Design .............................................................................................................18
3.7
Field
Programmable
Gate
Arrays
(FPGAs)................................................................19
3.8
Layout ....................................................................................................................................20
3.9
Post-layout ...........................................................................................................................21
3.10
Packaging
and
Test.........................................................................................................21
Chapter
4
Design
Tasks...............................................................................................................22
Exercise
1
Operation
of
the
STUMP
Assembler...........................................................23
Exercise
2
Top
Level
Model
of
the
STUMP
in
Verilog ................................................26
Exercise
3
Behavioural
Simulation
of
the
STUMP ......................................................30
Exercise
4
Signal
Usage
Charts
for
the
Control............................................................34
Exercise
5
Verilog
Specification
of
the
Control
Block ...............................................38
Exercise
6
Test
of
the
RTL
Design ....................................................................................40
Chapter
5
The
Processor
Specification..................................................................................42
5.1
Architecture.........................................................................................................................42
5.2
Instruction
Set .................................................................................................................42
5.3
Instruction
Formats..........................................................................................................43
5.4
Shift
Operations .................................................................................................................44
5.5
Conditional
Branch
Instructions ..................................................................................46
5.6
Condition
Codes .................................................................................................................46
5.7
Processor
Interface...........................................................................................................48
Chapter
6
Programming
the
STUMP
Processor ..................................................................50
6.1
Introduction ........................................................................................................................50
6.2
Using
the
SASM
Assembler .............................................................................................50
6.3
Extending
the
Instruction
Set ........................................................................................54
2
Appendices......................................................................................................................................56 Appendix A Verilog Top Level Behavioural Model of the STUMP Processor ............58 Appendix B Verilog Control Description...............................................................................64 Appendix C STUMP Processor: RTL Design ..........................................................................68 C1. RTL Datapath Design ........................................................................................................68 C2. Bus Interface Component................................................................................................71 C3. Control Block Signals........................................................................................................72 Appendix D Cadence End User Agreement...........................................................................74 Appendix E Exercise Answer Sheets.......................................................................................82 Exercise 1: STUMP Assembler .................................................................................................84 Copy of Signal Usage Charts.......................................................................................................90 Exercise 4: Signal Usage Charts ................................................................................................90
Chapter
1
Introduction
This
lab
manual
accompanies
the
2nd
year
course
unit
COMP22111:
VLSI
System
Design.
The
material
presented
further
reiterates
the
material
given
in
lectures
and
should
be
considered
to
be
examinable.
The
practical
work
outlined
here
will
enable
you
to
design
and
implement
a
fully
functioning
(albeit,
simplified)
processor.
Laboratory
Aims
To
learn
about
the
process
of
designing
on
silicon
by
doing.
To
design
a
simple
RISC
processor,
comprising
the
datapath
and
control,
at
(all)
lev- els
of
the
design
hierarchy
ranging
from
a
high
level
specification
down
to
the
Reg- ister
Transfer
Level
(RTL).
To
have
exposure
to
industry
standard
CAD
tools.
To
use
assembler
test
programs
which
can
be
used
to
test
all
levels
of
the
design.
To
simulate
the
complete
processor
at
the
different
levels
of
the
design
hierarchy.
To
support
the
lectures
with
a
practical
example.
Learning
Outcomes
After
completing
the
laboratory
a
student
will
be
able
to
specify
functionality
in
the
Verilog
hardware
description
language,
gain
experience
of
the
different
stages
of
the
VLSI
design
process
down
to
the
RTL
level,
gain
experience
of
the
composition
and
running
of
test
programs,
and
checking
their
results,
be
able
to
use
the
Cadence
CAD
tool
to
find
hardware/software
errors,
gain
experience
of
the
appropriate
CAD
tools
available
for
use
at
the
different
stages
of
the
design
process.
VLSI systems design requires inspiration and imagination as well as a sound technical background. Most of the technical background can be imparted by means of lectures, but when it comes to design there is no substitute for experience. We believe that, in the words of Albert Camus, You cannot create experience - you must undergo it. The COMP22111 course has therefore been structured so that the technical background is covered in a taught course consisting of lectures, while the design process itself is taught by means of a design project in laboratory classes. In the laboratory course students design a small ASIC (Application Specific Integrated Circuit) which, if completed successfully, could be implemented on silicon. This experience should give some feeling for the trials, tribulations and satisfactions of designing systems on silicon. The objective of the project is to learn some of the methodology of VLSI design by carrying out the design of a small RISC processor. It is also intended to help students 4
appreciate and understand the operation and architecture of a RISC processor. The STUMP processor has been fully specified by D. A. Edwards and parts of it have been designed. Thus, students should regard themselves as members of a design team whose job it is to do a significant part of the design; you will complete a partially done design and simulate the design as a whole. In contrast to design work carried out in previous School of Computer Science laboratories the work will not consist of designing gate-level circuits - the emphasis in this project is rather on systems on silicon. The work starts with the high level behavioural modelling of the chip and proceeds to Register Transfer Level (RTL), where again behavioural modelling is used. Tools are available to automatically translate a RTL description into logic and then to generate layout of the chip onto silicon; this will be described in lectures but not performed in the lab. The methodology described in lectures and used in the lab is typical of that adopted by many designers in industry for ASICs, since only a small proportion of the ASICs pro- duced nowadays are designed using full-custom methods. This manual includes basic information about the laboratory organisation (Chapter 2), a description of the design process (Chapter 3) and stage-by-stage details of the design tasks (Chapter 4). Chapter 3 is intended to give a background description of the design process while chapter 4 describes how the process is applied to the specific design project being carried out in the laboratory. The specification of the STUMP processor is in Chapter 5. Chapter 6 gives information on how to program the processor. The appendices contain information that you will need in undertaking the design tasks. Appendix A contains a copy of the top level (algorithmic) Verilog behavioural description of the processor which is to be completed as part of the second exercise. Appendix B contains a (very) incomplete Verilog description of the control for the STUMP at the RTL design level. Appendix C shows the RTL design of the STUMP datapath which has been done for you. Answer Sheets which students fill in and hand in for laboratory exercises 1 and 4 are to be found in Appendix E. Copies of answer sheets can also be found in the laboratory. The emphasis of the manual is on how to do it and it does not attempt to give a com- prehensive account of the many different facets of chip design. A fuller picture should emerge when the design work is taken together with the taught course material. References Information on the top-down design approach can be found in Chapter 3 of this manual. An introduction to Verilog can be found in the Cadence manuals: /home/cadtools5/cds_2008_2009/ldv_2009/doc/pdf/vlogref.pdf
Nov 8 Nov 10 (Exercise 2.2) (Lecture 10) Nov 15 - Nov 17 (Exercise 2.2) (Lecture 11) Nov 22 - Nov 24 (Exercise 3) (Lecture 12) Nov 29 - Dec 1 (Exercise 4.1) (Lecture 13) Dec 6 - Dec 8 (Exercise 4.2) (Lecture 14) Dec 13 (written Dec 13 Dec 15 work/code hand-in (Lectures 15 & 16) (Lecture 17) deadline 12:00) Dec 14 (demo deadline 15:00) Table 2.1: Schedule for Lectures and Lab in 2011 6
The final deadline for handing in written work or code for marking is 12:00 on Tuesday December 13th, week 12. The final deadline for demonstrating work is on the afternoon of Wednesday December 14th. Students wishing to demonstrate work must put their names on a list between 14:00 and 15:00. Names will be taken randomly from the list and students given one opportunity to demonstrate their work. Note: no work will be accepted or demonstrated after the deadlines unless the student concerned has a lab mark of less than 40%.
The
Project
The
lab
work
consists
of
designing
and
testing
a
simple
16-bit
RISC
processor
down
to
the
Register
Transfer
Level.
Preparation
Preparation
outside
the
timetabled
laboratory
classes
is
necessary
and
expected.
Students
who
wish
to
make
good
progress
in
the
laboratory
time
when
help
is
available
should
not
only
read
the
relevant
material
for
each
week
before
coming
to
the
lab
but
should
also
do
further
work
on
stages
of
the
design
outside
this.
Remember,
you
are
expected
to
spend
the
same
amount
of
time
on
preparation
as
you
spend
in
the
scheduled
lab
time.
In
addition,
the
lab
work
and
lectures
are
closely
integrated,
so
important
and
useful
information
about
lab
exercises
is
given
in
lectures;
so
attendance
at
lectures
is
closely
linked
to
good
progress
in
the
lab!
Deadlines
The
exercise
is
divided
into
a
number
of
stages
with
deadlines
as
indicated
in
Table
2.2.
The
details
of
the
deliverables
for
each
stage
are
given
in
Chapter
4
of
this
manual.
Due
to
the
incremental
nature
of
the
laboratory,
an
extension
system
is
not
operated
and
you
do
not
need
to
request
an
extension.
However,
to
complete
the
project
work,
you
should
adhere
to,
or
be
ahead
of,
the
deadlines
given.
Marks
This
course
has
more
labs
and
less
lectures
than
other
courses
and
the
overall
lab
and
exam
mark
is
weighted
accordingly.
Students
are
expected
to
work
individually
and
independently.
Hence,
work
resulting
from
collaborative
efforts
will
result
in
the
mark
awarded
for
the
work
being
equally
split
amongst
the
contributors.
As
the
COMP22111
lab
forms
a
significant
contribution
to
the
overall
course
mark,
it
is
in
your
interests
to
invest
the
time
in
obtaining
a
good
lab
mark!
Design Level Specification Ex. - Design Stage No. of Sessions Semester Week(s) Exercise hand-in Week Max Mark -
1. 2.1
Read Lab - Manual, especially Chapter 5 STUMP 1 Week 3 assembler Oct 11 Top level 2 Weeks 4 & 5 model in Oct 18 & 25 Verilog and entry Simulation of 2 Weeks 7 & 8 top level model Nov 8 & 15 Signal usage 1 Week 9 charts Nov 22 Verilog 1 Week 10 specification of Nov 29 control RTL Testing RTL 1 Week 11 design Dec 6 Deadline for - Week 12 written Dec 13 work/code hand-in Deadline to - Week 12 sign up for Dec 14 demo Table 2.2: COMP22111 Schedule
Week 3 Oct 11
20
Week 8 Nov 15 Week 9 Nov 22 Week 11 Dec 6 Week 12 12:00 Dec 15 Week 12 15:00 Dec 14
30 10 40 -
- 100
10
Design
Level
Top
Level
Components
Whole
chip
The
Three
Design
Representation
Domains
Behavioural
Written
specification
Executable
behavioural
description
Structural
Schematic
shows
core
logic
connected
to
input
and
output
pads
Physical
Chip
architecture
shown
as
pads,
core
logic
outline
and
power
distribution
Tests
Chip Architecture
Logic
Logic Gates
Transistor
Behaviour of gates as simulation models provided by the silicon vendor Electrical model, e.g. SPICE models, or transistors used by the silicon vendor
Block diagram schematic of chip shows inter- connectivity of functional blocks Schematic diagrams of each functional block show interconnectiv ity of RTL components Each RTL component is shown as a schematic of interconnecte d gates Circuit diagrams show transistors connected to form gates
Input to behavioural model should reflect all possible system conditions whole chip test patterns Floorplan Test for each shows size and functional shape of block + test for rectangular whole chip blocks with routing channels Components represented as areas of standard cells or as blocks of special cells, e.g. RAM, PLA, datapath Outline for each cell + interconnectio n tracks Test for each RTL block + each functional block + whole chip
Production
Polygons represent mask shapes used for fabrication of transistors and inter- connect Masks or reticules of pattern for each layer of fabrication
11
3.2
Specification
A
chip
design
starts
with
a
set
of
requirements
from
which
a
specification
is
drawn.
The
specification
defines
precisely
what
the
chip
does
-
its
function
-
not
how
to
do
it.
It
is
the
users
view
of
what
the
chip
does.
In
the
real
world
the
specification
is
needed
to
make
sure
that
the
designer
and
cus- tomer
agree
on
the
function
of
the
chip,
and
to
define
the
interaction,
or
interface,
of
the
chip
with
the
external
system
of
which
it
forms
a
part.
Cost
and
performance
criteria
are
also
a
part
of
the
requirements.
In
an
educational
exercise
there
are
no
customer
requirements
to
determine
design
constraints.
The
main
constraint
in
the
class
context
is
for
a
design
which
can
be
com- pleted
within
a
limited
time
(approx
16hrs).
When
deciding
what
to
put
in
a
specification
and
how
to
write
it,
it
is
useful
to
consider
what
information
will
be
needed
in
the
data
sheet
of
the
completed
device
because
the
two
are
very
similar.
A
good
summary
of
the
main
functions
of
a
specification
are:
a
summary
description
of
what
a
chip
does
a
list
of
the
chips
input
and
output
pins
required
performance
(clock
rate)
and
power
dissipation
a
list
of
the
major
modes
in
which
the
chip
operates
for
each
mode
signals
which
control
the
mode
function
executed
in
that
mode
performance
constraints
on
execution
such
as
minimum
and
maximum
times
between
inputs
and
outputs
12
Programming languages have been developed especially for the behavioural and struc- tural modelling of integrated circuits; they are known as Hardware Description Lan- guages, or HDLs. Two examples are VHDL and Verilog. Verilog is a widely used standard and is used in preference to VHDL in most CAD tools. Thus, Verilog will be used in this course. Other general purpose languages, such as C, Java or C++, could also be used for the top-level behavioural modelling of a chip. When Verilog is being used to model a whole chip, a common procedure is to connect it to a test bench as shown in Figure 3.1. The module representing the chip contains a model of the chip and the Tester module (the test bench), which emulates the external environment of the chip. During a simulation the Tester module reads some form of input from an external file and extracts data to be applied as inputs to the chip model. It also captures the chip output data and writes it to an external file. The form of the external test file will depend on the type of chip being modelled. For example, if the chip is a processor the test file could be in the form of the binary representation of a program to be executed by the processor.
The form of the chip model will depend on the stage reached in the design. A purely behavioural model, which describes the function of the chip but not its internal struc- ture, is used for the Top Level design stage. At lower levels of the design the model contains internal information usually in the form of a behavioural model describing the internal data flow and operations but it can also contain a structural (gate) description. The same test bench and test program should be used at all levels of the design to ensure that each level of the design decomposition carries out exactly the same function as the top level description.
13
On each active clock-edge data is clocked from the D inputs of the flip-flops (FFs) to the Q outputs which form the inputs to the following block(s) of logic; see Figure 3.2. After a short delay the outputs of each CL block change as a result of the change to the block inputs.
14
The elements in the RTL design are usually represented as boxes, or blocks, in a block diagram which shows the interconnections between the blocks. The internal logic structure of the combinational logic is not defined at this stage but the function, or behaviour, is described as a model thta can be used in a simulation of the RTL design. Thus, it can be seen that a register-transfer design gives a complete specification of what the chip will do on every clock cycle. Students may already have come across most typical combinational RTL elements in earlier courses: adders, multiplexers, comparators, ALUs etc. In addition to these there will be designer elements i.e. blocks of random logic designed to carry out arbitrary combinations of functions not included in standard libraries; the combinational logic block of a FSM (Finite State Machine) will be of this type. Sequential elements consist of either straightforward storage registers - a set of D-type flip-flops for example, or more complex assemblages such as counters or state machines. Counters and FSMs contain combinational logic in addition to memory ele- ments. Thus, the separate combinational logic and register blocks of an RTL block dia- gram will not always be obvious because some of the RT structure is hidden within these more complex blocks. However, each block in the diagram should only contain one register.
15
Experienced designers will spend time and effort optimising their designs for silicon area, performance etc. but a first-time designer will be happy with a completed design which works! It will probably be helpful to think of the design as made up of three parts: 1. memory storage - registers, RAM etc. 2. datapath functions - e.g. logical and arithmetic functions 3. control - a block which includes an FSM to control the state sequence of the circuit.
16
Stage 1 Preliminary Design 1. Draw an ASM diagram of the design. 2. Draw an outline block diagram including memory storage registers and datapath functions but omitting the control block, as follows: from the ASM diagram identify all the registers and memory needed select combinational functions to carry out the data operations draw in the connections (wires) needed to transfer data between blocks and add multiplexers where necessary - check the block diagram against the ASM diagram. 3. List all the control signals which will be needed to control the operations of the blocks and orchestrate the clocking of registers. Also identify the signals needed as inputs to the control block. 4. Define the functions of the control block and extract a state transition graph for the FSM from the ASM diagram. 5. Complete the block diagram by adding the control block and the control signals and write out a detailed specification of the control functions. Stage 2 Refining the Design The design should now be checked, critically examined and revised: 1. Work through the design, comparing it with your top level model and ASM diagram to check for the correct sequence, the correct production of control signals and correct data operations. 2. Modify the design if necessary. 3. Examine the design to see if there are any obvious simplifications that can be made. It will often be found that step (2) will have led to additions and modifications which are rather clumsy. A re-examination may show that a design revision will give a simpler solution. 4. Repeat steps (2) to (4) until satisfied with the design. Stage 3 Verifying the Design Before the design can be verified by simulation it must be entered into the CAD system. 1. The structure can be entered as a schematic block diagram. In this case, great care should be taken to avoid errors and inconsistencies in the labelling of pins and bus signals. Careless labelling can make nonsense of simulations. Alternatively, the entire design can be entered as a HDL description. 2. If the design is entered as a block diagram then the functional descriptions of the blocks must be entered using a HDL i.e. Verilog in the present exercise. Models will already exist for library blocks. 3. The behavioural/functional models of each of the RTL blocks are tested for correct functionality by simulation. A set of test patterns will be needed (see below). 4. When the functional models of all the blocks have been verified the whole design is simulated using the same chip test that was used for the top level behavioural simulation. 5. Corrections are made if needed and the design is re-simulated until correct outputs are obtained.
17
18
The methods which are commonly used for the design of standard cell ASICs are sumarised below. 1. Use of library components: Many widely used RTL elements, e.g. registers, multi- plexers, adders, can be pre-designed and stored as library components. 2. Logic synthesis: Automatic synthesis tools can be used to create gate-level logic designs from internal behavioural descriptions. Tools for synthesising combinational logic and FSMs are well established and widely available. Tools for synthesising whole RTL designs are also available and are now highly sophisticated so as to be able to optimise performance, power or area. However, this sophistication requires user interaction and usually design iteration. 3. Logic block compilers: Compilers are used to generate blocks which have some form of regular geometry. Most ASIC vendors supply compilers for ROM, RAM, PLAs and datapaths. In COMP22111, library components have been used to define the datapath and a logic synthesis tool is used for the control block. It is sensible to arrange that the behavioural descriptions of RTL logic elements which have already been written for the RTL design can also be used as inputs to the synthesis tools. In the processor design the Verilog program describing the control block is used as the input for the synthesis software. The logic-level design is seen to be an almost automatic decomposition from the RTL design. After decomposition to logic, the whole design is then re-simulated using the same chip test that was used for the Top Level and the RTL simulations. There may be a few problems at this stage because: 1. Synthesis tools do not always do the sensible thing and may misinterpret a description which was adequate for RTL simulation, but not sufficiently specified for the unambiguous decomposition to gate level. 2. Simulations at higher levels do not take any account of gate delays. The gate level simulation models do include information about the delay characteristics of the gates and the simulation results show gate delays; the logic simulator can also make worst case predictions of the effect of the wiring between gates (but the actual wiring delays cannot be calculated/known until after layout). The simulator results may show that some delays are unacceptable or that the active clock edges occur too close to data transitions. Tests of individual blocks may be needed in addition to the whole chip simulation in order to sort out problems. RTL block test patterns are needed for these logic level simulations.
blocks of the particular FPGA and these are then placed and routed. This can all be done automatically by CAD tools. The design can then be downloaded onto the FPGA, again using appropriate available tools. To check the operation of the downloaded design, a test program is run. This should be the same as that used in the top level behavioural simulation. Unlike semi custom design, any design errors are not fatal. They only require that the design process is repeated from the highest level, amended, followed by downloading and testing of the updated design!
3.8
Layout
For
a
semi-custom
design,
there
are
still
a
number
of
design
stages
following
the
logic
design
which
need
to
be
performed.
These
are
described
in
the
remaining
sections
of
this
chapter.
Layout
is
the
process
of
placing
geometrical
representations
of
gates
on
the
surface
representing
the
chip
and
interconnecting
them
with
tracks
(wires).
When
a
semi- custom
chip
is
being
designed
layout
is
carried
out
using
automatic
Place
and
Route
CAD
tools.
Each
gate
is
represented
as
a
rectangular
shape
of
a
standard
height
on
a
chip
which
uses
the
standard
cell
architecture.
The
internal
representation
of
each
gate
as
a
set
of
polygons
is
added
at
a
later
stage
by
the
manufacturer
before
making
the
masks
for
fabrication.
The
cells
are
butted
together
in
rows
with
channels
between
the
rows
for
routing
the
interconnections
(Figure
3.4).
Figure 3.4: Some rows of a standard cell layout The layout of a small chip will consist of a single rectangle containing a number of rows and channels of the same length but a floorplan is needed for a large chip. A floorplan subdivides the total surface of the chip into separate areas for the placement of the 20
different functional blocks and for the routing of signals between the blocks. Although CAD tools can be used to assist in the creation of a floorplan, it is a difficult process to automate. The objective will usually be to obtain a layout with as small an area as possible that maintains the signal integrity. The Place and Route procedure for a standard cell chip consists of carrying out a sequence of separate steps. First, I/O pads must be added to the top-level schematic. The circuit description will usually be held in the CAD database in a hierarchical format but the layout tools need a flattened description containing every instance to be used in the layout. The next step is therefore to flatten the netlist. Further steps define floorplan areas, assign cells to rows and carry out local channel routing and global routing.
3.9
Post-layout
The
layout
stage
is
not
the
end
of
the
story
for
the
ASIC
designer.
Having
obtained
a
layout,
a
Design
Rule
Check
(DRC)
is
carried
out
to
ensure
that
none
of
the
fabrication
process
design
rules
are
broken.
If
the
software
is
well
designed
and
bug-free
there
should
be
no
errors
at
this
stage
-
regrettably
it
is
sometimes
necessary
to
make
a
few
edits
to
the
layout
by
hand.
When
the
DRC
passes,
then
a
Layout
versus
Schematic
(LVS)
check
is
performed.
This
checks
that
every
feature
extracted
from
the
layout
appears
on
the
schematics
generated
from
the
logic.
Any
mismatches
need
to
be
investigated
and
fixed
until
the
components
in
the
logic
correspond
exactly
with
the
layout
features.
When
this
check
passes,
the
next
step
is
to
use
a
program
to
calculate
(extract)
the
par- asitic
capacitances
of
all
the
interconnection
tracks.
The
whole
chip
is
then
re-simulated
and
the
effects
of
the
extra
track
capacitances
are
included
in
the
delay
calculations
in
order
to
get
a
fairly
accurate
estimation
of
performance.
Further
testing
is
normally
done
to
ensure
that
the
design
functions
correctly
despite
the
maximum
allowable
variations
in
transistor
characteristics
and
in
environmental
parameters
(temperature,
voltage
etc.).
This
is
referred
to
as
testing
in
the
corners.
When
the
designer
is
satisfied
that
the
design
functions
correctly
under
all
conditions
and
meets
its
performance/power/area
specification
under
typical
conditions,
a
final
Design
Rule
Check
(DRC)
and
Layout
versus
Schematic
(LVS)
check
are
undertaken.
Once
these
have
been
done
and
passed,
the
chip
design
files
can
be
shipped
to
the
manufacturer
for
the
fabrication
of
the
chip.
21
Chapter
4
This
chapter
outlines
the
exercises
in
the
COMP22111
lab.
Chapter
5
details
the
STUMP
processor
and
Chapter
6
contains
information
on
how
to
programme
the
STUMP.
It
is
important
your
read
these
chapters
before
you
proceed
with
these
exercises.
Design
Tasks
There
are
six
exercises
for
you
to
complete
during
the
course
of
the
laboratory.
Please
make
sure
you
read
each
exercise
carefully,
along
with
the
extra
material
provided
in
this
manual.
Details
of
how
long
each
exercise
will
take,
hand-in
dates
etc
are
detailed
in
Table
2.2
(repeated
below).
Each
exercise
will
also
provide
further
information.
Design
Level
Specification
Ex.
-
Design
Stage
No.
of
Sessions
Semester
Week(s)
Exercise
hand-in
Week
Max
Mark
-
1. 2.1
Read Lab - Manual, especially Chapter 5 STUMP 1 Week 3 assembler Oct 11 Top level 2 Weeks 4 & 5 model in Oct 18 & 25 Verilog and entry Simulation of 2 Weeks 7 & 8 top level model Nov 8 & 15 Signal usage 1 Week 9 charts Nov 22 Verilog 1 Week 10 specification of Nov 29 control RTL Testing RTL 1 Week 11 design Dec 6 Deadline for - Week 12 written Dec 13 work/code hand-in Deadline to - Week 12 sign up for Dec 14 demo Table 2.2: COMP22111 Schedule
Week 3 Oct 11
20
Week 8 Nov 15 Week 9 Nov 22 Week 11 Dec 6 Week 12 12:00 Dec 15 Week 12 15:00 Dec 14
30 10 40 -
- 100
22
Lab
etiquette
Each
exercise
details
the
work
that
you
are
required
to
do
and
hand
in
for
each
exercise.
It
is
important
that
you
manage
your
time
wisely
in
the
labs
and
prepare
thoroughly
beforehand.
Some
exercises
require
answer
sheets
to
be
completed,
some
require
code
to
be
handed
in
and
some
require
work
to
be
demonstrated
to
a
lab
demonstrator
and
the
exercise
signed
off.
Answer
sheets
for
each
of
the
six
exercises
can
be
found
in
the
lab.
Please
make
sure
you
submit
the
information
required
for
each
exercise
along
with
the
appropriate
cover/answer
sheet.
Remember
the
lab
is
worth
45%
of
your
marks
for
this
course
unit.
Achieving
a
good
mark
in
the
laboratory
will
put
you
in
a
very
good
position
for
passing
the
module.
23
Instructions
You
will
be
given
a
sheet
with
a
few
consecutive
lines
of
assembler
code
together
with
the
initial
state
of
the
Register
Bank
and
Condition
Code
register
prior
to
executing
the
code.
For
each
instruction
fill
in
the
sheets
to
specify
the
register
state
after
executing
the
instruction.
Hand
in
your
sheets
for
marking
on
completion.
To
help
you,
note
that
The
Program
Counter
is
incremented
directly
after
fetching
an
instruction
and
before
the
instruction
is
executed.
The
Result
Register
holds
the
result
computed
by
the
ALU.
24
25
Instructions
You
should
have
met
the
hardware
description
language
Verilog
in
the
first
year
COMP12111
course
unit.
You
can
refer
to
your
old
notes
to
help
with
this
lab,
however,
you
should
find
that
the
examples
and
templates
provided
should
be
a
sufficient
guide
as
to
what
is
needed
to
complete
the
exercises.
Demonstrators
will
be
able
to
give
help
and
advice.
The
amount
of
Verilog
code
to
be
written
is
not
very
long
-
about
one
page
without
comment
lines.
In
order
to
write
this
code
and
to
attempt
the
other
exercises
well,
you
need
to
have
a
thorough
understanding
of
the
processor
specification
(Chapter
5).
You
should
remember
that
the
high
level
behavioural
model
of
a
chip
is
in
effect
an
execut- able
specification.
It
is
the
users
view
of
the
chip.
It
is
very
important
to
get
it
right
because
it
is
what
will
finally
be
made
-
all
the
lower
levels
of
design
are
tested
against
this
specification.
Your
first
task
is
to
complete
the
Verilog
behavioural
model
of
the
STUMP
processor
chip.
Most
of
the
model
is
provided
-
a
listing
of
the
code
is
given
in
Appendix
A
of
this
manual.
It
includes
all
the
functions
and
tasks
that
you
will
need,
and
the
main
program
includes
reset,
instruction
fetch
and
instruction
decoding.
The
part
left
for
you
to
do
is
the
execution
part
of
the
instruction
i.e.
the
reading
of
the
Register
Bank,
the
setting
up
of
operands
to
the
ALU,
the
execution
of
the
instructions
in
the
ALU
and
the
setting
of
condition
codes
in
the
Condition
Code
Register.
Read
the
26
(incomplete) Verilog listing, identify the signals to the ALU, and look at the instruction type descriptions in Chapter 5. As an example, in a type 3 instruction (Branch) the offset (bits 7:0 in the instruction) is added to the program counter (register 7 in the register bank) to calculate the memory location of the next instruction if the branch is to be taken. It is important to remember that the offset is a signed, 2s complement, number and that the inputs to the ALU (ALUA and ALUB) are 16-bit values, so the offset taken from the instruction needs to be extended to 16-bits appropriately. You must add the code to do this. The writeback phase following execution has been done for you. You can complete the program using CASE and IF constructs together with variable assignments and task and function calls. You will find examples of all the syntax needed in other parts of the program. Those parts of the Cadence CAD systems needed to complete the exercise are described below:
Cadence
Cadence
is
an
industry-standard
tool
that
the
University
has
access
to
via
the
Europractice
framework.
In
order
to
use
the
software
the
University
must
adhere
to
an
end
user
agreement
(EUA)
that
states
(amongst
other
things)
that
the
tools
must
be
kept
confidential,
must
not
be
copied,
and
must
not
be
used
for
commercial
purposes.
A
copy
of
this
end
user
agreement
can
be
found
in
Appendix
D.
When
you
run
the
Cadence
tools
for
the
first
time
you
will
be
asked
to
confirm
that
you
agree
to
the
conditions
set
out
by
the
end
user
agreement;
failure
to
do
so
will
result
in
you
not
being
allowed
access
to
the
tools.
Accessing
and
Modifying
the
Top
Level
Model
The
Cadence
CAD
system
is
used
in
this
laboratory
for
the
design
work.
Create
the
COMP22111
Cadence
directory
structure
by
typing
mk_cadence
22111
<return>
this
should
only
be
done
once.
Thereafter
start
a
Cadence
session
by
typing
start_cadence
22111
<return>
Eventually,
an
icds
window
opens.
Choose
File->Open.
This
brings
up
an
Open
File
window.
In
the
Open
File
window
set
Library
Name
-
comp22111,
Cell
Name
-
processor,
View
Name
-
algorithmic
and
then
click
OK.
This
brings
up
a
verilog.v
window
containing
the
incomplete
Verilog
code
describing
the
top
level
of
the
STUMP.
Type
in
this
window
to
add
your
code
and
then
save
it
using
File->Save
from
the
windows
toolbar.
The
edit
operation
on
the
file
causes
all
the
Verilog
code
to
be
checked
for
syntax
errors
on
Exit.
If
you
have
errors
a
27
HDL Parser Error/Warnings window comes up telling you that parsing of the Verilog file failed. A failed design check indicates syntax errors and by clicking Yes in the HDL Parser Error/Warnings window you can inspect the error report to gain some indication of where the error is, and the verilog.v window will reopen. You can correct any errors in the top level description if you feel confident to do so, save it and then exit to re-check for syntax errors. Repeat this until the code correctly passes the checks. Remember: any syntactical errors will be largely ignored when marked.
Printing
Print
out
your
code
from
the
verilog.v
window
toolbar
using
File->Print.
This
brings
up
a
Printer
window.
Enter
lpr
-Pugpr3
and
click
on
Print.
Dismiss
the
verilog.v
file
using
File- >Exit.
If
the
HDL
Parser
Errors/Warnings
window
comes
up,
click
No.
Exit
To
exit
from
Cadence,
click
on
File
in
the
icds
window
and
then
select
Exit.
In
the
Exit
icds?
window
this
brings
up,
click
yes.
28
29
Instructions
Parsing
(syntactic
analysis)
It
is
first
necessary
to
parse
a
design
that
satisfies
the
Verilog
syntax.
Start
Cadence
(start_cadence
22111).
This
brings
up
the
icds
window.
Choose
File->Open.
This
brings
up
an
Open
File
window.
In
the
Open
File
window
set
Library
Name
-
comp22111,
Cell
Name
-
processor,
View
Name
-
algorithmic
and
then
click
OK.
This
brings
up
a
verilog.v
window
containing
the
Verilog
code.
You
need
to
perform
at
least
one
edit
on
it
and
then
save
it.
Now,
in
the
verilog.v
window,
select
File->Exit.
Test
Files
In
the
COMP22111
directory,
you
will
find
4
test
files
(test1.s
to
test4.s)
written
in
the
STUMP
assembly
code.
These
four
tests
provide
a
fairly
good
test
of
most
of
the
STUMP
and
are
used
to
test
the
STUMP
at
all
stages
of
the
design
from
top
level
to
layout.
The
tests
would
also
used
to
test
the
fabricated
design.
The
tests
are
incremental
i.e.
test
2
assumes
that
test
1
works,
test
3
relies
on
tests
1
and
2
working,
etc.
The
tests
start
at
line
0
and
all
write
results
back
to
memory,
starting
at
line
0
and
thus
overwrite
the
program!
Test
1
is
a
basic
test
which
checks
that
the
internal
buses
are
connected
cor- rectly,
that
the
Register
Bank
can
be
correctly
addressed,
that
instructions
can
be
fetched,
and
that
data
can
be
written
back
to
memory
(for
checking).
If
test
1
does
not
work,
something
fundamental
is
wrong
and
this
should
be
fixed
before
running
any
other
tests.
30
Test 2 checks that the ALU operates correctly for various data combinations and dif- ferent logical and arithmetic operations. It only checks the ALU and does not use the shifter. It aims to identify any signals in the fabricated ALU which are unable to change state (because they are stuck at 1 or 0) and pinpoint any adjacent signals (bits i and i+1) which are shorted together. Test 3 is a rudimentary program which checks the different shifting operations, and test 4 checks the branch operations. As these are the test programs used throughout this laboratory, you are advised to peruse them carefully. Furthermore, you will be using them to debug your design so familiarity will certainly be necessary if the test programs indicate any errors in your design. The assembler is fully described in chapter 6 and instructions to convert the assembler programs into a format suitable for the processor memory are given in section 6.2.1. They are repeated here for convenience: in a shell window, change the directory to COMP22111 using cd $HOME/Cadence/COMP22111. Then type sasm <filename.s> to create 3 files. Binary is in <filename.bin> while hex versions are in <filename.hex> and <filename.mem>. The file for the processor memory is called xc4000mem.ram and is created by typing loadmem.sh <filename.mem> in the terminal window; this creates the file in the $HOME/Cadence/COMP22111/test_bench directory. Waveform Viewer In the icds window, select Tools->Verilog Integration->NC-Verilog. This brings up a Virtuoso Verilog Environment etc. window. Fill in its fields with Run Directory - test_bench, Library - comp22111 (in Top Level Design), Cell - processor_test_bench (in Top Level Design), View schematic (in Top Level Design). Click on the top left icon (of the running man) to initialise the simulator. When the icds window shows that initialisation is complete, click on the second icon down on the left (of three separate ticks) to generate a netlist. When the icds indicates this is complete, click on the Simulate icon which is the third icon down in the Virtuoso Verilog Environment window. This launches SimVision and (eventually) brings up two windows: a Design Browser 1-SimVision window containing the processor and a Console-SimVision window which is a command window. You will probably want to view signals on the Waveform Viewer, select the fifth icon from the right (showing waveforms) in the Design Browser 1-SimVision window to bring up a Waveform 1-SimVision window with the signals to be displayed listed down the left. To monitor signals/buses within the test bench, expand test (press its + button) to reveal top in the Design Browser 1-SimVision window and then click on the top symbol to list the signals to and from the processor. Select the signals you want to monitor (the address, data and clock lines are particularly recommended) and send them to the Waveform Viewer (fifth icon from right). Continue this process of signal expansion and selection until all the signals you require have been sent to the Waveform Viewer. A good starting point for monitored signals would be the input/output signals to the processor.
31
Command Files The simulation can be run by using the menus to issue commands to the simulator with the commands given appearing in the Console-SimVision window. However, this is tedious and prone to errors, so normally these commands are placed in a file and the simulator instructed to take this file as its input. Command files have a .sv extension and the command files for the first three tests, test1.sv to test3.sv are in your COMP22111 directory. test1.sv is shown below: force test.RESET = 1 run 10 ns deposit test.RESET = { 1b0} -after 2 us -absolute -release stop -create -condition {#test.top.A = 45 & #test.top.OE = 0 } stop -create -time 10 ms -absolute run stop -delete * deposit test.DUMP = { 1b1} -after 1 ns -relative stop -create -time 10 ns -relative run stop -delete * The first three lines define the RESET signal, setting it to 1 at 10nsecs and removing it by taking it to 0 at 2secs. The next two lines define breakpoints where the simulation is to be stopped. The first is when the address lines are 45 with the Memory Output Enable signal at 0. Remembering that the test program starts at line 0 and that instructions are placed in consecutive lines, this statement recognises the request to memory to fetch the programs final instruction (bal Fin). If your processor description is correct, the simulation always stops at this point. The second stop command is only required if your processor is stuck in an infinite loop in which case the simulation terminates at 10msecs. The simulation is then instructed to run up to the time a break- point is encountered. There is a stop -delete * after each run statement; this line is nec- essary to remove the breakpoint enabling the simulator to run past this time. The DUMP signal is activated 1ns after the current time when the current values in the memory will be dumped to the file $HOME/Cadence/COMP22111/test_bench/xc4000mem.dump. The final three statements in the command file continues the simulation for 10nsecs beyond the last breakpoint before stopping. Using the template for the three provided command files, you need to create a command file test4.sv for the branch test. Running Simulations from Command Files To run the simulation from a command file, reset the simulation using Simulation-> Reset to Start in the Design Browser 1-SimVision window. Run the simulation from the command file by selecting File->Source Command Script from the Design Browser 1- SimVision window to bring up a Source Command Script window. Here, select the browse button (...) which opens a Source Script File window in which <name of file.sv> in the COMP22111 directory is selected followed by Open; this closes the Source Script File window. This filename now appears in the Source Command Script window. Select simulator Console (NC-Sim) for the Send commands to: field and press OK to cause the simulator to run (as can be observed by the change in the icon displayed by the Play 32
button in the Design Browser 1-SimVision). You can now inspect the file dumped to memory and you should compare the created outputs with those expected from the test program. If the contents are not correct, you need to pinpoint the cause of the fault; in this case, you will find the Waveform Viewer a valuable tool in debugging your design. Correcting any faults will involve modifying the Verilog code for the processor. After modifying the Verilog code, you need to repeat the syntax checks as previously described. When your top level design description simulates correctly for all the supplied test programs, you should show the results of running the assembler code for each test to a demonstrator who will mark them off. You should complete and hand-in the top level code and demonstrate the simulation of the processor operation before moving on to the next exercises. Make sure you demonstrate you tests working and have a sheet signed off by a demonstrator. Stopping the Simulation If for any reason the simulation fails to complete within a few seconds and the time is galloping on uncontrollably, you need stop the simulation! In the Design Browser 1- SimVision window the button next to the play button (having two parallel vertical lines) is a stop button and will halt the simulation. Alternatively, use Simulation->Stop. Exiting the Simulation To exit from the Waveform Viewer, select File->Exit SimVision in the Waveform 1- SimVision window. To exit from the simulator, in the Design Browser 1-SimVision window select File->Exit SimVision. This removes both the Design Browser 1-SimVision window and the Console-SimVision window. To exit from the simulator environment, in the Virtuoso Schematic Composer etc. window select Commands->Close. Note that the created netlist does not need to be recreated if further simulations are performed unless there are changes to the top level Verilog code. To exit from Cadence by clicking on File in the icds window and then selecting Exit. In the Exit icds? window this brings up, click yes.
33
Instructions
The
processor
design
can
be
thought
of
as
comprising
a
16-bit
datapath
(which
does
the
actual
computation)
plus
a
control
block.
The
RTL
level
design
of
the
datapath
has
been
done
for
you
and
is
shown
in
Appendix
C.
The
control
logic
is
all
about
ensuring
that
the
right
things
happen
at
the
correct
time
by
activating
control
signals
at
the
appropriate
time
in
the
instruction
cycle.
Each
instruction
takes
three
clock
cycles
to
complete;
instruction
fetch
is
performed
on
the
first
cycle,
execute
which
reads
the
operands
and
performs
the
arithmetic
is
done
in
the
second
cycle,
and
a
writeback
phase
which
operates
on
the
contents
of
the
Result
Reg- ister
occupies
the
third
cycle.
In
general,
a
control
signal
is
required
by
each
multiplexer
(to
select
the
desired
input
at
the
appropriate
time)
and
by
each
register
(to
enable
data
to
be
loaded
into
a
particular
register)
in
the
datapath.
In
addition,
there
are
control
signals
to
access
the
desired
read
and
write
locations
in
the
Register
Bank
and
signals
to
control
the
access
of
information
in
the
memory.
At
the
RTL
level,
the
STUMP
processor
requires
12
different
lots
of
signals
to
control
the
datapath
and
the
memory.
These
are:
BR
indicates
a
branch
instruction.
It
is
used
by
the
Sign
Extender
element
to
extend
the
least
significant
8-bits,
while
the
5
bit
immediate
value
are
extended
if
BR
is
low
(for
a
Type
2
instruction).
FETCH
indicates
an
instruction
fetch
is
occurring
EXE
indicates
the
Execute
phase
of
operation
i.e.
data
is
read
from
the
Register
Bank,
operated
upon
and
the
result
placed
in
the
Result
Register.
SRPA[2:0]
3-bit
address
specifying
the
register
(Reg0
to
Reg7)
to
be
read
out
onto
port
A
of
the
Register
Bank
34
SRPB[2:0] 3-bit address specifying the register (Reg0 to Reg7) to be read out onto port B of the Register Bank SWP[2:0] 3-bit address specifying the register in the Register Bank (Reg0 to Reg7) to be written to IMMED selects the extended immediate operand rather than port B of the Register Bank as the B-input to the ALU LDCC enables the loading of the 4-bit Condition Code Register LDREG enables writing to a register in the Register Bank SALUD (Select ALU Data) selects data from the Result Register for writing into the Register Bank. If SALUD is low, data from the memory (MDIN[15:0]) is selected instead. MEMWR active-low signal to Bus Interface when Memory is to be written to MEMOE active-low signal to Bus Interface when Memory is to be read from The Control forms these signals based on the contents of the Instruction Register (INSTR[15:0]) and on a State Register which is internal to the Control. The State Reg- ister is updated on each clock and when running instructions has 3 states: FETCH, EXECUTE and WRITEBACK. FETCH indicates the Instruction Fetch phase, EXE (i.e. EXECUTE) indicates the Execute phase and LDREG indicates Writeback when the Register Bank is written to; naturally, only one of FETCH, EXE and LDREG can be a 1 at any time and all three may be 0 during the writeback phase of a store instruction. The action required in each phase (which takes one clock period) is summarised in Table 4.1. Phase 1 Fetch The next instruction is fetched from memory and +1 is added to the Program Counter. The instruction is decoded and executed. This phase finishes when the result from the ALU is clocked into the Result Register. The Condition Code Register may also be updated in this phase. ALU operations are written back into the register bank. Load/Store operations access memory. A branch instruction that is taken will update the Program Counter. Table 4.1: Instruction Phases
Phase 2
Decode/ Execute
Phase 3
Writeback
35
In specifying the control signals, it is useful to make Signal Usage Charts which show the state of control signals at any time. Signal usage charts sheets can be found in Appendix E (copies are also available in the lab). Your task is to determine how each control signal is formed in each phase of the instruction for each instruction type. The control signal may be 0, 1, dont care or formed from bits in the Instruction Register INSTR[15:0] and/or the phase signals (FETCH, EXE and LDREG) and/or other signals. Hand in your completed sheets (make a copy of what you hand in, as this will help you in writing the Verilog code required in the next exercise).
36
37
Instructions
Introduction
The
processor
comprises
the
Bus
Interface,
which
forms
the
memory
to
processor
interface,
the
Datapath
elements,
which
perform
the
computational
part
of
the
STUMP,
and
the
Control,
which
generates
the
signals
at
the
correct
time
required
to
perform
instructions.
In
this
exercise
you
are
going
to
complete
the
specification
of
the
Control
block
in
Verilog.
The
interconnection
of
the
Bus
Interface,
Control
and
the
components
of
the
Datapath
components
is
shown
in
Appendix
C.
The
Datapath
and
Bus
Interface
have
been
designed
for
you
(down
to
the
gate
level).
RTL
Design
of
the
Control
Advice
on
how
to
proceed
with
this
stage
of
the
design
is
given
in
section
3.5
-
but
the
most
important
thing
is
to
know
precisely
what
you
want
your
design
to
do.
In
determining
the
signals
that
have
to
be
asserted
for
each
of
the
three
phases
by
Control,
the
Signal
Usage
Charts
you
generated
in
the
last
exercise
(if
correct)
should
provide
you
with
a
precise
specification
of
the
control
for
the
Fetch,
Execute
and
Writeback
phases.
These
need
to
be
translated
into
a
Verilog
control
block
design
specification
and
entered
into
the
functional
view
of
the
control
cell.
This
is
provided
in
the
form
of
a
template
which
is
listed
in
Appendix
B.
Accessing
and
Modifying
the
Control
From
the
icds
window,
choose
File->Open.
This
brings
up
an
Open
File
window.
In
the
Open
File
window
set
Library
Name
-
comp22111,
Cell
Name
-
control,
View
Name
-
functional
38
and then click OK. This brings up a verilog.v window containing the incomplete Verilog code describing the control for the STUMP datapath. The code contains an always code block, written for you, which defines the processor state and advances the state on each positive clock edge. State 0 is the Reset state, State 1 Fetch, State 2 Execute, and State 3 is the Writeback state. If the state is in none of these states, then a Default state is entered which sets all signals to dont care. It is strongly recommended that the signals listed in the Default state are explicitly set in each of the four states you need to write. At the end of the code, a function Testbranch, written for you, takes as its input parameters bits 11 to 8 of the Instruction Register and the bits in the Condition Code Register returning a 1 if the branch is to be taken (and 0 if a branch is not taken). Use the Edit facilities of the window to add to your control code and then save it using File->Save.
39
Instructions
In
this
exercise,
you
will
run
the
test
programs
we
have
given
you
on
the
RTL
descrip- tion
of
the
processor.
This
will
allow
you
to
identify
and
debug
faults
in
your
Verilog
code
of
the
Control.
You
can
assume
that
the
test
programs
we
provide
are
correct
and
that
the
RTL
datapath
we
provide
is
correct.
Therefore
any
errors
in
operation
are
due
to
faults
in
your
Verilog
control
description!
When
you
are
satisfied
that
your
design
is
working
correctly,
show
your
simulation
results
for
each
test
to
a
demonstrator
and
have
your
sheet
signed
off.
Hand
the
sheet
in.
The
Cadence
procedures
needed
for
this
exercise
are
similar
to
those
in
Ex.
3
and
are
briefly
summarised
below:
40
Test
Files
As
before
the
test
files
are
in
test1.s
to
test4.s
and
you
need
to
create
a
file
for
the
proc- essor
memory
called
xc4000mem.ram
for
use
in
the
simulation.
See
Ex.
3
or
Ch.
6
for
instructions
on
this.
Simulation
In
the
icds
window,
select
Tools->Verilog
Integration->NC-Verilog.
This
brings
up
a
Virtuoso
Verilog
Environment
etc
window.
Fill
in
its
fields
with
Run
Directory
-
test_bench
and
in
the
Top
Level
Design
section
enter
Library
-
comp22111
cell
-
processor_test_bench
View
-
schematic.
Click
on
the
top
left
icon
(of
the
running
man)
to
initialise
the
simulator.
When
the
icds
window
shows
that
initialisation
is
complete,
click
on
Setup->Netlist
in
the
Virtuoso
Verilog
Environment
etc
window
along
its
top
toolbar.
This
brings
up
a
Netlist
Setup
window.
To
the
Netlist
These
Views
line,
remove
algorithmic
at
the
beginning
of
the
line
(to
enable
the
netlister
to
pick
up
the
Verilog
code
for
control).
Click
OK
along
the
top
toolbar
of
the
Netlist
Setup
window.
Back
in
the
Virtuoso
Verilog
Environment
etc
window,
click
on
the
second
icon
down
on
the
left
(of
three
separate
ticks)
to
generate
a
netlist.
When
the
icds
indicates
this
has
completed
correctly,
click
on
the
Simulate
icon
which
is
the
third
icon
down
in
the
Virtuoso
Verilog
Environment
etc
window.
This
brings
up
the
Design
Browser
1-SimVision
and
Console-SimVision
windows.
As
described
in
Ex.
3,
place
any
signals
you
wish
to
observe
on
the
Waveform
Viewer,
then
run
the
simulation
from
a
command
file
(reset
simulation,
use
File->Source
Command
Script
to
select
an
input
command
file
<filename.sv>,
send
commands
to
the
Simulator
Console
(NC-Sim)
and
press
OK
to
run
the
simulation).
When
your
processor
simulates
correctly
using
your
Verilog
description
of
Control
for
all
the
supplied
test
programs,
you
should
show
the
results
of
running
the
assembler
code
to
a
demonstrator
who
will
sign
off
your
sheet.
41
5.1
Architecture
The
processor
is
a
16-bit
machine
with
a
RISC
style
architecture.
Operands
for
ALU
operations
come
from
registers
inside
the
processor
and
the
result
is
returned
to
a
register.
Separate
instructions
are
provided
to
move
data
between
the
registers
and
external
memory.
There
are
8
registers,
R0-R7.
R0
is
always
zero
and
can
be
used
as
a
source
operand,
allowing
move
instructions
to
be
synthesised
from
an
add
instruction.
R0
may
be
written
to,
but
the
result
is
always
discarded,
allowing
compare
instructions
to
be
synthesised
from
subtract
instructions.
Register
R7
is
the
program
counter
and,
from
a
programmers
view
of
the
machine,
has
equal
status
with
the
other
registers
allowing
PC-relative
addressing
to
be
supported.
CC
3
N
-
Sign
Flag
CC
2
Z
-
Zero
Flag
CC
1
V
-
Overflow
Flag
CC
0
C
-
Carry
Out
Flag
Table 5.1: Condition Code Bits The processor has a 4-bit condition code register shown in Table 5.1. It holds status information relating to the ALU output. The four status bits indicate if the ALU result is negative (N bit is 1 if ALU result is -ve; N is 0 for +ve or zero), zero (Z bit is 1 if ALU result is all 0s; Z is 0 if non-zero), overflows (V bit is 1 is the result is out of range, i.e. if adding two +ve numbers yields a -ve result, or if adding two -ve numbers yields a +ve result; V is 0 if number is within range), or has a carry out (C bit is 1 if there is a carry out of bit 15 of the ALU result; C is 0 if bit 15 of the ALU result has no carry out). Each arithmetic and logical instruction has the option of updating or not updating the condition code register. If an arithmetic/logical instruction does not update the condition code register then its state remains as is. Load, Store and Branch instructions never update the condition code register and so do not change the existing state of this register. There are 3 instruction formats in a fixed-length 16-bit instruction. The machine operates on 16-bit words only. Byte addressing is not supported.
arithmetic is performed by an adder since subtraction of A-B can be performed as A + B +'1' . Some other instructions such as cmp, nop and mov can be expressed directly in terms of the basic instructions and are supported by the assembler. Other instructions may be synthesised from the combinations of the basic instruction set as shown in Chapter 6. Instruction Code 000 001 010 011 100 101 110 Instruction Explanation ADD ADC SUB SBC AND OR LD/ST 2s complement add 2s complement add with carry-in 2s complement subtract 2s complement subtract with borrow Bitwise AND of two 16-bit words Bitwise OR of two 16-bit words Load register from memory or Store register to memory 111 Bcc Branch if condition cc is satisfied.
Table 5.2: Basic Instruction Set Shift instructions are somewhat special. Shift-left instructions can be derived from the basic instruction set. Shift-right instructions have been added as a rather ugly kludge and are dealt with in next section.
SHIFT
Immediate
43
15 1 14 1 13 1 12 1 11 10 9 8 7 6 5 4 3 2 1 0
Offset
The processor is a 3-address machine specifying two source operands and a destination operand. In the case of arithmetic and logical instructions (instruction codes 0 to 5), the two source operands are either two registers (Type-1 instructions) or a register and a 5 bit signed immediate value (Type-2 instructions). The result of the operation is returned to the destination register (DST) and the condition codes are updated depending on the state of bit 11 (if LDCC is 1 then update condition-codes; if LDCC is 0, do not update condition codes) Branch (code 7, Bcc) and load/store (LD/ST) instructions (code 6) do not update the condition-code register. In the case of a LD/ST instruction, bit 11 is used to determine the direction of the data transfer: if LDCC is 1, the operation is store to memory; if LDCC is 0, the operation is load from memory. The memory address is constructed from the sum of the two source operands, i.e. the two registers specified by SRC A and SRC B for Type-1 instructions or the register specified by SRC A and the 5-bit signed immediate for Type-2 instructions. The register specified by DST is the register to be written into memory for a ST operation or the register to be loaded from memory for a LD operation. Type 3 instructions are branch instructions (code 7). Here, the 8-bit signed offset is added to the Program Counter to compute the address of the instruction to be jumped to if the branch is taken. This is written into the Program Counter if the branch is taken but is ignored otherwise. In branch instructions, bits 8 to 11 specify the conditions under which a branch is taken. These usually involve bit(s) in the condition code register. The branch conditions are described in section 5.5.
44
Operation No Shift ASR ROR RRC Instr Bit 1 0 0 1 1 Instr Bit 0 0 1 0 1 Shifter Output, bit 15 = A15 A15 A0 CC0 Shifter Carry-out, (CSH) = 0 A0 A0 A0
45
zero test
46
Cin
CC
3
Sign
S15
S15
S15
S15
S15
S15
CC
2
Zero
S=0
S=0
S=0
S=0
S=0
S=0
CC
1
Overflow
C14
!=
C15
C14
!=
C15
C14
!=
C15
C14
!=
C15
0
0
CC
0
Carry
C15
C15
C15
C15
Update codes if LDCC is set yes yes yes yes yes yes no no
0
CC0
1
CC0
0 0 0 0
Table 5.5: Condition code settings where: S is a 16 bit result of an arithmetic or logic operation, i.e. the ALU result C14 and C15 are the carry bits from bits 14 and 15 respectively of an arithmetic operation CSH is the shifter carry-out (see Table 5.3) and is only used to update the condition code register for a logical order which performs a ASR, ROR or RRC shift shift is TRUE for type 1 instructions when bits 0 and 1 are NOT equal to 00. SUB and SBC are done as an addition with CC0 and Cin settings as shown in the above table. CC0 is stored as a borrow and is C15 since a borrow = carry . Thus A B borrow = A + B + borrow . For SUB, there is no borrow, so A B = A + B +'1' while for SBC A B borrow = A + B + borrow = A + B + CC0 .
47
48
49
The input file is parsed on line by line basis. Each line should contain a single instruction or assembler directive or full line comment. Three output files are produced <filename.mem> contains code suitable for loading into the processor memory model using the loadmem.sh command as detailed below. <filename.hex> contains code suitable for down-loading into the memory on the Xilinx board <filename.bin> contains contains the binary of the assembled code A Verilog memory model is used and this can only read from a file named xc4000mem.ram and dumped to a file named xc4000mem.dump. To create the xc4000mem.ram file from the <filename.mem> created by sasm, use loadmem.sh <filename.mem> Thus, for example sasm test1.asm ladmem.sh test1.mem will create a memory file xc4000mem.ram of the assembler comprising test program 1. 50
1The assembler was written by Andrew Bardsley who also devised the original STUMP architecture
51
52
<string>
align [<label>[:]] align .w [<label>[:]] align .s [<label>[:]] align .l [<label>[:]] align <expr> The program counter is aligned to the nearest word (.w and .s) or long word (.l) or <expr> words. The last form is useful reserving a block of memory. Word alignment is only useful between data statements which are byte packed. Using a label with a data statement has the side-effect of word-aligning the first data element. Comments [<instruction or directive>] ;< comment>
53
Any line can be appended with a comment. However only comments that start in column 1 are echoed to the listing file. Other comments are discarded. Constants The C form of constants are allowed with the addition of binary constants which are introduced by 0b.
55
Appendices
56
57
58
// Verilog HDL for STUMP processor processor_v module processor (BUS_A, RAM_CS, BUS_RD, BUS_WR, BUS_D, BUSCLK, GSR); //processor to memory signals : output [15:0] BUS_A; output RAM_CS; output BUS_RD; output BUS_WR; inout [15:0] BUS_D; // processor signals input BUSCLK; input GSR; reg reg reg reg reg reg reg reg reg //processor clock // reset signal to processor //address bus // memory chip select // memory read // memory write // data bus
[15:0] D_OUT; [15:0] BUS_A; RAM_CS, BUS_RD, BUS_WR; [15:0] INSTR; [15:0] REG_BANK [7:0]; [15:0] RD_A, ALUA, ALUB, S; [3:0] CC ; [15:14] C; CSH;
wire [15:0] PC; assign BUS_D = D_OUT; assign PC = REG_BANK[7]; // PC is an alias for REG_BANK[7] // Used for debug only - do not // use PC in your code always begin if(GSR == 0) begin // Fetch State Memory_Read(REG_BANK[7], INSTR); REG_BANK[7] = REG_BANK[7] + 1; // Execute State if(INSTR[15:13] == 3b111) begin // branch instruction // Get instr pointed to by PC // add +1 to PC as soon as instr // fetched
// // put your code here to form ALU inputs ALUA and ALUB // note that the ALUA input is the output from the shifter // end else if(INSTR[12] == 1b1) begin // type 2 instruction
// // put your code here to form ALU inputs ALUA and // ALUB for type 2 instrs // end
59
else begin
// type 1 instruction
// // put your code here to form ALU inputs ALUA and // ALUB // end // op decode case (INSTR[15:13]) 0 : Add(ALUA, ALUB, 1b0, S, C); // add instr done for you // // put your code here to form ALU result S and carry // bits C14 and C15 if needed // endcase // // put your code here to update the condition code // register. // // Write state case (INSTR[15:13]) 3b111 : if (Testbranch(INSTR[11:8], CC) == 1) REG_BANK[7] = S; 3b110 : if (INSTR[11] == 1) Memory_Write(S,REG_BANK[INSTR[10:8]]); else begin Memory_Read(S, REG_BANK[INSTR[10:8]]); REG_BANK[0] = 0; end default : begin REG_BANK[INSTR[10:8]] = S; REG_BANK[0] = 0; end endcase end else // reset state begin RAM_CS = 1; wait (GSR == 0) begin REG_BANK[7] = 0; REG_BANK[0] = 0; CC = 0; end end end // end of always
60
/////////////////////////////////////////
// writes data on DMW to memory address AMW input [15:0] AMW, DMW;
begin RAM_CS = 0; BUS_RD = 1; BUS_WR = 1; BUS_A = AMW; D_OUT = DMW ; #25 BUS_WR = 0; #50 BUS_WR = 1; #25 RAM_CS = 1; end endtask task Memory_Read; //reads memory address AMR and places data on DMR input [15:0] AMR; output [15:0] DMR; begin RAM_CS = 0; BUS_RD = 1; BUS_WR = 1; BUS_A = AMR; D_OUT = 16hzzzz ; #25 BUS_RD = 0; #50 DMR = BUS_D ; #25 BUS_RD = 1; RAM_CS = 1; end endtask task Add; // adds a 1-bit carry in Cin to two 16-bit quantities A and B // produces 16-bit sum S and carry out C from addition in bits 14 and 15 input [15:0] A, B; input CIN ; output [15:0] S; output [15:14] C; reg [16:0] RESULT; begin RESULT = A[14:0] + B[14:0] + CIN ; C[14] = RESULT[15]; RESULT[16:15] = A[15] + B[15] + C[14]; S = RESULT[15:0]; C[15] = RESULT[16]; end endtask
61
task Shift; // shifts input A and a carry in Cin according to shift type INSTR[1:0] // produces shifter output ASH and shifter carry out CSH input [15:0] input [1:0] input output [15:0] output CSH; A; INSTR; CIN; ASH;
begin case (INSTR) 0 : begin CSH 1 : begin CSH 2 : begin CSH 3 : begin CSH endcase end endtask
= = = =
= = = =
A A A A
; end >> 1 ; ASH[15] = A[15] ; end >> 1 ; ASH[15] = A[0] ; end >> 1 ; ASH[15] = CIN ; end
function Testbranch; // compares branch condition INSTR[11:8] with cond code reg // returns 1 if branch to be taken, returns 0 if jump not taken input [11:8] BRANCH_INSTR; input [3:0] CC; reg N, Z, V, C; begin {N,Z,V,C} = CC; case (BRANCH_INSTR) 0 : Testbranch = 1; 1 : Testbranch = 0; 2 : Testbranch = ~(C|Z); 3 : Testbranch = C|Z; 4 : Testbranch = ~C; 5 : Testbranch = C; 6 : Testbranch = ~Z; 7 : Testbranch = Z; 8 : Testbranch = ~V; 9 : Testbranch = V; 10 : Testbranch = ~N; 11 : Testbranch = N; 12 : Testbranch = V~^N; 13 : Testbranch = V^N; 14 : Testbranch = ~((V^N)|Z) ; 15 : Testbranch = ((V^N)|Z) ; endcase end endfunction endmodule
62
63
64
// Control of finite state machine always @ (posedge CLK) begin if (RESET == 1) state = 0; else if (state == 3) state = 1; else state = state + 1; end // Control of state driven combinatorial logic always @ (state, INSTR, CC) begin case (state) 0: // Reset cycle begin // put your code here for signals set in reset phase // put your code here for dont care signals in the reset state end 1: // Fetch cycle // Fetches instruction from memory, loads into // instruction register & increments program counter begin // put your code here for signals that need to be set in the fetch // phase put your code here for dont care signals in the fetch state end 2: // Execute cycle
// Instruction is decoded and executed // Instruction may be: load/store (Type I or Type II) // or a Branch // or an ALU operation (Type I or Type II) begin // you need to decode the instruction and then set signals // appropriately to a value or dont care for that instruction end 3: // Write back cycle
begin // Instruction is decoded to determine whether // the output of the ALU should be written back // to memory or register bank or discarded and signals are set to //a value or dont care appropriately end
default: // All signals set to begin MEMWR = bx; MEMOE = bx; FETCH = bx; EXE = bx; LDREG LDCC = bx; IMMED = bx; BR = SRPA = 3bxxx; SRPB = 3bxxx; end endcase end
function Testbranch; //returns 1 if branch taken, 0 otherwise input [11:8] BRANCH_INSTR; input [3:0] CC; reg N, Z, V, C; begin {N,Z,V,C} = CC; case (BRANCH_INSTR) 0 : Testbranch = 1; 1 : Testbranch = 0; 2 : Testbranch = ~(C|Z); 3 : Testbranch = C|Z; 4 : Testbranch = ~C; 5 : Testbranch = C; 6 : Testbranch = ~Z; 7 : Testbranch = Z; 8 : Testbranch = ~V; 9 : Testbranch = V; 10 : Testbranch = ~N; 11 : Testbranch = N; 12 : Testbranch = V~^N; 13 : Testbranch = V^N; 14 : Testbranch = ~((V^N)|Z) ; 15 : Testbranch = ((V^N)|Z) ; endcase end endfunction endmodule
// // // // // // // // // // // // // // // //
BAL BNV BHI BLS BCC BCS BNE BEQ BVC BVS BPL BMI BGE BLT BGT BLE
66
67
68
69
70
Notes: 1. The clock is applied to all registers at all times but the clock is ignored unless the register is enabled. 2. The control signals to the memory are all active low. 3. Signal names have been defined for the Control Block. Please also use these names in your Verilog code. 4. Although Verilog is case sensitive with regard to signal names, other tools in the flow are not. Hence, you should be consistent with the use of upper/lower case in your names and signal names should be unique.
71
CONTROL BLOCK SIGNALS TO BUS INTERFACE MEMWR active-low signal to Bus Interface when memory is to be written MEMOE active-low signal to Bus Interface when Memory is to be read BUS INTERFACE TO DATAPATH &CONTROL BLOCK CLK the clock signal which goes to all flip flops in the processor. RESET reset signal - high when active
72
SALUD (Select ALU Data) selects data from the Result Register for writing into the Register Bank. If SALUD is low, data from the memory (MDIN[15:0]) is selected instead.
73
74
75
76
77
78
79
80
81
82
83
84
Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal
Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal
85
Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal
Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal
86
Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal
Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal
87
Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal
Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal
88
Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal
Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal
89
Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal
Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal
90
Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal
Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal
91
ALU Operation
Branch
Load
Store
ALU
Operation
Reg
op
Reg
Reg
Reg
op
Immed
Reg
Branch
PC
+
Immed
PC
Load
Reg+Reg
Addr;
[Addr]
Reg
Reg+Immed
Addr;
[Addr]
Reg
Store
Reg+Reg
Addr;
[Addr]
Reg
Reg+Immed
Addr;
[Addr]
Reg
FETCH EXE LDREG SRPA[2:0] SRPB[2:0] SRP[2:0] BR IMMED LDCC SALUD MEMOE MEMWR
92
Phase
2:
Decode/Execute
ALU
Operation
Reg
op
Reg
Reg
Reg
op
Immed
Reg
Branch
PC
+
Immed
PC
Load
Reg+Reg
Addr;
[Addr]
Reg
Reg+Immed
Addr;
[Addr]
Reg
Store
Reg+Reg
Addr;
[Addr]
Reg
Reg+Immed
Addr;
[Addr]
Reg
FETCH EXE LDREG SRPA[2:0] SRPB[2:0] SRP[2:0] BR IMMED LDCC SALUD MEMOE MEMWR
Phase 3: Writeback
93
ALU Operation
Branch
Load
Store
ALU
Operation
Reg
op
Reg
Reg
Reg
op
Immed
Reg
Branch
PC
+
Immed
PC
Load
Reg+Reg
Addr;
[Addr]
Reg
Reg+Immed
Addr;
[Addr]
Reg
Store
Reg+Reg
Addr;
[Addr]
Reg
Reg+Immed
Addr;
[Addr]
Reg
FETCH EXE LDREG SRPA[2:0] SRPB[2:0] SRP[2:0] BR IMMED LDCC SALUD MEMOE MEMWR
Phase 2: Decode/Execute
94
ALU
Operation
Reg
op
Reg
Reg
Reg
op
Immed
Reg
Branch
PC
+
Immed
PC
Load
Reg+Reg
Addr;
[Addr]
Reg
Reg+Immed
Addr;
[Addr]
Reg
Store
Reg+Reg
Addr;
[Addr]
Reg
Reg+Immed
Addr;
[Addr]
Reg
FETCH EXE LDREG SRPA[2:0] SRPB[2:0] SRP[2:0] BR IMMED LDCC SALUD MEMOE MEMWR
Phase 3: Writeback
95