You are on page 1of 96

COMP22111

Laboratory Manual

The STUMP Processor Chip Project


School of Computer Science The University of Manchester September 2011

Table of Contents
Chapter 1 Introduction ................................................................................................................ 4 Laboratory Aims ......................................................................................................................... 4 Learning Outcomes .................................................................................................................... 4 Chapter 2 Laboratory Organisation.......................................................................................... 6 Schedule......................................................................................................................................... 6 The Project.................................................................................................................................... 7 Preparation................................................................................................................................... 7 Deadlines....................................................................................................................................... 7 Marks .............................................................................................................................................. 7 Chapter 3 The Design Process ..................................................................................................10 3.1 Introduction ........................................................................................................................10 3.2 Specification ........................................................................................................................12 3.3 Top-level Behavioural Description .............................................................................12 3.4 Chip Architecture Design ................................................................................................14 3.5 Register-Transfer Level (RTL) Design........................................................................14 3.6 Logic Level Design .............................................................................................................18 3.7 Field Programmable Gate Arrays (FPGAs)................................................................19 3.8 Layout ....................................................................................................................................20 3.9 Post-layout ...........................................................................................................................21 3.10 Packaging and Test.........................................................................................................21 Chapter 4 Design Tasks...............................................................................................................22 Exercise 1 Operation of the STUMP Assembler...........................................................23 Exercise 2 Top Level Model of the STUMP in Verilog ................................................26 Exercise 3 Behavioural Simulation of the STUMP ......................................................30 Exercise 4 Signal Usage Charts for the Control............................................................34 Exercise 5 Verilog Specification of the Control Block ...............................................38 Exercise 6 Test of the RTL Design ....................................................................................40 Chapter 5 The Processor Specification..................................................................................42 5.1 Architecture.........................................................................................................................42 5.2 Instruction Set .................................................................................................................42 5.3 Instruction Formats..........................................................................................................43 5.4 Shift Operations .................................................................................................................44 5.5 Conditional Branch Instructions ..................................................................................46 5.6 Condition Codes .................................................................................................................46 5.7 Processor Interface...........................................................................................................48 Chapter 6 Programming the STUMP Processor ..................................................................50 6.1 Introduction ........................................................................................................................50 6.2 Using the SASM Assembler .............................................................................................50 6.3 Extending the Instruction Set ........................................................................................54 2

Appendices......................................................................................................................................56 Appendix A Verilog Top Level Behavioural Model of the STUMP Processor ............58 Appendix B Verilog Control Description...............................................................................64 Appendix C STUMP Processor: RTL Design ..........................................................................68 C1. RTL Datapath Design ........................................................................................................68 C2. Bus Interface Component................................................................................................71 C3. Control Block Signals........................................................................................................72 Appendix D Cadence End User Agreement...........................................................................74 Appendix E Exercise Answer Sheets.......................................................................................82 Exercise 1: STUMP Assembler .................................................................................................84 Copy of Signal Usage Charts.......................................................................................................90 Exercise 4: Signal Usage Charts ................................................................................................90

Chapter 1 Introduction
This lab manual accompanies the 2nd year course unit COMP22111: VLSI System Design. The material presented further reiterates the material given in lectures and should be considered to be examinable. The practical work outlined here will enable you to design and implement a fully functioning (albeit, simplified) processor.

Laboratory Aims
To learn about the process of designing on silicon by doing. To design a simple RISC processor, comprising the datapath and control, at (all) lev- els of the design hierarchy ranging from a high level specification down to the Reg- ister Transfer Level (RTL). To have exposure to industry standard CAD tools. To use assembler test programs which can be used to test all levels of the design. To simulate the complete processor at the different levels of the design hierarchy. To support the lectures with a practical example.

Learning Outcomes
After completing the laboratory a student will be able to specify functionality in the Verilog hardware description language, gain experience of the different stages of the VLSI design process down to the RTL level, gain experience of the composition and running of test programs, and checking their results, be able to use the Cadence CAD tool to find hardware/software errors, gain experience of the appropriate CAD tools available for use at the different stages of the design process.

VLSI systems design requires inspiration and imagination as well as a sound technical background. Most of the technical background can be imparted by means of lectures, but when it comes to design there is no substitute for experience. We believe that, in the words of Albert Camus, You cannot create experience - you must undergo it. The COMP22111 course has therefore been structured so that the technical background is covered in a taught course consisting of lectures, while the design process itself is taught by means of a design project in laboratory classes. In the laboratory course students design a small ASIC (Application Specific Integrated Circuit) which, if completed successfully, could be implemented on silicon. This experience should give some feeling for the trials, tribulations and satisfactions of designing systems on silicon. The objective of the project is to learn some of the methodology of VLSI design by carrying out the design of a small RISC processor. It is also intended to help students 4

appreciate and understand the operation and architecture of a RISC processor. The STUMP processor has been fully specified by D. A. Edwards and parts of it have been designed. Thus, students should regard themselves as members of a design team whose job it is to do a significant part of the design; you will complete a partially done design and simulate the design as a whole. In contrast to design work carried out in previous School of Computer Science laboratories the work will not consist of designing gate-level circuits - the emphasis in this project is rather on systems on silicon. The work starts with the high level behavioural modelling of the chip and proceeds to Register Transfer Level (RTL), where again behavioural modelling is used. Tools are available to automatically translate a RTL description into logic and then to generate layout of the chip onto silicon; this will be described in lectures but not performed in the lab. The methodology described in lectures and used in the lab is typical of that adopted by many designers in industry for ASICs, since only a small proportion of the ASICs pro- duced nowadays are designed using full-custom methods. This manual includes basic information about the laboratory organisation (Chapter 2), a description of the design process (Chapter 3) and stage-by-stage details of the design tasks (Chapter 4). Chapter 3 is intended to give a background description of the design process while chapter 4 describes how the process is applied to the specific design project being carried out in the laboratory. The specification of the STUMP processor is in Chapter 5. Chapter 6 gives information on how to program the processor. The appendices contain information that you will need in undertaking the design tasks. Appendix A contains a copy of the top level (algorithmic) Verilog behavioural description of the processor which is to be completed as part of the second exercise. Appendix B contains a (very) incomplete Verilog description of the control for the STUMP at the RTL design level. Appendix C shows the RTL design of the STUMP datapath which has been done for you. Answer Sheets which students fill in and hand in for laboratory exercises 1 and 4 are to be found in Appendix E. Copies of answer sheets can also be found in the laboratory. The emphasis of the manual is on how to do it and it does not attempt to give a com- prehensive account of the many different facets of chip design. A fuller picture should emerge when the design work is taken together with the taught course material. References Information on the top-down design approach can be found in Chapter 3 of this manual. An introduction to Verilog can be found in the Cadence manuals: /home/cadtools5/cds_2008_2009/ldv_2009/doc/pdf/vlogref.pdf

Chapter 2 Laboratory Organisation


Schedule
The COMP22111 lab comprises eight 2-hour sessions in weeks 3 to 11 excluding reading week (week 6). There is no lab in Week 12, this is reserved for demonstrating and submitting exercise 6. There will be 3 lectures a week in weeks 1, 2 and 12 when there are no formal labs scheduled. The schedule of lectures and labs in 2010 is shown in Table 2.1. Semester Week 1 2 3 4 5 6 7 8 9 10 11 12 Lab in Tootill 1 Tues: 10:00-12:00 - - Oct 11 (Exercise 1) Oct 18 (Exercise 2.1) Oct 25 (Exercise 2.1) Lectures in Tues: 10:00-12:00 Sept 27 (Lectures 1 & 2) Oct 4 (Lectures 4 & 5) - - - Lectures in Thurs: 14:00-15:00 Sept 29 (Lecture 3) Oct 6 (Lecture 6) Oct 13 (Lecture 7) Oct 20 (Lecture 8) Oct 27 (Lecture 9)

Nov 8 Nov 10 (Exercise 2.2) (Lecture 10) Nov 15 - Nov 17 (Exercise 2.2) (Lecture 11) Nov 22 - Nov 24 (Exercise 3) (Lecture 12) Nov 29 - Dec 1 (Exercise 4.1) (Lecture 13) Dec 6 - Dec 8 (Exercise 4.2) (Lecture 14) Dec 13 (written Dec 13 Dec 15 work/code hand-in (Lectures 15 & 16) (Lecture 17) deadline 12:00) Dec 14 (demo deadline 15:00) Table 2.1: Schedule for Lectures and Lab in 2011 6

Reading Week No lectures/labs -

The final deadline for handing in written work or code for marking is 12:00 on Tuesday December 13th, week 12. The final deadline for demonstrating work is on the afternoon of Wednesday December 14th. Students wishing to demonstrate work must put their names on a list between 14:00 and 15:00. Names will be taken randomly from the list and students given one opportunity to demonstrate their work. Note: no work will be accepted or demonstrated after the deadlines unless the student concerned has a lab mark of less than 40%.

The Project
The lab work consists of designing and testing a simple 16-bit RISC processor down to the Register Transfer Level.

Preparation
Preparation outside the timetabled laboratory classes is necessary and expected. Students who wish to make good progress in the laboratory time when help is available should not only read the relevant material for each week before coming to the lab but should also do further work on stages of the design outside this. Remember, you are expected to spend the same amount of time on preparation as you spend in the scheduled lab time. In addition, the lab work and lectures are closely integrated, so important and useful information about lab exercises is given in lectures; so attendance at lectures is closely linked to good progress in the lab!

Deadlines
The exercise is divided into a number of stages with deadlines as indicated in Table 2.2. The details of the deliverables for each stage are given in Chapter 4 of this manual. Due to the incremental nature of the laboratory, an extension system is not operated and you do not need to request an extension. However, to complete the project work, you should adhere to, or be ahead of, the deadlines given.

Marks
This course has more labs and less lectures than other courses and the overall lab and exam mark is weighted accordingly. Students are expected to work individually and independently. Hence, work resulting from collaborative efforts will result in the mark awarded for the work being equally split amongst the contributors. As the COMP22111 lab forms a significant contribution to the overall course mark, it is in your interests to invest the time in obtaining a good lab mark!

Design Level Specification Ex. - Design Stage No. of Sessions Semester Week(s) Exercise hand-in Week Max Mark -

Programmer Top Level

1. 2.1

Top Level RTL RTL RTL -

2.2 3. 4.1 4.2. -

Read Lab - Manual, especially Chapter 5 STUMP 1 Week 3 assembler Oct 11 Top level 2 Weeks 4 & 5 model in Oct 18 & 25 Verilog and entry Simulation of 2 Weeks 7 & 8 top level model Nov 8 & 15 Signal usage 1 Week 9 charts Nov 22 Verilog 1 Week 10 specification of Nov 29 control RTL Testing RTL 1 Week 11 design Dec 6 Deadline for - Week 12 written Dec 13 work/code hand-in Deadline to - Week 12 sign up for Dec 14 demo Table 2.2: COMP22111 Schedule

Week 3 Oct 11

20

Week 8 Nov 15 Week 9 Nov 22 Week 11 Dec 6 Week 12 12:00 Dec 15 Week 12 15:00 Dec 14

30 10 40 -

- 100

Chapter 3 The Design Process


3.1 Introduction
This chapter describes the sequence of steps and abstractions (levels of detail) that are used in transforming a circuit requirement into a silicon layout when a semi-custom integrated circuit is being designed. A semi-custom design is one in which libraries of pre-defined gates and logic components are provided by the circuit manufacturer; the silicon layout is then carried out using automated CAD tools. Table 3.1 shows the large number of different representations involved in the design of a semi-custom ASIC. It will be seen that the representations are divided into three domains: behavioural, structural and physical, and into six levels: top level down to production. The design of very large chips may include an extra level, for example there could be a subsystem level between the top level and the chip architecture level. The table indicates that the structural representation consists of a series of schematic diagrams. Although most engineers prefer to work with schematic diagrams, the structure could also be described using a hardware description language (HDL). It is important to understand that this classification into different representations and levels of detail is not the same as circuit hierarchy. A circuit hierarchy represents a cir- cuit decomposition into successive levels of detail while Table 3.1 shows design decomposition into different levels of abstraction. The design of a small circuit with no hierarchy would still involve several levels of design abstraction. The chip design process consists of creating a sequence of different abstractions at successively lower levels. It starts with a chip requirement and specification and pro- ceeds until a representation of the masks required for silicon fabrication is obtained. The solid arrows in Table 3.1 indicate the main design steps which will be described in this chapter; this sequence of steps constitutes a design methodology. The particular methodology shown is suitable for standard cell ASICs. Verification of each stage of the design is carried out by means of simulation. The test patterns needed as simulation stimuli are summarised in the rightmost column. Note how the same test patterns are used in successive levels to ensure the correct decomposition of one level to the next. The dashed arrows show where the test patterns and the behavioural models for each design stage come from. The use of a pure top-down design methodology requires considerable experience if the effects of high-level decisions on performance and on lower level implementation are to be anticipated. In practice it is common to carry out low-level feasibility studies before finalising high level specifications and descriptions. The COMP22111 design exercise will, nonetheless, be undertaken in a straightforward top-down manner.

10


Design Level Top Level Components Whole chip The Three Design Representation Domains Behavioural Written specification Executable behavioural description Structural Schematic shows core logic connected to input and output pads Physical Chip architecture shown as pads, core logic outline and power distribution Tests

Chip Architecture

Major functional blocks

Behaviour of functional blocks described in a HDL

Register Transfer (RTL)

Register, ALU, FSM, MUX, adder, etc.

Behaviour of RTL components described in a HDL

Logic

Logic Gates

Transistor

Transistors, e.g. CMOS

Behaviour of gates as simulation models provided by the silicon vendor Electrical model, e.g. SPICE models, or transistors used by the silicon vendor

Block diagram schematic of chip shows inter- connectivity of functional blocks Schematic diagrams of each functional block show interconnectiv ity of RTL components Each RTL component is shown as a schematic of interconnecte d gates Circuit diagrams show transistors connected to form gates

Input to behavioural model should reflect all possible system conditions whole chip test patterns Floorplan Test for each shows size and functional shape of block + test for rectangular whole chip blocks with routing channels Components represented as areas of standard cells or as blocks of special cells, e.g. RAM, PLA, datapath Outline for each cell + interconnectio n tracks Test for each RTL block + each functional block + whole chip

Same test as for RTL

Production

Polygons represent mask shapes used for fabrication of transistors and inter- connect Masks or reticules of pattern for each layer of fabrication

Tests in the form of analogue waveforms

Tests patterns designed to find structural production faults

Table 3.1: Design representations involved in the design of a semi-custom ASIC

11

3.2 Specification
A chip design starts with a set of requirements from which a specification is drawn. The specification defines precisely what the chip does - its function - not how to do it. It is the users view of what the chip does. In the real world the specification is needed to make sure that the designer and cus- tomer agree on the function of the chip, and to define the interaction, or interface, of the chip with the external system of which it forms a part. Cost and performance criteria are also a part of the requirements. In an educational exercise there are no customer requirements to determine design constraints. The main constraint in the class context is for a design which can be com- pleted within a limited time (approx 16hrs). When deciding what to put in a specification and how to write it, it is useful to consider what information will be needed in the data sheet of the completed device because the two are very similar. A good summary of the main functions of a specification are: a summary description of what a chip does a list of the chips input and output pins required performance (clock rate) and power dissipation a list of the major modes in which the chip operates for each mode signals which control the mode function executed in that mode performance constraints on execution such as minimum and maximum times between inputs and outputs

3.3 Top-level Behavioural Description


The top-level behavioural description is written as an executable program using a suit- able programming language. It provides a means of simulating the function of the chip and is much more precise than the specification which is written in English. As it describes the operation of the system, it is somewhat incomplete, in that there is no timing information provided; consequently it cannot be synthesized into hardware directly. The program should accept inputs and compute the appropriate outputs, then wait for the next set of inputs. The way the program works does not describe how the chip itself works - all that matters is that it captures the intended function of the device and that running it checks that what is specified is what is wanted. This high level simulation is an extremely useful step. It clarifies the specification, brings to light potential difficulties and hidden assumptions, and helps identify the major internal states of the chip. It is not surprising that the functional simulation often leads to a revised specification.

12

Programming languages have been developed especially for the behavioural and struc- tural modelling of integrated circuits; they are known as Hardware Description Lan- guages, or HDLs. Two examples are VHDL and Verilog. Verilog is a widely used standard and is used in preference to VHDL in most CAD tools. Thus, Verilog will be used in this course. Other general purpose languages, such as C, Java or C++, could also be used for the top-level behavioural modelling of a chip. When Verilog is being used to model a whole chip, a common procedure is to connect it to a test bench as shown in Figure 3.1. The module representing the chip contains a model of the chip and the Tester module (the test bench), which emulates the external environment of the chip. During a simulation the Tester module reads some form of input from an external file and extracts data to be applied as inputs to the chip model. It also captures the chip output data and writes it to an external file. The form of the external test file will depend on the type of chip being modelled. For example, if the chip is a processor the test file could be in the form of the binary representation of a program to be executed by the processor.

Figure 3.1: A Verilog Modelling System

The form of the chip model will depend on the stage reached in the design. A purely behavioural model, which describes the function of the chip but not its internal struc- ture, is used for the Top Level design stage. At lower levels of the design the model contains internal information usually in the form of a behavioural model describing the internal data flow and operations but it can also contain a structural (gate) description. The same test bench and test program should be used at all levels of the design to ensure that each level of the design decomposition carries out exactly the same function as the top level description.

13

3.4 Chip Architecture Design


Having decided on the overall chip function, the design is partitioned into major func- tional blocks. For example, a processor chip might be partitioned into input and output interface blocks, a datapath, RAM, control etc. The means by which data is transferred between blocks must also be decided. This will usually be in the form of a bus structure. The structure of the architecture level of design can be captured as a schematic diagram or described using a HDL. The design is then simulated using behavioural models of each of the functional blocks written using a HDL. In a small design this level of design may well be omitted, or a simple block diagram might be produced as an intermediate stage but no simulation carried out.

3.5 Register-Transfer Level (RTL) Design


3.5.1 What is Register-Transfer?
A register-transfer system is specified as a set of memory elements (e.g. registers) and combinational logic functions between the memory elements as shown in Figure 3.2. The basic memory elements used in student designs are usually D-type edge triggered flip-flops. All operations in an RTL system take place between clocked registers.

Figure 3.2: A register-transfer system

On each active clock-edge data is clocked from the D inputs of the flip-flops (FFs) to the Q outputs which form the inputs to the following block(s) of logic; see Figure 3.2. After a short delay the outputs of each CL block change as a result of the change to the block inputs.

14

The elements in the RTL design are usually represented as boxes, or blocks, in a block diagram which shows the interconnections between the blocks. The internal logic structure of the combinational logic is not defined at this stage but the function, or behaviour, is described as a model thta can be used in a simulation of the RTL design. Thus, it can be seen that a register-transfer design gives a complete specification of what the chip will do on every clock cycle. Students may already have come across most typical combinational RTL elements in earlier courses: adders, multiplexers, comparators, ALUs etc. In addition to these there will be designer elements i.e. blocks of random logic designed to carry out arbitrary combinations of functions not included in standard libraries; the combinational logic block of a FSM (Finite State Machine) will be of this type. Sequential elements consist of either straightforward storage registers - a set of D-type flip-flops for example, or more complex assemblages such as counters or state machines. Counters and FSMs contain combinational logic in addition to memory ele- ments. Thus, the separate combinational logic and register blocks of an RTL block dia- gram will not always be obvious because some of the RT structure is hidden within these more complex blocks. However, each block in the diagram should only contain one register.

3.5.2 Starting an RTL Design ASM diagrams


At Register Transfer Level the operation of the circuit is described as operations between clocked registers, where each clocking of registers corresponds to moving from one state to another. In the COMP22111 exercise, the RTL design of the datapath has been done for you and is shown in Appendix C. If you need to do an RTL datapath design then one starting point is to summarise the design in a diagram which shows both the major states of the circuit and the operations. The ASM (Algorithmic State Machine) diagram is such a diagram. It is a form of flow chart in which states are represented by rectangular boxes and decisions by angled boxes as shown in Figure 3.3. Note the two-way decision boxes. The operations to be carried out within a state are written inside the state box.

3.5.3 How to Carry Out a RTL Design


Most first-time designers find it difficult to know where to start in decomposing a behavioural description of a chip into a RTL design and, having made a start, go through many cycles of trial and error before arriving at a satisfactory design. This is because, even in a small design, there are many different ways in which events can be scheduled and functions allocated to different blocks. The following three-stage (and many-step) procedure can be used by first-time design- ers. Stages 1 and 2 are carried out on paper; the design is then transferred to the CAD system in Stage 3.

15

Figure 3.3. An ASM Example

Experienced designers will spend time and effort optimising their designs for silicon area, performance etc. but a first-time designer will be happy with a completed design which works! It will probably be helpful to think of the design as made up of three parts: 1. memory storage - registers, RAM etc. 2. datapath functions - e.g. logical and arithmetic functions 3. control - a block which includes an FSM to control the state sequence of the circuit.

16

Stage 1 Preliminary Design 1. Draw an ASM diagram of the design. 2. Draw an outline block diagram including memory storage registers and datapath functions but omitting the control block, as follows: from the ASM diagram identify all the registers and memory needed select combinational functions to carry out the data operations draw in the connections (wires) needed to transfer data between blocks and add multiplexers where necessary - check the block diagram against the ASM diagram. 3. List all the control signals which will be needed to control the operations of the blocks and orchestrate the clocking of registers. Also identify the signals needed as inputs to the control block. 4. Define the functions of the control block and extract a state transition graph for the FSM from the ASM diagram. 5. Complete the block diagram by adding the control block and the control signals and write out a detailed specification of the control functions. Stage 2 Refining the Design The design should now be checked, critically examined and revised: 1. Work through the design, comparing it with your top level model and ASM diagram to check for the correct sequence, the correct production of control signals and correct data operations. 2. Modify the design if necessary. 3. Examine the design to see if there are any obvious simplifications that can be made. It will often be found that step (2) will have led to additions and modifications which are rather clumsy. A re-examination may show that a design revision will give a simpler solution. 4. Repeat steps (2) to (4) until satisfied with the design. Stage 3 Verifying the Design Before the design can be verified by simulation it must be entered into the CAD system. 1. The structure can be entered as a schematic block diagram. In this case, great care should be taken to avoid errors and inconsistencies in the labelling of pins and bus signals. Careless labelling can make nonsense of simulations. Alternatively, the entire design can be entered as a HDL description. 2. If the design is entered as a block diagram then the functional descriptions of the blocks must be entered using a HDL i.e. Verilog in the present exercise. Models will already exist for library blocks. 3. The behavioural/functional models of each of the RTL blocks are tested for correct functionality by simulation. A set of test patterns will be needed (see below). 4. When the functional models of all the blocks have been verified the whole design is simulated using the same chip test that was used for the top level behavioural simulation. 5. Corrections are made if needed and the design is re-simulated until correct outputs are obtained.

17

3.5.4 Simulating the RTL Design


Whole Chip Simulation The same test program and test bench that were used for the Top Level behavioural simulation are used for the RTL simulation, but with one important difference - a clock signal will now be used by the chip model. There was no need for a clock signal at the Top Level because the Register Transfer structure had not then been defined. Note that the chip model now consists of a structural description, derived from the schematic block diagram, and behavioural models of each of the individual blocks in the schematic diagram. The output file which is produced by the RTL simulation should be compared with, and agree exactly with, the output file which was produced by the Top Level simulation. Testing Individual Blocks within the RTL Design In a large design, it is important to test the behavioural models that have been written for each of the functional blocks before testing the whole design. A test bench is now needed to supply the test input stimuli which emulate the signals that the block will see when embedded in the whole circuit. The test input stimuli consist of a set of test vectors representing the input signals to the block under test for each clock cycle. The test vectors also include the expected output signals for each clock cycle. In a simple design, the individual testing of blocks can usually be omitted. Thus in the COMP22111 design exercise, it should only be necessary to test the complete design.

3.5.5 Design for Testability


It is important during the design of a chip to consider the ease with which it can be tested after fabrication to find manufacturing faults. A common method of ensuring testability is the use of scan paths. A scan path is made by using registers and flip-flops which can be re-configured in test-mode to act in a serial-in serial-out mode. They can then be connected in long chains into which a test pattern can be shifted from a test pin or pins (refer to course notes for details). Scan path testability can be incorporated at the RTL stage by selecting library registers, counters etc. which are configurable as scan registers and by using FFs with multiplexed inputs. The inclusion of a scan path adds more complexity to the design and will be omitted in the STUMP design. Thus, the elements used in the COMP22111 project are not con- figured for scan path operation. However, students who do final year chip design projects are expected to include scan paths.

3.6 Logic Level Design


In the first year students learn how to design logic circuits using basic gates and flip- flops. For large designs, of tens or hundreds of thousands of gates, this approach is too slow. Nowadays engineers use a number of different CAD tools to create gate level circuits automatically from RTL designs so that whole ASICs can be designed without any by hand gate level design.

18

The methods which are commonly used for the design of standard cell ASICs are sumarised below. 1. Use of library components: Many widely used RTL elements, e.g. registers, multi- plexers, adders, can be pre-designed and stored as library components. 2. Logic synthesis: Automatic synthesis tools can be used to create gate-level logic designs from internal behavioural descriptions. Tools for synthesising combinational logic and FSMs are well established and widely available. Tools for synthesising whole RTL designs are also available and are now highly sophisticated so as to be able to optimise performance, power or area. However, this sophistication requires user interaction and usually design iteration. 3. Logic block compilers: Compilers are used to generate blocks which have some form of regular geometry. Most ASIC vendors supply compilers for ROM, RAM, PLAs and datapaths. In COMP22111, library components have been used to define the datapath and a logic synthesis tool is used for the control block. It is sensible to arrange that the behavioural descriptions of RTL logic elements which have already been written for the RTL design can also be used as inputs to the synthesis tools. In the processor design the Verilog program describing the control block is used as the input for the synthesis software. The logic-level design is seen to be an almost automatic decomposition from the RTL design. After decomposition to logic, the whole design is then re-simulated using the same chip test that was used for the Top Level and the RTL simulations. There may be a few problems at this stage because: 1. Synthesis tools do not always do the sensible thing and may misinterpret a description which was adequate for RTL simulation, but not sufficiently specified for the unambiguous decomposition to gate level. 2. Simulations at higher levels do not take any account of gate delays. The gate level simulation models do include information about the delay characteristics of the gates and the simulation results show gate delays; the logic simulator can also make worst case predictions of the effect of the wiring between gates (but the actual wiring delays cannot be calculated/known until after layout). The simulator results may show that some delays are unacceptable or that the active clock edges occur too close to data transitions. Tests of individual blocks may be needed in addition to the whole chip simulation in order to sort out problems. RTL block test patterns are needed for these logic level simulations.

3.7 Field Programmable Gate Arrays (FPGAs)


Although the design is normally targeted at semi-custom chip design, it can also be aimed at a FPGA at this point in the design. This is because the design process down to the end of the logic design stage is independent of the medium it is implemented on. A FPGA consists of preformed silicon comprising of functional logic blocks and inter- connections which are programmable. Here, the logic needs to be mapped onto the logic 19

blocks of the particular FPGA and these are then placed and routed. This can all be done automatically by CAD tools. The design can then be downloaded onto the FPGA, again using appropriate available tools. To check the operation of the downloaded design, a test program is run. This should be the same as that used in the top level behavioural simulation. Unlike semi custom design, any design errors are not fatal. They only require that the design process is repeated from the highest level, amended, followed by downloading and testing of the updated design!

3.8 Layout
For a semi-custom design, there are still a number of design stages following the logic design which need to be performed. These are described in the remaining sections of this chapter. Layout is the process of placing geometrical representations of gates on the surface representing the chip and interconnecting them with tracks (wires). When a semi- custom chip is being designed layout is carried out using automatic Place and Route CAD tools. Each gate is represented as a rectangular shape of a standard height on a chip which uses the standard cell architecture. The internal representation of each gate as a set of polygons is added at a later stage by the manufacturer before making the masks for fabrication. The cells are butted together in rows with channels between the rows for routing the interconnections (Figure 3.4).

Figure 3.4: Some rows of a standard cell layout The layout of a small chip will consist of a single rectangle containing a number of rows and channels of the same length but a floorplan is needed for a large chip. A floorplan subdivides the total surface of the chip into separate areas for the placement of the 20

different functional blocks and for the routing of signals between the blocks. Although CAD tools can be used to assist in the creation of a floorplan, it is a difficult process to automate. The objective will usually be to obtain a layout with as small an area as possible that maintains the signal integrity. The Place and Route procedure for a standard cell chip consists of carrying out a sequence of separate steps. First, I/O pads must be added to the top-level schematic. The circuit description will usually be held in the CAD database in a hierarchical format but the layout tools need a flattened description containing every instance to be used in the layout. The next step is therefore to flatten the netlist. Further steps define floorplan areas, assign cells to rows and carry out local channel routing and global routing.

3.9 Post-layout
The layout stage is not the end of the story for the ASIC designer. Having obtained a layout, a Design Rule Check (DRC) is carried out to ensure that none of the fabrication process design rules are broken. If the software is well designed and bug-free there should be no errors at this stage - regrettably it is sometimes necessary to make a few edits to the layout by hand. When the DRC passes, then a Layout versus Schematic (LVS) check is performed. This checks that every feature extracted from the layout appears on the schematics generated from the logic. Any mismatches need to be investigated and fixed until the components in the logic correspond exactly with the layout features. When this check passes, the next step is to use a program to calculate (extract) the par- asitic capacitances of all the interconnection tracks. The whole chip is then re-simulated and the effects of the extra track capacitances are included in the delay calculations in order to get a fairly accurate estimation of performance. Further testing is normally done to ensure that the design functions correctly despite the maximum allowable variations in transistor characteristics and in environmental parameters (temperature, voltage etc.). This is referred to as testing in the corners. When the designer is satisfied that the design functions correctly under all conditions and meets its performance/power/area specification under typical conditions, a final Design Rule Check (DRC) and Layout versus Schematic (LVS) check are undertaken. Once these have been done and passed, the chip design files can be shipped to the manufacturer for the fabrication of the chip.

3.10 Packaging and Test


Although the main design task is now complete the designer has more work to do:- 1. A bonding diagram showing how the chip is to be packaged must be sent to the manufacturer. The bonding diagram shows the connections between the bond pads on the chip and the lead frame pins of the package. 2. A set of test vectors for the testing of the chip after fabrication must be provided. This should be the same test program as used in the top level behavioural simula- tion.

21

Chapter 4
This chapter outlines the exercises in the COMP22111 lab. Chapter 5 details the STUMP processor and Chapter 6 contains information on how to programme the STUMP. It is important your read these chapters before you proceed with these exercises.

Design Tasks
There are six exercises for you to complete during the course of the laboratory. Please make sure you read each exercise carefully, along with the extra material provided in this manual. Details of how long each exercise will take, hand-in dates etc are detailed in Table 2.2 (repeated below). Each exercise will also provide further information. Design Level Specification Ex. - Design Stage No. of Sessions Semester Week(s) Exercise hand-in Week Max Mark -

Programmer Top Level

1. 2.1

Top Level RTL RTL RTL -

2.2 3. 4.1 4.2. -

Read Lab - Manual, especially Chapter 5 STUMP 1 Week 3 assembler Oct 11 Top level 2 Weeks 4 & 5 model in Oct 18 & 25 Verilog and entry Simulation of 2 Weeks 7 & 8 top level model Nov 8 & 15 Signal usage 1 Week 9 charts Nov 22 Verilog 1 Week 10 specification of Nov 29 control RTL Testing RTL 1 Week 11 design Dec 6 Deadline for - Week 12 written Dec 13 work/code hand-in Deadline to - Week 12 sign up for Dec 14 demo Table 2.2: COMP22111 Schedule

Week 3 Oct 11

20

Week 8 Nov 15 Week 9 Nov 22 Week 11 Dec 6 Week 12 12:00 Dec 15 Week 12 15:00 Dec 14

30 10 40 -

- 100

22

Lab etiquette
Each exercise details the work that you are required to do and hand in for each exercise. It is important that you manage your time wisely in the labs and prepare thoroughly beforehand. Some exercises require answer sheets to be completed, some require code to be handed in and some require work to be demonstrated to a lab demonstrator and the exercise signed off. Answer sheets for each of the six exercises can be found in the lab. Please make sure you submit the information required for each exercise along with the appropriate cover/answer sheet. Remember the lab is worth 45% of your marks for this course unit. Achieving a good mark in the laboratory will put you in a very good position for passing the module.

23

Exercise 1 Operation of the STUMP Assembler


Aim: To familiarise you with the STUMP assembler and to gain an understanding of the programmers top-level view of the STUMP operation. Hand in: Completed register status charts for exercise 1 to be found at the back of the manual (copies are available in the lab). Read: Chapters 5 and 6. Sessions: 1 Assessment: 20 marks (out of 100) Learning Outcomes: Understanding of operation of assembler code and practice in relating it to machine behaviour, practice at handling binary, hex and decimal quantities.

Instructions
You will be given a sheet with a few consecutive lines of assembler code together with the initial state of the Register Bank and Condition Code register prior to executing the code. For each instruction fill in the sheets to specify the register state after executing the instruction. Hand in your sheets for marking on completion. To help you, note that The Program Counter is incremented directly after fetching an instruction and before the instruction is executed. The Result Register holds the result computed by the ALU.

24

25

Exercise 2.1 Top Level Model of the STUMP in Verilog


Aim: To complete the Verilog model of the algorithmic view of the processor chip and to enter it into the Cadence CAD system. Hand in: A listing of the Verilog code for the top level model, along with a completed cover sheet (copies can be found in the lab). Read: Chapter 3, section 3.3 and chapter 5 References: Cadence Verilog manual (see Chapter 1 for location). M. D. Ciletti, Advanced Digital Design with Verilog HDL, Pearson 2002. J. Bhasker, A Verilog HDL Primer, 2nd ed., Star Galaxy Press, 1999. Sessions: 2 Assessment: Marks are awarded on the basis of testing your design in Exercise 2.2. Learning Outcomes: familiarity with Verilog description of a specification, experience of writing Verilog.

Instructions
You should have met the hardware description language Verilog in the first year COMP12111 course unit. You can refer to your old notes to help with this lab, however, you should find that the examples and templates provided should be a sufficient guide as to what is needed to complete the exercises. Demonstrators will be able to give help and advice. The amount of Verilog code to be written is not very long - about one page without comment lines. In order to write this code and to attempt the other exercises well, you need to have a thorough understanding of the processor specification (Chapter 5). You should remember that the high level behavioural model of a chip is in effect an execut- able specification. It is the users view of the chip. It is very important to get it right because it is what will finally be made - all the lower levels of design are tested against this specification. Your first task is to complete the Verilog behavioural model of the STUMP processor chip. Most of the model is provided - a listing of the code is given in Appendix A of this manual. It includes all the functions and tasks that you will need, and the main program includes reset, instruction fetch and instruction decoding. The part left for you to do is the execution part of the instruction i.e. the reading of the Register Bank, the setting up of operands to the ALU, the execution of the instructions in the ALU and the setting of condition codes in the Condition Code Register. Read the

26

(incomplete) Verilog listing, identify the signals to the ALU, and look at the instruction type descriptions in Chapter 5. As an example, in a type 3 instruction (Branch) the offset (bits 7:0 in the instruction) is added to the program counter (register 7 in the register bank) to calculate the memory location of the next instruction if the branch is to be taken. It is important to remember that the offset is a signed, 2s complement, number and that the inputs to the ALU (ALUA and ALUB) are 16-bit values, so the offset taken from the instruction needs to be extended to 16-bits appropriately. You must add the code to do this. The writeback phase following execution has been done for you. You can complete the program using CASE and IF constructs together with variable assignments and task and function calls. You will find examples of all the syntax needed in other parts of the program. Those parts of the Cadence CAD systems needed to complete the exercise are described below:

Cadence
Cadence is an industry-standard tool that the University has access to via the Europractice framework. In order to use the software the University must adhere to an end user agreement (EUA) that states (amongst other things) that the tools must be kept confidential, must not be copied, and must not be used for commercial purposes. A copy of this end user agreement can be found in Appendix D. When you run the Cadence tools for the first time you will be asked to confirm that you agree to the conditions set out by the end user agreement; failure to do so will result in you not being allowed access to the tools. Accessing and Modifying the Top Level Model The Cadence CAD system is used in this laboratory for the design work. Create the COMP22111 Cadence directory structure by typing mk_cadence 22111 <return> this should only be done once. Thereafter start a Cadence session by typing start_cadence 22111 <return> Eventually, an icds window opens. Choose File->Open. This brings up an Open File window. In the Open File window set Library Name - comp22111, Cell Name - processor, View Name - algorithmic and then click OK. This brings up a verilog.v window containing the incomplete Verilog code describing the top level of the STUMP. Type in this window to add your code and then save it using File->Save from the windows toolbar. The edit operation on the file causes all the Verilog code to be checked for syntax errors on Exit. If you have errors a 27

HDL Parser Error/Warnings window comes up telling you that parsing of the Verilog file failed. A failed design check indicates syntax errors and by clicking Yes in the HDL Parser Error/Warnings window you can inspect the error report to gain some indication of where the error is, and the verilog.v window will reopen. You can correct any errors in the top level description if you feel confident to do so, save it and then exit to re-check for syntax errors. Repeat this until the code correctly passes the checks. Remember: any syntactical errors will be largely ignored when marked.

Printing
Print out your code from the verilog.v window toolbar using File->Print. This brings up a Printer window. Enter lpr -Pugpr3 and click on Print. Dismiss the verilog.v file using File- >Exit. If the HDL Parser Errors/Warnings window comes up, click No.

Exit
To exit from Cadence, click on File in the icds window and then select Exit. In the Exit icds? window this brings up, click yes.

28

29

Exercise 2.2 Behavioural Simulation of the STUMP


Aim: To use the test programs provided to simulate and debug your top level behavioural model of the STUMP. Demonstrate: Demonstrate the test programs work correctly using your Verilog model and have a sheet signed off by a demonstrator (copies are available in the lab). Read: Chapter 6 Sessions: 2 Assessment: This exercise is assessed in the lab and is worth 30 marks. Learning Outcomes: How a test strategy evolves for complex hardware, how this translates to test programs, experience of using CAD tools to control and simulate a design using a test bench and experience in debugging hardware specifications.

Instructions
Parsing (syntactic analysis) It is first necessary to parse a design that satisfies the Verilog syntax. Start Cadence (start_cadence 22111). This brings up the icds window. Choose File->Open. This brings up an Open File window. In the Open File window set Library Name - comp22111, Cell Name - processor, View Name - algorithmic and then click OK. This brings up a verilog.v window containing the Verilog code. You need to perform at least one edit on it and then save it. Now, in the verilog.v window, select File->Exit. Test Files In the COMP22111 directory, you will find 4 test files (test1.s to test4.s) written in the STUMP assembly code. These four tests provide a fairly good test of most of the STUMP and are used to test the STUMP at all stages of the design from top level to layout. The tests would also used to test the fabricated design. The tests are incremental i.e. test 2 assumes that test 1 works, test 3 relies on tests 1 and 2 working, etc. The tests start at line 0 and all write results back to memory, starting at line 0 and thus overwrite the program! Test 1 is a basic test which checks that the internal buses are connected cor- rectly, that the Register Bank can be correctly addressed, that instructions can be fetched, and that data can be written back to memory (for checking). If test 1 does not work, something fundamental is wrong and this should be fixed before running any other tests.

30

Test 2 checks that the ALU operates correctly for various data combinations and dif- ferent logical and arithmetic operations. It only checks the ALU and does not use the shifter. It aims to identify any signals in the fabricated ALU which are unable to change state (because they are stuck at 1 or 0) and pinpoint any adjacent signals (bits i and i+1) which are shorted together. Test 3 is a rudimentary program which checks the different shifting operations, and test 4 checks the branch operations. As these are the test programs used throughout this laboratory, you are advised to peruse them carefully. Furthermore, you will be using them to debug your design so familiarity will certainly be necessary if the test programs indicate any errors in your design. The assembler is fully described in chapter 6 and instructions to convert the assembler programs into a format suitable for the processor memory are given in section 6.2.1. They are repeated here for convenience: in a shell window, change the directory to COMP22111 using cd $HOME/Cadence/COMP22111. Then type sasm <filename.s> to create 3 files. Binary is in <filename.bin> while hex versions are in <filename.hex> and <filename.mem>. The file for the processor memory is called xc4000mem.ram and is created by typing loadmem.sh <filename.mem> in the terminal window; this creates the file in the $HOME/Cadence/COMP22111/test_bench directory. Waveform Viewer In the icds window, select Tools->Verilog Integration->NC-Verilog. This brings up a Virtuoso Verilog Environment etc. window. Fill in its fields with Run Directory - test_bench, Library - comp22111 (in Top Level Design), Cell - processor_test_bench (in Top Level Design), View schematic (in Top Level Design). Click on the top left icon (of the running man) to initialise the simulator. When the icds window shows that initialisation is complete, click on the second icon down on the left (of three separate ticks) to generate a netlist. When the icds indicates this is complete, click on the Simulate icon which is the third icon down in the Virtuoso Verilog Environment window. This launches SimVision and (eventually) brings up two windows: a Design Browser 1-SimVision window containing the processor and a Console-SimVision window which is a command window. You will probably want to view signals on the Waveform Viewer, select the fifth icon from the right (showing waveforms) in the Design Browser 1-SimVision window to bring up a Waveform 1-SimVision window with the signals to be displayed listed down the left. To monitor signals/buses within the test bench, expand test (press its + button) to reveal top in the Design Browser 1-SimVision window and then click on the top symbol to list the signals to and from the processor. Select the signals you want to monitor (the address, data and clock lines are particularly recommended) and send them to the Waveform Viewer (fifth icon from right). Continue this process of signal expansion and selection until all the signals you require have been sent to the Waveform Viewer. A good starting point for monitored signals would be the input/output signals to the processor.

31

Command Files The simulation can be run by using the menus to issue commands to the simulator with the commands given appearing in the Console-SimVision window. However, this is tedious and prone to errors, so normally these commands are placed in a file and the simulator instructed to take this file as its input. Command files have a .sv extension and the command files for the first three tests, test1.sv to test3.sv are in your COMP22111 directory. test1.sv is shown below: force test.RESET = 1 run 10 ns deposit test.RESET = { 1b0} -after 2 us -absolute -release stop -create -condition {#test.top.A = 45 & #test.top.OE = 0 } stop -create -time 10 ms -absolute run stop -delete * deposit test.DUMP = { 1b1} -after 1 ns -relative stop -create -time 10 ns -relative run stop -delete * The first three lines define the RESET signal, setting it to 1 at 10nsecs and removing it by taking it to 0 at 2secs. The next two lines define breakpoints where the simulation is to be stopped. The first is when the address lines are 45 with the Memory Output Enable signal at 0. Remembering that the test program starts at line 0 and that instructions are placed in consecutive lines, this statement recognises the request to memory to fetch the programs final instruction (bal Fin). If your processor description is correct, the simulation always stops at this point. The second stop command is only required if your processor is stuck in an infinite loop in which case the simulation terminates at 10msecs. The simulation is then instructed to run up to the time a break- point is encountered. There is a stop -delete * after each run statement; this line is nec- essary to remove the breakpoint enabling the simulator to run past this time. The DUMP signal is activated 1ns after the current time when the current values in the memory will be dumped to the file $HOME/Cadence/COMP22111/test_bench/xc4000mem.dump. The final three statements in the command file continues the simulation for 10nsecs beyond the last breakpoint before stopping. Using the template for the three provided command files, you need to create a command file test4.sv for the branch test. Running Simulations from Command Files To run the simulation from a command file, reset the simulation using Simulation-> Reset to Start in the Design Browser 1-SimVision window. Run the simulation from the command file by selecting File->Source Command Script from the Design Browser 1- SimVision window to bring up a Source Command Script window. Here, select the browse button (...) which opens a Source Script File window in which <name of file.sv> in the COMP22111 directory is selected followed by Open; this closes the Source Script File window. This filename now appears in the Source Command Script window. Select simulator Console (NC-Sim) for the Send commands to: field and press OK to cause the simulator to run (as can be observed by the change in the icon displayed by the Play 32

button in the Design Browser 1-SimVision). You can now inspect the file dumped to memory and you should compare the created outputs with those expected from the test program. If the contents are not correct, you need to pinpoint the cause of the fault; in this case, you will find the Waveform Viewer a valuable tool in debugging your design. Correcting any faults will involve modifying the Verilog code for the processor. After modifying the Verilog code, you need to repeat the syntax checks as previously described. When your top level design description simulates correctly for all the supplied test programs, you should show the results of running the assembler code for each test to a demonstrator who will mark them off. You should complete and hand-in the top level code and demonstrate the simulation of the processor operation before moving on to the next exercises. Make sure you demonstrate you tests working and have a sheet signed off by a demonstrator. Stopping the Simulation If for any reason the simulation fails to complete within a few seconds and the time is galloping on uncontrollably, you need stop the simulation! In the Design Browser 1- SimVision window the button next to the play button (having two parallel vertical lines) is a stop button and will halt the simulation. Alternatively, use Simulation->Stop. Exiting the Simulation To exit from the Waveform Viewer, select File->Exit SimVision in the Waveform 1- SimVision window. To exit from the simulator, in the Design Browser 1-SimVision window select File->Exit SimVision. This removes both the Design Browser 1-SimVision window and the Console-SimVision window. To exit from the simulator environment, in the Virtuoso Schematic Composer etc. window select Commands->Close. Note that the created netlist does not need to be recreated if further simulations are performed unless there are changes to the top level Verilog code. To exit from Cadence by clicking on File in the icds window and then selecting Exit. In the Exit icds? window this brings up, click yes.

33

Exercise 3 Signal Usage Charts for the Control


Aim: To produce Signal Usage Charts to aid with the composing of the Verilog description of the STUMP control at the RTL level. Hand in: Completed signal usage charts for exercise 4, which can be found at the back of the manual (copies are available in the lab). Read: Chapters 5, 6 and Appendix C. Sessions: 1 Assessment: Worth 10 marks (out of 100). Marks will be awarded based on the correct operation of the STUMP with 2 marks for Fetch signals, 4 for Execute and 4 for Writeback. Learning Outcomes: understanding how to specify control in a formal way.

Instructions
The processor design can be thought of as comprising a 16-bit datapath (which does the actual computation) plus a control block. The RTL level design of the datapath has been done for you and is shown in Appendix C. The control logic is all about ensuring that the right things happen at the correct time by activating control signals at the appropriate time in the instruction cycle. Each instruction takes three clock cycles to complete; instruction fetch is performed on the first cycle, execute which reads the operands and performs the arithmetic is done in the second cycle, and a writeback phase which operates on the contents of the Result Reg- ister occupies the third cycle. In general, a control signal is required by each multiplexer (to select the desired input at the appropriate time) and by each register (to enable data to be loaded into a particular register) in the datapath. In addition, there are control signals to access the desired read and write locations in the Register Bank and signals to control the access of information in the memory. At the RTL level, the STUMP processor requires 12 different lots of signals to control the datapath and the memory. These are: BR indicates a branch instruction. It is used by the Sign Extender element to extend the least significant 8-bits, while the 5 bit immediate value are extended if BR is low (for a Type 2 instruction). FETCH indicates an instruction fetch is occurring EXE indicates the Execute phase of operation i.e. data is read from the Register Bank, operated upon and the result placed in the Result Register. SRPA[2:0] 3-bit address specifying the register (Reg0 to Reg7) to be read out onto port A of the Register Bank

34

SRPB[2:0] 3-bit address specifying the register (Reg0 to Reg7) to be read out onto port B of the Register Bank SWP[2:0] 3-bit address specifying the register in the Register Bank (Reg0 to Reg7) to be written to IMMED selects the extended immediate operand rather than port B of the Register Bank as the B-input to the ALU LDCC enables the loading of the 4-bit Condition Code Register LDREG enables writing to a register in the Register Bank SALUD (Select ALU Data) selects data from the Result Register for writing into the Register Bank. If SALUD is low, data from the memory (MDIN[15:0]) is selected instead. MEMWR active-low signal to Bus Interface when Memory is to be written to MEMOE active-low signal to Bus Interface when Memory is to be read from The Control forms these signals based on the contents of the Instruction Register (INSTR[15:0]) and on a State Register which is internal to the Control. The State Reg- ister is updated on each clock and when running instructions has 3 states: FETCH, EXECUTE and WRITEBACK. FETCH indicates the Instruction Fetch phase, EXE (i.e. EXECUTE) indicates the Execute phase and LDREG indicates Writeback when the Register Bank is written to; naturally, only one of FETCH, EXE and LDREG can be a 1 at any time and all three may be 0 during the writeback phase of a store instruction. The action required in each phase (which takes one clock period) is summarised in Table 4.1. Phase 1 Fetch The next instruction is fetched from memory and +1 is added to the Program Counter. The instruction is decoded and executed. This phase finishes when the result from the ALU is clocked into the Result Register. The Condition Code Register may also be updated in this phase. ALU operations are written back into the register bank. Load/Store operations access memory. A branch instruction that is taken will update the Program Counter. Table 4.1: Instruction Phases

Phase 2

Decode/ Execute

Phase 3

Writeback

35

In specifying the control signals, it is useful to make Signal Usage Charts which show the state of control signals at any time. Signal usage charts sheets can be found in Appendix E (copies are also available in the lab). Your task is to determine how each control signal is formed in each phase of the instruction for each instruction type. The control signal may be 0, 1, dont care or formed from bits in the Instruction Register INSTR[15:0] and/or the phase signals (FETCH, EXE and LDREG) and/or other signals. Hand in your completed sheets (make a copy of what you hand in, as this will help you in writing the Verilog code required in the next exercise).

36

37

Exercise 4.1 Verilog Specification of the Control Block


Aim: To produce a Verilog specification of the control for your processor and enter it into Cadence. Hand in: Verilog listing of the Control along with a completed cover sheet (copies can be found in the lab). Read: Section 3.5 and Appendix B and C of this manual Sessions: 1 Assessment: Marks are awarded on the basis of testing your design in Exercise 4.2. Learning Outcomes: A complete understanding of the operation of the STUMP RISC processor, more practice in writing a substantial hardware descriptions in Verilog but this time at the RTL level.

Instructions
Introduction The processor comprises the Bus Interface, which forms the memory to processor interface, the Datapath elements, which perform the computational part of the STUMP, and the Control, which generates the signals at the correct time required to perform instructions. In this exercise you are going to complete the specification of the Control block in Verilog. The interconnection of the Bus Interface, Control and the components of the Datapath components is shown in Appendix C. The Datapath and Bus Interface have been designed for you (down to the gate level). RTL Design of the Control Advice on how to proceed with this stage of the design is given in section 3.5 - but the most important thing is to know precisely what you want your design to do. In determining the signals that have to be asserted for each of the three phases by Control, the Signal Usage Charts you generated in the last exercise (if correct) should provide you with a precise specification of the control for the Fetch, Execute and Writeback phases. These need to be translated into a Verilog control block design specification and entered into the functional view of the control cell. This is provided in the form of a template which is listed in Appendix B. Accessing and Modifying the Control From the icds window, choose File->Open. This brings up an Open File window. In the Open File window set Library Name - comp22111, Cell Name - control, View Name - functional

38

and then click OK. This brings up a verilog.v window containing the incomplete Verilog code describing the control for the STUMP datapath. The code contains an always code block, written for you, which defines the processor state and advances the state on each positive clock edge. State 0 is the Reset state, State 1 Fetch, State 2 Execute, and State 3 is the Writeback state. If the state is in none of these states, then a Default state is entered which sets all signals to dont care. It is strongly recommended that the signals listed in the Default state are explicitly set in each of the four states you need to write. At the end of the code, a function Testbranch, written for you, takes as its input parameters bits 11 to 8 of the Instruction Register and the bits in the Condition Code Register returning a 1 if the branch is to be taken (and 0 if a branch is not taken). Use the Edit facilities of the window to add to your control code and then save it using File->Save.

39

Exercise 4.2 Test of the RTL Design


Aim: To simulate and debug the whole RTL design. Demonstrate: Demonstrate the simulation of the whole processor design with the test programs, have your sheet signed off by a demonstrator. Sessions: 1 Assessment: Assessed in the lab. 40 marks allocated on the basis of demonstrating that your RTL description is correct and passes the tests; 8 marks are awarded for passing the Register Bank test (Test 1), 14 marks for the ALU test (Test 2), 8 marks for the shifter test (Test 3) and 10 marks for the branch test (Test 4). Learning Outcomes: practice in testing and fault finding in a large digital system at the RTL level using the Cadence CAD tools, experience of the design iteration process.

Instructions
In this exercise, you will run the test programs we have given you on the RTL descrip- tion of the processor. This will allow you to identify and debug faults in your Verilog code of the Control. You can assume that the test programs we provide are correct and that the RTL datapath we provide is correct. Therefore any errors in operation are due to faults in your Verilog control description! When you are satisfied that your design is working correctly, show your simulation results for each test to a demonstrator and have your sheet signed off. Hand the sheet in. The Cadence procedures needed for this exercise are similar to those in Ex. 3 and are briefly summarised below:

Parsing (syntactic analysis)


It is first necessary to parse a design that satisfies the Verilog syntax. Start Cadence (start_cadence 22111). This brings up the icds window. Choose File->Open. This brings up an Open File window. In the Open File window fill in the fields: Library Name - comp22111, Cell Name - control, View Name - functional and then click OK. This brings up a verilog.v window containing the Verilog code. Make at least one edit then save. Now, in the verilog.v window, select File->Exit to check the Verilog code for syntax errors. Correct any errors and repeat the parsing process until correct.

40

Test Files
As before the test files are in test1.s to test4.s and you need to create a file for the proc- essor memory called xc4000mem.ram for use in the simulation. See Ex. 3 or Ch. 6 for instructions on this.

Simulation
In the icds window, select Tools->Verilog Integration->NC-Verilog. This brings up a Virtuoso Verilog Environment etc window. Fill in its fields with Run Directory - test_bench and in the Top Level Design section enter Library - comp22111 cell - processor_test_bench View - schematic. Click on the top left icon (of the running man) to initialise the simulator. When the icds window shows that initialisation is complete, click on Setup->Netlist in the Virtuoso Verilog Environment etc window along its top toolbar. This brings up a Netlist Setup window. To the Netlist These Views line, remove algorithmic at the beginning of the line (to enable the netlister to pick up the Verilog code for control). Click OK along the top toolbar of the Netlist Setup window. Back in the Virtuoso Verilog Environment etc window, click on the second icon down on the left (of three separate ticks) to generate a netlist. When the icds indicates this has completed correctly, click on the Simulate icon which is the third icon down in the Virtuoso Verilog Environment etc window. This brings up the Design Browser 1-SimVision and Console-SimVision windows. As described in Ex. 3, place any signals you wish to observe on the Waveform Viewer, then run the simulation from a command file (reset simulation, use File->Source Command Script to select an input command file <filename.sv>, send commands to the Simulator Console (NC-Sim) and press OK to run the simulation). When your processor simulates correctly using your Verilog description of Control for all the supplied test programs, you should show the results of running the assembler code to a demonstrator who will sign off your sheet.

41

Chapter 5 The Processor Specification


This chapter details the specification of the 16-bit STUMP processor.

5.1 Architecture
The processor is a 16-bit machine with a RISC style architecture. Operands for ALU operations come from registers inside the processor and the result is returned to a register. Separate instructions are provided to move data between the registers and external memory. There are 8 registers, R0-R7. R0 is always zero and can be used as a source operand, allowing move instructions to be synthesised from an add instruction. R0 may be written to, but the result is always discarded, allowing compare instructions to be synthesised from subtract instructions. Register R7 is the program counter and, from a programmers view of the machine, has equal status with the other registers allowing PC-relative addressing to be supported. CC 3 N - Sign Flag CC 2 Z - Zero Flag CC 1 V - Overflow Flag CC 0 C - Carry Out Flag

Table 5.1: Condition Code Bits The processor has a 4-bit condition code register shown in Table 5.1. It holds status information relating to the ALU output. The four status bits indicate if the ALU result is negative (N bit is 1 if ALU result is -ve; N is 0 for +ve or zero), zero (Z bit is 1 if ALU result is all 0s; Z is 0 if non-zero), overflows (V bit is 1 is the result is out of range, i.e. if adding two +ve numbers yields a -ve result, or if adding two -ve numbers yields a +ve result; V is 0 if number is within range), or has a carry out (C bit is 1 if there is a carry out of bit 15 of the ALU result; C is 0 if bit 15 of the ALU result has no carry out). Each arithmetic and logical instruction has the option of updating or not updating the condition code register. If an arithmetic/logical instruction does not update the condition code register then its state remains as is. Load, Store and Branch instructions never update the condition code register and so do not change the existing state of this register. There are 3 instruction formats in a fixed-length 16-bit instruction. The machine operates on 16-bit words only. Byte addressing is not supported.

5.2 Instruction Set


There are 8 basic instructions shown in Table 5.2. enabling arithmetic, logical, load, store and branch operations to be performed; as is common in processors, all the 42

arithmetic is performed by an adder since subtraction of A-B can be performed as A + B +'1' . Some other instructions such as cmp, nop and mov can be expressed directly in terms of the basic instructions and are supported by the assembler. Other instructions may be synthesised from the combinations of the basic instruction set as shown in Chapter 6. Instruction Code 000 001 010 011 100 101 110 Instruction Explanation ADD ADC SUB SBC AND OR LD/ST 2s complement add 2s complement add with carry-in 2s complement subtract 2s complement subtract with borrow Bitwise AND of two 16-bit words Bitwise OR of two 16-bit words Load register from memory or Store register to memory 111 Bcc Branch if condition cc is satisfied.

Table 5.2: Basic Instruction Set Shift instructions are somewhat special. Shift-left instructions can be derived from the basic instruction set. Shift-right instructions have been added as a rather ugly kludge and are dealt with in next section.

5.3 Instruction Formats


There are just 3 instruction formats, which are shown below: 15 14 INSTR 13 12 0 11 LD CC 10 9 DST 8 7 6 SRC A 5 4 3 SRC B 2 1 0

SHIFT

Type 1: 2 source registers 15 14 INSTR 13 12 1 11 LD CC 10 9 DST 8 7 6 SRC A 5 4 3 2 1 0

Immediate

Type 2: 1 source register, 1 immediate value

43

15 1 14 1 13 1 12 1 11 10 9 8 7 6 5 4 3 2 1 0

Condition Type 3: Conditional branch

Offset

The processor is a 3-address machine specifying two source operands and a destination operand. In the case of arithmetic and logical instructions (instruction codes 0 to 5), the two source operands are either two registers (Type-1 instructions) or a register and a 5 bit signed immediate value (Type-2 instructions). The result of the operation is returned to the destination register (DST) and the condition codes are updated depending on the state of bit 11 (if LDCC is 1 then update condition-codes; if LDCC is 0, do not update condition codes) Branch (code 7, Bcc) and load/store (LD/ST) instructions (code 6) do not update the condition-code register. In the case of a LD/ST instruction, bit 11 is used to determine the direction of the data transfer: if LDCC is 1, the operation is store to memory; if LDCC is 0, the operation is load from memory. The memory address is constructed from the sum of the two source operands, i.e. the two registers specified by SRC A and SRC B for Type-1 instructions or the register specified by SRC A and the 5-bit signed immediate for Type-2 instructions. The register specified by DST is the register to be written into memory for a ST operation or the register to be loaded from memory for a LD operation. Type 3 instructions are branch instructions (code 7). Here, the 8-bit signed offset is added to the Program Counter to compute the address of the instruction to be jumped to if the branch is taken. This is written into the Program Counter if the branch is taken but is ignored otherwise. In branch instructions, bits 8 to 11 specify the conditions under which a branch is taken. These usually involve bit(s) in the condition code register. The branch conditions are described in section 5.5.

5.4 Shift Operations


Bits 1 and 0 in Type-1 instructions are used to control various shift-right operations. If a shift is specified then the one bit right shift of operand A from the Register Bank is performed before it reaches the ALU. The shifts that can be specified are an arithmetic shift right (ASR) with the sign bit copied to bit 15, clockwise circular shift (ROR) with the bit 0 moving to bit 15, and clockwise circular shift through the carry (RRC) with the C-bit in the condition code register moving to bit 15. These shift operations are summarised in Table 5.3. Refer to section 5.6 for information on how the shifter carry-out is used. Assuming that the data input to the Shifter is A<15:0>, then the effect of the shift operations can be summarised in Table 5.3 and Figure 5.1.

44

Operation No Shift ASR ROR RRC Instr Bit 1 0 0 1 1 Instr Bit 0 0 1 0 1 Shifter Output, bit 15 = A15 A15 A0 CC0 Shifter Carry-out, (CSH) = 0 A0 A0 A0

Table 5.3: Shift Operations

Figure 5.1: Shift Instructions

45

5.5 Conditional Branch Instructions


Type-3 instructions implement 16 conditional branch instructions shown in Table 5.4. The range of the branch target address is PC + 1 8 bit signed offset. Mnemonic BAL BNV BHI BLS BCC BCS BNE BEQ BVC BVS BPL BMI BGE BLT BGT BLE Bits 11:8 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Branch Condition Always Branch Never C + Z = 0 C + Z = 1 C = 0 C = 1 Z = 0 Z = 1 V = 0 V = 1 N = 0 N = 1
N.V + N.V = 0 N.V + N.V = 1

comparison: unsigned arithmetic overflow test: unsigned arithmetic

zero test

overflow test: signed arithmetic

comparison: signed arithmetic

( N.V + N.V) + Z = 0 ( N.V + N.V) + Z = 1

Table 5.4: Conditional Branch Instructions

5.6 Condition Codes


The following table summarises the conditions under which the various condition bits are set. The column labelled Cin shows where the carry-in comes from for the ADC and SBC instructions. Cin is the carry into the least significant bit of the adder. The column labelled CC0 shows how the carry bit in the Condition Code register is derived if the register is updated.

46

Cin CC 3 Sign S15 S15 S15 S15 S15 S15 CC 2 Zero S=0 S=0 S=0 S=0 S=0 S=0 CC 1 Overflow C14 != C15 C14 != C15 C14 != C15 C14 != C15 0 0 CC 0 Carry
C15 C15
C15 C15

Update codes if LDCC is set yes yes yes yes yes yes no no

ADD ADC SUB SBC AND OR LD/ST BR

0
CC0

1
CC0

0 0 0 0

CSH if shift else 0 CSH if shift else 0

Table 5.5: Condition code settings where: S is a 16 bit result of an arithmetic or logic operation, i.e. the ALU result C14 and C15 are the carry bits from bits 14 and 15 respectively of an arithmetic operation CSH is the shifter carry-out (see Table 5.3) and is only used to update the condition code register for a logical order which performs a ASR, ROR or RRC shift shift is TRUE for type 1 instructions when bits 0 and 1 are NOT equal to 00. SUB and SBC are done as an addition with CC0 and Cin settings as shown in the above table. CC0 is stored as a borrow and is C15 since a borrow = carry . Thus A B borrow = A + B + borrow . For SUB, there is no borrow, so A B = A + B +'1' while for SBC A B borrow = A + B + borrow = A + B + CC0 .

47

5.7 Processor Interface


The processor is shown in Figure 5.2 in a system with a clock, reset and memory. Communication with memory is controlled by the processor signals Ram_Cs (memory select), Bus_Rd (enable memory output) and Bus_Wr (enable memory write). These are all active low signals i.e. their normal inactive state is high and they go low to activate the memory. The data is transferred on a 16-bit bidirectional Bus_D[15:0] and the address at which reading or writing takes place is specified by the processor on the 16- bit bus address bus, Bus_A[15:0]. If reading, the address to be read is placed on the address lines and the chip is enabled (using cs) together with the output enable signal. After the access time of the chip has elapsed, the data from that address appears on the bus data lines. If writing, the address lines of the line to be written to are driven and the chip is enabled together with the write signal. The data on the bidirectional data bus is then written in to this location. You can assume that data can be read from or written to memory within one clock cycle (if the clock period = 400nsecs).

Figure 5.2: The processor system

48

49

Chapter 6 Programming the STUMP Processor


6.1 Introduction
The STUMP processor design will be tested by supplying a program in binary form for loading into the memory model. An assembler, the SASM1 assembler, has been produced which can be used to create a binary program from an assembly language program. The SASM assembler is described in the following section. Section 6.3 shows how the eight basic instructions described in Chapter 5 can be used to synthesise a further fifteen instructions.

6.2 Using the SASM Assembler


6.2.1 Usage
sasm <filename.s>

The input file is parsed on line by line basis. Each line should contain a single instruction or assembler directive or full line comment. Three output files are produced <filename.mem> contains code suitable for loading into the processor memory model using the loadmem.sh command as detailed below. <filename.hex> contains code suitable for down-loading into the memory on the Xilinx board <filename.bin> contains contains the binary of the assembled code A Verilog memory model is used and this can only read from a file named xc4000mem.ram and dumped to a file named xc4000mem.dump. To create the xc4000mem.ram file from the <filename.mem> created by sasm, use loadmem.sh <filename.mem> Thus, for example sasm test1.asm ladmem.sh test1.mem will create a memory file xc4000mem.ram of the assembler comprising test program 1. 50

1The assembler was written by Andrew Bardsley who also devised the original STUMP architecture

6.2.2 Assembler Instruction Format


The format of instruction lines is: [<label>[:]] <instruction name> <operands> The label must begin in column 1 of the line and can optionally be terminated by a colon. It is valid to omit the label and also to place the first character of the instruction-name in column 1. Labels consist of one of the characters [a-z A-Z _] followed by any number of the characters [a-z A-Z _ 0-9]. Labels may not be any of the following reserved words (in either upper or lower case, although a mixed case version (e.g. Nop) of any of these keywords is a valid label): adc add adcs adds align and ands asr bal bcc bcs beq bge bgt bhi bhs ble blo bls blt bne bnv bmi bpl bvc bvs cmp data equ idem include ld mov movs nop or org ors pc r0 r1 r2 r3 r4 r5 r6 r7 ror rrc sbc sbcs st sub subs Register Names Valid register names are (pc is an alias for r7): r0 r1 r2 r3 r4 r5 r6 r7 pc Mnemonics Instruction mnemonics are listed below by instruction type. Instructions names that end in an s have the effect of setting the condition code bits based on the result of the instruction. <shift> is one of ror, asr, rrc and indicates that value of <src_reg1> or <offset_reg> is affected by the specified shift operation before carrying out the specified operation <expr> is a value in the range -16 to +15. An error is reported if the value is out of range. Expressions Expressions are similar to expressions in C. Supported operators are: + - * / % & | ^ << >> - ~ ()

6.2.3 Diadic Arithmetic/Logical Instructions


Valid Instructions adc, adcs, add, adds, and, ands, or, ors, sbc, sbcs, sub, subs Instruction Formats <instruction> <dst_reg>, <src_reg1>, <src_reg2>

51

<instruction> <dst_reg>, <src_reg1>, <src_reg2>, <shift> <instruction> <dst_reg>, <src_reg1>, #<expr>

6.2.4 Load/Store Instructions


Valid Instructions ld st Instruction Formats <instruction> <src/dest_reg>, [<base_reg>, <offset_reg>] <instruction> <src/dest_reg>, [<base_reg>, <offset_reg>, shift] <instruction> <src/dest_reg>, [<base_reg>, #<expr>] <instruction> <src/dest_reg>, [<base_reg>, <label>] In the last form of the instruction, the offset is calculated by subtracting the address of the label <label> from the current instruction. It is used with <base_reg> = r7 to allow pc-offset index addressing. For examples, see the test program code.

6.2.5 Branch Instructions


Valid Instructions bal bnv bhi bls bcc bhs bcs blo bne beq bvc bvs bpl bmi bge blt bgt ble Instruction Format <instruction> <label> <label> is translated into an offset from the address of the next instruction and must be in the range of -127 to 128 from the current address.

6.2.6 Instruction Aliases


Some common instructions, not visible in the basic instruction set, are available as aliases: nop cmp <src1_reg>, <src2_reg> cmp <src1_reg>, <src2_reg>, <shift> cmp <src1_reg>, #<expr> mov <dst_reg>, <src_reg> mov <dst_reg>, <src_reg>, <shift> mov <dst_reg>, #<expr> Similar movs instructions are also allowed.

52

6.2.7 Assembler Directives


org [<label>[:]] org <expr> Set the current program address to the value of <expr>. An error is reported if this expression evaluates to less than zero or greater than 65535. The optional <label> is assigned to the new address. The value of <expr> must be resolvable in the 1st pass of the assembler. equ label[:] equ <expr> Bind the value from evaluating <expr> to the identifier <label>. The value of <expr> can take any 32 bit value but must be resolvable in the 1st pass of the assembler. data [<label>[:]] data <list of data_items> Inserts constants at the current program address onwards. <list of data_items> is a comma separated list of the elements: <expr> Any expression. An unadorned expression is truncated to a 16 bit value and occupies a single word. Expressions with a suffix .b, .w or .s, .l represent a byte, a word, and a long word respectively. Long words are stored as two words in a little-endian format. Byte expressions are packed two to a word, least-significant byte first. Any sequence of characters except stored as signed 6-bit values unless the string is suffixed with .b in which the characters are byte packed as above

<string>

align [<label>[:]] align .w [<label>[:]] align .s [<label>[:]] align .l [<label>[:]] align <expr> The program counter is aligned to the nearest word (.w and .s) or long word (.l) or <expr> words. The last form is useful reserving a block of memory. Word alignment is only useful between data statements which are byte packed. Using a label with a data statement has the side-effect of word-aligning the first data element. Comments [<instruction or directive>] ;< comment>

53

Any line can be appended with a comment. However only comments that start in column 1 are echoed to the listing file. Other comments are discarded. Constants The C form of constants are allowed with the addition of binary constants which are introduced by 0b.

6.3 Extending the Instruction Set


There are 8 basic instructions + the modifiers that affect operand-A shifting and the conditional updating of condition codes. Other instructions can be synthesised from the basic instruction set. Note only NOP, MOV and CMP are recognized by the assembler. NOP MOV ra, rb CMP ra, rb ASL ra LSL ra RLC ra ROL ra LSR ra CCF SCF NEG ra CPL ra BL RL RET ADD r0, r0, r0 ADD ra, rb, r0 SUBS r0, ra, rb ADDS ra, ra, ra ADDS ra, ra, ra ADCS ra, ra, ra ADDS ra, ra, ra AND ra, ra, r0 ANDS r0, r0, r0 ADD ra, ra, r0 rrc ANDS r0, r0, r0 ADD r1, r0, #1 ANDS r1, r1, r0, asr SUB ra, r0, ra AND r0, r0, r0 SBC ra, r0, ra ADD r5, pc, #1 BAL <label> ADD pc, r5, #0 ADD r6, r6, #-1 LD r5, [r6, #0] ADD pc, r5, #1 No-OP: do nothing Move register rb to ra Compare registers ra and rb Arithmetic Shift Left Logical Shift Left Rotate Left through Carry Rotate Left Logical Shift Left Clear Carry Flag Set Carry Flag 2s complement of ra 1s complement of ra Branch and Link for leaf procedures r5 is link register = return address Return from link General return

Table 6.1: Extending the Instruction Set 54

55

Appendices

56

57

Appendix A Verilog Top Level Behavioural Model of the STUMP Processor


Introduction
The Verilog listing of the processor module, processor, at the top level is given as follows. It is a behavioural, or the programmers, view of its operation. The algorithmic code describes a fetch instruction phase, an execute phase where the instruction is decoded and the arithmetic/logical operation specified is performed, and a writeback phase where the result computed during the execute phase is used. The ALU result is either written back to the Register Bank. If used as a memory address, data can either be stored to memory from the Register Bank or loaded from memory into the Register Bank depending on the instruction. The code given is incomplete and the task in the second execise is to complete the high level code for the processor so that the model runs the given test programs successfully. The instruction fetch and the code for the writeback is complete and should not be altered. However, the code for the Execute phase is missing and this is the code which needs to be completed. The code for the high level model calls functions and tasks. These are listed after high level model. Functions and tasks are passed parameters and perform some operation. Functions return a result. Tasks are similar to procedures in that they operate on parameters and can modify them during task execution. Both functions and tasks may declare local variables to assist with their operation. N.B. If the code stored in the files differs from that shown here, it should be assumed that the stored code is correct (i.e. it has been updated).

58

// Verilog HDL for STUMP processor processor_v module processor (BUS_A, RAM_CS, BUS_RD, BUS_WR, BUS_D, BUSCLK, GSR); //processor to memory signals : output [15:0] BUS_A; output RAM_CS; output BUS_RD; output BUS_WR; inout [15:0] BUS_D; // processor signals input BUSCLK; input GSR; reg reg reg reg reg reg reg reg reg //processor clock // reset signal to processor //address bus // memory chip select // memory read // memory write // data bus

[15:0] D_OUT; [15:0] BUS_A; RAM_CS, BUS_RD, BUS_WR; [15:0] INSTR; [15:0] REG_BANK [7:0]; [15:0] RD_A, ALUA, ALUB, S; [3:0] CC ; [15:14] C; CSH;

wire [15:0] PC; assign BUS_D = D_OUT; assign PC = REG_BANK[7]; // PC is an alias for REG_BANK[7] // Used for debug only - do not // use PC in your code always begin if(GSR == 0) begin // Fetch State Memory_Read(REG_BANK[7], INSTR); REG_BANK[7] = REG_BANK[7] + 1; // Execute State if(INSTR[15:13] == 3b111) begin // branch instruction // Get instr pointed to by PC // add +1 to PC as soon as instr // fetched

// // put your code here to form ALU inputs ALUA and ALUB // note that the ALUA input is the output from the shifter // end else if(INSTR[12] == 1b1) begin // type 2 instruction

// // put your code here to form ALU inputs ALUA and // ALUB for type 2 instrs // end

59

else begin

// type 1 instruction

// // put your code here to form ALU inputs ALUA and // ALUB // end // op decode case (INSTR[15:13]) 0 : Add(ALUA, ALUB, 1b0, S, C); // add instr done for you // // put your code here to form ALU result S and carry // bits C14 and C15 if needed // endcase // // put your code here to update the condition code // register. // // Write state case (INSTR[15:13]) 3b111 : if (Testbranch(INSTR[11:8], CC) == 1) REG_BANK[7] = S; 3b110 : if (INSTR[11] == 1) Memory_Write(S,REG_BANK[INSTR[10:8]]); else begin Memory_Read(S, REG_BANK[INSTR[10:8]]); REG_BANK[0] = 0; end default : begin REG_BANK[INSTR[10:8]] = S; REG_BANK[0] = 0; end endcase end else // reset state begin RAM_CS = 1; wait (GSR == 0) begin REG_BANK[7] = 0; REG_BANK[0] = 0; CC = 0; end end end // end of always

60

// start of tasks and functions task Memory_Write;

/////////////////////////////////////////

// writes data on DMW to memory address AMW input [15:0] AMW, DMW;

begin RAM_CS = 0; BUS_RD = 1; BUS_WR = 1; BUS_A = AMW; D_OUT = DMW ; #25 BUS_WR = 0; #50 BUS_WR = 1; #25 RAM_CS = 1; end endtask task Memory_Read; //reads memory address AMR and places data on DMR input [15:0] AMR; output [15:0] DMR; begin RAM_CS = 0; BUS_RD = 1; BUS_WR = 1; BUS_A = AMR; D_OUT = 16hzzzz ; #25 BUS_RD = 0; #50 DMR = BUS_D ; #25 BUS_RD = 1; RAM_CS = 1; end endtask task Add; // adds a 1-bit carry in Cin to two 16-bit quantities A and B // produces 16-bit sum S and carry out C from addition in bits 14 and 15 input [15:0] A, B; input CIN ; output [15:0] S; output [15:14] C; reg [16:0] RESULT; begin RESULT = A[14:0] + B[14:0] + CIN ; C[14] = RESULT[15]; RESULT[16:15] = A[15] + B[15] + C[14]; S = RESULT[15:0]; C[15] = RESULT[16]; end endtask

61

task Shift; // shifts input A and a carry in Cin according to shift type INSTR[1:0] // produces shifter output ASH and shifter carry out CSH input [15:0] input [1:0] input output [15:0] output CSH; A; INSTR; CIN; ASH;

begin case (INSTR) 0 : begin CSH 1 : begin CSH 2 : begin CSH 3 : begin CSH endcase end endtask

= = = =

0; A[0]; A[0]; A[0];

ASH ASH ASH ASH

= = = =

A A A A

; end >> 1 ; ASH[15] = A[15] ; end >> 1 ; ASH[15] = A[0] ; end >> 1 ; ASH[15] = CIN ; end

function Testbranch; // compares branch condition INSTR[11:8] with cond code reg // returns 1 if branch to be taken, returns 0 if jump not taken input [11:8] BRANCH_INSTR; input [3:0] CC; reg N, Z, V, C; begin {N,Z,V,C} = CC; case (BRANCH_INSTR) 0 : Testbranch = 1; 1 : Testbranch = 0; 2 : Testbranch = ~(C|Z); 3 : Testbranch = C|Z; 4 : Testbranch = ~C; 5 : Testbranch = C; 6 : Testbranch = ~Z; 7 : Testbranch = Z; 8 : Testbranch = ~V; 9 : Testbranch = V; 10 : Testbranch = ~N; 11 : Testbranch = N; 12 : Testbranch = V~^N; 13 : Testbranch = V^N; 14 : Testbranch = ~((V^N)|Z) ; 15 : Testbranch = ((V^N)|Z) ; endcase end endfunction endmodule

62

63

Appendix B Verilog Control Description


The following is very incomplete Verilog code for the RTL description of a non-pipelined control unit for the STUMP. It is your task in exercise 5 to complete the code required.
// Stump Control unit // Original:ADP 9/5/06 // Last modified:ADP 11/5/06 module control ( LDCC, LDREG, EXE, BR, FETCH, IMMED, MEMOE, MEMWR, SALUD, SRPA, SRPB, SWP, CC, CLK, INSTR, RESET ); //--------Input ports-------input CLK, RESET; input [15:2] INSTR; input [3:0] CC; //-------Output ports-------output BR, FETCH, EXE, IMMED, LDCC, LDREG, MEMOE, MEMWR, SALUD; reg BR, FETCH, EXE, IMMED, LDCC, LDREG, MEMOE, MEMWR, SALUD; output [2:0] SRPA, SRPB, SWP; reg [2:0] SRPA, SRPB, SWP; //--------Internals--------reg [1:0] state;

64

// Control of finite state machine always @ (posedge CLK) begin if (RESET == 1) state = 0; else if (state == 3) state = 1; else state = state + 1; end // Control of state driven combinatorial logic always @ (state, INSTR, CC) begin case (state) 0: // Reset cycle begin // put your code here for signals set in reset phase // put your code here for dont care signals in the reset state end 1: // Fetch cycle // Fetches instruction from memory, loads into // instruction register & increments program counter begin // put your code here for signals that need to be set in the fetch // phase put your code here for dont care signals in the fetch state end 2: // Execute cycle

// Instruction is decoded and executed // Instruction may be: load/store (Type I or Type II) // or a Branch // or an ALU operation (Type I or Type II) begin // you need to decode the instruction and then set signals // appropriately to a value or dont care for that instruction end 3: // Write back cycle

begin // Instruction is decoded to determine whether // the output of the ALU should be written back // to memory or register bank or discarded and signals are set to //a value or dont care appropriately end

default: // All signals set to begin MEMWR = bx; MEMOE = bx; FETCH = bx; EXE = bx; LDREG LDCC = bx; IMMED = bx; BR = SRPA = 3bxxx; SRPB = 3bxxx; end endcase end

dont care = bx; bx; SALUD = bx; SWP =3bxxx;

function Testbranch; //returns 1 if branch taken, 0 otherwise input [11:8] BRANCH_INSTR; input [3:0] CC; reg N, Z, V, C; begin {N,Z,V,C} = CC; case (BRANCH_INSTR) 0 : Testbranch = 1; 1 : Testbranch = 0; 2 : Testbranch = ~(C|Z); 3 : Testbranch = C|Z; 4 : Testbranch = ~C; 5 : Testbranch = C; 6 : Testbranch = ~Z; 7 : Testbranch = Z; 8 : Testbranch = ~V; 9 : Testbranch = V; 10 : Testbranch = ~N; 11 : Testbranch = N; 12 : Testbranch = V~^N; 13 : Testbranch = V^N; 14 : Testbranch = ~((V^N)|Z) ; 15 : Testbranch = ((V^N)|Z) ; endcase end endfunction endmodule

// // // // // // // // // // // // // // // //

BAL BNV BHI BLS BCC BCS BNE BEQ BVC BVS BPL BMI BGE BLT BGT BLE

66

67

Appendix C STUMP Processor: RTL Design


This appendix describes the RTL (Register Transfer Level) design of the STUMP processor. It contains an RTL schematic of the processor showing its component parts, where the Bus Interface and datapath components have already been designed for you. Your task in exercise 5 is to complete the Verilog specification of the Control Block. The entire processor design will then be complete and can be simulated with the same test program as was used for the top level design.

C1. RTL Datapath Design


The schematic in Figure C1 shows a RTL schematic of the STUMP processor, Figure C2 shows the control block. The address for the instruction fetch is kept in the Program Counter (Reg7). This is sent to memory when a fetch is performed. Instructions from memory are clocked in to the Instruction Register (INSTR) at the end of the clock phase (on the positive clock edge); the Program Counter is also incremented at this time. Instruction execution is split in to two phases. In the execute phase, a register (R0 to R7) is read on port A and B of the Register Bank. The port A data is optionally shifted to form the ALU A operand. The ALU B operand is either supplied by port B of the Register Bank or by the immediate data in the instruction register, which is sign extended to 16 bits. The ALU operation specified in the instruction is performed and its result is clocked in to the Result Register (ALUR) at the end of the execute phase (on the positive clock edge). The condition bits are also clocked in to the 4-bit Condition Code Register (CC) at this time if it is enabled. The writeback phase completes the instruction execution. Usually the ALU result is written back to the Register Bank. Here, the register to be written is specified as a 3-bit address (R0 to R7) and this register is enabled by the Write signal (LDREG). Since R0 contains zero, writing to R0 has no effect. The write occurs at the end of the phase (on the positive clock edge). A branch instruction which is not taken can write to R0 instead of R7 (or can make the write enable to the Register Bank inactive). Load and store instructions operate differently in their writeback phase. Here, the ALU result is used as the memory address, MA[15:0]. For a load instruction, the memory is read (MEMOE = 0 i.e. is active) and the memory output, MDIN[15:0], placed in the specified register in the Register Bank. For a store instruction, the destination register specified in the instruction is read onto Port A and written to memory (MEMWR = 0) at the address given by the ALU Result Register.

68

Figure C1: Register Transfer Level Design of the STUMP Processor

69

Figure C2: STUMP Processor control block

Figure C3: Bus Interface Signals

70

Notes: 1. The clock is applied to all registers at all times but the clock is ignored unless the register is enabled. 2. The control signals to the memory are all active low. 3. Signal names have been defined for the Control Block. Please also use these names in your Verilog code. 4. Although Verilog is case sensitive with regard to signal names, other tools in the flow are not. Hence, you should be consistent with the use of upper/lower case in your names and signal names should be unique.

C2. Bus Interface Component


All processor signals to/from memory or the outside world proceed via the Bus Interface Component. The Bus Interface signals to/from memory or other external devices are shown in Figure C3, and are described below: BUS INTERFACE SIGNALS TO/FROM MEMORY AND OUTSIDE WORLD BUS_D[15:0] bidirectional bus between the Bus Interface and the memory BUS_A[15:0] 16-bit address to memory BUS_WR active low signal which writes to memory BUS_RD active low signal which reads from memory RAM_CS active low signal which enables the memory. It is tied low so the memory is enabled all the time. BUSCLK clock generated by a Clock Module GSR global reset Apart from the clock, the signals above are generated either by the datapath elements or the Control Block. They are listed below: DATAPATH SIGNALS TO BUS INTERFACE MA[15:0] 16-bit address to Bus Interface MDIN[15:0] 16-bits of data from Bus Interface A_RD[1:0] 16-bits of data to Bus Interface

71

CONTROL BLOCK SIGNALS TO BUS INTERFACE MEMWR active-low signal to Bus Interface when memory is to be written MEMOE active-low signal to Bus Interface when Memory is to be read BUS INTERFACE TO DATAPATH &CONTROL BLOCK CLK the clock signal which goes to all flip flops in the processor. RESET reset signal - high when active

C3. Control Block Signals


Apart from the RESET signal, which is a global signal applied to both the datapath and Control Block prior to operating the processor, the remaining signals to/from the Control Block are internal to the processor. They can be partitioned into signals to and from the datapath, and they are summarised below: DATAPATH SIGNALS TO CONTROL BLOCK INSTR[15:2] the most significant 14 bits of the Instruction Register CC[3:0] the 4-bit Condition Code Register CONTROL BLOCK SIGNALS TO DATAPATH BR indicates a branch instruction. It is used by the Sign Extender element to extend the least significant 8-bits while 5 bits are extended if BR is low. FETCH indicates an instruction fetch is occurring EXE indicates the Execute phase of operation i.e. data is read from the Register Bank, operated upon and the result placed in the Result Register. SRPA[2:0] 3-bit address specifying the register (R0 to R7) to be read out onto port A of the Register Bank SRPB[2:0] 3-bit address specifying the register (R0 to R7) to be read out onto port B of the Register Bank SWP[2:0] 3-bit address specifying the register in the Register Bank (R0 to R7) to be written to IMMED selects the sign immediate operand rather than he Register Bank port B operand as the B-input to the ALU for the Type 2 & 3 operations LDCC enables the loading of the 4-bit Condition Code Register LDREG enables writing to a register in the Register Bank

72

SALUD (Select ALU Data) selects data from the Result Register for writing into the Register Bank. If SALUD is low, data from the memory (MDIN[15:0]) is selected instead.

73

Appendix D Cadence End User Agreement

74

75

76

77

78

79

80

81

Appendix E Exercise Answer Sheets

82

83

Exercise 1: STUMP Assembler


Name: Assembler code sequence: Initial register states: R0[15:0]=0000000000000000=0x000=0 R1[15:0]= R2[15:0]= R3[15:0]= R4[15:0]= R5[15:0]= R6[15:0]= R7[15:0]= CC[3:0]= memory address of first instruction in sequence =

84

Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal

Memory Address (decimal)

Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal

Memory Address (decimal)

85

Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal

Memory Address (decimal)

Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal

Memory Address (decimal)

86

Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal

Memory Address (decimal)

Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal

Memory Address (decimal)

87

Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal

Memory Address (decimal)

Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal

Memory Address (decimal)

88

Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal

Memory Address (decimal)

Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal

Memory Address (decimal)

89

Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal

Memory Address (decimal)

Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal

Memory Address (decimal)

90

Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal

Memory Address (decimal)

Assembler Instruction: Memory Address of Instruction Instruction Register R0 R1 R2 R3 R4 R5 R6 R7=PC CC Result Register for a Store Instruction Data Written (decimal) binary 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x 0x hex decimal

Memory Address (decimal)

91

Exercise 4: Signal usage charts for control


Name: FETCH EXE LDREG SRPA[2:0] SRPB[2:0] SRP[2:0] BR IMMED LDCC SALUD MEMOE MEMWR

ALU Operation

Branch

Load

Store

Phase 1: Instruction Fetch

ALU Operation
Reg op Reg Reg Reg op Immed Reg

Branch
PC + Immed PC

Load
Reg+Reg Addr; [Addr] Reg Reg+Immed Addr; [Addr] Reg

Store
Reg+Reg Addr; [Addr] Reg Reg+Immed Addr; [Addr] Reg

FETCH EXE LDREG SRPA[2:0] SRPB[2:0] SRP[2:0] BR IMMED LDCC SALUD MEMOE MEMWR

92

Phase 2: Decode/Execute

ALU Operation
Reg op Reg Reg Reg op Immed Reg

Branch
PC + Immed PC

Load
Reg+Reg Addr; [Addr] Reg Reg+Immed Addr; [Addr] Reg

Store
Reg+Reg Addr; [Addr] Reg Reg+Immed Addr; [Addr] Reg

FETCH EXE LDREG SRPA[2:0] SRPB[2:0] SRP[2:0] BR IMMED LDCC SALUD MEMOE MEMWR

Phase 3: Writeback

Note: in Signal Usage Charts, use for dont care.

93

Copy of Signal Usage Charts (make a copy and keep)


(for use in Exercise 5) FETCH EXE LDREG SRPA[2:0] SRPB[2:0] SRP[2:0] BR IMMED LDCC SALUD MEMOE MEMWR

ALU Operation

Branch

Load

Store

Phase 1: Instruction Fetch

ALU Operation
Reg op Reg Reg Reg op Immed Reg

Branch
PC + Immed PC

Load
Reg+Reg Addr; [Addr] Reg Reg+Immed Addr; [Addr] Reg

Store
Reg+Reg Addr; [Addr] Reg Reg+Immed Addr; [Addr] Reg

FETCH EXE LDREG SRPA[2:0] SRPB[2:0] SRP[2:0] BR IMMED LDCC SALUD MEMOE MEMWR

Phase 2: Decode/Execute

94

ALU Operation
Reg op Reg Reg Reg op Immed Reg

Branch
PC + Immed PC

Load
Reg+Reg Addr; [Addr] Reg Reg+Immed Addr; [Addr] Reg

Store
Reg+Reg Addr; [Addr] Reg Reg+Immed Addr; [Addr] Reg

FETCH EXE LDREG SRPA[2:0] SRPB[2:0] SRP[2:0] BR IMMED LDCC SALUD MEMOE MEMWR

Phase 3: Writeback

Note: in Signal Usage Charts, use for dont care.

95

You might also like