Professional Documents
Culture Documents
Design of a 32-bit RISC Microprocessor with Floating Point Unit Design of a Floating Point Unit
Author: Adam Parsons S/N: 100653270
ELEC 4907
Supervisor: M. Shams
April 5, 2010
Microprocessor Design
April 5, 2010
Abstract
This fourth year project presents and examines the design of a microprocessor. The project is to design a 32-bit RISC microprocessor with a floating point unit. The design presented includes contributions from Zain Zia, Chaiya See-toh, and Adam Parsons. This report covers the topics of professional engineering practices as well as project management techniques, but it centers mainly on the microprocessor, and its design. It provides background and information on the microprocessor and its importance to todays society. The more technical portion of the report focuses heavily upon the Floating Point Unit which is can be viewed as a coprocessor to the microprocessor that was designed. It starts by focusing on the understanding of how a microprocessor operates, which is then followed by a more in depth study of how a floating point unit is designed and operated. Furthermore, the results of the successful digital design testing are presented and explained, with suggestions of improvements and further optimization techniques.
ii
Microprocessor Design
April 5, 2010
Acknowledgements
My immediate thanks go to Maitham Shams (project supervisor), for his constant guidance. Under his instruction for this project, I have gained valuable skills that can be applied in the workplace.
I would also like to thank my group members Chaiyas See-toh and Zain Zia, of whom without this project would not have been completed. Their patience and dedication to hard work made this project a success, and they were indeed a true pleasure to work with.
I would also like to thank all those I have met in my abundance of years at Carleton University. You have all kept me on the right track, as you constantly remind me of things I had often forgotten. I would also like to thank the creators of ASICWORLD.com, as well as AJDESIGNER.com, for without their guidance, I would be lost in the language of Verilog and floating point calculations.
Most of all I would like to thank my parents, who patiently stood by me in all my years of studies, although they dont always understand what I am supposed to be learning. April 2010 Adam Parsons iii
Microprocessor Design
April 5, 2010
This is for those who are patient. Were here for the long haul.
iv
Microprocessor Design
April 5, 2010
Table of Contents
Abstract ................................................................................................................................ii Acknowledgements.............................................................................................................iii Table of Figures .................................................................................................................. vii Table of Equations ............................................................................................................. vii Table of Tables .................................................................................................................. viii List of Abbreviations ......................................................................................................... viii 1.0 Introduction .................................................................................................................. 1 1.1 Purpose ......................................................................................................................... 1 1.1.1 Motivation .............................................................................................................. 1 1.1.2 Applications ............................................................................................................ 2 1.2 Report Overview ........................................................................................................... 2 2.0 Health and Safety ...................................................................................................... 4 2.1 Engineering Professionalism ..................................................................................... 6 2.2 Project Management................................................................................................. 7 3.0 Project Overview ........................................................................................................... 9 3.1 Design Specifications .................................................................................................. 10 3.2 Design Methodology ................................................................................................... 12 4.0 Background of Floating Point Representation ............................................................ 14 4.1 Floating Point Unit ...................................................................................................... 18 4.2 Addition and Subtraction ............................................................................................ 19 4.2.1 Addition ................................................................................................................ 22 4.2.2 Subtraction ........................................................................................................... 23 4.3 Multiplication and Division ......................................................................................... 24 4.3.1 Multiplier .............................................................................................................. 26 4.3.2 Division ................................................................................................................. 28 4.4 Float to Integer ........................................................................................................... 30 4.5 Integer to Float ........................................................................................................... 32 4.6 Power Approximation ................................................................................................. 33 4.7 Square-Root ................................................................................................................ 38 4.8 Floating Point Control Unit ......................................................................................... 39 5.0 Digital Testing.............................................................................................................. 42 v
Microprocessor Design
April 5, 2010
5.1 Structural Analysis....................................................................................................... 42 5.2 Timing Analysis............................................................................................................ 44 5.3 Implementation .......................................................................................................... 45 6.0 Concluding Remarks.................................................................................................... 47 6.1 Summary of Project Accomplishments ....................................................................... 47 6.2 Considerations for Future Work ................................................................................. 48 References ........................................................................................................................ 49 Appendix A: Verilog Design Code ..................................................................................... 50 Addition Module ........................................................................................................... 50 Subtraction Module ...................................................................................................... 53 Normalization Module............................................................................................... 56 24- bit Addition Module ............................................................................................ 58 Multiplication Module................................................................................................... 60 Division Module ............................................................................................................ 63 Floating Point to Integer Conversion Module ............................................................... 65 Integer to Floating Point Conversion Module ............................................................... 68 Power Module ............................................................................................................... 70 Square Root Module ..................................................................................................... 72 Control Module ............................................................................................................. 73 Appendix B: Digital Testing Results................................................................................... 77 Standard Case Waveforms ............................................................................................ 77 Corner Case Tables ........................................................................................................ 79
vi
Microprocessor Design
April 5, 2010
Table of Figures
FIGURE 1: PROJECT SCHEDULE ........................................................................................................................8 FIGURE 2: PROCESSOR OVERVIEW ..................................................................................................................9 FIGURE 3: WORKLOAD PARTITIONING CHART ..............................................................................................12 FIGURE 4: FLOATING POINT BINARY .............................................................................................................16 FIGURE 5: FLOATING POINT BLOCK DIAGRAM ..............................................................................................19 FIGURE 6: ADDITION/SUBTRACTION MODULE .............................................................................................21 FIGURE 7: CARRY LOOK-AHEAD ADDER ........................................................................................................22 FIGURE 8: TWO'S COMPLIMENT ...................................................................................................................23 FIGURE 9: MULTIPLIER AND DIVIDER MODULE .............................................................................................25 FIGURE 10: MULTIPLICATION ALGORITHM ...................................................................................................26 FIGURE 11: MULTIPLICATION BLOCK DIAGRAM............................................................................................28 FIGURE 12: DIVISION BLOCK DIAGRAM.........................................................................................................29 FIGURE 13: DIVISION ALGORITHM ................................................................................................................30 FIGURE 14: FLOAT TO INTEGER BLOCK..........................................................................................................31 FIGURE 15: INTEGER TO FLOAT DIAGRAM ....................................................................................................32 FIGURE 16: LOG2 VS IEEE ESTIMATE .............................................................................................................34 FIGURE 17: POWER UNIT...............................................................................................................................37 FIGURE 18: SQUAREROOT UNIT ....................................................................................................................39 FIGURE 19: FLOATING POINT CONTROL UNIT ...............................................................................................40 FIGURE 20: ALTERA DE2 IMPLEMENTATION .................................................................................................46
Table of Equations
EQUATION 1 ..................................................................................................................................................32 EQUATION 2 ..................................................................................................................................................32 EQUATION 3 ..................................................................................................................................................33 EQUATION 4 ..................................................................................................................................................35 EQUATION 5 ..................................................................................................................................................36 EQUATION 6 ..................................................................................................................................................36 EQUATION 7 ..................................................................................................................................................36 EQUATION 8 ..................................................................................................................................................36 EQUATION 9 ..................................................................................................................................................38 EQUATION 10 ................................................................................................................................................38
vii
Microprocessor Design
April 5, 2010
Table of Tables
TABLE 1: IEEE-754 SPECIAL REPRESENTATIONS ............................................................................................17 TABLE 2: LOG ESTIMATE ERROR ....................................................................................................................35 TABLE 3: LOG ESTIMATE ERROR CORRECTION ..............................................................................................35 TABLE 4: STANDARD TEST CASE ....................................................................................................................43 TABLE 5: SPECIAL TEST CASES .......................................................................................................................43 TABLE 6: FAST TIMING ANALYSIS ..................................................................................................................44 TABLE 7: SLOW TIMING ANALYSIS.................................................................................................................45
List of Abbreviations
CPU RISC FPGA OPCODE ALU FPU MIPS NaN INF FMAX TCO TH TSU
Central Processing Unit Reduced Instruction Set Computer Field Programmable Gate Array Operational Code Arithmetic Logic Unit Floating Point Unit Microprocessor without Interlocked Pipeline Stages Not a Number Infinity Maximum Frequency Clock Output Time Hold Time Clock Setup Time
viii
Microprocessor Design
April 5, 2010
Chapter 1
1.0 Introduction
The purpose of this report is to present and examine the design of a microprocessor. The project is to design a 32-bit RISC microprocessor with a floating point unit. The design presented includes contributions from Zain Zia, Chaiya See-toh, and Adam Parsons.
1.1 Purpose
Microprocessors are extremely small electrical devices built on an integrated circuit. They are the cornerstone that todays automated systems are built upon. Most notably the microprocessor is used in the common computer; be it either a PC or a MAC. There are many more applications of it in the modern world, and there is often a microprocessor design specifically for that task. Their uses can range from simple household devices such as washing machines and mobile phones to the automatic check-in booths in the airport.
1.1.1 Motivation
As the microprocessor becomes more integrated into every aspect of daily life, it becomes more important to understand the design and implementation of the device. This allows for improvements and optimizations in order to maintain a competitive 1
April 5, 2010
applications of microprocessors require them to be faster, precise and designed with minimal hardware.
1.1.2 Applications
The 32-bit RISC microprocessor with floating point unit is a more specialized device, but it still maintains a wide range of possible implementations. It can store and manipulate large data sets, and handle real number calculations that may be necessary in the field. These applications would tend to be directed to math-intensive operations, such as data processing. With a more specialized functionality that provides faster and more accurate outputs compared to a general microprocessor. Due to the specialty of the processor it is often encouraged to implement it as part of a multi-core processing set. This particular processor can be implemented within web controllers, graphics processors, as well as mobile GPS devices.
Microprocessor Design
April 5, 2010
requirements are met. It also addresses the engineering professionalism pertaining to the project, through project management, workload partitioning, as well as workplace synergy. Chapter 3 will begin to present you with the more technical aspect of the microprocessor and its design. This chapter addresses the overview of the project, providing background information regarding the microprocessor, as well as design specifications, and the partitioning of the actual microprocessor components in relation to each project member. The specialized main topic of the project is presented within Chapter 4. For this specific report it will provide in depth technical details regarding the floating point unit. The individual modules of the device will be explained, and the algorithms and optimizations that were used to produce a high performing floating point unit. In Chapter 5 the results from the digital design testing are displayed and analyzed. This chapter also contains explanations for performance analysis and performance restrictions of the floating point unit. Chapter 6 concludes the report by summarizing the projects work and accomplishments, and possible applications for the 32-bit RISC micro processor with floating point unit, or even just simply the floating point coprocessor. This chapter also states proposals for future improvements to be made to the processor.
Microprocessor Design
April 5, 2010
Chapter 2
2.0 Health and Safety
Microprocessors are relatively safe devices to operate, but within the computer design lab it is still important to follow and respect general health and safety principles as regulated by the Carleton University Health-And-Safety document. Some of the relevant health and safety principle from the document include: usage of personal protective equipment at all times, using the equipment only for its designed purpose, keeping the lab supervisor informed of any unsafe condition, keeping track of the location and correct use of safety equipment, determining potential hazards and appropriate safety precautions before beginning new operations.
As the microprocessor was implemented and tested on the ALTERA DE2 Development Board, extra precautions were needed to be considered to ensure a safe work environment. The following measures ensure that the board operates within its normal operating conditions while maintain the health and safety of all project members.
Microprocessor Design
April 5, 2010
Automatic testing was incorporated to check the integrity of the following units before the first execution: systems Memory Units (RAM and ROM), Input and Output signal processing circuitry, the Arithmetic Logic Unit (ALU), Control Unit, and Registers.
Software was developed which during predetermined time intervals monitors for electrical parameters such as Current or Voltage in the Circuit. When fault is sensed it sends a signal to the board which halts further execution and terminates the program. This circuitry continually tests for proper supply voltage to the microprocessor.
Overcurrent is an abnormal current greater than the full load value of the circuit. This can occur due to short-circuitry or overload currents in any unit.
Overload is an overcurrent which persists long enough to cause dangerous overheating. This can occur during long start time, during multiple restarts in a short interval and if the normal duty cycle of the processor is exceeded.
An Alarm Signal is generated by the board and the program execution is halted if an overload was to occur.
The board was implemented in such a fashion so that failure to execute the program disconnects the Voltage Source to prevent any false leakage of Current.
An asynchronous Reset Signal for the Microprocessor was designed for manual override to reset all units in case of a danger of overload.
Microprocessor Design
April 5, 2010
Microprocessor is designed so that the algorithm cant be altered by anyone except by the designers themselves.
Microprocessor Design through emails as to keep each other up to date with status reports, and questions regarding project difficulties/confusion. Although during the development of a microprocessor there are reduced
April 5, 2010
chances for unprofessional behavior there was none that had truly impeded the quality of work, or professional decisions that had to be made for the completion of the project. Each group members professional responsibilities aided in meeting each members individually designated goals. It has also enabled the achievement of the groups goal which was to successfully designing a microprocessor.
Microprocessor Design
April 5, 2010
member arrived at a difficult design decision or had any other difficulty either of the other group members had been able to assist. The ability to perform the project is not something that could truly fall under project management of the group. This ability rests heavily upon the individual group member as the software required to complete the project is available in several laboratories within the Department of Electronics at Carleton University; a free web service of the program was also available for use at home. The performance expectations were clearly displayed within the initial project proposal as shown in Figure 1 below.
The partitioning of the workload relating to the project was decided during one of the initial group meetings that were supervised by M. Shams. Each portion designated was selected or compromised by the individual group members as to encourage each individual to work in the field that sparks the most personal interest, which would therefore increase workflow productivity. 8
Microprocessor Design
April 5, 2010
Chapter 3
3.0 Project Overview
Before discussing the more technical side of the design of a 32-bit RISC microprocessor with floating point unit, it important to receive a clear overview of the components of a microprocessor. A simple microprocessor is built from five basic integrated blocks as shown in Figure 2. These are: Inputs/Outputs Memory Datapath Control Unit Arithmetic Logic Unit
Microprocessor Design
April 5, 2010
Figure 2 clearly shows the organization of the microprocessor, which is consistent throughout all types of processors. Every processor performs the same basic functions of fetching decoding and executing, which require all of the five necessary blocks. The processor receives instructions from the Memory, which is responsible for storing the instruction sets as well as data sets. The flow of data between the Memory to the processor follows the implementation of the Datapath. The Datapath interprets the instruction signals between the Control Unit, Memory, as well as the Input/Output devices. This interpretation of data is regulated by the Control Units output signals which then branch to the Input/Output devices. The input and output devices, usually consist of hardware such as a keyboard, or a graphics display.
Microprocessor Design
April 5, 2010
R-type
Instruction
Arithmetic
Instructions
(Addition,
Subtraction,
Multiplication and Division of two operands) and Logical Instructions (A Comparison of two operands). Branch Instruction Makes a jump to the provided Memory address by comparing two operands. Operands are compared for equality and if they are equal the branch is executed. Load Instruction Loads a data word from Memory into one of the specified registers in the processor. Store Instruction Stores a data word from a specified register into the specified Memory address.
11
Microprocessor Design
April 5, 2010
The design follows the Von Neumann Architecture, which follows the standard FETCH DECODE EXECUTE pattern of microprocessors. This particular architecture allows the instructions and data to be stored within the same memory. This particular architecture has been chosen due to its highly-optimized instruction set, high performance implementations, programmability (easy to express programs) and 12
Microprocessor Design
April 5, 2010
reduction in the required hardware. It does this by sharing the functional units, while also implementing pipelining, and as a result a smaller silicon size chip with a lower operating power can be fabricated.
13
Microprocessor Design
April 5, 2010
Chapter 4
4.0 Background of Floating Point Representation
Many basic microprocessors are unable to handle real number arithmetic, but only integer manipulations. Real number manipulation allows for the processors to handle rational, as well as possibly irrational numbers. This is very important for data analysis and manipulation of various signals within Digital Signal Processing (DSP) devices. An important part of handling real numbers is scientific notation, which is a form of handling real numbers that may be too large to be conveniently expressed in decimal notation. This notation is presented as [fraction] x 10[exponent] [real] x 10[integer] More often than not scientific notation is expressed in its normalized format. This is the format of when the most significant integer is of the real number is the only one to the left hand side of the decimal point. This allows for easy comparison of the magnitudes of two numbers as they are expressed solely within the exponent of the notation.
14
April 5, 2010
5.73ten x 10 -4
235.9722 x108
The floating point representation of real binary values allows microprocessors to manipulate real numbers. This notation deals with the fractions created by real numbers through the placement of binary points 1 as well as scientific notation.
Examples of real binary numbers: 110111.11two = 55.75 1011two x 23 1.0001two x 2-7 (scientific notation) (normalized scientific notation)
There are different formats for handling floating point binary, such as MIPS and IEEE-754 standards. In the design of a floating point unit these both require specific sizes of both the exponent and fraction. The size of the exponent and fraction (commonly referred to as mantissa) are determined by the size of the fixed word. A large exponent
Binary point is the binary term for a decimal point, as we are now working in binary notation instead of
decimal notation
15
Microprocessor Design
April 5, 2010
would be ideal for a large range of numbers, while a larger size of the fraction allows for a more precise representation of the numbers within the reduced range. For a 32-bit word neither of these are much of problem as there is a relatively large range, with capabilities of significant precision.
MIPS floating point representation was designed by MIPS Technologies (-1)sign x [fraction] x 2[exponent]
EXPONENT [8 bits]
This format allows for 23-bits to express the fraction, with 8-bits expressing the exponent. The exponent holds a bias of 127, which allows for the exponential to range from +127 to -127.
MIPS may not have many limitations but it is not the best representation for floating point numbers for binary computing. A more commonly used standard is the IEEE-754 representation of floating point binary. (-1)sign x [1+fraction] x 2[exponent]
16
Microprocessor Design
April 5, 2010
It still uses the 32-bit format expressed like the MIPS, but the format assumes that the fraction is constantly normalized, which enables the most significant bit to be implied. This hidden bit allows for the fraction to actually be 24-bits instead of 23-bits long.
This format is preferred over the MIPS format mainly because it allows for special representations of certain values such as Inf, and NaN to prevent interrupts.
These special representations do not cover overflow and underflow exceptions. Overflow occurs when the exponent is too large to be represented, while underflow occurs when the negative exponent is also too large to be represented.
17
Microprocessor Design
April 5, 2010
Many of the algorithms that were utilized throughout the design of the floating point unit were created through basic arithmetic that can be done by hand.
18
Microprocessor Design
April 5, 2010
19
April 5, 2010
Compare Exponent of two numbers and shift the smaller number to the right until exponents match The shift allows the two numbers to have the same exponent which enables the numbers to the easily added together with a basic arithmetic adder/subtractor that could be designed from an ALU. Step 2: Add or Subtract significands The specific addition/subtraction function module is called in respect to the instruction implemented.
Step 3: Normalize the sum by shifting right or left Normalization of the sum adjusts for over flow or underflow. This must be done as each floating point number is normalized as to maintain consistency of arithmetic algorithms.
20
Microprocessor Design Rounding the significand can be done to increase accuracy, but it was decided that it would delay the operational speed of the device, in comparison to the relatively high accuracy that can be determined from a 22bit mantissa. Truncation was performed instead, as to maintain the high speeds that the unit can operate within.
April 5, 2010
21
Microprocessor Design
April 5, 2010
4.2.1 Addition
The addition of the significands can be done for the sake of simplicity with a basic Carry-Save Adder (CSA). However, a Carry-Look Ahead Adder (CLA) produces results faster as it calculates both the propagate and the generate signals for the group to avoid waiting for the ripple to determine the first groups generated carry. The group generate signal is the signal that generates the summation by passing the two signals through an AND gate. This is done in parallel with the group propagation signal is the signal that determines if the signal will pass along. This signal is created by passing the group inputs through an OR gate.
In this project a 24 bit CLA Adder was used as to increase the speed of the function.
22
Microprocessor Design
April 5, 2010
4.2.2 Subtraction
The subtraction of the significands utilized the CLA used in the previous module. As the difference between addition and subtraction is minimal it was very elementary to change the addition module into a subtraction module.
The only technical change from the addition to subtraction was the mantissa of the subtractor was converted into a negative value through twos compliment manipulation.
23
Microprocessor Design
April 5, 2010
The as with the Addition/Subtraction modules the Multiplication and Division modules follow similar premises when dealing with floating point notation.
Step 1:
Addition/Subtraction exponents without bias The exponents are added or subtracted together, just as if this was done by hand.
Step 2:
Manipulation of Significands Multiplication or Division of the significands is done at this stage, where a separate module is called to perform the specified operation.
Step 3:
Check if Normalized and for Overflow As binary multiplication/division produces an output that is a summation of the sizes of the inputs, it is important to check if the product/quotient is normalized, as well as the exponents being check for overflow.
24
Microprocessor Design Step 4: Rounding or Truncation Due to the large size of the mantissa, as well as for the sake of speed, truncation was chosen to occur as it was deemed unnecessary for a floating point number that already holds such precision. Step 5: Set the Sign The sign it set by passing the two sign bits through an XOR gate to produce the appropriate value.
April 5, 2010
25
Microprocessor Design
April 5, 2010
4.3.1 Multiplier
There are several various algorithms for multiplication, but the rolled out binary multiplier was used, as like the addition/subtraction modules it was the most relatable and clear to understand and explain. A simple binary adder performs a simple shift and summation for the entire length of the multiplicand. This can be implemented within a loop to conserve space within the chip design. This produces a synchronous circuit which therefore relies upon 24 clock edges until it is completed. The rolled out version was used to make the same basic algorithm but instead of the synchronous loop, each stage was laid out to produce the accurate multiplication in much less than 24 clock edges. This format allows for easier implementation of pipelining circuitry as to support multiple function calls simultaneously.
26
Microprocessor Design
April 5, 2010
Step 1: Check the multiplier bit [n] Step 2: If the multiplier bit [n] holds a value of 1 then the product is summed with the multiplicand and placed within the product register Step 3: Shift the multiplicand left by 1 bit Step 4: Shift the multiplier right by 1 bit Step 5: Check if the loop has stepped through each multiplier bit, if not then step to the next bit (n+1) and repeat.
27
Microprocessor Design
April 5, 2010
4.3.2 Division
The division algorithm is identical to the multiplication algorithm, and can be implemented in a very similar manner. This division algorithm is different from the multiplication algorithm implemented because it was kept in the iterative loop.
28
April 5, 2010
If the remainder is greater than zero the quotient is shifted by 1-bit, and
the new LSB is set to a value of one. Step 2b: If the remainder is less than zero the quotient is shifted by 1-bit, the new
LSB is set to a value of zero, and the remainder is restored. Step 5: Check if the loop has stepped through each remainder bit, if not then
29
Microprocessor Design
April 5, 2010
The loop was maintained because as the multiplication algorithm was already built, the looped divider would provide an appropriate comparison during simulations, and timing analysis.
30
Microprocessor Design
April 5, 2010
It does this by placing the fraction into a shift register that is twice as large as an integer register (2x 32-bit), as to maximize the size of the integers that can be produced. As to order to produce an integer the exponent must be zero; therefore large register is then shifted left or right according to the value of the exponent to set the exponent to zero. If the exponent is too large for the shift register to manipulate then the register is shifted to the far right or the far left and the exponent is adjusted accordingly.
31
Microprocessor Design
April 5, 2010
The numerator and denominator are formed by stepping through the bottom segment (32-bits) of the shift register, while counting the value of bits. As the bits are counted they follow the equation =
0
1 2
Equation 1
, () = = + ,
= 0 = 1
Equation 2
The Integer to Float Module accepts the inputs in signed binary integer format, and normalizes the integer, which provides it with an exponent value of its own. The importance of normalization was previously discussed in Section 4.0
32
Microprocessor Design
April 5, 2010
The first issue was addressed by changing the Power Module into a Power Approximation Module. The Power Approximation Module uses the IEEE-754 binary representation of a 32-bit floating point number in its estimation of LOG2(X).
Microprocessor Design
April 5, 2010
However, a problem occurs when the logarithmic value is further manipulated, the precision becomes greatly lost in comparison to its actual value.
X=5 Y = Log2(X) = Z = 2*Y = 2^Z = Real 5 2.3219 4.6439 25 Xinteger = Xinteger Y = 127 = 223 Z = 2*Y = Z + 127 = 223 XFloat= Estimate 1084227584 2.25 4.5 1103101952 16 Lossy Estimate 1084227584 2 4 1065353220 1
34
Microprocessor Design
Table 2: Log Estimate Error
April 5, 2010
This issue can be resolved by shifting the value of Xinteger to the left a few binary points before passing it through the logarithmic estimate function. In this implementation of the algorithm the Xinteger was shifted by two places and the results can be seen in the table below.
2.3219 4.6439 25
Real
Y=(
223
Z+127100 223
The accuracy of the estimate of the power module has greatly increased from the implementation. This can be further improved by shifting the initial Xinteger by several more binary places. The second issue was resolved by utilizing the Float to Integer Converter Module. This module converts the binary real exponent into a more manipulative integer format. + 10
Equation 4
35
Microprocessor Design
April 5, 2010
With the logarithmic estimate provided, the manipulation into a power module becomes as simple as multiplication and division of an integer. Example:
= log 2 [ ] =
Equation 5
= log 2 [ ] = 2 + 2
log 2 [ ]
Equation 6
Equation 7
Equation 8
These calculations are within the block diagram in Figure 17 which shows the flow of the individual steps to produce the power approximation module.
36
Microprocessor Design
April 5, 2010
The power block is incapable of handling exponents outside the range of +4.2950e+009 to - 4.2950e+009 as these numbers are too large for the algorithm to properly operate. 37
Microprocessor Design
April 5, 2010
4.7 Square-Root
There are several different iterative methods (i.e. Newtons Method) for developing the square-root estimate of a binary real number. The issue was once again, that the methods take several iterations. For this reason, the Square-Root Module utilizes the same method of logarithmic approximation as the Power Module. This is much faster than the Power Module, as it does not rely upon the Float to Integer Converter. It simply follows the formula: _ = 1 ( log 2 ) 2
= 2 _
Equation 9
Equation 10
38
Microprocessor Design
April 5, 2010
Microprocessor Design for special cases. Although IEEE-754 floating point representation was designed to handle certain special cases, it was deemed better to be on the side of caution.
April 5, 2010
The various exceptions the control unit is designed to catch are cases when the inputs or outputs would be clearly: Zeros, NaNs or INFs.
40
Microprocessor Design
April 5, 2010
For example: Input x Zero = Zero Input + Inf = INF Input/Zero = NaN
After the control unit checks for special cases it then calls the individual modules in event that the predetermined opcode is received.
41
Microprocessor Design
April 5, 2010
Chapter 5
5.0 Digital Testing
After the complete coprocessor was designed, the overall digital testing began. There were two types of digital design testing that was done on the design. These tests were regarding structural analysis, as well as timing analysis.
The first case test shown in the table below is a standard test case, which is comfortably within the operational range of the floating point units parameters. This test case shows that the floating point unit is operating properly under reasonable conditions.
42
Microprocessor Design Standard case: (Input1 = 5, Input2 = 0.75) Real Value A 5 B 0.75 Add 5.75 Sub 4.25 Mul 3.75 Div 6.6667 Pow 3.3437 SQRT 2.2361 Floating Point Value 0_10000001_01000000000000000000000 0_01111110_10000000000000000000000 0_10000001_01110000000000000000000 0_10000001_00010000000000000000000 0_10000000_11100000000000000000000 0_10000001_10101010101010101010100 0_10000000_10101110000101000111101 0_10000000_00011110101110000101000
Table 4: Standard Test Case
April 5, 2010
More specific cases were also used to test the corners of the design. A few results of the specific cases that were used are shown in the table below:
Real Value FPU Value Real Value FPU Value Real Value FPU Value Real Value FPU Value
A B ADD SUB MUL DIV POWER 5 5 10 0 25 1 3125 - 10 0 25 1 2560 5 -5 0 10 -25 -1 3.1605e-018 - 0 10 -25 -1 NaN 5 0 5 5 0 NaN 1 - 5 5 0 NaN 1 5 Inf Inf -Inf Inf 0 Inf - Inf -Inf Inf 0 Inf
Table 5: Special Test Cases
Several more extra cases were tested with the results posted within Appendix B. These cases test the corners of the design, which range from the smallest numbers the FPU should be able to handle all the way to the largest.
43
Microprocessor Design
April 5, 2010
Type Worst-case tsu Worst-case tco Worst-case tpd Worst-case th Worst-case Minimum tco Worst-case Minimum tpd Fast Model Clock Setup: 'clk'
Time 4.702 ns 11.824 ns 10.560 ns 4.808 ns 4.231 ns 4.286 ns 4.88 MHz ( period = 204.804 ns )
The maximum operation frequency of the Fast Timing Model is a slow 4.88MHz, while in the Slow Timing Model the maximum operating frequency is an even slower 2.21MHz. Table 6 clearly shows that the Float to Integer Module used within the Power module is by far the slowest module, and it greatly affects the highest operating frequency of the device.
44
Microprocessor Design
Type Worst-case tsu Worst-case tco Worst-case tpd Worst-case th Slow Model Clock Setup: 'clk' Time 9.168 ns 24.432 ns 20.806 ns 9.778 ns 2.21 MHz ( period = 453.352 ns ) From opcode[0] mulB[24] B[31] A[0] Power:power| float2int:float2pow| numerator[18]
April 5, 2010
To subB[30] valueout[30] valueout[29] mulA[0] Power:power| normFr[0]
The Slow Model Analysis was done without the power modules float to integer converter, and produced a maximum frequency of 88.75MHz, with a Fast Timing Analysis fmax of 199.80MHz. The slowest clock setup time was due to the Subtractor Module needing to switch to a twos compliment before it passes through the binary adder. Although the removal of the twos compliment would make the subtractor into another floating point adder, curiosity took over, and resulted in impressive improvements in speed. The Slow model Analyzer produced a fmax value of 144.7MHz, while the fast model analyzer produced more than double that speed with an fmax characterized at 320.82 MHz.
5.3 Implementation
The coprocessor was implemented onto the ALTERA DE2 development board as shown in Figure 20. Due to the lack of inputs provided by the board it was unreasonable to create a complex form of setting the input values for the device for live-testing. Instead a set of preset inputs were assigned for purpose of presentation.
45
Microprocessor Design
April 5, 2010
The various switches determined the opcode, and set the operation to be performed by the device. The push buttons were set as the reset input for the device, for when a new opcode was to be inputted into the board. The outputs were displayed on both the small LCD as well as the on the 18 LEDs located above the switches. Due to the size of the LCD display, which did not allow the floating point unit to display large real numbers, the 18 LEDs displayed the output in floating point binary format. The eight green LEDs clearly showing the exponential value, while the rest displayed a truncated version of the mantissa. 46
Microprocessor Design
April 5, 2010
Chapter 6
6.0 Concluding Remarks
This concluding chapter allows for a brief review of the project, and to emphasize on a few key points that developed during the course of the year.
The addition and subtraction modules utilized the fastest basic binary addition algorithm. The multiplication module is optimized for the ability to be pipelined, while the divider utilized a slow looping algorithm. The Power module used IEEE logarithmic estimation to improve performance, but was slowed down considerably by the Float to Integer Converter that it required to fully operate.
The digital design was put under test, and analyzed to optimized performance characteristics. There were a few small bugs here and there, but the floating point unit successfully passed the rigorous digital device testing, although perceptively slow to the commercial versions of the FPU, which operate at speeds around 250MHz. 47
Microprocessor Design
April 5, 2010
The multiplier is ready to be pipelined, and several tests are required to see how well the coprocessor would combine with the regular 32-bit RISC microprocessor.
48
Microprocessor Design
April 5, 2010
References
[1] Carleton University, Laboratory Health and Safety Manual, [Online]. Available at: http://www.doe.carleton.ca/undergrads/health-and-safety.pdf [Accessed: March 28 2010]. [2] D. A. Patterson and J. L. Hennessey, Computer Organization and Design, 3rd Ed. San Francisco: Morgan Kaufmann Publishers. [3] Carleton University, Microprocessor Systems, ELEC 4601. [Online]. Available at: http://www.doe.carleton.ca/~shams/ELEC4601/Course_Notes.pdf [Accessed: Oct 17 2009]. [4] Carleton University, Digital Design Flow, ELEC 4706. [Online]. Available at: http://www.doe.carleton.ca/courses/ELEC4706/protected/class%20material/08-0910%20LECTURES [Accessed: Oct 13 2009]. [5] Carleton University, Binary Manipulation, SYSC 3006. [Online]. Available at: http://www.sce.carleton.ca/courses/sysc-3006/f09/Part3-BinaryManipulations.pdf [Accessed: Oct 12 2009]. [6] ASIC WORLD, Verilog Tutorials, Deepak Kumar Tala [Online]. Available at: http://www.asic-world.com/verilog/veritut.html [Accessed: Sept 25 2009]. [7] Goldberg, David. 1991. What Every Computer Scientist Should Know About Floating-Point Arithmetic.[Online]. Available at: http://delivery.acm.org/10.1145/110000/103163/p5-goldberg.pdf [Accessed: Oct 5 2009].
49
Microprocessor Design
April 5, 2010
50
Microprocessor Design
shift<=fractionA; noshift<=fractionB; expLarge<=expA; diffreg<=8'b0; sign<=signA; snorm<=1'b1;
April 5, 2010
end
end else begin shift<=shift; noshift<=noshift; expLarge<=expLarge; diffreg<=diffreg; sign<=sign; snorm<=1'b0; end
end else if(expA>expB)begin shift<=fractionB; noshift<=fractionA; expLarge<=expA; diffreg<=expA-expB; sign<=signA; snorm<=1'b1; end else if(expB>expA)begin shift<=fractionA; noshift<=fractionB; expLarge<=expB; diffreg<=expB-expA; sign<=signB; snorm<=1'b1; end else begin shift<=shift; noshift<=noshift; expLarge<=expLarge; diffreg<=diffreg; sign<=sign; snorm<=1'b0; end
51
Microprocessor Design
normalizer addnorm(expLarge,addoutput,expNorm,normLarge,clk,rst,expoverflow,snorm,fnorm); // check for overflow? assign overflow=expoverflow; // output exponent assign OUT[30:23]=expNorm;//expNorm; // output truncated mantissa assign OUT[22:0]=normLarge[22:0]; // output sign assign OUT[31]=sign; assign finish=fnorm; endmodule
April 5, 2010
52
Microprocessor Design
April 5, 2010
Subtraction Module
module subtractor (A,B,OUT,clk,rst,overflow,start,finish); input clk,rst,start; input [31:0] A,B; output [31:0] OUT; output overflow,finish; reg[7:0] expLarge,diffreg; reg [23:0] shift,noshift,out; reg sign,snorm; wire signA,signB,expoverflow,fnorm; wire[24:0] suboutput,normLarge; wire [7:0] expA,expB,diff,expNorm; wire [23:0] fractionA,fractionB,normout,shiftout,shiftout1; assign fractionA[22:0]=A[22:0]; assign fractionA[23]=1; assign fractionB[22:0]=B[22:0]; assign fractionB[23]=1; assign expA=A[30:23]; assign expB=B[30:23]; assign diff=diffreg; assign signA=A[31]; assign signB=B[31]; //1.0 ALU Difference and shift SHIFTR8 SHIFT8sub(shift[23:0],shiftout[23:0],diff); always@(posedge clk or posedge rst) begin if(rst) begin shift<=24'b0; noshift<=24'b0; expLarge<=8'b0; diffreg<=8'b0; sign<=1'b0; snorm<=1'b0; end else if(start)begin if(expA==expB)begin shift<=fractionA; noshift<=fractionB; expLarge<=expA; diffreg<=8'b0;
53
Microprocessor Design
sign<=1b0; snorm<=1'b1;
April 5, 2010
end
end else begin shift<=shift; noshift<=noshift; expLarge<=expLarge; diffreg<=diffreg; sign<=sign; snorm<=1'b0; end
end else if(expA>expB)begin shift<=fractionB; noshift<=fractionA; expLarge<=expA; diffreg<=expA-expB; sign<=1b0; snorm<=1'b1; end else if(expB>expA)begin shift<=fractionA; noshift<=fractionB; expLarge<=expB; diffreg<=expB-expA; sign<=1b1; snorm<=1'b1; end else begin shift<=shift; noshift<=noshift; expLarge<=expLarge; diffreg<=diffreg; sign<=sign; snorm<=1'b0; end
//2.0 Add Significands // this is the slowest part by 100MHz i blame the INV wire [23:0] negtemp; assign negtemp[23:20]=~shiftout[23:20]+1'b1; assign negtemp[19:15]=~shiftout[19:15]+1'b1; assign negtemp[14:12]=~shiftout[14:12]+1'b1; assign negtemp[11:8]=~shiftout[11:8]+1'b1;
54
Microprocessor Design
assign negtemp[7:4]=~shiftout[7:4]+1'b1; assign negtemp[3:0]=~shiftout[3:0]+1'b1; bitadder sub(noshift,negtemp,1'b0,suboutput); //bitadder sub(noshift,(~shiftout+1'b1),1'b0,suboutput); // Normalize normalizer addnorm(expLarge,suboutput,expNorm,normLarge,clk,rst,expoverflow,snorm,fnorm); // check for overflow? assign overflow=expoverflow; // output exponent assign OUT[30:23]=expNorm;//expNorm; // output truncated mantissa assign OUT[22:0]=normLarge[22:0]; // output sign assign OUT[31]=sign; assign finish=fnorm; endmodule
April 5, 2010
55
Microprocessor Design
module normalizer(expin,in,expout,out,clk,rst,overflow,start,finish); input clk,rst,start; input [7:0]expin; input [24:0]in; output [23:0]out; output [7:0] expout; output finish,overflow; reg active,first; reg [24:0] regF,fregF; reg [8:0] regE,fregE; always@(posedge clk or posedge rst)begin if(rst)begin regF<=25'b0; regE[7:0]<=8'b0; fregF<=25'b0; fregE<=9'b0; active<=1'b0; first<=1'b0; end else if(start)begin if(!first)begin fregF<=fregF; fregE<=fregE; regF<=in[24:0]; regE[7:0]<=expin[7:0]; active<=1'b1; first<=1'b1; end else if(regF[24]==1'b1)begin regF<=regF>>1'b1; regE<=regE+1'b1; // Increment Exponent active<=1'b1; first<=1'b1; end else if(regF[23]==1'b0 && regF[24]==1'b0)begin //shift left regF<=regF<<1'b1; regE<=regE-1'b1; // Decrement Exponent active<=1'b1; first<=1'b1; end else begin regE<=regE;
56
Microprocessor Design
regF<=regF; fregE<=regE; fregF<=regF; active<=1'b0; first<=1'b1;
April 5, 2010
end assign out=fregF[23:0]; assign expout=fregE[7:0]; assign overflow=fregF[8]; assign finish=~active; endmodule
end end else begin regE<=regE; regF<=regF; fregE<=fregE; fregF<=fregF; active<=1'b0; first<=1'b0; end
57
Microprocessor Design
module bitadder(addinA,addinB,carryin,sum); input[23:0] addinA,addinB; input carryin; output [24:0]sum; wire carryout1,carryout2,carryout3,carryout4,carryout5,carryout6; wire [3:0] sum1,sum2,sum3,sum4,sum5,sum6; fourbitadder adder1(addinA[3:0],addinB[3:0],carryin,sum1,carryout1); fourbitadder adder2(addinA[7:4],addinB[7:4],carryout1,sum2,carryout2); fourbitadder adder3(addinA[11:8],addinB[11:8],carryout2,sum3,carryout3); fourbitadder adder4(addinA[15:12],addinB[15:12],carryout3,sum4,carryout4); fourbitadder adder5(addinA[19:16],addinB[19:16],carryout4,sum5,carryout5); fourbitadder adder6(addinA[23:20],addinB[23:20],carryout5,sum6,carryout6); assign sum[24] = carryout6; assign sum[23:20] = sum6; assign sum[19:16] = sum5; assign sum[15:12] = sum4; assign sum[11:8] = sum3; assign sum[7:4] = sum2; assign sum[3:0] = sum1; assign test=addinA+addinB; endmodule
58
Microprocessor Design
module fourbitadder(addinA,addinB,carryin,sum,carryout); input[3:0] addinA,addinB; input carryin; output [3:0]sum; output carryout; wire[3:0] generation,propagation; wire [2:0] carrybit; assign sum[0] = propagation[0]^carryin; assign generation = addinA&addinB; assign propagation = addinA^addinB; assign carrybit[0] = generation[0]|(propagation[0]&carryin); assign carrybit[1] = generation[1]|(generation[0]&propagation[1])|(propagation[0]&propagation[1]&carryin); assign carrybit[2] = generation[2]|(generation[1]&propagation[2])|(generation[0]&propagation[1]&propagation[2]) |(propagation[0]&propagation[1]&propagation[2]&carryin); assign sum[3:1] = propagation[3:1]^carrybit[2:0]; endmodule
59
Microprocessor Design
April 5, 2010
Multiplication Module
module floatmul(A,B,OUT,clk,rst,overflow,start,finish); input clk,rst,start; input [31:0] A,B; output [31:0] OUT; output finish,overflow; reg active; reg [47:0] Mplier,Mcand,product,d,e; reg [7:0]counter; wire [23:0] fractionA,fractionB; wire [7:0] expA,expB; wire [8:0] expsum; assign expA=A[30:23]-127; assign expB=B[30:23]-127; assign fractionA={1'b1,A[22:0]}; assign fractionB={1'b1,B[22:0]}; // adding exponents without bias assign expsum = ((A[30:23]-127)+(B[30:23]-127))+127; // check for overflow assign overflow = expsum[8]; // multiplying significands always@(posedge clk)begin if(rst)begin d=0; e=0; active=1'b0; end else if(start) begin active=1'b1; d={({32{fractionA[1]}}&fractionB)&({32{fractionA[0]}}&fractionB),({32{fractionA[1]}}&fra ctionB)^({32{fractionA[0]}}&fractionB)}; e[0]=d[0]; d={({32{fractionA[2]}}&fractionB)&d[32:1],({32{fractionA[2]}}&fractionB)^d[32:1]}; e[1]=d[0]; d={({32{fractionA[3]}}&fractionB)&d[32:1],({32{fractionA[3]}}&fractionB)^d[32:1]}; e[2]=d[0]; d={({32{fractionA[4]}}&fractionB)&d[32:1],({32{fractionA[4]}}&fractionB)^d[32:1]}; 60
Microprocessor Design
April 5, 2010
e[3]=d[0]; d={({32{fractionA[5]}}&fractionB)&d[32:1],({32{fractionA[5]}}&fractionB)^d[32:1]}; e[4]=d[0]; d={({32{fractionA[6]}}&fractionB)&d[32:1],({32{fractionA[6]}}&fractionB)^d[32:1]}; e[5]=d[0]; d={({32{fractionA[7]}}&fractionB)&d[32:1],({32{fractionA[7]}}&fractionB)^d[32:1]}; e[6]=d[0]; d={({32{fractionA[8]}}&fractionB)&d[32:1],({32{fractionA[8]}}&fractionB)^d[32:1]}; e[7]=d[0]; d={({32{fractionA[9]}}&fractionB)&d[32:1],({32{fractionA[9]}}&fractionB)^d[32:1]}; e[8]=d[0]; d={({32{fractionA[10]}}&fractionB)&d[32:1],({32{fractionA[10]}}&fractionB)^d[32:1]}; e[9]=d[0]; //-----------10----------d={({32{fractionA[11]}}&fractionB)&d[32:1],({32{fractionA[11]}}&fractionB)^d[32:1]}; e[10]=d[0]; d={({32{fractionA[12]}}&fractionB)&d[32:1],({32{fractionA[12]}}&fractionB)^d[32:1]}; e[11]=d[0]; d={({32{fractionA[13]}}&fractionB)&d[32:1],({32{fractionA[13]}}&fractionB)^d[32:1]}; e[12]=d[0]; d={({32{fractionA[14]}}&fractionB)&d[32:1],({32{fractionA[14]}}&fractionB)^d[32:1]}; e[13]=d[0]; d={({32{fractionA[15]}}&fractionB)&d[32:1],({32{fractionA[15]}}&fractionB)^d[32:1]}; e[14]=d[0]; d={({32{fractionA[16]}}&fractionB)&d[32:1],({32{fractionA[16]}}&fractionB)^d[32:1]}; e[15]=d[0]; d={({32{fractionA[17]}}&fractionB)&d[32:1],({32{fractionA[17]}}&fractionB)^d[32:1]}; e[16]=d[0]; d={({32{fractionA[18]}}&fractionB)&d[32:1],({32{fractionA[18]}}&fractionB)^d[32:1]}; e[17]=d[0]; d={({32{fractionA[19]}}&fractionB)&d[32:1],({32{fractionA[19]}}&fractionB)^d[32:1]}; e[18]=d[0]; //---------20----------d={({32{fractionA[20]}}&fractionB)&d[32:1],({32{fractionA[20]}}&fractionB)^d[32:1]}; e[19]=d[0]; d={({32{fractionA[21]}}&fractionB)&d[32:1],({32{fractionA[21]}}&fractionB)^d[32:1]}; e[20]=d[0]; d={({32{fractionA[22]}}&fractionB)&d[32:1],({32{fractionA[22]}}&fractionB)^d[32:1]}; e[21]=d[0]; d={({32{fractionA[23]}}&fractionB)&d[32:1],({32{fractionA[23]}}&fractionB)^d[32:1]}; e[22]=d[0]; //---again!!! for N+1 iterations or good luck 61
Microprocessor Design
April 5, 2010
d={({32{fractionA[23]}}&fractionB)&d[32:1],({32{fractionA[23]}}&fractionB)^d[32:1]}; e[22]=d[0]; //--------e[47:23]=d; active=1'b0; end else begin d=0; e=0; active=1'b0; end end // truncation // output the mantissa assign OUT[22:0]=e[45:22];//e[45:23];//46:22 // output exponent assign OUT[30:23]=expsum[7:0]; // set the sign // xor the signs together assign OUT[31]={A[31] ^ B[31]}; assign finish=~active; endmodule
62
Microprocessor Design
April 5, 2010
Division Module
module floatdiv(A,B,OUT,clk,rst,overflow,start,finish);//floatdiv input clk,rst,start; input[31:0] A,B; output[31:0] OUT; output overflow,finish; wire [7:0] expA,expB; wire [8:0] expsub; assign expA=A[30:23]-127; assign expB=B[30:23]-127; reg active; reg [46:0] remainder,divisorreg;//46:0 reg [23:0] quotientreg,outreg; reg [7:0] counter; //adding exponents without bias assign expsub =((A[30:23]-127)-(B[30:23]-127))+127; // check for overflow assign overflow = expsub[8]; //the divider starts here always@(posedge clk or posedge rst) begin if(rst)begin remainder<={22'b0,1'b1,A[22:0]}; quotientreg<=24'b0; divisorreg<={1'b1,B[22:0],23'b0}; counter<=7'b0; active<='b0; outreg<=24'b0; end else if(start)begin if(counter<25)begin//25 remainder<=remainder-divisorreg; if(remainder[46])begin // shift quotient to the left quotientreg<={quotientreg[22:0],1'b0}; end else begin// restore if less than zero remainder<=remainder+divisorreg; // shift quotient to the left quotientreg<={quotientreg[22:0],1'b1}; end // shift divisor to the right divisorreg<={1'b0,divisorreg[46:1]}; counter<=counter+1'b1;
63
Microprocessor Design
active<=1'b1; outreg<=outreg;
April 5, 2010
assign OUT[30:23]=expsub[7:0]; assign finish=~active; assign OUT[22:0]=outreg[22:0]; // set the sign // xor the signs together assign OUT[31]={A[31] ^ B[31]}; endmodule
end
end else begin quotientreg<=quotientreg; outreg<=outreg; divisorreg<=divisorreg; remainder<=remainder; counter<=counter; active<=1'b0; end
end else begin quotientreg<=quotientreg; divisorreg<=divisorreg; remainder<=remainder; counter<=counter; active<=1'b0; outreg<=quotientreg; end
64
Microprocessor Design
April 5, 2010
//the positive/negative
//shift A into integer and fraction always@(posedge rst or posedge clk)begin if (rst) begin fractionshift<=64'b0; intexp<=8'b0; end else if(expIN<= 159 && expIN>= 95)begin if(expIN<127)begin fractionshift<=fraction>>(-diff); intexp<=expIN+(-diff); end else begin fractionshift<=fraction<<diff; intexp<=expIN-diff; end
65
Microprocessor Design
April 5, 2010
end else if(expIN>159)begin // for a large integer fractionshift<=fraction<<31; intexp<=expIN-5'b11111; // decrement exponent end else if(expIN<95)begin // for a small fraction fractionshift<=fraction>>31; intexp<=expIN+5'b11111; // increment exponent end else begin fractionshift<=fraction; intexp<=intexp; end end // find the numerator and denominator integers of the floating point // by adding the fractions 1/2+1/4+1/8..etc = 0.875=7/8 always@(posedge clk or posedge rst)begin if(rst)begin counter<=32;//0 bincount<=1; numerator<=1'b0; denominator<=1'b1; active<=1'b0; end else if(start)begin if(counter>0)begin counter<=counter-1'b1; bincount<=bincount*2'b10; active<=1'b1; if(fractionIN[counter])begin //cross multiplying denominator<=bincount*denominator; numerator<=bincount*numerator+denominator; end else begin numerator<=numerator; denominator<=denominator; end end else begin counter<=counter; bincount<=bincount; numerator<=numerator; denominator<=denominator;
66
Microprocessor Design
active<=1'b0;
April 5, 2010
end
67
Microprocessor Design
April 5, 2010
68
Microprocessor Design
else begin shiftreg<=shiftreg; shiftexp<=shiftexp; fshiftreg<=shiftreg; fshiftexp<=shiftexp; active<=1'b0; first<=1'b0; sign<=sign; end shiftreg<=shiftreg; shiftexp<=shiftexp; fshiftreg<=fshiftreg; fshiftexp<=fshiftexp; active<=active; first<=first; sign<=sign;
April 5, 2010
end
69
Microprocessor Design
April 5, 2010
module Power(A,B,OUT,clk,rst,start,finish); input clk,rst,start; input [31:0] A,B; output finish; output [31:0] OUT; wire [63:0] log,mullog,mullog2; wire [31:0] integerOUT,numerator,denominator,OUTmul; wire [7:0] expPow; wire ffloat; reg [63:0] normInt,normFr; reg [31:0] check,checkout; reg active; float2int float2pow(B,clk,rst,integerOUT,numerator,denominator,sign,expPow,start,ffloat);
Power Module
assign log=(A*100)/8388608-127*100; // convert to log A/(2^(23))-127; assign mullog=(numerator*log/denominator); //+log apply to the power of B assign mullog2=log*integerOUT; //seperately include the integer //check for invalids always@(posedge clk or posedge rst)begin if(rst)begin normInt<=63'b0; normFr<=63'b0; active<=1'b1; end else if(ffloat)begin if(A<=10'd1065353216)begin // if negative or zero normInt<=32'b1111111100000000000000000000000;// NaN normFr<=32'b0; active<=1'b0; end else if(numerator==0)begin normFr<=32'b0; normInt<=((mullog2+127*100)*8388608)/100; active<=1'b0; end else if(integerOUT==0)begin normInt<=32'b0; normFr<=((mullog+127*100)*8388608)/100; active<=1'b0; end
70
Microprocessor Design
else begin //because it can't do log2(0) normInt<=((mullog2+127*100)*8388608)/100; // convert from log (A+127)*(2^(23)); normFr<=((mullog+127*100)*8388608)/100; active<=1'b0; end end else begin normInt<=normInt; normFr<=normInt; active<=1'b1; end end always@(posedge clk or posedge rst)begin if(rst)begin check<=32'b0; checkout<=32'b0; end else if(!active)begin if(B[31]==1'b1)begin check<=normFr+normInt; checkout<={check[31],(~check[30:23]+1'b1),check[22:0]}; end else begin check<=check; checkout<=normFr+normInt; end end else begin check<=check; checkout<=checkout; end end assign OUT=checkout; assign finish=~active; endmodule
April 5, 2010
71
Microprocessor Design
April 5, 2010
72
Microprocessor Design
April 5, 2010
Control Module
module Control(opcode,A,B,clk,rst,valueout); input clk,rst; input[2:0] opcode; input[31:0] A,B; output[31:0] valueout; reg [31:0]OUT; reg [31:0] addA,addB,subA,subB,divA,divB,mulA,mulB,powA,powB,sqrtA; reg sdiv,spow,sadd,ssub,smul,ssqrt,finish; wire [31:0] addOUT,subOUT,OUTdiv,OUTmul,OUTpow,root; // declare constants wire[31:0] Inf,NaN,Zero,One; wire /*fpow,fdiv,fadd,fsub,fmul,fsqrt,*/addof,subof,mulof,divof; assign Inf=32'b1111111100000000000000000000000; assign NaN=32'b1111111110000000000000000000000; assign One=32'b0011111110000000000000000000000; assign Zero=32'b0000000000000000000000000000000; adder addition(addA,addB,addOUT,clk,rst,addof,sadd,fadd); //A+B subtractor subtraction(subA,subB,subOUT,clk,rst,subof,ssub,fsub); //A-B floatmul floatmulA(mulA,mulB,OUTmul,clk,rst,mulof,smul,fmul);// A*B floatdiv floatdivA(divA,divB,OUTdiv,clk,rst,divof,sdiv,fdiv);// A/B Power power(powA,powB,OUTpow,clk,rst,spow,fpow);//A^B SQRT squareroot(sqrtA,root,clk,rst); // check for Zeros NaN & INFs inputs // check for Special Case Statements always@(posedge clk or posedge opcode)begin // opcode case statements case(opcode) 0: begin sdiv<=1'b0;spow<=1'b0;sadd<=1'b0;ssub<=1'b0;smul<=1'b0;ssqrt<=1'b0; addA<=1'b0;addB<=1'b0;subA<=1'b0;subB<=1'b0;divA<=1'b0;divB<=1'b0;mulA<=1'b0;m ulB<=1'b0;powA<=1'b0;powB<=1'b0;sqrtA<=1'b0; OUT<=NaN; end //For the Adder =============================================================== 1: begin if(A[30:0]==Zero[30:0]) OUT<=B; else if(B[30:0]==Zero[30:0]) OUT<=A;
73
Microprocessor Design
else if(A==Inf || B==Inf) OUT<=Inf[30:0]; else if(A[30:0]==B[30:0] && A[31]!=B[31]) //A+(-A) or (-A)+A OUT<=Zero; else if(A[31]==1'b1 && B[31]==1'b0)begin //-A+B = B-A subB<={1'b0,A[30:0]}; subA<=B; OUT<=subOUT; end else if (A[31]==1'b0 && B[31]==1'b1)begin //A+-B = A-B subB<={1'b0,B[30:0]}; subA<=A; ssub<=1'b1; OUT<=subOUT; end else if (A[31]==1'b0 && B[31]==1'b1)begin //-A + -B = -(A+B) addB<={1'b0,B[30:0]}; addA<={1'b0,A[30:0]}; OUT<={1'b1,addOUT[30:0]}; sadd<=1'b1; end else begin addA<=A; addB<=B; sadd<=1'b1; OUT<=addOUT; end
April 5, 2010
end //For the Subtractor ============================================================= 2: begin if(A==B) OUT<=Zero;// just make it zero else if(A[30:0]==Zero[30:0]) OUT<={~B[31],B[30:0]}; else if(B[30:0]==Zero[30:0]) OUT<=A; else if(A[31]==1'b1 && B[31]==1'b0)begin //-A - B = -(B+A) addA<={1'b0,A[30:0]}; addB<=B; sadd<=1'b1; OUT<={1'b1,addOUT[30:0]}; end else if (A[31]==1'b0 && B[31]==1'b1)begin //A - -B = A+B addB<={1'b0,B[30:0]}; addA<=A; sadd<=1'b1;
74
Microprocessor Design
OUT<={1'b0,addOUT[30:0]}; end else if (A[31]==1'b1 && B[31]==1'b1)begin //- A - -B = B-A subA<={1'b0,B[30:0]}; subB<={1'b0,A[30:0]}; ssub<=1'b1; OUT<=subOUT; end else begin subA<=A; subB<=B; ssub<=1'b1; OUT<=subOUT; end
April 5, 2010
end // For the Mulitplier ============================================================= 3: begin if (A[30:0]==Zero[30:0]|| B[30:0]==Zero[30:0]) //if(A*Zero) OUT<={{A[31]^B[31]},Inf[30:0]}; // Inf else if(A[30:0]==Inf[30:0]||B[30:0]==Inf[30:0]) //if(A*Inf) OUT<=Zero; // Zero else begin mulA<=A; mulB<=B; smul<=1'b1; OUT<=OUTmul; end end // For the Divider ============================================================ 4: begin // varieties of Zero or NaN or Inf if(A[30:0]==Zero[30:0]) OUT<=Zero; // Zero else if(B[30:0]==Zero[30:0]) OUT<={{A[31]^B[31]},Inf[30:0]}; // Inf else if(B[30:0]==Inf[30:0]) //if(A/Inf) OUT<=Zero; // Zero else if(A[30:0]==B[30:0]) // 1 OUT[31:0]<={{A[31]^B[31]},One[30:0]};//One else begin divA<=A; divB<=B; sdiv=1'b1; OUT<=OUTdiv;
75
Microprocessor Design
April 5, 2010
end end // For the Power ============================================================== 5: begin if(A[31]) OUT<=NaN; else if (A[30:0]==Zero[30:0]) // +/- Zero OUT<=Zero; if(B[30:0]==Zero[30:0]) OUT<=One; else begin powA<=A; powB<=B; spow<=1'b1; if(fpow) OUT<=OUTpow; end end // For the SquareRoot ============================================================ 6: begin if(A[31]) OUT<=NaN; else if (A[30:0]==Zero[30:0]) // +/- Zero OUT<=Zero; else begin sqrtA<=A; OUT<=root; end end // Default Case =============================================================== default: begin sdiv<=1'b0;spow<=1'b0;sadd<=1'b0;ssub<=1'b0;smul<=1'b0;ssqrt<=1'b0; addA<=1'b0;addB<=1'b0;subA<=1'b0;subB<=1'b0;divA<=1'b0;divB<=1'b0;mulA<=1'b0;m ulB<=1'b0;powA<=1'b0;powB<=1'b0;sqrtA<=1'b0; OUT<=NaN; end endcase end // output the output value assign valueout=OUT; endmodule
76
Microprocessor Design
April 5, 2010
Subtraction
Multiplication
Division
Power
77
Microprocessor Design
Square-root
April 5, 2010
78
Microprocessor Design
April 5, 2010
* note: the corner cases are too large for the power unit algorithm to handle
79