Professional Documents
Culture Documents
Arvind, Joel Emer, Li-Shiuan Peh Abhinav Agarwal, Myron King Arvind, Joel Emer, Sang Woo Jun, Murali Vijayaraghavan Asif Khan, Elliott Fleming Prof Jihong Kim & students at Seoul Nation University Prof Derek Chiou, University of Texas at Austin
Volunteers
External
CS220, CO
Instruction Execution
Execution of an instruction involves 1. 2. 3. 4. 5. 6. Instruction fetch Decode Register fetch ALU operation Memory operation (optional) Write back
SMIPS ISA
CS220, CO
Implementing an ISA
Instructions fetch
Decode
requires an Instruction memory, PC requires understanding the instruction format requires interaction with a register file with a specific number of read/write ports must have the ability to carry out the specified ops requires a data memory requires interaction with the register file requires arithmetic ops to calculate pc and condition
CS220, CO 4
Register Fetch
ALU
Memory operations
Write-back
Update the PC
SMIPS ISA
A single-cycle implementation
PC
CPU
Register File
iMem
dMem
SMIPS ISA
CS220, CO
Register (n-bit)
Clk
D1 D1
D2 D2
D3
D4 D3
D5 Undef
clk
en
SMIPS ISA
CS220, CO
wr0 rd0 D Q1
Register (n-bit)
wr1 rd1
SMIPS ISA
CS220, CO
Register File
R0 R1 D Q
R2
R3
WA
SMIPS ISA
RA
CS220, CO 8
Multiple Buses
RD RA RB D Register (n-bit) Q0 WE Clk
A
wr0 RdA0 RdB0 D Register (n-bit) Q1 D
Register File
CS220, CO
D Register File
SMIPS ISA
CS220, CO
10
Instruction formats
6
opcode
6
5
rs
5
5
rt
5
5
rd
5
shamt
16
6
func R-type I-type J-type
opcode
rs
rt
immediate
6
opcode
26
target
Only three formats but the fields are used differently by different types of instructions
SMIPS ISA
CS220, CO
11
Instruction formats
Computational Instructions
6
0
opcode
cont
5
rs
rs
5
rt
rt
5
rd
5
0
immediate
6
func rd (rs) func (rt)
rt (rs) op immediate
Load/Store Instructions
6
opcode
31 26 25
5
rs
21 20
5
rt
16 15
16
displacement
0
addressing mode
(rs) + displacement
rs is the base register rt is the destination of a Load or the source for a Store
SMIPS ISA
CS220, CO
12
Control Instructions
Conditional (on GPR) PC-relative branch
6 5 5 16
opcode
rs
offset
BEQZ, BNEZ
JR, JALR
target
J, JAL
target address = {PC<31:28>, target4} range : 256 MB range jump-&-link stores PC+4 into the link register (R31)
CS220, CO 13
SMIPS ISA
Examples of Instructions
6 000000 5 10001 5 10010 5 01000 5 000000 6 100000
op = 000000 => R type instruction func = 100000 = 32 = opcode for ADD rs = 17, rt = 18, rd = 8 add $8, $17, $18 add $t0, $s1, $s2
SMIPS ISA
Examples of Instructions
6 001000 5 11101 5 11101 16 0000 0000 0000 0100
op = 001000 => I type instruction 001000 = 8 = opcode for ADDI rs = 29, rt = 29, immediate = 4 add $29, $29, 4 add $sp, $sp, 4
SMIPS ISA
$sp = $sp + 4
CS220, CO 15
Examples of Instructions
6 010011 5 10011 5 01000 16 0000 0000 0010 0000
op = 001000 => I type instruction 001000 = 35 = opcode for LW rs = 19, rt = 8, immediate = 32 lw $8, 32($19) lw $t0, 32($s3)
SMIPS ISA
Examples of Instructions
6 000010 26
80000 (value)
op = 000010 => J type instruction 000010 = 2 = opcode for J Address = 80000 j 80000
SMIPS ISA
A single-cycle implementation
PC
CPU
fetch & execute
RF
iMem
dMem
A register file with 2 read ports and a write port An instruction memory, separate from data memory so that we can fetch an instruction as well as perform a data operation (Load/store) on the memory
CS220, CO 18
SMIPS ISA
Single-Cycle SMIPS
Register File
PC
+4
Decode
Execute
Inst Memory
Data Memory
Datapath is shown only for convenience; it will be derived automatically from the highlevel textual description
SMIPS ISA
CS220, CO
19
Fetched one at a time Function Code (aka opcode) Addresses of Source Registers Address of the destination register Write Enable.
CS220, CO 20
SMIPS ISA
Lets say all our instructions are fixed size (32 bit, or 4 byte wide). Thus the PC need to be incremented by four after each instruction fetch. If the address space is 32-bits, width of PC is 32 bits. In reality, the last two bits of PC can be always 0 (we need just 30 bit register)
SMIPS ISA
21
Instruction Fetch
At each instruction cycle,
size of the instruction. Instruction read is stored in the instruction register (IR) Current instruction in the IR is executed.
SMIPS ISA
CS220, CO
22
Instructions
Arithmetic & Logic: ADD, SUB, AND, OR, XOR Shift and rotate: SLL, SRL, SRA, ROL, ROR Comparison: SLT, SGT, SLE, SGE, UGT, ULT, ULE, UGE. ADDI, SUBI, ORI, ANDI, XORI, SLLI, SRLI, SRAI, SLTI, SGTI, SLEI, SGEI, ULTI, UGTI, ULEI, UGEI, LHI (To load half of a register). LDSB, LDSW, LDUB, LDUW, LDL, STB, STW, STL instructions
Load signed byte, word; unsigned byte, word; long; store byte, word, long
CS220, CO
SMIPS ISA
23
Instructions
BNEZ, BEQZ (Branch if rs <>0 or rs = 0) If (cond) then PC = PC + offset (Remember: PC is 30 bit register) JR, JALR (Jump based on register) Target: rs & 0xFFFFFFFC + 4*offset Offset is signed number. PC = Target >> 2 (shift right by 2) JALR also saves {PC,2b00} in R31
SMIPS ISA
24
Instructions
J type instructions
J, JAL (jump and link): Jump to new target Target = {PC[29:26], offset} PC = Target JAL instruction additionally saves {PC,2b00} in R31
SMIPS ISA
CS220, CO
25
ALU functions
ADD, SUB Left shift (LS), Arithmetic right shift (RSA), Logical right shift (RSL) by a count provided by the second operand (only five bits are used) (Bit wise) AND, OR, XOR Comparison:
Signed greater than (SGT), Signed less than (SLT), Signed less than or equal to (SLE), Signed greater than or equal to (SGE) Unsigned greater than (UGT), Unsigned less than (ULT), Unsigned less than or equal to (ULE), Unsigned greater than or equal to (UGE)
CS220, CO 26
SMIPS ISA
Memory Interface
Address: Address of the location to read/write Size: Number of bytes to read/write Control: Read or Write DataIn: Data to memory (for writing) DataOut: Data from memory (for reading)
Address: {PC, 2b00} Size: 4 Bytes Control: Read DataIn: Dont care DataOut: Provided by the memory. Gets latched in the IR
CS220, CO
SMIPS ISA
27
Decoding Instructions:
decode
31:26, 5:0
31:26 5:0
instruction Bit#(32) pure combinational logic: derived automatically from the high-level description
brComp BrFunc
20:16
15:0 25:0
ext
CS220, CO
SMIPS ISA
Executing Instructions
execute
iType
dInst
rVal2 ALU rVal1 Pure combinational logic pc
SMIPS ISA
dst data
29
Instruction Fetch
Read Control PC (30 bit) Convert to 32 bits Memory
Clock
Add 1
IR
Instruction
RS1
D2out Clock
RS2
WE
Datapath for R and R-I type triadic instructions (non load store)
IR[20:16] IR[15:11] RDSel ALU Control Combinatorial circuit. IR[5:0] IR[31:26]
D1out
RD Din Register file (32regs) RS1
D2out
Clock RS2 Sign Extension to 32 bits WE Oprnd2Sel IR[15:0]
Adding Load-Store
IR[20:16] IR[15:11] RDSel RD Din Register file (32regs) D1out RS1 Address DCnvt Data ALU Control Combinatorial circuit.
IR[5:0]
IR[31:26]
Memory
D2out
Clock RS2
WE
Memory
IR[15:0]
WE
Instruction
Oprnd1Sel
ALU Control D1out RS1 D2out
Memory
RS2 DinSel2
Sign Extension to 32 bits Oprnd2Sel Data to the Memory ExtnCntl Read Control
IR[25:0]
WE
Add 00 at end
Convert to 32 bits
Memory
Instruction
Control Signals
RDSel: 2 bits. Selector for RD.
Possible Values: RD31, RD20-16, RD15-11 Values: DinALU, DinMEM Values: DinPC, DinS1 Values: YES, NO
Control Signals
Inst R Triadic ADDI LHI LDSB LDUB LDUW LDL STB BNEZ JR JALR J JAL RDSel IR15-11 IR20-16 IR20-16 IR20-16 IR20-16 IR20-16 IR20-16 X X X RD31 X RD31 DinSel1 DinALU DinALU DinALU DinMEM DinMEM DinMEM DinMEM X X X X X X DinSel2 DinS1 DinS1 DinS1 DinS1 DinS1 DinS1 DinS1 X X X DinPC X DinPC WE YES YES YES YES YES YES YES NO NO NO YES NO YES NextPC PCPlus1 PCPlus1 PCPlus1 PCPlus1 PCPlus1 PCPlus1 PCPlus1 PCPlus1 **** PCALU PCALU PCALU PCALU Oprnd1Sel RegFile RegFile RegFile RegFile RegFile RegFile RegFile RegFile PC RegFile RegFile PC PC OprndSel2 RegFile IR IR IR IR IR IR IR IR IR IR IR IR ExtnCntl X EXT16 EXT16 EXT16 EXT16 EXT16 EXT16 EXT16 EXT16 EXT16 EXT16 EXT32 EXT32 ALU IR[4:0] ADD DM ADD ADD ADD ADD ADD ADD4 ADD4 ADD4 ADD4 ADD4
Control Signals
For Load instructions, data from memory is to be converted. DCnvt control to be used.
LDSB: DCnvt = ByteSExtend LDUB: DCnvt = Byte0Pad LDSW: DCnvt = WordSExtend LDUW: DCnvt = Word0Pad LDL: DCnvt = None Other instructions: DCnvt is Dont Care.
Harvard architecture
What we just finished is Harvard architecture for SMIPS processor. How do we load code?
Code must be loaded in program memory but it is not writable. Only one operation can take place at a time.
Instruction fetch or data read/write.
Memory Interface
Must be possible to delay an instruction fetch if there is a data read/write request In order to do this,
Load IR with NOP like SLL $0, $0, 0 instruction. And do not change the PC.
SLL function code is 0. Load IR with NOP is same as clearing the register to 0s.
Branch instructions
When a branch instruction is being executed
PC contains the address of the instruction next to the branch (BNEZ, BEQZ, J, JR, JAL, JALR)
Other solution:
One solution: To set WE = NO and no update to PC register. Let the instruction execute. Delayed branch semantics
Solution 1
At cycle i,
If instruction executed in cycle i1 resulted in branch taken Instruction at cycle i must not be executed.
Solution 1
Program: [100] SUB 0, R5, R5
[104]
IR SUB 104 JAL 108 ADD 224 STL 228
JAL X
ADD ... ... SUB R30, 4, R30 ...
[108]
changes in the[...]
PC
IR PC
SUB 104
JAL 108
NOP 224
STL 228
PC Updated, IR Cleared
IR Updated
Oprnd1Sel
ALU Control D1out RS1 D2out
Memory
RS2 DinSel2
Sign Extension to 32 bits Oprnd2Sel Data to the Memory ExtnCntl Read Control
IR[25:0]
WE
Add 00 at end
Convert to 32 bits
Memory
NOPFeed
Instruction
NOP Feeding
NOP fed if the current instruction is
On an average every 6-7th instruction in a program is a branch kind of instruction. During the execution of LD and ST instructions, no instruction is read from memory.
LDxx or STxx J, JR, JAL, JALR BNEZ (if RS = 0) BEQZ (if RS <> 0)
But in Branch instructions, an instruction is already read from memory. Execute it possibly to improve performance Delayed Branch semantics
Put a NOP after each branch instruction. Put a meaningful instruction after the branch.
Smart approaches
SUB 0, R5, R5
JAL X ADD ... ... SUB R30, 4, R30 ...
JAL X
SUB 0, R5, R5 ADD ... ... SUB R30, 4, R30 ...
X:
SUB 0, R5, R5
JAL X STL R31, 0(R30) ... SUB R30, 4, R30 ...
Possible only if a branch independent instruction can be found in the code before branch
Possible only if a branch dependent instruction can be found in the target code that is always executed