You are on page 1of 47

Introduction to Computer Organization

SMIPS ISA and Instruction Decoding


Amey Karkare karkare@cse.iitk.ac.in Department of CSE, IIT Kanpur
Thanks to Prof Arvind, MIT for Slides.
SMIPS ISA CS220, CO 1

Contributors to the course material


Staff and students in 6.S078 Spring 2012

Arvind, Joel Emer, Li-Shiuan Peh Abhinav Agarwal, Myron King Arvind, Joel Emer, Sang Woo Jun, Murali Vijayaraghavan Asif Khan, Elliott Fleming Prof Jihong Kim & students at Seoul Nation University Prof Derek Chiou, University of Texas at Austin

Staff and students in 6.S195 Fall 2012


Volunteers

External

Bluespec Inc: R. Nikhil


SMIPS ISA

CS220, CO

Instruction Execution
Execution of an instruction involves 1. 2. 3. 4. 5. 6. Instruction fetch Decode Register fetch ALU operation Memory operation (optional) Write back

and the computation of the next instruction address

SMIPS ISA

CS220, CO

Implementing an ISA
Instructions fetch

Decode

requires an Instruction memory, PC requires understanding the instruction format requires interaction with a register file with a specific number of read/write ports must have the ability to carry out the specified ops requires a data memory requires interaction with the register file requires arithmetic ops to calculate pc and condition
CS220, CO 4

Register Fetch

ALU

Memory operations

Write-back

Update the PC
SMIPS ISA

A single-cycle implementation
PC

CPU

Register File

fetch & execute

iMem

dMem

SMIPS ISA

CS220, CO

Building Register File: Register

Register (n-bit)

Clk

D1 D1

D2 D2

D3

D4 D3

D5 Undef

clk

en

SMIPS ISA

CS220, CO

Building Register File: Bussing


Q D Register (n-bit) Q0

wr0 rd0 D Q1

Register (n-bit)

wr1 rd1

SMIPS ISA

CS220, CO

Register File
R0 R1 D Q

R2
R3

wr0 to wr3 WE Decoder

rd0 to rd3 Decoder

WA
SMIPS ISA

RA
CS220, CO 8

Multiple Buses
RD RA RB D Register (n-bit) Q0 WE Clk

A
wr0 RdA0 RdB0 D Register (n-bit) Q1 D

Register File

wr1 RdA1 RdB1


SMIPS ISA

CS220, CO

Adding ALU to Register File


RD WE Clk RA RB Function Code

D Register File

SMIPS ISA

CS220, CO

10

Instruction formats
6
opcode
6

5
rs
5

5
rt
5

5
rd

5
shamt
16

6
func R-type I-type J-type

opcode

rs

rt

immediate

6
opcode

26
target

Only three formats but the fields are used differently by different types of instructions

SMIPS ISA

CS220, CO

11

Instruction formats
Computational Instructions
6
0
opcode

cont

5
rs
rs

5
rt
rt

5
rd

5
0
immediate

6
func rd (rs) func (rt)
rt (rs) op immediate

Load/Store Instructions
6
opcode
31 26 25

5
rs
21 20

5
rt
16 15

16
displacement
0

addressing mode
(rs) + displacement

rs is the base register rt is the destination of a Load or the source for a Store

SMIPS ISA

CS220, CO

12

Control Instructions
Conditional (on GPR) PC-relative branch
6 5 5 16
opcode

rs

offset

BEQZ, BNEZ

Unconditional register-indirect jumps


6 5 5 16
opcode rs

target address = (offset in words)4 + (PC+4) range: 128 KB range

JR, JALR

Unconditional absolute jumps


6 26
opcode

target

J, JAL

target address = {PC<31:28>, target4} range : 256 MB range jump-&-link stores PC+4 into the link register (R31)
CS220, CO 13

SMIPS ISA

Examples of Instructions
6 000000 5 10001 5 10010 5 01000 5 000000 6 100000

op = 000000 => R type instruction func = 100000 = 32 = opcode for ADD rs = 17, rt = 18, rd = 8 add $8, $17, $18 add $t0, $s1, $s2
SMIPS ISA

$t0 = $s1 + $s2


CS220, CO 14

Examples of Instructions
6 001000 5 11101 5 11101 16 0000 0000 0000 0100

op = 001000 => I type instruction 001000 = 8 = opcode for ADDI rs = 29, rt = 29, immediate = 4 add $29, $29, 4 add $sp, $sp, 4
SMIPS ISA

$sp = $sp + 4
CS220, CO 15

Examples of Instructions
6 010011 5 10011 5 01000 16 0000 0000 0010 0000

op = 001000 => I type instruction 001000 = 35 = opcode for LW rs = 19, rt = 8, immediate = 32 lw $8, 32($19) lw $t0, 32($s3)
SMIPS ISA

$t0 = $s3[8] $s3 contains the base of some array


CS220, CO 16

Examples of Instructions
6 000010 26

80000 (value)

op = 000010 => J type instruction 000010 = 2 = opcode for J Address = 80000 j 80000
SMIPS ISA

Loop: <some code> j Loop


CS220, CO 17

A single-cycle implementation
PC

CPU
fetch & execute

RF

iMem

dMem

A single-cycle MIPS implementation requires:

A register file with 2 read ports and a write port An instruction memory, separate from data memory so that we can fetch an instruction as well as perform a data operation (Load/store) on the memory
CS220, CO 18

SMIPS ISA

Single-Cycle SMIPS
Register File

2 read & 1 write ports

PC

+4

Decode

Execute

Inst Memory

separate Instruction & Data memories

Data Memory

Datapath is shown only for convenience; it will be derived automatically from the highlevel textual description
SMIPS ISA

CS220, CO

19

SMIPS ISA Design


Instructions are stored in the
memory.

Fetched one at a time Function Code (aka opcode) Addresses of Source Registers Address of the destination register Write Enable.
CS220, CO 20

Instructions (simple ones) provide


SMIPS ISA

SMIPS ISA Design

Register R0 is always 0 (Hardwired!).

Any writing into R0 is ignored. Read always provides 0.

Lets say all our instructions are fixed size (32 bit, or 4 byte wide). Thus the PC need to be incremented by four after each instruction fetch. If the address space is 32-bits, width of PC is 32 bits. In reality, the last two bits of PC can be always 0 (we need just 30 bit register)

SMIPS ISA

PC Register is added 1 after each instruction fetch. CS220, CO

21

Instruction Fetch
At each instruction cycle,

Instruction is read from memory. Program Counter (PC)

Memory address is provided by a register

After a fetch, PC is incremented by the

size of the instruction. Instruction read is stored in the instruction register (IR) Current instruction in the IR is executed.

SMIPS ISA

CS220, CO

22

Instructions

R type triadic instructions


R-I type triadic instructions

Arithmetic & Logic: ADD, SUB, AND, OR, XOR Shift and rotate: SLL, SRL, SRA, ROL, ROR Comparison: SLT, SGT, SLE, SGE, UGT, ULT, ULE, UGE. ADDI, SUBI, ORI, ANDI, XORI, SLLI, SRLI, SRAI, SLTI, SGTI, SLEI, SGEI, ULTI, UGTI, ULEI, UGEI, LHI (To load half of a register). LDSB, LDSW, LDUB, LDUW, LDL, STB, STW, STL instructions

Load signed byte, word; unsigned byte, word; long; store byte, word, long
CS220, CO

SMIPS ISA

23

Instructions

R type dyadic instructions

BNEZ, BEQZ (Branch if rs <>0 or rs = 0) If (cond) then PC = PC + offset (Remember: PC is 30 bit register) JR, JALR (Jump based on register) Target: rs & 0xFFFFFFFC + 4*offset Offset is signed number. PC = Target >> 2 (shift right by 2) JALR also saves {PC,2b00} in R31

Before updating the PC


CS220, CO

SMIPS ISA

24

Instructions

J type instructions

J, JAL (jump and link): Jump to new target Target = {PC[29:26], offset} PC = Target JAL instruction additionally saves {PC,2b00} in R31

Before updating the PC

SMIPS ISA

CS220, CO

25

ALU functions

ADD, SUB Left shift (LS), Arithmetic right shift (RSA), Logical right shift (RSL) by a count provided by the second operand (only five bits are used) (Bit wise) AND, OR, XOR Comparison:

Signed greater than (SGT), Signed less than (SLT), Signed less than or equal to (SLE), Signed greater than or equal to (SGE) Unsigned greater than (UGT), Unsigned less than (ULT), Unsigned less than or equal to (ULE), Unsigned greater than or equal to (UGE)
CS220, CO 26

SMIPS ISA

Memory Interface

Memory interface includes the following

Instruction fetch does not write any thing in the memory.

Address: Address of the location to read/write Size: Number of bytes to read/write Control: Read or Write DataIn: Data to memory (for writing) DataOut: Data from memory (for reading)
Address: {PC, 2b00} Size: 4 Bytes Control: Read DataIn: Dont care DataOut: Provided by the memory. Gets latched in the IR
CS220, CO

SMIPS ISA

27

Decoding Instructions:
decode
31:26, 5:0

extract fields needed for execution


Type DecodedInst

iType IType aluFunc AluFunc

31:26 5:0

instruction Bit#(32) pure combinational logic: derived automatically from the high-level description

31:26 20:16 15:11 25:21

brComp BrFunc

20:16
15:0 25:0

ext
CS220, CO

rDst Maybe#(RIndx) rSrc1 Maybe#(RIndx) rSrc2 Maybe#(RIndx) imm Maybe#(Bit#(32))


28

SMIPS ISA

Executing Instructions
execute
iType

dInst
rVal2 ALU rVal1 Pure combinational logic pc
SMIPS ISA

dst data

either for rf write or St

ALUBr Branch Address


CS220, CO

either for memory addr reference or branch target brTaken

29

Instruction Fetch
Read Control PC (30 bit) Convert to 32 bits Memory

Clock

Add 1

IR

Instruction

Datapath for R type triadic instructions


ALU Control D1out RD Din Register file (32regs)

RS1

D2out Clock

RS2

WE

Datapath for R and R-I type triadic instructions (non load store)
IR[20:16] IR[15:11] RDSel ALU Control Combinatorial circuit. IR[5:0] IR[31:26]

D1out
RD Din Register file (32regs) RS1

D2out
Clock RS2 Sign Extension to 32 bits WE Oprnd2Sel IR[15:0]

Adding Load-Store
IR[20:16] IR[15:11] RDSel RD Din Register file (32regs) D1out RS1 Address DCnvt Data ALU Control Combinatorial circuit.

IR[5:0]
IR[31:26]

Memory

D2out
Clock RS2

DinSel Sign Extension to 32 bits Oprnd2Sel Data to the Memory IR[15:0]

WE

Adding JR, BEQZ, BNEZ


=0? IR[20:16] IR[15:11] RDSel RD Din Clock Register file (32regs) D1out RS1 D2out RS2 Oprnd2Sel Data to the Memory Remove 2 bits at end PC (30 bit) NextPC Clock Add 00 at end Add 1 IR Oprnd1Sel ALU Control Combinatorial circuit. RS1is0 IR[5:0] IR[31:26]

Address DCnvt Data DinSel

Memory

Sign Extension to 32 bits

IR[15:0]

WE

Read Control Convert to 32 bits Memory

Instruction

Adding J, JAL and JALR


=0? IR[20:16] IR[15:11] RS1is0 IR[5:0] Combinatorial circuit. IR[31:26]

Oprnd1Sel
ALU Control D1out RS1 D2out

31 RDSel RD Din Clock

Register file (32regs)

Address DCnvt Data DinSel1

Memory

RS2 DinSel2

Sign Extension to 32 bits Oprnd2Sel Data to the Memory ExtnCntl Read Control

IR[25:0]

WE

Add 00 at end

Remove 2 bits at end PC (30 bit)

Convert to 32 bits

Memory

NextPC Clock Add 00 at end Add 1 IR

Instruction

Control Signals
RDSel: 2 bits. Selector for RD.

DinSel1: 1 Bit Selector for Din Source

Possible Values: RD31, RD20-16, RD15-11 Values: DinALU, DinMEM Values: DinPC, DinS1 Values: YES, NO

DinSel2: 1 Bit Selector for Din Source

WE: Write Enable

NextPC: Source for NextPC

Oprnd1Sel: RegFile or PC Oprnd2Sel: RegFile or IR ExtnCntl: EXT16 or EXT26 ALU Control

Values: PCALU, PCPlus1

Control Signals
Inst R Triadic ADDI LHI LDSB LDUB LDUW LDL STB BNEZ JR JALR J JAL RDSel IR15-11 IR20-16 IR20-16 IR20-16 IR20-16 IR20-16 IR20-16 X X X RD31 X RD31 DinSel1 DinALU DinALU DinALU DinMEM DinMEM DinMEM DinMEM X X X X X X DinSel2 DinS1 DinS1 DinS1 DinS1 DinS1 DinS1 DinS1 X X X DinPC X DinPC WE YES YES YES YES YES YES YES NO NO NO YES NO YES NextPC PCPlus1 PCPlus1 PCPlus1 PCPlus1 PCPlus1 PCPlus1 PCPlus1 PCPlus1 **** PCALU PCALU PCALU PCALU Oprnd1Sel RegFile RegFile RegFile RegFile RegFile RegFile RegFile RegFile PC RegFile RegFile PC PC OprndSel2 RegFile IR IR IR IR IR IR IR IR IR IR IR IR ExtnCntl X EXT16 EXT16 EXT16 EXT16 EXT16 EXT16 EXT16 EXT16 EXT16 EXT16 EXT32 EXT32 ALU IR[4:0] ADD DM ADD ADD ADD ADD ADD ADD4 ADD4 ADD4 ADD4 ADD4

Control Signals
For Load instructions, data from memory is to be converted. DCnvt control to be used.

Values: ByteSExtend, WordSExtend, Byte0Pad, Word0Pad, None

LDSB: DCnvt = ByteSExtend LDUB: DCnvt = Byte0Pad LDSW: DCnvt = WordSExtend LDUW: DCnvt = Word0Pad LDL: DCnvt = None Other instructions: DCnvt is Dont Care.

Harvard vs. Princeton architectures


Princeton architecture

Single memory for program and data


Separate memories for program and data

Harvard architecture

What we just finished is Harvard architecture for SMIPS processor. How do we load code?

Code must be loaded in program memory but it is not writable. Only one operation can take place at a time.
Instruction fetch or data read/write.

If we have a single memory

Memory Interface
Must be possible to delay an instruction fetch if there is a data read/write request In order to do this,

Load IR with NOP like SLL $0, $0, 0 instruction. And do not change the PC.

For esthetics reasons, NOP instruction is all 0s.


SLL function code is 0. Load IR with NOP is same as clearing the register to 0s.

Branch instructions
When a branch instruction is being executed

What do we do with the instruction just fetched.

PC contains the address of the instruction next to the branch (BNEZ, BEQZ, J, JR, JAL, JALR)

Other solution:

One solution: To set WE = NO and no update to PC register. Let the instruction execute. Delayed branch semantics

Compilers can generate a NOP instruction after each branch

Solution 1
At cycle i,

If instruction executed in cycle i1 resulted in branch taken Instruction at cycle i must not be executed.

Clear the IR register so that a NOP is loaded.

Solution 1
Program: [100] SUB 0, R5, R5

[104]
IR SUB 104 JAL 108 ADD 224 STL 228

JAL X
ADD ... ... SUB R30, 4, R30 ...

Without any architecture

[108]

changes in the[...]
PC

[224] X: STL R31, 0(R30) [228] [...]

IR PC

SUB 104

JAL 108

NOP 224

STL 228

NOP feeding change

PC Updated, IR Cleared

IR Updated

SMIPS Data Path + NOP feeding


=0? IR[20:16] IR[15:11] RS1is0 IR[5:0] Combinatorial circuit. IR[31:26]

Oprnd1Sel
ALU Control D1out RS1 D2out

31 RDSel RD Din Clock

Register file (32regs)

Address DCnvt Data DinSel1

Memory

RS2 DinSel2

Sign Extension to 32 bits Oprnd2Sel Data to the Memory ExtnCntl Read Control

IR[25:0]

WE

Add 00 at end

Remove 2 bits at end PC (30 bit)

Convert to 32 bits

Memory

NextPC Clock Add 1 IR

NOPFeed
Instruction

NOP Feeding
NOP fed if the current instruction is

On an average every 6-7th instruction in a program is a branch kind of instruction. During the execution of LD and ST instructions, no instruction is read from memory.

LDxx or STxx J, JR, JAL, JALR BNEZ (if RS = 0) BEQZ (if RS <> 0)

But in Branch instructions, an instruction is already read from memory. Execute it possibly to improve performance Delayed Branch semantics

Solution2: Delayed Branch


An instruction stored in memory right after a branch is executed. Solution can be implemented by a compiler. Nave approach.

Put a NOP after each branch instruction. Put a meaningful instruction after the branch.

Smart approaches

Example of delay slot filling


Program: (Desired semantics) Program: (Code in memory) Program: (alternate)

SUB 0, R5, R5
JAL X ADD ... ... SUB R30, 4, R30 ...

JAL X
SUB 0, R5, R5 ADD ... ... SUB R30, 4, R30 ...

X:

SUB 0, R5, R5
JAL X STL R31, 0(R30) ... SUB R30, 4, R30 ...

X: STL R31, 0(R30)

X: STL R31, 0(R30)

Possible only if a branch independent instruction can be found in the code before branch

Possible only if a branch dependent instruction can be found in the target code that is always executed

You might also like