Professional Documents
Culture Documents
Architecture
Lec 3 Performance
+ Pipeline Review
David Patterson
Electrical Engineering and Computer Sciences
University of California, Berkeley
http://www.eecs.berkeley.edu/~pattrsn
http://www-inst.eecs.berkeley.edu/~cs252
http://inst.eecs.berkeley.edu/~cs252/archives.html
CS252: Administrivia
Instructor: Prof David Patterson
Office: 635 Soda Hall, pattrsn@cs
Office Hours: Tue 11 - noon or by appt.
(Contact Cecilia Pracher; cpracher@eecs)
T. A:
Class:
Text:
Computer Architecture: A Quantitative Approach, 4th
Edition (Oct, 2006), Beta, distributed for free provided report errors
Web page: http://www.cs/~pattrsn/courses/cs252-S06/
Lectures available online <9:00 AM day of lecture
Wiki page: ??
Reading assignment: Memory Hierarchy Basics Appendix C
(handout) for Mon 1/30
Wed 2/1: Great ISA debate (3 papers) + Prerequisite Quiz
08/29/16
Outline
Review
Quantify and summarize performance
Ratios, Geometric Mean, Multiplicative Standard Deviation
08/29/16
08/29/16
26 25
21 20
Rd<-Rs1 op Rs2
16 15
Op
Rs
Rt
Register-Immediate
(I-type)
Load/Store
31
26 25
Op
21 20
Rs
Rd
11 10
6 5
Opx
Rd<-Rs1 op Imm
16 15
26 25
Op
; Add R1,R2,#3
L R1,60(R2)
immediate
Rt
Branch (B-type)
31
; Add R1,R2,R3
BNE R1,R2,name
21 20
Rs
16 15
Rt/Opx
immediate
J name
26 25
Op
08/29/16
target
Approaching an ISA
Instruction Set Architecture
Defines set of operations, instruction format, hardware supported data
types, named storage, addressing modes, sequencing
Instr. Decode
Reg. Fetch
Execute
Addr. Calc
Next SEQ PC
Adder
Zero?
RS1
L
M
D
MUX
Data
Memory
ALU
Imm
MUX MUX
RD
Reg File
Inst
Memory
Address
IR <= mem[PC];
RS2
Write
Back
MUX
Next PC
Memory
Access
Sign
Extend
PC <= PC + 4
Reg[IRrd] <= Reg[IRrs] opIRop Reg[IRrt]
08/29/16
WB Data
08/29/16
JSR
jmp
br
opFetch-DCD
A <= Reg[IRrs];
JR
ST
B <= Reg[IRrt]
RI
RR
r <= ID/EX.PC
+Irim, cond <=
ID/EX.A==0
PC <= IRjaddr
r <= A opIRop B
WB <= r
Reg[IRrd] <= WB
08/29/16
Ifetch
LD
r <= A + IRim
WB <= r
WB <= Mem[r]
Reg[IRrt] <= WB
Reg[IRrt] <= WB
08/29/16
10
08/29/16
11
08/29/16
12
08/29/16
13
08/29/16
14
08/29/16
15
08/29/16
16
08/29/16
17
08/29/16
18
Visualizing Pipelining
Figure A.2, Page A-8
08/29/16
Ifetch
DMem
Reg
Ifetch
Reg
Ifetch
Reg
DMem
Reg
Reg
DMem
ALU
Reg
ALU
O
r
d
e
r
Ifetch
ALU
I
n
s
t
r.
ALU
Reg
DMem
Reg
19
08/29/16
20
Instr 4
08/29/16
Ifetch
Reg
Ifetch
DMem
ALU
Reg
Reg
Reg
Ifetch
Reg
DMem
Reg
DMem
Reg
ALU
Instr 3
Ifetch
DMem
ALU
O
r
d
e
r
Instr 2
Reg
ALU
I Load Ifetch
n
s
t Instr 1
r.
ALU
Reg
Reg
DMem
21
Stall
Instr 3
How
08/29/16
Reg
Ifetch
Reg
Bubble
Reg
DMem
ALU
Ifetch
DMem
Reg
DMem
Bubble Bubble
Ifetch
do you bubbleCS252-s06,
the pipe?
Lec 02-intro
Reg
Reg
Bubble
ALU
O
r
d
e
r
Instr 2
Reg
ALU
I Load Ifetch
n
s
t Instr 1
r.
ALU
Bubble
Reg
DMem
22
CPIpipelined
Cycle Time pipelined
Cycle Timeunpipelined
Ideal CPI Pipeline depth
Speedup
23
24
Data Hazard on R1
Figure A.6, Page A-17
and r6,r1,r7
or
r8,r1,r9
Ifetch
DMem
Reg
DMem
Ifetch
Reg
DMem
Reg
DMem
Reg
ALU
sub r4,r1,r3
Reg
ALU
Ifetch
ALU
O
r
d
e
r
add r1,r2,r3
Ifetch
Reg
Ifetch
xor r10,r1,r11
08/29/16
WB
ALU
I
n
s
t
r.
MEM
ALU
IF ID/RF EX
Reg
Reg
Reg
DMem
25
Reg
08/29/16
26
I: sub r4,r1,r3
J: add r1,r2,r3
K: mul r6,r1,r7
27
I: sub r1,r4,r3
J: add r1,r2,r3
K: mul r6,r1,r7
28
r8,r1,r9
DMem
Ifetch
Reg
DMem
Reg
DMem
Reg
ALU
or
Reg
ALU
and r6,r1,r7
Ifetch
DMem
ALU
sub r4,r1,r3
Reg
ALU
O
r
d
e
r
Ifetch
Reg
Ifetch
xor r10,r1,r11
08/29/16
Yes
by Forwarding
(bypassing)
ALU
I
n
s
t
r.
Reg
Reg
Reg
DMem
29
Reg
NextPC
mux
MEM/WR
EX/MEM
ALU
mux
ID/EX
Registers
Data
Memory
mux
Immediate
30
08/29/16
31
or
r8,r6,r9
Reg
DMem
Ifetch
Reg
DMem
Reg
DMem
Reg
ALU
sw r4,12(r1)
Ifetch
DMem
ALU
lw r4, 0(r1)
Reg
ALU
O
r
d
e
r
ALU
I
n
s
t
r.
ALU
Ifetch
Ifetch
xor r10,r9,r11
ZASTOJ ?
08/29/16
Reg
Reg
Reg
Reg
DMem
Reg
32
and r6,r1,r7
or
08/29/16
r8,r1,r9
DMem
Ifetch
Reg
DMem
Reg
Ifetch
Ifetch
Reg
Reg
Reg
DMem
ALU
sub r4,r1,r6
Reg
ALU
O
r
d
e
r
ALU
I
n
s
t
r.
ALU
Reg
DMem
33
Reg
and r6,r1,r7
or r8,r1,r9
08/29/16
Reg
DMem
Ifetch
Reg
Bubble
Ifetch
Bubble
Reg
Bubble
Ifetch
Reg
DMem
Reg
Reg
Reg
DMem
ALU
sub r4,r1,r6
Ifetch
ALU
O
r
d
e
r
lw r1, 0(r2)
ALU
I
n
s
t
r.
ALU
DMem
34
08/29/16
35
Rb,b
Rc,c
Ra,Rb,Rc
a,Ra
Re,e
Rf,f
Rd,Re,Rf
d,Rd
Fast code:
LW Rb,b
LW Rc,c
LW Re,e
ADD Ra,Rb,Rc
LW Rf,f
SW a,Ra
SUB Rd,Re,Rf
SW d,Rd
36
Outline
Review
Quantify and summarize performance
Ratios, Geometric Mean, Multiplicative Standard Deviation
08/29/16
37
Reg
DMem
Ifetch
Reg
Ifetch
Reg
Reg
DMem
Reg
DMem
Ifetch
Reg
ALU
r6,r1,r7
DMem
ALU
Ifetch
Reg
ALU
Ifetch
ALU
ALU
Reg
38
Reg
DMem
08/29/16
39
Memory
Access
Write
Back
Adder
Adder
MUX
Next
SEQ PC
Next PC
Zero?
RS1
MUX
MEM/WB
Data
Memory
EX/MEM
ALU
MUX
ID/EX
Imm
Reg File
IF/ID
Memory
Address
RS2
WB Data
Execute
Addr. Calc
Instr. Decode
Reg. Fetch
Sign
Extend
RD
RD
RD
40
08/29/16
41
42
becomes
if R2=0 then
add R1,R2,R3
add R1,R2,R3
if R1=0 then
sub R4,R5,R6
or R7,R8,R9
sub R4,R5,R6
A is the best choice, fills delay slot & reduces instruction count (IC)
In B, the sub instruction may need to be copied, increasing IC
In B (C), must be okay to execute sub (or) when branch fails (occurs)
08/29/16
43
Delayed Branch
Compiler effectiveness for single branch delay slot:
Fills about 60% of branch delay slots
About 80% of instructions executed in branch delay slots useful in
computation
About 50% (60% x 80%) of slots usefully filled
08/29/16
44
Pipelinedepth
1+BranchfrequencyBranchpenalty
08/29/16
45
46
Cycle Timeunpipelined
Pipeline depth
Speedup
08/29/16
48