Professional Documents
Culture Documents
IIntroduction
t d ti to t
Code Optimization
Instruction Scheduling
5
Outline
Modern architectures
Introduction to instruction scheduling
List scheduling
Resource constraints
Scheduling across basic blocks
Trace scheduling g
IF DE EXE MEM WB
Inst 1
IF DE EXE MEM WB
Inst 2
IF DE EXE MEM WB
Inst 2
IF DE EXE MEM WB
Inst 3
IF DE EXE MEM WB
IInstt 4
Inst 5 IF DE EXE MEM WB
Saman Amarasinghe 5 6.035 MIT Fall 1998
5
Outline
Modern architectures
Introduction to instruction scheduling
List scheduling
Resource constraints
Scheduling across basic blocks
Trace scheduling g
to a Reall Machi
M hine Mod
del
l
Many pipeline stages
Pentium 5
Pentium Pro 10
Pentium IV (130nm) 20
Pentium IV (90nm) 31
Core 2 Duo 14
pointer
i alias
li analysis:
l i p1foo f ?= p2foof
dependencies
dependencies
dependencies
dependencies
1: r2 = *(r1 + 4) 1 2
2: r3 = *(r1
(r1 + 8)
2 2
3: r4 = r2 + r3 2
4: r5 = r2 - 1 4 3
Edge is labeled with Latency:
(i j) = delay
v(i d l required
i d bbetween initiation
i i i i times
i off
i and j minus the execution time required by i
Saman Amarasinghe 14 6.035 MIT Fall 1998
Example
p
1: r2 = *(r1 + 4)
2: r3 = *(r2 + 4)
3: r4 = r2 + r3
4: r5 = r2 - 1 1 2
2 2
2
4 3
4 3
1 2 3 4 st st 5 6 st st st 7
Saman Amarasinghe 20 6.035 MIT Fall 1998
5
Outline
Modern architectures
Introduction to instruction scheduling
List scheduling
Resource constraints
Scheduling across basic blocks
Trace scheduling g
predecessors
Loop until READY is empty
Schedule each node in READY when no stalling
Update READY
reverse breadth-first
breadth first visitation order
1 4
d=3
7 f=2
3 1
d 0
d=0 d 0
d=0
8 f=0 9 f=0
1 4
d=3
7 f=2
3 1
d 0
d=0 d 0
d=0
8 f=0 9 f=0
1 4
d=3
7 f=2
3 1
d 0
d=0 d 0
d=0
8 f=0 9 f=0
1 4
d=3
7 f=2
3 1
d 0
d=0 d 0
d=0
8 f=0 9 f=0
6
Saman Amarasinghe 32 6.035 MIT Fall 1998
Examp
ple
1 4
d=3
7 f=2
3 1
d 0
d=0 d 0
d=0
8 f=0 9 f=0
6 1
Saman Amarasinghe 33 6.035 MIT Fall 1998
Examp
ple
1 4
d=3
7 f=2
3 1
d 0
d=0 d 0
d=0
8 f=0 9 f=0
6 1 2
Saman Amarasinghe 34 6.035 MIT Fall 1998
Examp
ple
1 4
d=3
7 f=2
3 1
d 0
d=0 d 0
d=0
8 f=0 9 f=0
6 1 2 4
Saman Amarasinghe 35 6.035 MIT Fall 1998
Examp
ple
1 4
d=3
7 f=2
3 1
d 0
d=0 d 0
d=0
8 f=0 9 f=0
6 1 2 4 7
Saman Amarasinghe 36 6.035 MIT Fall 1998
Examp
ple
1 4
d=3
7 f=2
3 1
d 0
d=0 d 0
d=0
8 f=0 9 f=0
6 1 2 4 7 3
Saman Amarasinghe 37 6.035 MIT Fall 1998
Examp
ple
1 4
d=3
7 f=2
3 1
d 0
d=0 d 0
d=0
8 f=0 9 f=0
6 1 2 4 7 3 5 8
Saman Amarasinghe 38 6.035 MIT Fall 1998
Examp
ple
1 4
d=3
7 f=2
3 1
d 0
d=0 d 0
d=0
8 f=0 9 f=0
6 1 2 4 7 3 5 8 9
Saman Amarasinghe 39 6.035 MIT Fall 1998
Examp
ple
1 4
d=3
7 f=2
3 1
d 0
d=0 d 0
d=0
8 f=0 9 f=0
6 1 2 4 7 3 5 8 9
Saman Amarasinghe 40 6.035 MIT Fall 1998
Example
p
Results In
1: lea var_a, %rax 1 cycle
2: add $4, %rax 1 cycle
3: inc %r11 1 cycle
4: mov 4(%rsp), %r10 3 cycles
5: add %r10, 8(%rsp)
6: and 16(%rsp), %rbx 4 cycles
7: imul %rax, %rbx 3 cycles
8: mov %rbx, 16(%rsp)
9: lea var_b,
var b %rax
1 2 3 4 st st 5 6 st st st 7 8 9
14 cycles vs
6 1 2 4 7 3 5 8 9 9 cycles
Saman Amarasinghe 41 6.035 MIT Fall 1998
5
Outline
Modern architectures
Introduction to instruction scheduling
List scheduling
Resource constraints
Scheduling across basic blocks
Trace scheduling g
cycle
t i t
Represent the superscalar architecture as multiple
pipelines
t i t
Represent the superscalar architecture as multiple
pipelines
Each pipeline represent some resource
Example
OOne single
i l cycle
l reg-to-reg ALU unit i
9
9: mov %rbx,
% b 16(%
16(%rsp))
READY = { } 8 9
ALUop 1 6 3 7 8
MEM 1 4 2 5 9
MEM 2 4 2
Saman Amarasinghe 49 6.035 MIT Fall 1998
5
Outline
Modern architectures
Introduction to instruction scheduling
List scheduling
Resource constraints
Scheduling across basic blocks
Trace scheduling g
B C
B C
B C
B C
B C
B C
if ( . . . )
p c
a = b op
if ( . . . )
p c
a = b op
if ( c != 0 )
a = b / c
NO!!!
If ( . . . )
d = *(a1)
If ( valid address? )
d = *(a1)
Outline
Modern architectures
Introduction to instruction scheduling
List scheduling
Resource constraints
Scheduling across basic blocks
Trace scheduling g
B C
F G
H
Saman Amarasinghe 66 6.035 MIT Fall 1998
Trace Scheduling
g
H
Saman Amarasinghe 67 6.035 MIT Fall 1998
Large Basic Blocks via
C d Duplication
Code D li ti
Creating
g large
g extended basic blocks by
y
duplication
Schedule the larger blocks
A
B C
E
Saman Amarasinghe 68 6.035 MIT Fall 1998
Large Basic Blocks via
C d Duplication
Code D li ti
Creating
g large
g extended basic blocks by
y
duplication
Schedule the larger blocks
A A
B C B C
D D D
E E E
Saman Amarasinghe 69 6.035 MIT Fall 1998
Trace Scheduling
g
B C
D D
E
E
F G
F G
H
H H
Saman Amarasinghe 70 6.035 MIT Fall 1998
5
Next
Schedulingg for loopps
Loop unrolling
Software pipelining
Interaction with register allocation
Hardware vs. Compiler
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.