You are on page 1of 185

©

R.Lauwereins
Imec 2001
Course contents

Digital
• Digital design
design
• Combinatorial circuits: without status
Combina-
torial • Sequential circuits: with status
circuits
 FSMD design: hardwired processors
Sequential
circuits • Language based HW design: VHDL
FSMD
design

VHDL

4/1
©
R.Lauwereins
Imec 2001
FSMD design

Digital
 FSMDs
design
• Models
Combina-
torial • Synthesis techniques
circuits

Sequential
circuits

FSMD
design

VHDL

4/2
©
R.Lauwereins
Imec 2001
FSMD

Digital
• FSMD: Finite State Machine with Datapath
design
• FSMD = hardcoded processor
Combina-  Consists of a datapath that performs the
torial
circuits computations
 and a controller which indicates to the
Sequential
circuits datapath which operations have to be carried
out on which data
FSMD
design  The controller always executes the same
algorithm: hardcoded
VHDL
• A traditional ASIC consists of multiple
interconnected FSMDs

4/3
©
R.Lauwereins
Imec 2001
FSMD

Digital
design

Combina-
Data Data
torial inputs outputs
circuits
Datapath
Sequential
circuits

FSMD Control Status


design
signals signals
VHDL
Control Control
inputs outputs
Controller

4/4
©
R.Lauwereins
Imec 2001
FSMD design

Digital
• FSMDs
design
 Datapath design
Combina-  Controller design
torial
circuits • Models
Sequential • Synthesis techniques
circuits

FSMD
design

VHDL

4/5
©
R.Lauwereins
Imec 2001
FSMD design

Digital
• FSMDs
design
 Datapath design
Combina-  Controller design
torial
circuits • Models
Sequential • Synthesis techniques
circuits

FSMD
design

VHDL

4/6
©
R.Lauwereins
Imec 2001
Datapath design

Digital
• Datapath
design
 Temporary storage: registers, register files,
Combina-
FIFO’s, …
torial
circuits
 Functional units: arithmetic and logic units,
shifters
Sequential
circuits
 Connections: busses, multiplexors, tri-state bus
drivers
FSMD
design

VHDL

4/7
©
R.Lauwereins
Imec 2001
Datapath design
2
Digital
design
Task: sum   xi
i 1
Combina-
torial
circuits
Algorithm: Processing
sum = 0
Sequential
circuits
FOR i = 1 TO 2
sum = sum + xi Control
FSMD ENDFOR
design y = sum
VHDL
Datapath construction rules:
•each variable and constant corresponds to a register
•each operator corresponds to a functional unit
•connect outputs of registers to input of functional
units; when multiple outputs connect to the same input:
MUX or bus with tristate drivers
•connect output of functional units to input
of registers; when multiple outputs connect to the same
4/8 input: MUX or bus with tristate drivers
©
R.Lauwereins
Imec 2001
Datapath design
Variables: sum Algorithm:
Output order: sum = 0
Digital
Operators: add ‘Reset’,’Load’, FOR i = 1 TO 2
design
Connections ’Out’ sum = sum + xi
210 ENDFOR
Combina-
torial xi y = sum
circuits

Sequential
Start 0 2
circuits
Wait Reset
Register
100 Load
FSMD
1 SUM
design Start=1 Clk
Add1
VHDL 010

Add2
010 Add

Output
001 0

4/9 y
©
R.Lauwereins
Imec 2001
Datapath design
Task: count the number of ‘1’s in a word
Digital
design
Algorithm:
Combina- Data = Inport || OCnt = 0 || Mask = 1
torial
circuits
WHILE Data <> 0 DO
Temp = Data AND Mask
Sequential OCnt = OCnt + Temp || Data = Data >> 1
circuits ENDWHILE
Outport = OCnt
FSMD
design

All instructions on a single line are executed concurrently:


VHDL
maximum speed, but highest cost

Trading-off speed for area is explained in the section on


‘Synthesis techniques’

All hardware components work in parallel. Implementing


hardware is hence not writing a sequential software
program and implementing this directly in hardware. Above
4/10 algorithm is a ‘concurrent’ description!
Datapath design
©
R.Lauwereins
Imec 2001
Data = Inport; OCnt = 0; Mask = 1
WHILE Data <> 0 DO
Temp = Data AND Mask Output order:
OCnt = OCnt + Temp; Data = Data >> 1
Digital
543210
ENDWHILE
design Outport = OCnt

Combina-
torial
s=0 Inport
circuits
s Wait
x01x00
Sequential s=1
circuits
1 0
Load
5
FSMD 111x00
design
3R
Comp Data OCnt Mask Temp
VHDL 4 2 1
x00000
z=0 z=1

Temp Out
x00010 x00001
<>0 AND Add >>1 0
Update zero
010100
Outport
4/11
©
R.Lauwereins
Imec 2001
Datapath design

Digital
• Possible optimisations:
design  When the life time of 2 variables is non-
overlapping, both can be stored in the same
Combina-
torial register: register sharing
circuits
 When two operations are not executed
concurrently, they can be assigned to the same
Sequential
circuits functional unit: functional unit sharing
 When two connections are not used
FSMD
design
concurrently, they can be shared: connection
sharing
VHDL  When two registers are not concurrently read
from resp. writen to, they can be combined into
a single register file: register port sharing
 Operations that could be executed
concurrently, may also be executed
sequentially, facilitating the four previous
optimisations

4/12
©
R.Lauwereins
Imec 2001
Datapath design

Digital
• Generic structure of the datapath:
design
External input
Combina-
torial
circuits

Sequential
circuits
Temporary storage
FSMD
design

Operand switching network


VHDL

Functional units

Result switching network

External output
4/13
©
R.Lauwereins
Imec 2001
Datapath design

Digital
• Typical datapath:
design
Inport

Combina- S 1 0
torial
circuits
WA
WE
Sequential RA1 Register
circuits RE1 File
R R
23
L Counter RA2 Register
FSMD C RE2 L
design
COE RFOE1 RFOE2 ROE

VHDL

Comparator F ALU Sh Barrel


D shifter
> = <
AOE SOE

OOE

Outport
4/14
©
R.Lauwereins
Imec 2001
Datapath design

Digital
• In the datapath of previous slide a few
design
decisions have been taken:
Combina-  Only 1 i.o. 2 result busses  ALU and Barrel
torial
circuits
shifter cannot be used concurrently
 Only 2 i.o. 4 operand busses  e.g. Compare
Sequential
circuits
and ALU work on the same set of data
 9 registers with only 2 write ports and 3 read
FSMD
design
ports
 Inport can only feed the register file
VHDL

4/15
©
R.Lauwereins
Imec 2001
Datapath design
Instruction format
Digital
design
17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
RF
Combina- RA2RA1RA0RE2 R L ROE F2 F1 F0 AOESH2SH1SH0 D SOEOOE
OE2
torial
circuits

Register File Barrel


Sequential Register ALU
circuits Read Port 2 shifter

FSMD 31 30 29 28 27 26 25 24 23 22 21 20 19 18
design RF
R L C COE S WA2WA1WA0 WE RA2RA1RA0RE1
OE1
VHDL
Register
Register File
Counter File
Read Port 1
Write Port
32-bit instruction word
For reasons of simplicity, clarity and correctness, it is
possible to assign a mnemonic to a certain bit pattern
(e.g. ADD): assembly instruction
4/16
©
R.Lauwereins
Imec 2001
Datapath design

Digital
• The size of the instruction word may be
design reduced, since several operations cannot
Combina-
be executed concurrently
torial  Either Register File Read Port 2, either Register
circuits
Read Port connects to the 1st Operand Bus (-1)
Sequential  Either Register File Read Port 1, either Counter
circuits Read Port connects to the 2nd Operand Bus (-1)
 ALU & Shift cannot occur concurrently: 1 bit
FSMD
design needed to select the operator and 4 bits control
the operator (-2)
VHDL  When the ALU operator is active, its output
may immediately be placed on the result bus;
idem for the Barrel shifter (-2)
 For the counter the ‘Count’ and ‘Load’
operations are exclusive (-1)
• Additional limitations to concurrency may
be introduced at the cost of increased
4/17
execution time
©
R.Lauwereins
Imec 2001
Datapath design
• Design freedom
Digital
design Type Fixed To be designed speed cost design
time
Combina- custom fixed - - custom custom    
torial
circuits proc. algo DP Ctrl
soft IP fixed DP - DP custom    
Sequential algo ext. Ctrl
circuits ASIP algo DP Ctrl DP Ctrl    
class ext. ext.
FSMD Proc any DP Ctrl - -  -  
design
algo

VHDL

A compiler performs the same tasks as synthesis tools


(e.g. assign variables without overlapping life time to
the same register) but with less degrees of freedom,
since the hardware is fixed

4/18
©
R.Lauwereins
Imec 2001
FSMD design

Digital
• FSMDs
design
 Datapath design
Combina-  Controller design
torial
circuits • Models
Sequential • Synthesis techniques
circuits

FSMD
design

VHDL

4/19
©
R.Lauwereins
Imec 2001
Controller design

Digital
• The controller has been designed each
design
time using the design method for FSMs as
Combina- discussed before
torial
circuits • For a large number of states this is a
Sequential
tedious job
• Next slides present alternative design
circuits

FSMD methods, that lead to a faster design


design
process in several cases
VHDL

4/20
©
R.Lauwereins
Imec 2001
Controller design
Standard FSM
Digital
design

D Q
Combina-
torial Clk
circuits

Sequential
S*=F(S,I)
circuits
D Q O=H(S,I)
FSMD Next Clk Output
design State
Combi-
Combi-
VHDL nato-
nato-
rial
rial
Logic
Logic
D Q
Clk

4/21
©
R.Lauwereins
Imec 2001
Controller design
Redrawn Control Status
Digital Signals (CS) Signals (SS)
design

Next CI SS
Combina- State
torial
circuits

Next
Sequential
circuits state
logic
FSMD Control Control
design Input (CI) Output (CO)
State
VHDL Reg
Size State Reg:
Out- CS
log2n for n states
for straightforward put
Current logic
and State
CO
minimum-bit-change;
n for n states for
CI SS
one-hot
4/22
©
R.Lauwereins
Imec 2001
Controller design
Critical path delay:
Digital
Find the longest combinatorial path from clock
design to clock
ClkOutStateReg + OutputLogic + AddressToOutRegFile +
Combina-
torial BusDriver + BarrelShifter +BusDriver +Mux +
circuits SetupInPortRegFile

Sequential
circuits Next
S1 0
CI SS
State WA
FSMD
WE
Register
design Next RA1
R RE1 File R
state
logic L Counter RA2 23 Register
VHDL C RE2 L
COE RFOE1 RFOE2 ROE
State
Reg

Comparator ALU Sh Barrel


Out- CS F
put D shifter
> = <
Current logic
CO AOE SOE
State

CI SS OOE
Outport
4/23
©
R.Lauwereins
Imec 2001
Controller design
Modification 1 CS SS
Digital
design

One-hot CI Next CI SS
Combina- State
torial State
circuits reg
Next
Sequential
circuits state
Properties: logic
FSMD * simple
design design and small log2n CO
State
next state and n
Reg
VHDL output logic of dec.
one-hot CS
* small number of Out-
flip-flops of put
Current logic
straightforward State
CO
and minimum-
bit-change
CI SS

4/24
©
R.Lauwereins
Imec 2001
Controller design

Digital
• Modification 2
design
 Often the state diagram shows an unconditional
Combina-
sequence of states, but for a few exceptions
torial
circuits
 E.g.

Sequential
circuits
0
FSMD Wait
design 100
Start=1
VHDL
Add1
010

Add2
010

Output
001
4/25
©
R.Lauwereins
Imec 2001
Controller design
Modification 2
CS SS
Digital
design

Next CI SS
Combina- State
torial Next
circuits
State
Logic Next
Sequential
circuits state
logic
FSMD
design CI CO
MUX State
VHDL Reg

INC Out- CS
put
logic CO
Current
State
CI SS

4/26
©
R.Lauwereins
Imec 2001
Controller design

Digital
• Advantage of modification 2:
design
 The next state logic is very simple:
Combina- for unconditional next state: select
torial
circuits the INC
Sequential only for conditional next state the
hardware should generate the next
circuits

FSMD state
design
• Implementation of the INC:
VHDL
 ripple carry chain of Half Adders
 INC and State Reg together form a synchronous
counter

4/27
©
R.Lauwereins
Imec 2001
Controller design

Digital
• Modification 3
design
 Often the state diagram contains a part that is
Combina-
repeated several times  subroutine
torial
circuits

s0 s0
Sequential
circuits
s1
FSMD s3
design s2
s4 5 states
VHDL s3
Only at run-time
s4 s1 it is known
which will be
s5 the next
state following
s6 the end of a
s2 subroutine
 stack
4/28 7 states
©
R.Lauwereins
Imec 2001
Controller design
Modification 3 CS SS
Digital
design

Combina-
Next CI SS
torial State
circuits
Logic Next
State
Next
Sequential Push/ state
circuits
Pop’ logic
FSMD
design CI CO
Stack MUX State
VHDL Reg

Return Out- CS
State put
Current
State logic CO

CI SS

4/29
©
R.Lauwereins
Imec 2001
Controller design
Combination
CS SS
Digital
design
CI SS

Combina-
torial Next
circuits Push/ State
Next
Pop’ state
Sequential
circuits logic
Stack
FSMD Log2n
design MUX State
n
CI Reg CO
Dec
VHDL
Out- CS
INC
put
Current
State logic CO

CI SS

4/30 Assumption: Return state = Jump state + 1


©
R.Lauwereins
Imec 2001
Controller design

Digital
• Implementation of the next state logic
design
and the output logic
Combina-  Either construct via Karnaugh a minimal AND-
torial
circuits
OR implementation
 Either put the truth table in a ROM-table (this
Sequential
circuits
method is called microprogrammed control)

FSMD
design

VHDL

4/31
©
R.Lauwereins
Imec 2001
Controller design
ROM table
CS SS
Digital
design
CI SS

Combina-
torial Next
circuits Push/ State
Pop’
Sequential
circuits

Stack
FSMD
State ROM
design MUX
CI Reg table CO
VHDL
CS
INC

CO
Current
State

4/32
©
R.Lauwereins
Imec 2001
Controller design
Be careful about timing!
Digital
design
Example: Each iteration of the
Combina-
ReadFromExternal(A); WHILE loop (body, test
torial || sum := 0; and decision) should be
circuits WHILE A <> 1 executed in just one
sum := sum + A; clock cycle!!
Sequential
circuits || ReadFromExternal(A);
Comp
FSMD A
design
LA LS
VHDL A sum
RS

C
Comp Add No 3-state
drivers: each
C=1 when A<>1 bus only has
4/33
one source
©
R.Lauwereins
Imec 2001
Controller design
Can the controller be state based?
Digital
design
Example: Animate sequence
ReadFromExternal(A); A=5,2,1  sum=7
Combina-
torial || sum := 0; Reset is asynchronous
circuits WHILE A <> 1
sum := sum + A; One count too much
Sequential
circuits || ReadFromExternal(A); sum=8 i.o. 7

FSMD
design 5
2
1
? 5
7
8
?
s0 LA LS
VHDL LA=1 A=5
A=2
A=1
A=?
A Sum=8
Sum=0
Sum=5
Sum=7
Sum=?
sum
RS=1 RS
LS=0
C=1
s1
LA=1 Comp Add
RS=0
LS=1
C=0 5
7
8
?
C=1 when
C=1
C=0
C=? A<>1
4/34
©
R.Lauwereins
Imec 2001
Controller design
Can the controller be input based?
Digital
design
Example: Animate sequence
ReadFromExternal(A); A=5,2,1  sum=7
Combina-
torial || sum := 0; Reset is asynchronous
circuits WHILE A <> 1
sum := sum + A;
Sequential Result is correct.
|| ReadFromExternal(A);
circuits Always check timing!
FSMD
design
LA 5
2
1
? LS 5
7
8
?
s0 LA LS
VHDL LA=1 A=2
A=5
A=1
A=?
A Sum=5
Sum=0
Sum=7
Sum=?
sum
RS=1 RS
LS=0
C=1
LA=1
LS=1
s1 Comp Add
RS=0
C=0
LA=0 5
7
8
?
LS=0 C=1 when
C=1
C=0
C=? A<>1
4/35
©
R.Lauwereins
Imec 2001
FSMD design

Digital
• FSMDs
design
• Models
Combina-  State-action table
torial
circuits  Algorithmic-state-machine chart
Sequential • Synthesis techniques
circuits

FSMD
design

VHDL

4/36
©
R.Lauwereins
Imec 2001
FSMD design

Digital
• FSMDs
design
• Models
Combina-  State-action table
torial
circuits  Algorithmic-state-machine chart
Sequential • Synthesis techniques
circuits

FSMD
design

VHDL

4/37
©
R.Lauwereins
Imec 2001
State-action table

Digital
• The specification of an FSMD could be
design
done using the traditional next state &
Combina- output table
torial
circuits • However, for large designs, this becomes
Sequential
not so practical
• Next slide shows the next state & output
circuits

FSMD table for the one counting application


design

VHDL

Data = Inport; OCnt = 0; Mask = 1


WHILE Data <> 0 DO
Temp = Data AND Mask
OCnt = OCnt + Temp; Data = Data >> 1
ENDWHILE
Outport = OCnt

4/38
©
R.Lauwereins
Imec 2001
State-action table

Digital
• Next state and output table
design

Present Next state Data Data path variables


Combina-
torial state (Start, Status) path
circuits output
00 01 10 11 Outport Data OCount Temp Mask
Sequential
circuits S0 S0 S0 S1 S1 Z X X X X
S1 S2 S2 S2 S2 Z Inport X X X
FSMD S2 S3 S3 S3 S3 Z Data 0 X X
design
S3 S4 S4 S4 S4 Z Data OCount X 1
VHDL S4 S5 S5 S5 S5 Z Data OCount Data Mask
AND
Mask
S5 S6 S6 S6 S6 Z Data OCount X Mask
+Temp
S6 S4 S7 S4 S7 Z Data >> OCount X Mask
1
S7 S0 S0 S0 S0 Ocount Data Ocount X X

4/39
©
R.Lauwereins
Imec 2001
State-action table

Digital
• The next state and output table do not
design
offer a good overview
Combina-  often the next state is only dependent on a few
torial
circuits
of the inputs
 often, the data path variables do not change
Sequential
circuits • Hence, the same information as in the
FSMD
next state and output table is presented
design in a more condensed form: the state
VHDL action table (See next slide)

4/40
©
R.Lauwereins
Imec 2001
State-action table

Digital
design Present Next state Control and data path
state actions
Combina- Condition State Condition Actions
torial
circuits S0 Start=0 S0 Output=Z
Start=1 S1
Sequential S1 S2 Data=Inport
circuits
S2 S3 Ocount=0
FSMD S3 S4 Mask=1
design S4 S5 Temp=Data
AND Mask
VHDL
S5 S6 Ocount=
Ocount+
Temp
S6 Data <> 0 S4 Data >> 1
Data = 0 S7
S7 S0 Output=
OCount

4/41
©
R.Lauwereins
Imec 2001
FSMD design

Digital
• FSMDs
design
• Models
Combina-  State-action table
torial
circuits  Algorithmic-state-machine chart
Sequential • Synthesis techniques
circuits

FSMD
design

VHDL

4/42
©
R.Lauwereins
Imec 2001
Algorithmic-state-machine chart

Digital
• An algorithmic-state-machine chart (ASM
design
chart) is an alternative visualization
Combina- method for the state action table
torial
circuits • It shows loops, conditions and next states
Sequential
in a way which is easier to understand for
circuits
a human being
FSMD • Each row in the state action table
design
translates to an ASM block
VHDL
• ASM blocks are constructed out of three
types of elements: state boxes, decision
boxes and condition boxes

4/43
©
R.Lauwereins
Imec 2001
Algorithmic-state-machine chart

Digital
design State name State encoding

Combina- Unconditional
torial
circuits State box variable
assignment
Sequential
circuits

FSMD
design
1 0
Decision box Condition
VHDL

Conditional
Condition box variable
assignment
4/44
©
R.Lauwereins
Imec 2001
Algorithmic-state-machine chart

Digital Example of an ASM block


design

Combina-
torial s0
circuits

Sequential Done = 0
circuits

FSMD
design
0 1
Start = 0
VHDL

Data = Inport

4/45
©
R.Lauwereins
Imec 2001
Algorithmic-state-machine chart

Digital
• An ASM block has to obey following rule:
design
 each input combination should lead to exactly
Combina-
one next state
• Example 1 of an invalid ASM block:
torial
circuits

Sequential s0
circuits When Cond2=1
there are two
FSMD next states
design

VHDL
1 0 0 1
Cond1 Cond2

s1 s2
4/46
©
R.Lauwereins
Imec 2001
Algorithmic-state-machine chart

Digital
• Example 2 of an invalid ASM block:
design

Combina-
torial
When Cond1=0
circuits s0
and Cond2=0
Sequential there is no
circuits next state
1 0
FSMD
design
Cond1

VHDL
0 1
Cond2

s1 s2

4/47
©
R.Lauwereins
Imec 2001
Algorithmic-state-machine chart

Digital
• An ASM chart representing a state-based
design
or Moore type FSMD has no condition
Combina- boxes, since all outputs only depend on
torial
circuits the state; all assignments to variables are
done in state boxes
Sequential
circuits
• An ASM chart representing an input-based
FSMD or Mealy type FSMD has state boxes as
design
well as condition boxes; variable
VHDL assignments that only depend on the
state are done within the state boxes;
variable assignments that depend on
input conditions are done in condition
boxes

4/48
©
R.Lauwereins
s0
Imec 2001
Algorithmic-
1 Start=1
0 state-machine
Digital
design
chart
Data=Inport
s1
Combina- OCount=0
torial
circuits
s2
Sequential State based (Moore)
circuits

0 DataLSB 1
FSMD
design

VHDL
s3 Ocount=Ocount+1

Data=Data>>1 s4

1 Data<>0 0

s5
4/49
Output=OCount
©
R.Lauwereins
s0
Imec 2001
Algorithmic-
1 Start=1
0 state-machine
Digital
design
chart
Data=Inport
s1
Combina- OCount=0
torial
circuits
s2
Sequential Input based (Mealy)
circuits

0 DataLSB 1
FSMD Only 4 states instead
design
of the 6 for a state
based approach
VHDL Ocount=Ocount+1

1 Data<>0 0

Data=Date>>1

s3
4/50
Output=OCount
©
R.Lauwereins
Imec 2001
FSMD design

Digital
• FSMDs
design
• Models
Combina-
torial • Synthesis techniques
circuits
 Basic principles
Sequential  Merging
circuits
Register sharing (variable merging)
FSMD
design Functional-unit sharing (operator
VHDL
merging)
Bus sharing (connection merging)
Register port sharing (register
merging)

4/51
©
R.Lauwereins
Imec 2001
FSMD design

Digital
• FSMDs
design
• Models
Combina-
torial • Synthesis techniques
circuits
 Basic principles
Sequential  Merging
circuits
Register sharing (variable merging)
FSMD
design Functional-unit sharing (operator
VHDL
merging)
Bus sharing (connection merging)
Register port sharing (register
merging)

4/52
©
R.Lauwereins
Imec 2001
Basic synthesis principles

Digital
• An FSMD represented by an action state
design
table or an ASM chart could be
Combina- implemented using the methodology we
torial
circuits used:
 every variable corresponds to a register
Sequential
circuits  every operation corresponds to a functional
unit
FSMD
design  every reading of a variable correponds to a
connection from register to functional unit
VHDL
 every writing of a variable corresponds to a
connection from a functional unit to a register
 every row of the state action table or every
ASM block of the ASM chart corresponds to a
state of the controller
• This method however leads to expensive
4/53
realisations
©
R.Lauwereins
Imec 2001
Basic synthesis principles
• Minimization requires two steps:
Digital
design  First, the controller can be minimized by
minimizing the number of states via combining
Combina-
torial
equivalent states
circuits choosing the best state encoding scheme
Sequential
selecting the appropriate flip-flop type
circuits minimizing the next state and output logic
 Second, the data path should be minimized according to the
FSMD
design
principles already mentioned:
When the life time of 2 variables is non-
VHDL
overlapping, both can be stored in the same
register: register sharing
When two operations are not executed
concurrently, they can be assigned to the same
functional unit: functional unit sharing
When two connections are not used concurrently,
they can be shared: connection sharing
When two registers are not concurrently read
from resp. writen to, they can be combined into a
single register file: register port sharing
4/54
©
R.Lauwereins
Imec 2001
Basic synthesis principles

Digital
• We are going to show the data path
design
minimizations using an approximation for
Combina- a square root calculation (SRA: Square
torial
circuits Root Approximation):
Sequential
circuits a 2  b 2  max  0.875 x  0.5 y  , x 
FSMD
design
with x  max a , b  and y  min  a , b 

VHDL This approximation could for example be used to


compute the power level on a QAM based
communication line, in order to detect the start of a
packet
used for CATV communication (cf. Telenet)
a is then the real part and b the imaginary part of
the signal

4/55
©
R.Lauwereins
Imec 2001
Basic synthesis principles

Digital
design a2  b2 
Combina-
a=In1
b=In2
max   0.875 x  0.5 y  , x 
with x  max  a , b 
torial
circuits
0
Sequential
Start Out=t7
and y  min  a , b 
circuits
1
t1=|a|
FSMD t7=max(t6,x)
design t2=|b|

VHDL x=max(t1,t2)
t6=t4+t5
y=min(t1,t2)

t3=x>>3
t5=x-t3
t4=y>>1

t3=0.125x t5=0.875x
t4=0.5y
4/56
©
R.Lauwereins
Imec 2001
Basic synthesis principles
Liveliness of variables:
a variable is alive in first
Digital
design
state following active
clock edge which assigns
a=In1
Combina- b=In2 its new value
torial and in all states between
circuits
0 this first state and the
Sequential
Start Out=t7 last state which uses it.
circuits
1
S1 S2 S3 S4 S5 S6 S7
t1=|a|
FSMD t7=max(t6,x) A X
design t2=|b| B X
T1 X
VHDL x=max(t1,t2) T2 X
t6=t4+t5
y=min(t1,t2) X X X X X
Y X
t3=x>>3 T3 X
t5=x-t3
t4=y>>1 T4 X X
T5 X
T6 X
T7 X
# 2 2 2 3 3 2 1
4/57
©
R.Lauwereins
Imec 2001
Basic synthesis principles
S1 S2 S3 S4 S5 S6 S7
Digital A X
design B X
T1 X
Combina- T2 X
torial
circuits X X X X X
Y X
Sequential T3 X
circuits T4 X X
T5 X
FSMD T6 X
design
T7 X
# 2 2 2 3 3 2 1
VHDL

• We see that at most 3 variables are life at the same


time
• We hence should try to map all variables to three
registers in such a way that their lifetimes do not overlap
• In a further section, the algorithm is presented to
accomplish this: register/memory sharing

4/58
©
R.Lauwereins
Imec 2001
Basic synthesis principles

Digital
design
Operation usage:
a=In1
Combina- b=In2
torial
circuits S1 S2 S3 S4 S5 S6 S7 #
0 abs 2 2
Start Out=t7 min 1 1
Sequential
circuits
1 max 1 1 2
>> 2 2
t1=|a|
FSMD t7=max(t6,x) - 1 1
design t2=|b|
+ 1 1
# 2 2 2 1 1 1
VHDL x=max(t1,t2)
t6=t4+t5
y=min(t1,t2)

t3=x>>3
t5=x-t3
t4=y>>1

4/59
©
R.Lauwereins
Imec 2001
Basic synthesis principles
S1 S2 S3 S4 S5 S6 S7 #
Digital abs 2 2
design
min 1 1
max 1 1 2
Combina-
torial >> 2 2
circuits - 1 1
+ 1 1
Sequential # 2 2 2 1 1 1
circuits

• The straightforward approach would allocate 2


FSMD
design abs, 1 min, 2 max, 2 shift, 1 subtractor and 1
adder components, i.e. 9 components
VHDL
• However, at most 2 are active at the same time
• We should hence try to merge multiple functions
into one component: e.g. the subtractor and adder
together
• In a further section, the algorithm is presented to
accomplish this: functional unit sharing

4/60
Basic synthesis
©
R.Lauwereins
Imec 2001 a=In1
b=In2 principles
Digital
design
0 Start Out=t7
1
Combina-
t1=|a|
torial t7=max(t6,x)
circuits t2=|b|

Sequential x=max(t1,t2)
circuits t6=t4+t5
y=min(t1,t2)
FSMD
design t3=x>>3 Connectivity table:
t5=x-t3
t4=y>>1
VHDL

a b t1 t2 x y t3 t4 t5 t6 t7
abs1 I O
abs2 I O
min I I O
max I I O/I I O
>>3 I O
>>1 I O
- I I O
4/61 + I I O
S1 S2 S3 S4 S5 S6 S7 #
Basic synthesis
©
R.Lauwereins
Imec 2001
abs 2 2
min
max
1
1 1
1
2 principles
>> 2 2
Digital - 1 1
design
+ 1 1
# 2 2 2 1 1 1
Combina-
torial a b t1 t2 x y t3 t4 t5 t6 t7
circuits abs1 I O
abs2 I O
Sequential min I I O
circuits
max I I O/I I O
>>3 I O
FSMD >>1 I O
design
- I I O
+ I I O
VHDL
• The straightforward approach would allocate 20
connections (11 register outputs and 9 FU outputs)
• In state S2, the largest number of connections is
needed: 4 inputs and 2 outputs.
• We should hence try to merge multiple connections into
one bus
• In a further section, the algorithm is presented to
accomplish this: connection merging
4/62
©
R.Lauwereins
Imec 2001
FSMD design

Digital
• FSMDs
design
• Models
Combina-
torial • Synthesis techniques
circuits
 Basic principles
Sequential  Merging
circuits
Register sharing (variable merging)
FSMD
design Functional-unit sharing (operator
VHDL
merging)
Bus sharing (connection merging)
Register port sharing (register
merging)

4/63
©
R.Lauwereins
Imec 2001
Register sharing

Digital
• Definition of the lifetime of a variable:
design  The set of states in which the variable is alive
Combina-
 starting at the state following the state in
torial which it is assigned a new value (write state)
circuits
 ending at every state in which its value is used
Sequential (read state)
circuits
 and all the states on each path between the
FSMD write state and a read state.
design
 Note that a variable may be written more than
VHDL once (multiple assignments)
 and that a single written value may be read
multiple times.
• After determining the lifetime of the variables, we
have to group variables with non-overlapping
lifetimes and assign each group to a single
variable. We should hence find the smallest
4/64
number of groups.
©
R.Lauwereins
Imec 2001
Determine variable Register sharing
lifetimes

Digital
design Sort by write state
& life length
Combina-
torial
circuits Allocate new Left-edge algorithm
register
Sequential
circuits

Assign to reg. all


FSMD
design non-overlapping
variables top down
VHDL

Remove all
assigned variables
from list

no yes
Empty?

4/65
©
R.Lauwereins
Imec 2001
Register sharing

Digital
Determine variable lifetimes
design
S1 S2 S3 S4 S5 S6 S7
Combina- A X
torial
B X
circuits
T1 X
T2 X
Sequential
circuits X X X X X
Y X
FSMD T3 X
design T4 X X
T5 X
VHDL T6 X
T7 X

4/66
©
R.Lauwereins
Imec 2001
Register sharing

Digital
Sort variables by write state and lifetime
design
S1 S2 S3 S4 S5 S6 S7
Combina- A X
torial B X
circuits
T1 X
T2 X
Sequential
circuits X X X X X
Y X
FSMD T3
T4 X X T4 has longer lifetime
design T4
T3 X X than T3
T5 X
VHDL T6 X
T7 X

4/67
©
R.Lauwereins
Imec 2001
Register sharing

Digital
Allocate new register and assign non-overlapping variables
design
S1 S2 S3 S4 S5 S6 S7
A X
Combina-
torial B X
circuits T1 X
T2 X
Sequential X X X X X
circuits
Y X
T4 X X
FSMD
design T3 X
T5 X
VHDL T6 X
T7 X

R1: A T1 X T7

R2: B T2 Y T4 T6

R3: T3 T5

4/68
©
R.Lauwereins
Imec 2001
Register sharing
In1 In2

Digital
design

Combina-
torial
circuits MUX MUX MUX

Sequential
circuits
R2: b,t2,y
R1: a,t1,x,t7 R3: t3,t5
FSMD t4,t6
design

VHDL

|a| |b| min max max + - >>1 >>3

Out
4/69
©
R.Lauwereins
Imec 2001
Register sharing

Digital
• The left-edge algorithm finds an assignment
design with the smallest number of registers
Combina- • There exist however multiple possible
torial
circuits variable-to-register assignments with the
smallest number of registers
Sequential
circuits • We hence can use a second cost criterion to
find the best assignment
FSMD
design  First criterion: smallest number of registers
 Second criterion: minimize the number of ports of
VHDL
the MUX and DEMUX circuits
preferably map two variables to the
same register that are the same (e.g.
left) input of the same functional unit
preferably map two variables to the
same register that are the same output
4/70
of the same FU
©
R.Lauwereins
Imec 2001
Register sharing

Digital
• Why does this register sharing reduces
design
the cost of MUX and DEMUX?
Combina-
torial
circuits
R1: t1 R2: t2
Sequential
circuits
MUX R1: t1,t2
FSMD
design

VHDL
FU FU

DEMUX R2: t3,t4

R3: t3 R4: t4

4/71
©
R.Lauwereins
Imec 2001
Register sharing

Digital
• We should hence determine which
design
variables are the same input of the same
Combina- functional unit and which variables are
torial
circuits the same output of the same FU
Sequential
• However, at this stage of the design,
circuits
before operator merging, each operator is
FSMD implemented in a different FU such that
design
no variables share the same input or
VHDL output

4/72
©
R.Lauwereins
Imec 2001
Register sharing
• Does this mean that we should do operator merging
Digital
design before register sharing?
 Register sharing: (1) minimize registers and (2) minimize size
Combina- of MUX/DEMUX
torial
circuits The latter is only known after operator merging
 Operator merging: merge operators where the combined cost
Sequential of MUX/DEMUX/CombinedFU is smaller than the cost of two
circuits
FUs

FSMD
The cost of the MUX/DEMUX is only known after
design register merging
 This deadlock situation is typical for all optimization steps in
VHDL hardware synthesis (and software compilation)!! Solution:
First optimize those things that give the largest
cost improvement; use quick-and-dirty
estimates for the next optimization steps
Next optimize the things with less cost
influence
Iterate till satisfied with outcome
4/73
©
R.Lauwereins
Imec 2001
Register sharing
• What gives the biggest cost influence: register
Digital
design sharing or operator merging
 In most cases, register sharing has a higher cost impact:
Combina-
torial
there are more variables than FUs
circuits
merging two registers in one does not increase
Sequential
the cost of the register; merging two different
circuits FUs in one makes this single FU more expensive
than each of the original FUs separately
FSMD
design it is easier to quickly estimate which operators
will be merged, than to see which variables will
VHDL be merged
 We hence mostly do register sharing first
For some applications (e.g. when they contain
only one type of FU) and some target platforms
(e.g. where the cost of a register is negligible
compared to the cost of an FU), we do operator
merging first
In an FPGA, a register at the FU output is free!
4/74
©
R.Lauwereins
Imec 2001
Register sharing

Digital
• We choose to do register sharing first
design
• We hence have to estimate operator
Combina-
torial
merging
circuits
S1 S2 S3 S4 S5 S6 S7 #
abs 2 2
Sequential
circuits min 1 1
max 1 1 2
FSMD >> 2 2
design - 1 1
+ 1 1
VHDL # 2 2 2 1 1 1

We assume that the 2 max-operators used in


different states, will be combined into one max-
operator
We assume that the subtraction and the addition
used in different states, will be combined into one
adder-subtractor
4/75
©
R.Lauwereins
Imec 2001
Register sharing

Digital
• Method for register sharing, combined
design
with MUX/DEMUX cost reduction:
Combina-  Build a compatibility graph
torial
circuits  Perform a max-cut graph partitioning

Sequential
circuits

FSMD
design

VHDL

4/76
©
R.Lauwereins
Imec 2001
Register sharing

Digital
• Build a compatibility graph
design
 Nodes are variables
Combina- Hint: sort the nodes graphically
torial
circuits according to the left-edge merging
Sequential
since this will already separate
circuits incompatible variables with
FSMD
overlapping lifetime
design  Incompatibility edges are drawn between two
variables with overlapping lifetime: they cannot
VHDL
be merged
 Priority edges are drawn between two variables
that are the same input of the same FU or the
same output of the same FU. A weight on this
edge indicates how many times the two
variables drive the same input of the same FU
plus how many times they are the same output
4/77 of the same FU.
©
R.Lauwereins
Imec 2001 Register sharing
a t1 x t7
Digital
design

Combina- b t2 y t4 t6
torial
circuits

Sequential
circuits t3 t5

FSMD
design

VHDL

Nodes are variables

Result of left-edge algorithm:


R1: a, t1, x, t7
R2: b, t2, y, t4, t6
R3: t3, t5
4/78
©
R.Lauwereins
Imec 2001 Register sharing
a t1 x t7
Digital
design

Combina- b t2 y t4 t6
torial
circuits

Sequential
circuits t3 t5

FSMD S1 S2 S3 S4 S5 S6 S7
design
A X Incompatibility edges:
B X variables with
VHDL
T1 X
overlapping lifetimes
T2 X
X X X X X
Y X
T4 X X
T3 X
T5 X
T6 X
T7 X
4/79
©
R.Lauwereins
Imec 2001 Register sharing
1 1
a t1 x t7
Digital
design

Combina- b t2 y t4 1 t6
torial
circuits

1
Sequential
t3 t5
circuits
1
FSMD x and t4 however have overlapping lifetimes:
design
no priority edge
VHDL
a
a b
b t1
t1 t2
t2
t2 xx
x yyy t3
t3
t3 t4
t4 t5
t5 t6
t4 t5 t6 t7
t6 t7t7
abs1 II OO Priority edges:
abs2
abs2 III O
O
O variables with
min
min III III O
O
O
O same input to
max
max III
I III O/I
O/I
O/I
O/I IIII
I O
OO
O O FU or same
>>3 II O
>>3 II O
O
O output from FU
>>1 I O
>>1
>>1 III O
O
O
- I I O
-- III III O
OO
+ I I O
++ III III O
OO
4/80
©
R.Lauwereins
Imec 2001
Register sharing

Digital
• Perform a max-cut graph partitioning
design
 Divide the graph in the minimum number of
Combina-
clusters of compatible nodes, such that the
torial total weight is maximized.
circuits
 Total weight is computed by summing all
Sequential weights of priority edges within a cluster (a
circuits
priority edge crossing cluster boundaries is not
FSMD counted)
design
• We are going to do this optimization
VHDL visually
• See course on optimization techniques for
max-cut graph partitioning optimization
algorithm

4/81
©
R.Lauwereins
Imec 2001 Register sharing
1 1
a t1 x t7
Digital
design

Combina- b t2 y t4 1 t6
torial
circuits

1
Sequential
t3 t5
circuits
1
FSMD
design

VHDL x, t3 and t4 are mutually incompatible: each should


be assigned to a different register

4/82
©
R.Lauwereins
Imec 2001 Register sharing
1 1
a t1 x t7 Cut=2
Digital
design

Combina- b t2 y t4 1 t6
torial
circuits

1
Sequential
t3 t5
circuits
1
FSMD
design

VHDL t1 and t7 may be assigned to the same register as x


since they are compatible and are connected by a
priority link with the highest weight in the graph, i.e. 1

4/83
©
R.Lauwereins
Imec 2001 Register sharing
1 1
a t1 x t7 Cut=5
Digital
design

Combina- b t2 y t4 1 t6
torial
circuits

1
Sequential
t3 t5
circuits
1
FSMD
design

VHDL t2, t5 and t6 may be assigned to the same register as t3


since they are compatible and are connected by a
priority link with the highest weight in the graph, i.e. 1

4/84
©
R.Lauwereins
Imec 2001 Register sharing
1 1
a t1 x t7 Cut=5
Digital
design

Combina- b t2 y t4 1 t6
torial
circuits

1
Sequential
t3 t5
circuits
1
FSMD
design

VHDL
The three other variables do not have priority edges
and can be assigned to any register as long as they
are compatible with all other variables assigned to
the same register
Result of max-cut algorithm: Result of left-edge algorithm:
R1: a, t1, x, t7 R1: a, t1, x, t7
R2: b, t2, t3, t5, t6 R2: b, t2, y, t4, t6
R3: y, t4 R3: t3, t5
4/85
©
R.Lauwereins
Imec 2001
Register sharing
In1 In2

Digital
design

Combina-
torial
circuits MUX MUX MUX

Sequential
circuits
R2: b,t2,t3
R1: a,t1,x,t7 R3: y,t4
FSMD t5,t6
design

VHDL

|a| |b| min max max + - >>1 >>3

Out
4/86
©
R.Lauwereins
Imec 2001
Register sharing

Digital
• Register cost computation
design  Cost of 1 bit register with CE and asynchronous preset
or clear
Combina-
torial 1/2 CLB
circuits
7 gates
Sequential 34 TOR
circuits
 Cost of 1-bit 2-to-1 MUX
FSMD
design
1/2 CLB
3 gates
VHDL
14 TOR
 Cost of 1-bit 4-to-1 MUX
1 CLB
5 gates
36 TOR
 In FPGA, register and MUX share CLB
4/87
 5-to-1 MUX is 4-to-1 MUX followed by 2-to-1 MUX
©
R.Lauwereins
Imec 2001
Register sharing

Digital
• Register cost computation for original
design
FSMD implementation (32-bit data path):
Combina-  11 registers of 32 bits
torial
circuits 11 reg * 32 bit/reg * 1/2 CLB/bit = 176
Sequential
CLB
circuits
11 reg * 32 bit/reg * 7 gates/bit =
FSMD 2464 gates
design
11 reg * 32 bit/reg * 34 TOR/bit =
VHDL
11968 TOR

4/88
©
R.Lauwereins
Imec 2001
Register sharing

Digital
• Register cost computation for current FSMD
design implementation:
Combina-
 1 register of 32 bits with 4-to-1 MUX
torial
circuits 1 CLB/MUXREGbit * 32 bit = 32 CLB
Sequential
(5 gates/MUXbit + 7 gates/REGbit) * 32
circuits bit = 384 gates
FSMD (36 TOR/MUXbit + 34 TOR/REGbit) * 32
design
bit = 2240 TOR
VHDL  1 register of 32 bits with 5-to-1 MUX
(1 CLB/4MUXbit + 1/2 CLB/2MUXREGbit)
* 32 bit = 48 CLB
(5 gates/4MUXbit + 3 gates/2MUXbit + 7
gates/REGbit) * 32 bit = 480 gates
(36 TOR/4MUXbit + 14 TOR/2MUXbit +
4/89
34 TOR/REGbit) * 32 bit = 2688 TOR
©
R.Lauwereins
Imec 2001
Register sharing

Digital
• Register cost computation for current
design
FSMD implementation:
Combina-  1 register of 32 bits with 2-to-1 MUX
torial
circuits 1/2 CLB/MUXREGbit * 32 bit = 16 CLB
Sequential (3 gates/MUXbit + 7 gates/REGbit) *
32 bit = 320 gates
circuits

FSMD
design
(14 TOR/MUXbit + 34 TOR/REGbit) *
32 bit = 1536 TOR
VHDL

4/90
©
R.Lauwereins
Imec 2001
Register sharing

Digital
CLB gates TOR Conn
design
Reg FU Tot Reg FU Tot Reg FU Tot
Combina-
Origi 176 2464 11968 20
torial nal
circuits Reg 96 1184 6464 12
share
Sequential FU
circuits
share
Bus
FSMD
design share
Port
VHDL share

Note that register sharing also reduced the number of


connections: all 4 minimization steps influence each
other.

We could have made estimates of this reduction of


connections and used this for guiding the register sharing
4/91
©
R.Lauwereins
Imec 2001
FSMD design

Digital
• FSMDs
design
• Models
Combina-
torial • Synthesis techniques
circuits
 Basic principles
Sequential  Merging
circuits
Register sharing (variable merging)
FSMD
design Functional-unit sharing (operator
VHDL
merging)
Bus sharing (connection merging)
Register port sharing (register
merging)

4/92
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Basic principle:
design
 Replace two FUs that are not used at the same
Combina-
time by a single FU with combined functionality
torial and by a MUX at each input and a DEMUX at
circuits
each output
Sequential  Do this only when MUX/CombinedFU/DEMUX is
circuits
cheaper than two FUs
FSMD
design a b c d a c b d

VHDL MUX MUX

FU1 FU2 FU1&2

DEMUX
x y x y
4/93
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• When register sharing did a correct guess
design
for FU sharing, the cost of the extra MUX
Combina- and DEMUX will be small since input and
torial
circuits output variables of both FUs will often be
assigned to the same register
Sequential
circuits
• Which units can be shared:
FSMD  identical units (cf. 2 MAX units)
design
 different units (cf. ADD and SUBTRACT)
VHDL

4/94
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Build a compatibility graph
design
 Nodes are operators
Combina-  Incompatibility edges are drawn between two
torial
circuits
operators that are used in the same state: they
cannot be merged
Sequential
circuits
 Priority edges are drawn between two (or a
group of n) operators that can be merged into
FSMD the same FU. A weight on this edge indicates
design
how large the cost saving is by merging the
VHDL two (or n) operators.

4/95
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
design

ABS MIN SUB >>3


Combina-
torial
circuits

Sequential
circuits
ABS MAX MAX ADD >>1
FSMD
design

VHDL
Nodes are operators

4/96
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
design

ABS MIN SUB >>3


Combina-
torial
circuits

Sequential
circuits
ABS MAX MAX ADD >>1
FSMD
design

VHDL S1 S2 S3 S4 S5 S6 S7 #
Incompatibility edge:
abs 2 2
two operators needed
min 1 1
max 1 1 2
in same state
>> 2 2
- 1 1
+ 1 1
# 2 2 2 1 1 1

4/97
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
design

ABS MIN SUB >>3


Combina-
torial
circuits

Sequential
circuits ?
ABS MAX MAX ADD >>1
FSMD
design

VHDL
Priority edge:
weight indicates saving by sharing

4/98
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model for the MAX
design
a b ai
bi
Combina-
torial ci
circuits
subtract Cost per bit:
Sequential
circuits
- 1 CLB
- 8 gates
FSMD MUX - 34 TOR
design Sign
max(a,b)
VHDL ci+1

Only carry logic, but 1/2 CLB/bit


for MSB where we need 3 gates/bit
the sum logic: 14 TOR/bit
1/2 CLB/bit
5 gates/bit
20 TOR/bit
4/99
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model for one FU  (MAX&MAX)
design
R1 R2 R1 R2
Cost:
Combina-
torial 2 CLB
circuits
R1=MAX(R1,R2)
& R1=MAX(R1,R2)
16 gate
68 TOR
Sequential R1 R1
circuits

FSMD
design

VHDL R1 R2
Cost: Savings:
1 CLB 1 CLB
R1=MAX(R1,R2)
8 gate 8 gate
34 TOR 34 TOR
R1

Note that this was only possible by mapping


corresponding operands and result to same register
4/100
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
design

ABS MIN SUB >>3


Combina-
torial
circuits

Sequential
circuits 1/8/34
ABS MAX MAX ADD >>1
FSMD
design
?

VHDL

4/101
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model for the ABS
design
a
Combina-
torial
Cost per bit:
circuits negator - 1/2 CLB (using carry chain)
- 6 gates
Sequential - 34 TOR
circuits
MUX
Sign: an-1
FSMD |a|
design
an-1 2 gates a1 a0
VHDL
(AND & XOR)
18 TOR
(6 + 12)
1
HA HA HA

an-1
MUX MUX MUX

4/102 |an-1| |a1| |a0|


©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model for one FU(ABS&MAX&MAX)
design
R2 R1 R2 R1 R2
Cost:
Combina-
torial 2.5 CLB
circuits
R2=ABS(R2)
& R1=MAX(R1,R2)
& R1=MAX(R1,R2)
22 gate
102 TOR
Sequential R2 R1 R1
circuits

FSMD
design

VHDL R1 R2

R2=ABS(R2)
Cost:
R1=MAX(R1,R2) ?
R1

4/103
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Structure of an ABS&MAX unit
design
R1 R2 MAX/ABS' R2n-1 Sn-1 F M10
0 0 0 R2 1x
Combina-
torial 0 0 1 R2 1x
circuits MAX/ABS’
0 1 0 S 01
0 1 1 S 01
Sequential 1 0 0 R1 00
circuits
1 0 1 R2 1x
1
FA 1 1 0 R1 00
FSMD
design 1 1 1 R2 1x

R1 S R2
VHDL Cost per bit:
M1 • 1/2 CLB (FA&INV) + 1/2 CLB
00 01 1x
M0 (AND) + 1 (MUX) = 2 CLB
• 5 gates (FA) + 1 (AND) + 1 (INV)
F + 4 (MUX) = 11 gates
R2 appears • 36 TOR (FA) + 6 (AND) + 2 (INV)
most in table: + 22 (MUX) = 66 TOR
most don’t
4/104
cares is best
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model for one FU(ABS&MAX&MAX)
design
R2 R1 R2 R1 R2
Cost:
Combina-
torial 2.5 CLB
circuits
R2=ABS(R2)
& R1=MAX(R1,R2)
& R1=MAX(R1,R2)
22 gate
102 TOR
Sequential R2 R1 R1
circuits

FSMD
design

VHDL R1 R2
Cost: Savings:
R2=ABS(R2)
2 CLB 0.5 CLB
R1=MAX(R1,R2) 11 gates 11 gate
66 TOR 36 TOR
R1

4/105
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
design
?
ABS MIN SUB >>3
Combina-
torial
circuits

Sequential
circuits 1/8/34
ABS MAX MAX ADD >>1
FSMD
design
0.5/11/36

VHDL

4/106
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model for the MIN
design
a b ai
bi
Combina-
torial ci
circuits
subtract
Sequential Cost per bit:
circuits - 1 CLB
MUX - 8 gates
FSMD
design Sign - 34 TOR
min(a,b)
VHDL ci+1

Only carry logic, but 1/2 CLB/bit


for MSB where we need 3 gates/bit
the sum logic: 14 TOR/bit
1/2 CLB/bit
5 gates/bit
20 TOR/bit
4/107
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model for one FU(ABS&MIN)
design
R1 R1 R2
Cost:
Combina-
torial 1.5 CLB
circuits
R1=ABS(R1)
& R3=MIN(R1,R2)
14 gate
68 TOR
Sequential R1 R3
circuits

FSMD
design

VHDL R1 R2

R1=ABS(R1)
Cost:
R3=MAX(R1,R2) ?
R1/R3

4/108
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Structure of an ABS&MIN unit
design
R1 R2 MIN/ABS' R1n-1 Sn-1 F M10
0 0 0 R1 1x
Combina-
torial 0 0 1 R1 1x
circuits MIN/ABS’ MIN/
MUX 0 1 0 S 01
ABS’ 0 1 1 S 01
Sequential 1 0 0 R2 00
circuits
1 0 1 R1 1x
1
FA 1 1 0 R2 00
FSMD
design 1 1 1 R1 1x

R1 S R2 Cost per bit:


VHDL
• 1/2 CLB (FA) + 1/2 CLB (AND)
M1 1x 01 00 + 1/2 CLB (MUX&INV)
M0 + 1 (MUX) = 2.5 CLB
F • 5 gates (FA) + 1 (AND) + 3 (MUX
&INV) + 4 (MUX) = 13 gates
• 36 TOR (FA) + 6 (AND) + 16 (MUX
&INV) + 22 (MUX) = 80 TOR

4/109
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model for one FU(ABS&MIN)
design
R1 R1 R2
Cost:
Combina-
torial 1.5 CLB
circuits
R1=ABS(R1)
& R3=MIN(R1,R2)
14 gate
68 TOR
Sequential R1 R3
circuits

FSMD
design

VHDL R1 R2
Cost: Savings:
R1=ABS(R1)
2.5 CLB -1 CLB
R3=MAX(R1,R2) 13 gates 1 gate
80 TOR -12 TOR
R1/R3

It does not seem


to be a good idea
4/110
to share ABS and MIN
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
design -1/1/
-12
ABS MIN SUB >>3
Combina-
torial
circuits
?
Sequential
circuits 1/8/34
ABS MAX MAX ADD >>1
FSMD
design
0.5/11/36

VHDL

4/111
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model for the ADD
design

Combina-
torial
Cost per bit:
circuits - 1/2 CLB
xi - 5 gates
yi
Sequential
ci - 36 TOR
circuits

FSMD
design

VHDL

si

ci+1

4/112
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model for the SUB
design

Combina-
torial
Cost per bit:
circuits - 1/2 CLB
- 6 gates
Sequential - 38 TOR
circuits

a3 b3 a2 b2 a1 b1 a0 b0
FSMD
design

VHDL

c4 c3 c2 c1 1
FA FA FA FA

f3 f2 f1 f0
4/113
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model for one FU(ADD&SUB)
design
R3 R2 R1 R2
Cost:
Combina-
torial 1 CLB
circuits
R2=ADD(R3,R2)
& R2=SUB(R1,R2)
11 gate
74 TOR
Sequential R2 R2
circuits

FSMD
design

VHDL R1 R2 R3

R2=ADD(R3,R2)
Cost:
R2=SUB(R1,R2) ?
R2

4/114
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Structure of an ADD&SUB unit
design A/
R1 R3 R2
S’
Combina-
torial
circuits A/S’
MUX

Sequential
circuits
It is not clear
A’/S whether MUX
FSMD FA fits in same
design CLB
S
VHDL
Cost per bit:
• 1/2 CLB (FAS&MUX)
• 6 gates (FAS) + 3 (MUX) =
13 gates
• 48 TOR (FAS) + 14 (MUX) =
62 TOR

4/115
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model for one FU(ADD&SUB)
design
R3 R2 R1 R2
Cost:
Combina-
torial 1 CLB
circuits
R2=ADD(R3,R2)
& R2=SUB(R1,R2)
11 gate
74 TOR
Sequential R2 R2
circuits

FSMD
design

VHDL R1 R2 R3
Cost: Savings:
R2=ADD(R3,R2)
1/2 CLB 0.5 CLB
R2=SUB(R1,R2) 9 gates 2 gate
62 TOR 12 TOR
R2

4/116
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
design -1/1/
-12
ABS MIN SUB >>3
Combina-
torial
circuits 0.5/
2/12
Sequential
circuits 1/8/34
ABS MAX MAX ADD >>1
FSMD
design
0.5/11/36
?
VHDL

4/117
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model for one FU(MAX&MAX&ADD)
design
R1 R2 R1 R2 R3 R2
Cost:
Combina-
torial 2.5 CLB
circuits
R1=MAX(R1,R2)
& R1=MAX(R1,R2)
& R2=ADD(R3,R2)
21 gate
104 TOR
Sequential R1 R1 R2
circuits

FSMD
design

VHDL R1 R2 R3

R1=MAX(R1,R2) Cost:
R1=MAX(R1,R2)
R2=ADD(R3,R2) ?
R1/R2

4/118
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Structure of an ADD&MAX unit
design
R1 R3 A/ R2 ADD/MAX' Sn-1 F M10

Combina-
M’ 0 0 R1 00
0 1 R2 01
torial A/M’ 1 0 S 1x
circuits MUX
1 1 S 1x
Sequential
circuits M1 = ADD/MAX’
1 It is not clear
FA M0 = Sn-1
FSMD whether MUX
design
fits in same
R1 S R2 CLB
VHDL Cost per bit:
M1 • 1/2 CLB (FAS&MUX)
00 1x 01
M0 + 1 (MUX) = 1.5 CLB
• 6 gates (FAS) + 3 (MUX)
F + 4 (MUX) = 13 gates
• 48 TOR (FAS) + 12 (MUX)
+ 22 (MUX) = 82 TOR

4/119
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model for one FU(MAX&MAX&ADD)
design
R1 R2 R1 R2 R3 R2
Cost:
Combina-
torial 2.5 CLB
circuits
R1=MAX(R1,R2)
& R1=MAX(R1,R2)
& R2=ADD(R3,R2)
21 gate
104 TOR
Sequential R1 R1 R2
circuits

FSMD
design

VHDL R1 R2 R3
Cost: Savings:
R1=MAX(R1,R2) 1.5 CLB 1 CLB
R1=MAX(R1,R2)
R2=ADD(R3,R2) 13 gates 8 gate
82 TOR 22 TOR
R1/R2

4/120
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
design -1/1/
-12
ABS MIN SUB >>3
Combina-
torial
circuits 0.5/
2/12
Sequential
circuits 1/8/34
ABS MAX MAX ADD >>1
FSMD
design
0.5/11/36
1/8/22
VHDL ?

4/121
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model FU(ABS&MAX&MAX&ADD)
design
R2 R1 R2 R1 R2 R3 R2
Cost:
Combina-
torial 3 CLB
circuits
R2=ABS(R2)
& R1=MAX(R1,R2)
& R1=MAX(R1,R2)
& R2=ADD(R3,R2)
27 gate
138 TOR
Sequential R2 R1 R1 R2
circuits

FSMD
design

VHDL R1 R2 R3

R2=ABS(R2) Cost:
R1=MAX(R1,R2)
R2=ADD(R3,R2) ?
R1/R2

4/122
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Structure of an ABS&MAX&ADD unit
design
R1 R3 A/ R2
M’
Else/ABS’ 0
Combina-
torial
circuits MUX

ADD/MAX’
Sequential
circuits

FSMD FA
design

R1 S R2
VHDL Cost per bit:
M1 • 1/2 CLB (FAS) + 1/2 CLB (MUX)
00 1x 01
M0 + 1 (MUX) = 2 CLB
• 6 gates (FAS) + 3 (MUX)
F + 4 (MUX) = 13 gates
• 48 TOR (FAS) + 16 (MUX)
+ 22 (MUX) = 86 TOR

4/123
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model FU(ABS&MAX&MAX&ADD)
design
R2 R1 R2 R1 R2 R3 R2
Cost:
Combina-
torial 3 CLB
circuits
R2=ABS(R2)
& R1=MAX(R1,R2)
& R1=MAX(R1,R2)
& R2=ADD(R3,R2)
27 gate
138 TOR
Sequential R2 R1 R1 R2
circuits

FSMD
design

VHDL R1 R2 R3
Cost: Savings:
R2=ABS(R2) 2 CLB 1 CLB
R1=MAX(R1,R2)
R2=ADD(R3,R2) 13 gates 14 gate
86 TOR 52 TOR
R1/R2

4/124
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
design -1/1/
-12
ABS MIN SUB >>3
Combina-
torial
circuits ? 0.5/
2/12
Sequential
circuits 1/8/34
ABS MAX MAX ADD >>1
FSMD
design
0.5/11/36
1/8/22
VHDL 1/14/52

4/125
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• FU(ABS&MAX&MAX&ADD&SUB)
design
R2 R1 R2 R1 R2 R3 R2
Cost:
Combina-
torial 3.5 CLB
circuits
R2=ABS(R2)
& R1=MAX(R1,R2)
& R1=MAX(R1,R2)
& R2=ADD(R3,R2)
33 gate
176 TOR
Sequential R2 R1 R1 R2
circuits
R1 R2
FSMD
design

& R2=SUB(R1,R2)

VHDL
R2
R1 R2 R3

R2=ABS(R2)
R1=MAX(R1,R2)
R2=ADD(R3,R2)
Cost:
R2=SUB(R1,R2) ?
R1/R2
4/126
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Structure of an ABS&MAX&ADD&SUB unit
design
R1 R3 R2
Combina-
torial
0
circuits MUX

Sequential
circuits

FSMD FA
design

R1 S R2
VHDL Cost per bit:
M1 • 1/2 CLB (FAS) + 1/2 CLB (MUX)
00 1x 01
M0 + 1 (MUX) = 2 CLB
• 6 gates (FAS) + 3 (MUX)
F + 4 (MUX) = 13 gates
• 48 TOR (FAS) + 16 (MUX)
+ 22 (MUX) = 86 TOR

4/127
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• FU(ABS&MAX&MAX&ADD&SUB)
design
R2 R1 R2 R1 R2 R3 R2
Cost:
Combina-
torial 3.5 CLB
circuits
R2=ABS(R2)
& R1=MAX(R1,R2)
& R1=MAX(R1,R2)
& R2=ADD(R3,R2)
33 gate
176 TOR
Sequential R2 R1 R1 R2
circuits
R1 R2
FSMD
design

& R2=SUB(R1,R2)

VHDL
R2
R1 R2 R3

R2=ABS(R2)
Cost: Savings:
R1=MAX(R1,R2)
R2=ADD(R3,R2)
2 CLB 1.5 CLB
R2=SUB(R1,R2) 13 gates 20 gate
86 TOR 90 TOR
R1/R2
4/128
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
design -1/1/
-12 ?
ABS MIN SUB >>3
Combina-
torial
circuits 1.5/20/90 0.5/
2/12
Sequential
circuits 1/8/34
ABS MAX MAX ADD >>1
FSMD
design
0.5/11/36
1/8/22
VHDL 1/14/52

4/129
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• FU(MIN&SUB)
design
R1 R2 R1 R2
Cost:
Combina-
torial 1.5 CLB
circuits
R3=MIN(R1,R2)
& R2=SUB(R1,R2)
14 gate
72 TOR
Sequential R3 R2
circuits

FSMD
design

VHDL

R1 R2

R3=MIN(R1,R2)
R2=SUB(R1,R2)
Cost:
?
R2/R3
4/130
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Structure of a MIN&SUB unit
design
R1 R2
Combina-
torial
circuits

Sequential
circuits
1
FSMD FA
design

R1 S R2
VHDL Cost per bit:
M1 • 1/2 CLB (FA&INV)
00 01 1x
M0 + 1 (MUX) = 1.5 CLB
• 5 gates (FA) + 1 (INV)
F + 4 (MUX) = 10 gates
• 36 TOR (FA) + 2 (INV)
+ 22 (MUX) = 60 TOR

4/131
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• FU(MIN&SUB)
design
R1 R2 R1 R2
Cost:
Combina-
torial 1.5 CLB
circuits
R3=MIN(R1,R2)
& R2=SUB(R1,R2)
14 gate
72 TOR
Sequential R3 R2
circuits

FSMD
design

VHDL

R1 R2

Cost: Savings:
R3=MIN(R1,R2)
R2=SUB(R1,R2)
1.5 CLB 0 CLB
10 gates 4 gate
60 TOR 12 TOR
R2/R3
4/132
©
R.Lauwereins
Imec 2001
Functional-unit sharing
?
Digital
design -1/1/
-12 0/4/12
ABS MIN SUB >>3
Combina-
torial
circuits 1.5/20/90 0.5/
2/12
Sequential
circuits 1/8/34
ABS MAX MAX ADD >>1
FSMD
design
0.5/11/36
1/8/22
VHDL 1/14/52

4/133
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model for one FU(ABS&MIN&SUB)
design
R1 R1 R2 R1 R2
Cost:
Combina-
torial 2 CLB
circuits
R1=ABS(R1)
& R3=MIN(R1,R2)
& R2=SUB(R1,R2)
20 gate
106 TOR
Sequential R1 R3 R2
circuits

FSMD
design

VHDL R1 R2

R1=ABS(R1) Cost:
R3=MAX(R1,R2)
R2=SUB(R1,R2) ?
R1/R2/R3

4/134
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Structure of an ABS&MIN&SUB unit
design
R1 R2
Combina-
torial
circuits MUX

Sequential
circuits
1
FSMD FA
design

R1 S R2
VHDL Cost per bit:
M1 • 1/2 CLB (FA) + 1/2 (AND) + 1/2
00 01 1x
M0 (MUX&INV) + 1 (MUX) = 2.5 CLB
• 5 gates (FA) + 1 (AND) + 3 (MUX
F &INV) + 4 (MUX) = 13 gates
• 36 TOR (FA) + 6 (AND) + 16 (MUX
&INV) + 22 (MUX) = 80 TOR

4/135
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost model for one FU(ABS&MIN&SUB)
design
R1 R1 R2 R1 R2
Cost:
Combina-
torial 2 CLB
circuits
R1=ABS(R1)
& R3=MIN(R1,R2)
& R2=SUB(R1,R2)
20 gate
106 TOR
Sequential R1 R3 R2
circuits

FSMD
design

VHDL R1 R2
Cost: Savings:
R1=ABS(R1) 2.5 CLB -0.5 CLB
R3=MAX(R1,R2)
R2=SUB(R1,R2) 13 gates 7 gate
80 TOR 26 TOR
R1/R2/R3

4/136
©
R.Lauwereins
Imec 2001
Functional-unit sharing
-0.5/7/26
Digital
design -1/1/
-12 0/4/12
ABS MIN SUB >>3
Combina-
torial
circuits 1.5/20/90 0.5/
2/12
Sequential
circuits 1/8/34
ABS MAX MAX ADD >>1
FSMD
design
0.5/11/36
1/8/22
VHDL 1/14/52

Is it useful to share the SHIFTs with other FUs?

4/137
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Cost models for the FUs: SHIFT
design

Combina- Cost per bit:


torial
circuits
- 0 CLB
- 0 gates
Sequential - 0 TOR
circuits

FSMD
design >>1 Since the SHIFTs do not cost
anything, cost can only increase
VHDL by combining them with other
operators

>>3

4/138
©
R.Lauwereins
Imec 2001
Functional-unit sharing
-0.5/7/26
Digital
design -1/1/
-12 0/4/12
ABS MIN SUB >>3
Combina-
torial
circuits 1.5/20/90 0.5/
2/12
Sequential
circuits 1/8/34
ABS MAX MAX ADD >>1
FSMD
design
0.5/11/36
1/8/22
VHDL 1/14/52

This is our compatibility graph; although there are


still other sharings possible, I assume they won’t
yield better cost
Note that max-cut graph partitioning is not well suited
when the saving of sharing 3 nodes is not the sum of
the savings of the 3 couples of 2 nodes.
4/139
©
R.Lauwereins
Imec 2001
Functional-unit sharing
Cost minimization for FPGA
Digital
design -0.5

Combina- -1
torial 0
circuits
ABS MIN SUB >>3

Sequential 1.5
0.5
circuits

FSMD 1
design
ABS MAX MAX ADD >>1

0.5
VHDL
1
1

Possibility 1: (ABS), (MIN), (ABS&MAX&MAX&ADD&SUB),


(>>3), (>>1): saves 1.5 CLBs, costs 3.5 CLBs

4/140
©
R.Lauwereins
Imec 2001
Functional-unit sharing
Cost minimization for FPGA
Digital
design -0.5

Combina- -1
torial 0
circuits
ABS MIN SUB >>3

Sequential 1.5
0.5
circuits

FSMD 1
design
ABS MAX MAX ADD >>1

0.5
VHDL
1
1

Possibility 1: (ABS), (MIN), (ABS&MAX&MAX&ADD&SUB),


(>>3), (>>1): saves 1.5 CLBs, costs 3.5 CLBs
Possibility 2: (ABS), (MIN&SUB&ADD), (ABS), (MAX&MAX),
(>>3), (>>1): saves 1.5 CLBs, costs 3.5 CLBs
4/141 Poss. 2 requires 1 FU more ( more connections)
©
R.Lauwereins
Imec 2001
Functional-unit sharing
Cost minimization for gate arrays
Digital
design 7

Combina- 1
torial 4
circuits
ABS MIN SUB >>3

Sequential 20
2
circuits

FSMD 8
design
ABS MAX MAX ADD >>1

11
VHDL
8
14

Possibility 1: (ABS&MIN), (ABS&MAX&MAX&ADD&SUB),


(>>3), (>>1): saves 21 gates, costs 26 gates

4/142
©
R.Lauwereins
Imec 2001
Functional-unit sharing
Cost minimization for gate arrays
Digital
design 7

Combina- 1
torial 4
circuits
ABS MIN SUB >>3

Sequential 20
2
circuits

FSMD 8
design
ABS MAX MAX ADD >>1

11
VHDL
8
14

Possibility 1: (ABS&MIN), (ABS&MAX&MAX&ADD&SUB),


(>>3), (>>1): saves 21 gates, costs 26 gates
Possibility 2: (ABS&MIN&SUB), (ABS&MAX&MAX&ADD),
(>>3), (>>1): saves 21 gates, costs 26 gates
4/143
©
R.Lauwereins
Imec 2001
Functional-unit sharing
Cost minimization for CMOS ASICs
Digital
design 26

Combina- -12
torial 12
circuits
ABS MIN SUB >>3

Sequential 90
12
circuits

FSMD 34
design
ABS MAX MAX ADD >>1

36
VHDL
22
52

Possibility 1: (ABS), (MIN), (ABS&MAX&MAX&ADD&SUB),


(>>3), (>>1): saves 90 TOR, costs 154 TOR

4/144
©
R.Lauwereins
Imec 2001
Functional-unit sharing
We select solution 1 for FPGA
Digital
design -0.5

Combina- -1
torial 0
circuits
ABS MIN SUB >>3

Sequential 1.5
0.5
circuits

FSMD 1
design
ABS MAX MAX ADD >>1

0.5
VHDL
1
1

FU1: ABS (1/2 CLB/bit)


FU2: MIN (1 CLB/bit)
FU3: ABS, MAX, MAX, ADD, SUB (2 CLB/bit)
FU4: >>3 (0 CLB/bit)
FU5: >>1 (0 CLB/bit)
4/145
©
R.Lauwereins
Imec 2001
Functional-unit sharing
In1 In2

Digital
design

Combina-
torial
circuits MUX MUX MUX

Sequential
circuits
R2: b,t2,t3
R1: a,t1,x,t7 R3: y,t4
FSMD t5,t6
design

VHDL

MUX

FU1 FU2 FU3 FU4 FU5

Out
4/146
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Note that functional-unit sharing reduced the
design number of ports of the register MUXes; we
guided register sharing already with this in mind
Combina-
torial • We should hence recalculate register cost
circuits
 Cost of 1-bit 3-to-1 MUX
Sequential 1 CLB
circuits
4 gates
FSMD 28 TOR
design
 Cost of 1-bit 2-to-1 MUX
VHDL 1/2 CLB
3 gates
14 TOR
 Cost of 1-bit register
1/2 CLB
7 gates
34 TOR
4/147
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
• Register cost computation for current FSMD
design
implementation:
Combina-
 2 registers of 32 bits with 3-to-1 MUX; each
torial register costs:
circuits
1 CLB/MUXREGbit * 32 bit = 32 CLB
Sequential
circuits (4 gates/MUXbit + 7 gates/REGbit) * 32
bit = 352 gates
FSMD
design
(28 TOR/MUXbit + 34 TOR/REGbit) * 32
VHDL bit = 1984 TOR
 1 register of 32 bits with 2-to-1 MUX
0.5 CLB/MUXREGbit * 32 bit = 16 CLB
(3 gates/MUXbit + 7 gates/REGbit) * 32
bit = 320 gates
(14 TOR/MUXbit + 34 TOR/REGbit) * 32
4/148
bit = 1536 TOR
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
CLB gates TOR Conn
design
Reg FU Tot Reg FU Tot Reg FU Tot
Combina-
Origi 176 160 336 2464 1408 3872 11968 7616 19584 20
torial nal
circuits Reg 96 160 256 1184 1408 2592 6464 7616 14080 12
share
Sequential FU 80 112 192 1024 832 1856 5504 4864 10368 8
circuits
share
Bus
FSMD
design share
Port
VHDL share

Note that functional unit sharing also reduced the number


of registers as well as connections: all 4 minimization
steps influence each other.

We could have made estimates of the reduction of


connections and used this for guiding the FU sharing
4/149
©
R.Lauwereins
Imec 2001
FSMD design

Digital
• FSMDs
design
• Models
Combina-
torial • Synthesis techniques
circuits
 Basic principles
Sequential  Merging
circuits
Register sharing (variable merging)
FSMD
design Functional-unit sharing (operator
VHDL
merging)
Bus sharing (connection merging)
Register port sharing (register
merging)

4/150
©
R.Lauwereins
Imec 2001
Bus sharing

Digital
• Basic principle:
design
 Replace two connections that are not used at
Combina-
the same time by a single connection
torial
circuits
 This reduces wiring, which in today’s circuits
became the predominant cost
Sequential  at the cost of requiring tri-state drivers each
circuits
time two different sources drive the same bus
FSMD  but also saving MUXes each time two different
design
connections driving the same destination are
VHDL replaced by a single bus

R1 R2 R1 R2

MUX FU1

FU1
4/151
©
R.Lauwereins
Imec 2001
Bus sharing

Digital
• Since wiring cost is so high for buses, we
design
search for the absolute minimum number
Combina- of buses, without looking at the increased
torial
circuits cost for drivers
Sequential
• When several solutions lead to the same
circuits
number of buses, we choose that
FSMD combination that has the minimum
design
number of tri-state drivers at the sources
VHDL and MUXes at the destinations

4/152
©
R.Lauwereins
Imec 2001
Bus sharing

Digital
• Build a compatibility graph for the
design
connections from registers to functional
Combina- units and a second compatibility graph for
torial
circuits the connections from functional units to
registers
Sequential
circuits  Nodes are connections
 Incompatibility edges are drawn between two
FSMD
design connections that are used in the same state
and have different sources
VHDL
 Priority edges are drawn between two
connections that have the same source (saves
on tri-state drivers) or the same destination
(saves on input MUXes)

4/153
©
R.Lauwereins
Imec 2001
Bus sharing
In1 In2

Digital
design

Combina-
torial
circuits MUX MUX MUX

Sequential
circuits
R2: b,t2,t3
R1: a,t1,x,t7 R3: y,t4
FSMD t5,t6
design

VHDL

A B C D E FG H I
MUX

FU1 FU2 FU3 FU4 FU5

Out
4/154 Name all input connections for the FUs
©
R.Lauwereins
Imec 2001
Bus sharing

Digital
• Build the compatibility graph: nodes are
design
connections
Combina-
torial
circuits A
Sequential I B
circuits

FSMD
design
H C
VHDL

G D

F E

4/155
©
R.Lauwereins
Imec 2001
Bus sharing

Digital
• In which state is each connection used?
design
From which source and to which
Combina- destination do they go?
torial
circuits
S0 S1 S2 S3 S4 S5 S6 S7
Sequential
circuits
A R1Out
B R1FU1
FSMD C R1FU21
design
D R2FU22
E R1FU31
VHDL
F R3FU31
G R2FU32
H R1FU4
I R3FU5

4/156
©
R.Lauwereins
Imec 2001
Bus sharing

Digital
design
R1=In1
a=In1 R1: a,t1,x,t7
Combina- R2=In2
b=In2 R2: b,t2,t3,t5,t6
torial
circuits R3: y,t4
0
Start Out=R1
Out=t7
Sequential
circuits
1 FU1: ABS
FSMD
R1=F1(R1)
t1=|a|
R1=F3(R1,R2)
t7=max(t6,x)
FU2: MIN
design R2=F3(R2)
t2=|b| FU3: ABS, MAX,MAX,
ADD, SUB
VHDL R1=F3(R1,R2)
x=max(t1,t2) FU4: >>3
R2=F3(R3,R2)
t6=t4+t5
R3=F2(R1,R2)
y=min(t1,t2) FU5: >>1

R2=F4(R1)
t3=x>>3
R2=F3(R1,R2)
t5=x-t3
R3=F5(R3)
t4=y>>1

Rewrite taking into account register and FU sharing


4/157
©
R.Lauwereins
B-G S0 S1 S2 S3 S4 S5 S6 S7
Imec 2001 C-D A R1Out X
C-G B R1FU1 X
D-E C R1FU21 X
X
E-G D R2FU22 X
X
Digital
design H-I X X X
E R1FU31 X X X
F-G
F R3FU31 X
Combina-
torial G R2FU32 X X X X X
circuits H R1FU4 X
R1=In1 I R3FU5 X
X
Sequential
circuits R2=In2

0
Bus sharing
FSMD
design Start Out=R1
1
VHDL
R1=F1(R1) Incompatible connections
R1=F3(R1,R2)
R2=F3(R2) are those that are used
in the same state and
R1=F3(R1,R2) come from a different
R2=F3(R3,R2)
R3=F2(R1,R2) register

R2=F4(R1)
R2=F3(R1,R2)
R3=F5(R3)
4/158
©
R.Lauwereins
Imec 2001
Bus sharing
Incompatibility edges: B-G
Digital
design C-D
C-G
Combina- D-E
torial E-G
circuits A H-I
F-G
Sequential I B
circuits

FSMD
design
H C
VHDL

G D

F E

4/159
©
R.Lauwereins
Imec 2001
Bus sharing
Priority edges:
same source or
Digital
same destination A R1Out
R1Out
design
B
B R1FU1
R1FU1
R1FU1
C
C R1FU21
R1FU21
R1FU21
Combina-
torial D
D R2FU22
R2FU22
R2FU22
circuits A E
E R1FU31
R1FU31
R1FU31
F R3FU31
R3FU31
Sequential I B F R3FU31
circuits G R2FU32
R2FU32
G R2FU32
H R1FU4
H R1FU4
R1FU4
FSMD I R3FU5
design I R3FU5
R3FU5
H C
VHDL

G D

F E

4/160
©
R.Lauwereins
Imec 2001
Bus sharing
Bus 1: A, B, C, E, F, H
Digital
design Bus 2: D, G, I

Combina-
torial
circuits A
Sequential I B
circuits

FSMD
design
H C
VHDL

G D

F E

4/161
©
R.Lauwereins
Imec 2001
Bus sharing
In1 In2

Digital
design

A B C D E F G H
Combina-
torial
circuits MUX MUX MUX

Sequential
circuits
R2: b,t2,t3
R1: a,t1,x,t7 R3: y,t4
FSMD t5,t6
design

VHDL

FU1 FU2 FU3 FU4 FU5

Out
4/162 Name all input connections for the registers
©
R.Lauwereins
Imec 2001
Bus sharing

Digital
• Build the compatibility graph: nodes are
design
connections
Combina-
torial
circuits A
Sequential
circuits H B

FSMD
design

VHDL G C

F D

E
4/163
©
R.Lauwereins
Imec 2001
Bus sharing

Digital
• In which state is each connection used?
design
From which source and to which
Combina- destination do they go?
torial
circuits
S0 S1 S2 S3 S4 S5 S6 S7
Sequential A In1R1
circuits
B FU1R1
FSMD
C FU3R1
design D In2R2
E FU3R2
VHDL F FU4R2
G FU2R3
H FU5R3

4/164
© S0
S0 S1
S1 S2
S2 S3
S3 S4
S4 S5
S5 S6
S6 S7
S7
R.Lauwereins A-D
A
A In1R1
In1R1 X
X
Imec 2001
B-E
B
B FU1R1
FU1R1 X
C-G
C
C FU3R1
FU3R1 X
X X
F-H
D
D In2R2
In2R2 X
X
Digital
design E
E FU3R2
FU3R2 X
X X X
F FU4R2
F FU4R2 X
Combina- G FU2R3
torial G FU2R3 X
circuits H FU5R3
H FU5R3 X

Sequential
R1=In1
circuits R2=In2

0
Bus sharing
FSMD
design Start Out=R1
1
VHDL
R1=F1(R1) Incompatible connections
R1=F3(R1,R2)
R2=F3(R2) are those that are used
in the same state and
R1=F3(R1,R2) come from a different
R2=F3(R3,R2)
R3=F2(R1,R2) functional unit

R2=F4(R1)
R2=F3(R1,R2)
R3=F5(R3)
4/165
©
R.Lauwereins
Imec 2001
Bus sharing

Digital
• Incompatibility edges:
design
A-D
B-E
Combina-
torial C-G
circuits A F-H

Sequential
circuits H B

FSMD
design

VHDL G C

F D

E
4/166
©
R.Lauwereins
Imec 2001
Bus sharing

Digital
• Priority edges: A In1R1
In1R1
design
B FU1R1
FU1R1
C
C FU3R1
FU3R1
FU3R1
Combina-
torial D
D In2R2
In2R2
In2R2
circuits A E
E FU3R2
FU3R2
FU3R2
F
F FU4R2
FU4R2
FU4R2
Sequential
circuits H B G
G
FU2R3
FU2R3
FU2R3
H FU5R3
FU5R3
H FU5R3
FSMD
design

VHDL G C

F D

E
4/167
©
R.Lauwereins
Imec 2001
Bus sharing

Digital Bus 1: A, B, C, H
design
Bus 2: D, E, F, G
Combina-
torial
circuits A
Sequential
circuits H B

FSMD
design

VHDL G C

F D

E
4/168
©
R.Lauwereins
Imec 2001
Bus sharing
In1 In2

Digital
design

Combina-
torial
circuits MUX MUX MUX

Sequential
circuits
R2: b,t2,t3
R1: a,t1,x,t7 R3: y,t4
FSMD t5,t6
design

VHDL

FU1 FU2 FU3 FU4 FU5

Out
4/169
©
R.Lauwereins
Imec 2001
Bus sharing

Digital
• Cost calculation
design
 Register cost
Combina- Before bus sharing: 2 3-to-1 MUXes
torial
circuits and 1 2-to-1 MUX
Sequential After bus sharing: 3 2-to-1 MUXes
and 4 tri-state drivers
circuits

FSMD  Functional Unit cost


design
Before bus sharing: 1 2-to-1 MUX
VHDL
After bus sharing: 6 tri-state drivers

4/170
©
R.Lauwereins
Imec 2001
Bus sharing

Digital
• Cost of a tri-state driver
design
 FPGA
Combina- each CLB has a tri-state driver to a
torial
circuits horizontal long line
Sequential cost is hence included in the CLB
circuits
long lines are scarce: highest priority
FSMD
design
is reducing the number of
connections
VHDL

4/171
©
R.Lauwereins
Imec 2001
Bus sharing

Digital
• Cost of a tri-state driver
design
 Gate array & CMOS
Combina-
torial
circuits
Vcc

Sequential
circuits
E F is driven high when
FSMD I E=1 and I =1
design F

VHDL
E F is driven low when
I E=1 and I =0

Vss

4 gates, 12 TOR
4/172
©
R.Lauwereins
Imec 2001
Bus sharing

Digital
• Recalculation of register cost
design  Cost of tri-state driver

Combina-
0 CLB
torial
circuits
4 gates
12 TOR
Sequential
circuits
 Cost of 1-bit 2-to-1 MUX
1/2 CLB
FSMD
design 3 gates
14 TOR
VHDL
 Cost of 1-bit register
1/2 CLB
7 gates
34 TOR
• Recalculation of functional unit cost
 One 2-to-1 MUX less
4/173
 6 tri-state drivers more
©
R.Lauwereins
Imec 2001
Bus sharing

Digital
• Register cost computation for current FSMD
design implementation:
Combina-
 3 registers of 32 bits with 2-to-1 MUX; each
torial register costs:
circuits
0.5 CLB/MUXREGbit * 32 bit = 16 CLB
Sequential
circuits (3 gates/MUXbit + 7 gates/REGbit) * 32
bit = 320 gates
FSMD
design (14 TOR/MUXbit + 34 TOR/REGbit) * 32
VHDL
bit = 1536 TOR
 4 tri-state drivers of 32 bits; each tri-state driver
costs:
0 CLB/TRIStatebit * 32 bit = 0 CLB
4 gates/TRIStatebit * 32 bit = 128 gates
12 TOR/TRIStatebit * 32 bit = 384 TOR

4/174
©
R.Lauwereins
Imec 2001
Functional-unit sharing

Digital
CLB gates TOR Conn
design
Reg FU Tot Reg FU Tot Reg FU Tot
Combina-
Origi 176 160 336 2464 1408 3872 11968 7616 19584 20
torial nal
circuits Reg 96 160 256 1184 1408 2592 6464 7616 14080 12
share
Sequential FU 80 112 192 1024 832 1856 5504 4864 10368 8
circuits
share
Bus 48 96 144 1472 1504 2976 6144 6720 12864 4
FSMD
design share
Port
VHDL share

Note that bus sharing also influenced the cost


of registers as well as FUs: all 4 minimization
steps influence each other.

We could have made estimates of this influence and used


this for guiding the register and FU sharing
4/175
©
R.Lauwereins
Imec 2001
FSMD design

Digital
• FSMDs
design
• Models
Combina-
torial • Synthesis techniques
circuits
 Basic principles
Sequential  Merging
circuits
Register sharing (variable merging)
FSMD
design Functional-unit sharing (operator
VHDL
merging)
Bus sharing (connection merging)
Register port sharing (register
merging)

4/176
©
R.Lauwereins
Imec 2001
Register port sharing

Digital
• Basic principle:
design
 Combine several registers into one register file
Combina-
to reduce the number of read ports (less input
torial MUXes) and the number of write ports (less tri-
circuits
state drivers
Sequential
circuits
• Methodology: build the Register Access
Table, indicating reads and writes to
FSMD
design registers in each state
VHDL

4/177
©
R.Lauwereins
Imec 2001
Register port sharing
S0 S1 S2 S3 S4 S5 S6 S7
A R1Out X
Reuse
Digital
design RegFU
B R1FU1 X
table
C R1FU21 X
Combina- used for
torial D R2FU22 X
circuits
connection
E R1FU31 X X X
merging
F R3FU31 X
Sequential
circuits G R2FU32 X X X X X
H R1FU4 X
FSMD
I R3FU5 X
design

VHDL

S0 S1 S2 S3 S4 S5 S6 S7
R1 R R R R R R
R2 R R R R R
R3 R R

4/178
©
R.Lauwereins
Imec 2001
Register port sharing
S0 S1 S2 S3 S4 S5 S6 S7
A In1R1 X
Reuse
Digital
design FUReg
B FU1R1 X
table
C FU3R1 X X
Combina- used for
torial D In2R2 X
circuits
connection
E FU3R2 X X X merging
F FU4R2 X
Sequential
circuits G FU2R3 X
H FU5R3 X
FSMD
design

VHDL

S0 S1 S2 S3 S4 S5 S6 S7
R1 W R W R W R R R W R
R2 W R W R W R W R W R
R3 W R W R

4/179
©
R.Lauwereins
Imec 2001
Register port sharing
S0 S1 S2 S3 S4 S5 S6 S7
Digital
design R1 W R W R W R R R W R
R2 W R W R W R W R W R
Combina- R3 W R W R
torial
circuits
• When implemented as three registers, we
Sequential
circuits
need 3 write ports and 3 read ports
FSMD
• In next slides, we do an exhaustive
design search (i.e. we enumerate all possibilities
VHDL
and compute their cost) for merging 2 or
more registers in 1 register file
• For large designs, we would need an
optimization technique

4/180
©
R.Lauwereins
Imec 2001
Register port sharing
S0 S1 S2 S3 S4 S5 S6 S7
Digital
design R1 W R W R W R R R W R
R2 W R W R W R W R W R
Combina- R3 W R W R
torial
circuits
• How many ports are needed for a register file
Sequential sharing 2 registers?
circuits
 Combine R1 and R2
FSMD 2 read ports (S1, S2, S4, S6)
design
2 write ports (S0, S1)
VHDL
 Combine R1 and R3
2 read ports (S3)
2 write ports (S2)
 Combine R2 and R3
2 read ports (S5)
2 write ports (S3)
4/181  No saving is obtained
©
R.Lauwereins
Imec 2001
Register port sharing
S0 S1 S2 S3 S4 S5 S6 S7
Digital
design R1 W R W R W R R R W R
R2 W R W R W R W R W R
Combina- R3 W R W R
torial
circuits
• How many ports are needed for a register
Sequential
circuits
file sharing 3 registers?
 Combine R1, R2 and R3
FSMD
design 2 read ports (S1, S2, S3, S4, S5, S6)
VHDL 2 write ports (S0, S1, S2, S3)
 We save 2 ports

4/182
©
R.Lauwereins
Imec 2001
Register port sharing
In1 In2
Digital
design

Combina-
torial
circuits R1: a,t1,x,t7

Sequential
R2: b,t2,t3
circuits t5,t6

FSMD
R3: y,t4
design

VHDL

FU1 FU2 FU3 FU4 FU5

Out
4/183
©
R.Lauwereins
Imec 2001
Register port sharing

Digital
• Recalculation of register cost
design
 Before register port sharing: 3 2-to-1 MUXes
Combina-
and 4 tri-state drivers
torial
circuits
 After register port sharing: 4 tri-state drivers
 Saving:
Sequential
circuits 0 CLB (the small MUXes fitted in the
same CLB as the register bits)
FSMD
design 3 gates/MUXbit * 32 bit = 96 gates
VHDL
14 TOR/MUXbit * 32 bit = 448 TOR

4/184
©
R.Lauwereins
Imec 2001
Register port sharing

Digital
CLB gates TOR Conn
design
Reg FU Tot Reg FU Tot Reg FU Tot
Combina-
Origi 176 160 336 2464 1408 3872 11968 7616 19584 20
torial nal
circuits Reg 96 160 256 1184 1408 2592 6464 7616 14080 12
share
Sequential FU 80 112 192 1024 832 1856 5504 4864 10368 8
circuits
share
Bus 48 96 144 1472 1504 2976 6144 6720 12864 4
FSMD
design share
Port 48 96 144 1376 1504 2880 5696 6720 12416 4
VHDL share

4/185

You might also like