You are on page 1of 42

CHAPTER 2

Custom single-purpose
processors
Outline
Introduction

Combinational logic

Sequential logic

Custom single-purpose processor design

RT-level custom single-purpose

processor design
Introduction
• Processor
– Digital circuit that performs a computation tasks

– Controller and data path.

– General-purpose: variety of computation tasks.

– Single-purpose: one particular computation task.

– Custom single-purpose: non-standard task.

• A custom single-purpose processor may be

– Fast, small, low power

– But, high NRE, longer time-to-market, less flexible.


CMOS transistor on silicon
Transistor

The basic electrical component in digital systems

Acts as an on/off switch

Voltage at “gate” controls whether current flows

from source to drain


Don’t confuse this “gate” with a logic gate. source

gate Conducts
if gate=1
drain

gate
IC package IC oxide
source channel drain
Silicon substrate
CMOS transistor implementations
Complementary Metal Oxide Semiconductor

We refer to logic levels source source

gate Conducts gate Conducts


Typically 0 is 0V, 1 is 5V if gate=1 if gate=0
drain
drain
pMOS
Two basic CMOS types
nMOS

nMOS conducts if gate=1

pMOS conducts if gate=0 1 1


1 x y x

Hence “complementary” x F = x' x


F = (xy)' y
F=
y x y(x+y)'
Basic gates 0
0
0
NOR
NAND gate gate
Inverter, NAND, NOR inverter
Basic logic gates

x x x x
F x F F
x y F y F x y F F x y F
0 0 y 0 0 0 0 0 0 y 0 0 0
1 1 0 1 0 0 1 1 0 1 1
F=x F=xy 1 0 0 F=x+ 1 0 1 F=x 1 0 1
1 1 1 1 1 1 1 1 0
Drive AND y y
r OR XOR

x F x x y x x y x x y
x F
F
F F F F F
0 1 y 0 0 1 y 0 0 1 y 0 0 1
1 0 0 1 1 0 1 0 0 1 0
F = x’ F = (x 1 0 1 F= 1 0 0 F=x y 1 0 0
Inverte y)’ 1 1 0 (x+y)’ 1 1 0 XNOR 1 1 1
r NAND NOR
Combinational logic design

A) Problem description B) Truth table C) Output equations

y is 1 if a is to 1, or b and c are 1. z is 1 if b Inputs Outputs y = a'bc + ab'c' + ab'c + abc' + abc


or c is to 1, but not both, or if all are 1. a b c y z
0 0 0 0 0
0 0 1 0 1 z = a'b'c + a'bc' + ab'c + abc' + abc
0 1 0 0 1
0 1 1 1 0
1 0 0 1 0
1 0 1 1 1
D) Minimized output equations 1 1 0 1 1
y bc 1 1 1 1 1 E) Logic Gates
a 00 01 11 10
0 0 0 1 0
a y
1 1 1 1 1 b
c
y = a + bc
z
bc
a 00 01 11 10
0 0 1 0 1
z
1 0 1 1 1

z = ab + b’c + bc’
RT-Level Combinational Components

A multiplexor, sometimes called a selector, allows

only one of its data inputs Im to pass through to the


output O.
A decoder converts its binary input I into a one-hot

output O. A common feature on a decoder is an extra


input called enable. When enable is 0, all outputs are
0. When enable is 1, the decoder functions as before.
RT-Level Combinational Components

An adder adds two n-bit binary inputs A and B,

generating an n-bit output sum along with an output


carry.
A comparator compares two n-bit binary inputs A

and B, generating outputs that indicate whether A is


less than, equal to, or greater than B.
An ALU (arithmetic-logic unit) can perform a

variety of arithmetic and logic functions on its n-bit


inputs A and B.
Sequential logic design
A sequential circuit is a digital circuit whose

outputs are a function of the current as well as


previous input values.
One of the most basic sequential circuits is the

flip-flop.
The simplest type of flip-flop is the D flip-flop. It

has two inputs: D and clock.


 When clock is 1, the value of D is stored in the

flip-flop, and that value appears at an output Q.


Sequential logic design
The SR flip-flop, which has three inputs: S, R and

clock.
When clock is 0, the previously stored bit is

maintained and appears at output Q.


When clock is 1, the inputs S and R are examined. If

S is 1, a 1 is stored. If R is 1, a 0 is stored.
If both are 0, there’s no change. If both are 1,

behavior is undefined. Thus, S stands for set, and R


for reset
Sequential logic design
JK flip-flop, which is the same as an SR flip-flop

except that when both J and K are 1, the stored bit


toggles from 1 to 0 or 0 to 1.
To prevent unexpected behavior from signal glitches,

flip-flops are typically designed to be edge


triggered.
They only pay attention to their non-clock inputs

when the clock is rising from 0 to 1, or alternatively


when the clock is falling from 1 to 0.
RT-Level Sequential Components
 A register stores n bits from its n-bit data input I, with those

stored bits appearing at its output O.

 A register usually has at least two control inputs, clock and

load.

 For a rising-edge-triggered register, the inputs I are only stored

when load is 1 and clock is rising from 0 to 1.

 The clock input is usually drawn as a small triangle, as shown

in the figure. Another common register control input is clear,

which resets all bits to 0,regardless of the value of I.


RT-Level Sequential Components
 Because all n bits of the register can be stored in

parallel, we often refer to this type of register as a


parallel-load register.
A shift register stores n bits, but these bits cannot

be stored in parallel. Instead, they must be shifted


into the register serially, meaning one bit per clock
edge.
A shift register has a one-bit data input I, and at least

two control inputs clock and shift.


RT-Level Sequential Components

When clock is rising and shift is 1, the value

of I is stored in the (n)’th bit, while the (n)’th


bit is stored in the (n-1)’th bit, and likewise,
until the second bit is stored in the first bit.
The first bit is typically shifted out, meaning

it appears over an output Q.


RT-Level Sequential Components
A counter is a register that can also

increment (add binary 1) to its stored binary


value.
A counter has a clear input, which resets all

stored bits to 0, and a count input, which


enables incrementing on the clock edge.
 A counter often also has a parallel load data

input and associated control signal.


RT-Level Sequential Components

A common counter feature is both up and down counting

(incrementing and decrementing), requiring an additional

control input to indicate the count direction.

The control inputs discussed above can be either synchronous

or asynchronous. A synchronous input’s value only has an

effect during a clock edge.

An asynchronous input’s value affects the circuit independent

of the clock. Typically, clear control lines are asynchronous.


RT-Level Sequential Components
Sequential logic design
A) Problem Description C) Implementation Model D) State Table (Moore-type)
You want to construct a clock
divider. Slow down your pre- x
existing clock so that you output a Combinational logic Inputs Outputs
a 1 for every four clock cycles I1 Q1 Q0 a I1 I0 x
I0 0 0 0 0 0
0
0 0 1 0 1
0 1 0 0 1 0
Q1 Q0 0 1 1 1 0
1 0 0 1 0 0
B) State Diagram 1 0 1 1 1
State register
1 1 0 1 1
x=0 x=1 a=0 1
a=0 1 1 1 0 0
I1 I0
0 a=1 3

a=1 a=1

1 2
a=1 • Given this implementation model
a=0 x=0 x=0 a=0
– Sequential logic design quickly reduces to

combinational logic design


Sequential logic design
E) Minimized Output Equations F) Combinational Logic
I1 Q1Q0
a 00 01 11 10
a
0 0 0 1 1
I1 = Q1’Q0a + Q1a’ + x
1 Q1Q0’
0 1 0 1

Q1Q0
01 11 10 I1
a
I0 00
0 0 1 1 0 I0 = Q0a’ + Q0’a

1 1 0 0 1

x Q1Q0 I0
a
00 01 11 10
0 0 0 1 0 x = Q1Q0
Q1 Q0
1 0 0 1 0
Custom single-purpose processor
design
We can apply the above combinational and

sequential logic design techniques to build data


path components and controllers.
we need to build a custom single-purpose

processor for a given program, since a processor


consists of a controller and a data path.
Data path stores and manipulates a systems

data.
It contains register units, functional units, and

connection units like wires and multiplexors.


A controller sets the data path control units like

register load and multiplexor select signals of the


register units, functional units and connection units
to obtain desired configuration at a particular time.
It monitors external control inputs as well as data

path control outputs known as status signals,


coming from functional units and sets external
control outputs.
external … …
external
control data
inputs inputs
controller datapath
… …
datapath
control next-state registers
inputs and
controller datapath
control
logic

datapath
control
outputs state functional
… … register units

external external
control outputs data
outputs

… …

controller and datapath a view inside the controller and datapath


Example: greatest common divisor
!1
(a) black-box 1:
(c) state
view
• First create algorithm diagram
1 !(!go_i)
2:
go_i x_i y_i !go_i

• Convert algorithm to GCD


2-J:

3: x = x_i
d_o
“complex” state machine 4: y = y_i

– Known as FSMD: finite- (b) desired


functionality
5: !(x!=y)
0: int x, y; x!=y

state machine with data 1: while (1) {


2: while (!go_i);
6:
x<y !(x<y)
3: x = x_i;
path. 4: y = y_i; 7: y = y -x 8: x = x - y
5: while (x != y) {
6-J:
– Can use templates to 6:
7:
if (x < y)
y = y - x;
else 5-J:
perform such conversion. 8: x = x - y; 9: d_o = x
}
9: d_o = x; 1-J:
}
State diagram templates
Branch statement

if (c1)
Assignment statement Loop statement c1 stmts
a=b else if c2
while
next c2 stmts
(cond) {
statement else
loop-body-
other stmts
next statement
statements
}
a=b next
!cond C:
C: statement
c1 !c1*c2 !c1*!c2
next cond
statemen
t loop-
body- c1 stmts c2 others
statement stmts
s
J:
J:

next
statement
next
statement
Creating the data path
• Create a register for any declared variable.

• Create a functional unit for each arithmetic


operation.
• Connect the ports, registers and functional units.
– Based on reads and writes

– Use multiplexors for multiple sources

• Create unique identifier.

– for each data path component control input and


output
!1
1:

1 !(!go_i) x_i y_i


2:
Datapath
!go_i

2-J: x_sel
n-bit 2x1 n-bit 2x1
3: x = x_i
y_sel

x_ld
4: y = y_i 0: x 0: y

y_ld
!(x!=y)
5:

x!=y
6: != < subtractor subtractor
x<y !(x<y)
5: x!=y 6: x<y 8: x-y 7: y-x
7: y = y -x 8: x=x-y
x_neq_y

6-J:
x_lt_y 9: d

d_ld
5-J:

9: d_o = x d_o

1-J:
Creating the controller’s FSM

1:
!1
Controller
go_i
• Same structure as FSMD
!1
1 !(!go_i) 0000 1:
2:
0001 2:
1 !(!go_i) • Replace complex
!go_i
!go_i
2-J:
0010 2-J: actions/conditions with
3: x = x_i x_sel = 0
0011 3: x_ld = 1

4: y = y_i
data path configurations.
y_sel = 0 x_i y_i
0100 4: y_ld = 1
!(x!=y) Datapath
5: !x_neq_y
0101 5:
x!=y x_sel
n-bit 2x1 n-bit 2x1
x_neq_y
6: 0110 6: y_sel
x<y !(x<y) x_lt_y !x_lt_y x_ld
0: x 0: y
7: y = y -x 8: x = x - y 7: y_sel = 1 8: x_sel =1
y_ld
y_ld = 1 x_ld = 1

6-J: 0111 1000


1001 6-J:
!= < subtractor subtractor
5-J: 1010 5-J: 5: x!=y 6: x<y 8: x-y 7: y-x
x_neq_
9: d_o = x 1011 9: d_ld = 1 y
x_lt_y 9: d
1-J: 1100 1-J: d_ld

d_
o
Splitting into a controller and data path
go_i

Controller implementation model Controller !1


0000 1: x_i y_i
go_i
x_sel 1 !(!go_i) (b) Datapath
Combinational y_sel 0001 2:
logic !go_i x_sel
x_ld n-bit 2x1 n-bit 2x1
y_ld 0010 2-J: y_sel
x_neq_y x_sel = 0 x_ld
0011 3: x_ld = 1 0: x 0: y
x_lt_y y_ld
d_ld
y_sel = 0
0100 4: y_ld = 1
!= < subtractor subtractor
x_neq_y=0 5: x!=y 6: x<y 8: x-y 7: y-x
0101 5: x_neq_
Q3 Q2 Q1 Q0 x_neq_y=1 y
0110 6: x_lt_y 9: d
State register d_ld
x_lt_y=1 x_lt_y=0
I3 I2 I1 I0
7: y_sel = 1 8: x_sel =1 d_
y_ld = 1 x_ld = 1 o
0111 1000
1001 6-J:

1010 5-J:

1011 9: d_ld = 1

1100 1-J:
Completing the GCD custom single-
purpose processor design
… …
• We finished the data path
controller datapath
• We have a state table for
next-state registers

the next state and control


and
control
logic

logic
state functional
– All that’s left is combinational register units

logic design

• This is not an optimized …

a view inside the controller and datapath


design, but we see the

basic steps
Controller state table for the GCD Example
Completing the GCD custom
single-purpose processor design
… …
We finished the data path
controller datapath
We have a state table for the
next-state registers

next state and control logic and


control
logic
All that’s left is
state functional
combinational logic design register units

This is not an optimized


… …
design, but we see the basic
a view inside the controller and datapath
steps

33
RT-level custom single-purpose processor
design
We often start with a state

Problem Specification
Sende Bridge Rece
machine r rdy_in A single-purpose processor that
converts two 4-bit inputs, arriving
rdy_out iver

clock one at a time over data_in along with


 Rather than algorithm a rdy_in pulse, into one 8-bit output
on data_out along with a rdy_out
data_in(4) pulse. data_out(8)
 Cycle timing often too central to

functionality
rdy_in=0 Bridge rdy_in=1

Example WaitFirst4
rdy_in=1
RecFirst4Start RecFirst4End
data_lo=data_in
 Bus bridge that converts 4-bit
rdy_in=0 rdy_in=0 rdy_in=1
bus to 8-bit bus rdy_in=1
WaitSecond4 RecSecond4Start RecSecond4End
FSMD

 Start with FSMD data_hi=data_in

rdy_in=0
 Known as register-transfer (RT) Send8Start
Inputs
rdy_in: bit; data_in: bit[4];
data_out=data_hi Send8End Outputs
level & data_lo
rdy_out=1
rdy_out=0 rdy_out: bit; data_out:bit[8]
Variables
data_lo, data_hi: bit[4];
 Exercise: complete the design

34
RT-level custom single-purpose
processor design (cont’)
Bridge
(a) Controller
rdy_in=0 rdy_in=1
rdy_in=1
WaitFirst4 RecFirst4Start RecFirst4End
data_lo_ld=1
rdy_in=0 rdy_in=0 rdy_in=1
rdy_in=1
WaitSecond4 RecSecond4Start RecSecond4End
data_hi_ld=1

Send8Start Send8End
data_out_ld=1 rdy_out=0
rdy_out=1

rdy_in rdy_ou
t
clk
data_in(4) data_out

data_lo_ld
data_out_ld
data_hi_ld
registers

data_hi data_lo
to all

data_out
(b) Datapath

35
Optimizing single-purpose
processors

Optimization is the task of making design

metric values the best possible


Optimization opportunities
original program

FSMD

Data path

FSM

36
Optimizing the original program

Analyze program attributes and look for areas

of possible improvement
number of computations

size of variable

time and space complexity

operations used

 multiplication and division very expensive

37
Optimizing the original
program
original program optimized program
0: int x, y; 0: int x, y, r;
1: while (1) { 1: while (1) {
2: while (!go_i); 2: while (!go_i);
3: x = x_i; // x must be the larger
4: y = y_i; number
5: while (x != y) { 3: if (x_i >= y_i) {
replace the subtraction
6: if (x < y) 4: x=x_i;
operation(s) with modulo
7: y = y - x; 5: y=y_i;
operation in order to speed
else }
up program
8: x = x - y; 6: else {
} 7: x=y_i;
9: d_o = x; 8: y=x_i;
} }
9: while (y != 0) {
10: r = x % y;
11: x = y;
12: y = r;
}
13: d_o = x;
}
GCD(42, 8) - 9 iterations to complete the loop GCD(42,8) - 3 iterations to complete the loop
x and y values evaluated as follows : (42, 8), (43, x and y values evaluated as follows: (42, 8),
8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2). (8,2), (2,0)

38
Optimizing the FSMD
Areas of possible improvements

merge states

states with constants on transitions can be eliminated,


transition taken is already known
states with independent operations can be merged
separate states

states which require complex operations (a*b*c*d) can


be broken into smaller states to reduce hardware size
scheduling
39
Optimizing the FSMD (cont.)
int x, y; !1 optimized FSMD
original FSMD
1:
int x, y;
1 !(!go_i) eliminate state 1 – transitions have constant values
2: 2:
!go_i go_i !go_i

2-J: x = x_i
3: y = y_i
merge state 2 and state 2J – no loop operation in
3: x = x_i between them
5:

4: y = y_i x<y x>y


merge state 3 and state 4 – assignment operations
are independent of one another 7: y = y -x 8: x = x - y
5: !(x!=y)

x!=y
9: d_o = x
6: merge state 5 and state 6 – transitions from state 6
x<y !(x<y) can be done in state 5
y = y -x 8: x = x - y
7:
eliminate state 5J and 6J – transitions from each state
6-J: can be done from state 7 and state 8, respectively

5-J:
eliminate state 1-J – transition from state 1-J can be
d_o = x done directly from state 9
9:

1-J:

40
Optimizing the data path

Sharing of functional units


one-to-one mapping, as done previously, is not

necessary
if same operation occurs in different states, they can

share a single functional unit


Multi-functional units
ALUs support a variety of operations, it can be

shared among operations occurring in different states


41
Summary

Custom single-purpose processors


Straightforward design techniques

Can be built to execute algorithms

Typically start with FSMD

CAD tools can be of great assistance

42

You might also like