Lecture 04 Control Units

Control Unit :
Hardwired vs. Microprogrammed

Approach
Dr Shankar Balachandran
Indian Institute of Technology Madras
shankar@cse.iitm.ernet.in
14 October 2006
Two Major Blocks in a CPU
Datapath
Adders,
multipliers, dividers
Shifters, Registers
Anything that changes or stores data
Control Unit
Controls
the data
How data is stored?
Where is it stored?
When should data be available?
Control Unit
Correct sequencing of control signals
Much like human brain controlling various
parts of body
Sequence and timing is the key
Any
aberration will result in wrong operation
A Simplified Control Unit

Fetch
Fetch Unit
Decode
Control Unit
Decode Unit
Execute
Execution Unit
Write Back
Write Back Unit
A Possible Implementation
Mod-3
Counter
CLK
2 to 4
Decoder
Timing Diagram
CLK
Fetch
Decode
Execute
Write Back
Lets Sample The Signals
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
Another Way to Generate Signals

1000
0100
0010
0001
Hardwired vs Microprogrammed
Hardwired
Use
gates to generate signals

Squeeze out the juice for performance
Different logic styles possible
Microprogrammed
Store
the control signals in the sequence

Just read from the memory every clock cycle
A Model Computer
(Richard Eckert, SIGCSE Bulletin, Vol. 20, No. 3, September 1988)
IP
LP
EP
LM
12
8
PC
Accumulator
12
12
12
MAR
ALU
8
R
W
Register B
12
MDR
IR
4
Bus
S
A
EU
12
12
RAM
12
LD
ED
LA
EA
Control
LB
LI
EI
More Details
L = Load
E = Copy to bus
A,S = Add and Subtract
Sign bit to control unit
IP = Increment PC
IP
LP
EP
LM
R
W
LD
ED
ACC
PC
LA
EA
MAR
ALU
A
EU
LB
IR
LI
EI
RAM
MDR
Bus
Control
Mnemonic Opcode
LDA
Register Transfers
Active
Controls
A(Mem)
1. MAR IR
2. MDR M(MAR)
3. A MDR
EI,LM
R
ED,LA
EI,LM
EA,LD
W
Action
Load
Accumulator
STA
Store
Accumulator
(Mem) A
1. MAR IR
2.MDR A
3. M(MAR) MDR
ADD
A A+B
1. AALU(Add)
A,EU,LA
SUB
A A-B
1. AALU(Sub)
S,EU,LA
MBA
B A
1. BA
EA,LB
JMP
PC Mem
1. PCIR
EI,LP
JN
PC Mem
If ve flag
is set
1. PCIR if NF is set
NF : EI,LP
HLT
8-15
Stop Clock
1. MAR PC
2. MDR M(MAR)
3. IR MDR
EP,LM
R
ED,LI,IP
Fetch
IR Next
Instruction
Hardwired Unit
CLK
IR
Ring Counter
T5
Opcode
T1
LDA
STA
ADD
Decoder
SUB
MBA
JMP
Control
Matrix
JN
Halt
NF
Control Signals
Table with Sequencing

IP
LP
Fetch T2
EP LM R
T0
LD
ED LI
T0
T1
T2
LDA
T3
T4
T5
STA
T3
T5
EI
LA
T3
T5
EA A
EU LB
T2
T4
T3
T4
MBA
T3
ADD
T3
SUB
T3
JMP
T3
T3
JN
T3
*F
T3
*F
IP = T2;
LP = T3*JMP+T3*JN*NF;
EP = T0;
LM = T0+T3*LDA+T3*STA
R=T1+T4*LDA;
W=T5* STA;
LD = T4*STA;
ED=T2+T5*LDA;
T3
T3
T3
T3
LI=T2;
A = T3*ADD;
S = T3*SUB;
..
T3
Control Matrix
Implement using discrete gates
Usually done using PLAs
Large control matrices are implemented
hierarchically
For
speed
A well known process and design flows

are widespread
An Alternate Implementation
4-bit
opcode
IR
MAP
Starting
Address
Generator
CD
&
1*
NF
01
00
CLK
Map
CD
Meaning
From IR
Unconditional
Branch within
Microprogram
NF=0 =>
Increment
NF=1 =>
Conditional Branch
uPC
+1
32 x 24
Control ROM
Jump Address
Control
Store
Microinstruction
Register
HLT
Control
Instruction Op-Code
Fetch
LDA
STA
Control Store
uInstruction
Address
Control Signals
CD
00
0011000000000000
01
01
0000100000000000
02
02
1000000110000000
XX
03
0001000001000000
04
04
0000100000000000
05
05
0000000100100000
00
06
0001000001000000
07
07
0000001000010000
08
08
0000010000000000
00
MAP HLT Addr. Of Next
ADD
09
0000000000101010
00
SUB
0A
0000000000100110
00
MBA
0B
0000000000010001
00
JMP
0C
0100000001000000
00
JN
0D
0000000000000000
0F
0E
0000000000000000
00
0F
0100000001000000
00
Expansion
8-E
10-1E
Control Word
I
L E
Example 1 MBA followed by ADD
P P P
Fetch
LDA
STA
L
M
L
D
E
D
L
I
E
I
L
A
E
A
E
U
L
B
00
0011000000000000
01
01
0000100000000000
02
02
1000000110000000
XX 0B
09
03
0001000001000000
04
04
0000100000000000
05
05
0000000100100000
00
06
0001000001000000
07
07
0000001000010000
08
08
0000010000000000
00
ADD
09
0000000000101010
00
SUB
0A
0000000000100110
00
MBA
0B
0000000000010001
00
JMP
0C
0100000001000000
00
JN
0D
0000000000000000
0F
0E
0000000000000000
00
0F
0100000001000000
00
Expansion
8-E
10-1E
Sequence for MBA,ADD

MOV B,A
ADD
1. MAR PC
2. MDR M(MAR)
3. IR MDR
BA
1. MAR PC
2. MDR M(MAR)
3. IR MDR
AALU(Add)
0011000000000000
0000100000000000
1000000110000000
0000000000010001
0011000000000000
0000100000000000
1000000110000000
0000000000101010
I
P
L
P
E
P
L
M
L
D
E
D
L
I
E
I
L
A
E
A
E
U
L
B
Example 2 JN with
Flag Set
CD
Fetch
LDA
STA
00
0011000000000000
01
01
0000100000000000
02
02
1000000110000000
XX
03
0001000001000000
04
04
0000100000000000
05
05
0000000100100000
00
06
0001000001000000
07
07
0000001000010000
08
08
0000010000000000
00
ADD
09
0000000000101010
00
SUB
0A
0000000000100110
00
MBA
0B
0000000000010001
00
JMP
0C
0100000001000000
00
JN
0D
0000000000000000
0F
0E
0000000000000000
00
0F
0100000001000000
00
Expansion
8-E
10-1E
0D
If negative FLAG is set, jump to a new location by skipping to uInstruction at 0F
I
P
L
P
E
P
L
M
L
D
E
D
L
I
E
I
L
A
E
A
E
U
L
B
Example 3 JN with
Flag Not Set
CD
Fetch
LDA
STA
00
0011000000000000
01
01
0000100000000000
02
02
1000000110000000
XX
03
0001000001000000
04
04
0000100000000000
05
05
0000000100100000
00
06
0001000001000000
07
07
0000001000010000
08
08
0000010000000000
00
ADD
09
0000000000101010
00
SUB
0A
0000000000100110
00
MBA
0B
0000000000010001
00
JMP
0C
0100000001000000
00
JN
0D
0000000000000000
0F
0E
0000000000000000
00
0F
0100000001000000
00
Expansion
8-E
10-1E
0D
Lets Review the

Microprogramming Model
Store the microprogram in control store
Fetch the instruction
Get the set of control signals from the
control word
Move the microinstruction address
Lather, Rinse, Repeat
What is Microcode?
Michael Slater's "Microprocessor Based Design" (pg.42):

Microcode tells the processor every detailed step
required to execute each machine language instruction.
Microcode is thus at an even more detailed level than
machine language, and in fact defines the machine
language. In a standard microprocessor, the microcode
is stored in a ROM or a programmable logic array (PLA)
that is part of the microprocessor chip and cannot be
modified by the user.'
Thought Experiment
Why is the design a little clumsy?
What can we do about it?
Reason for Clumsiness

JN Conditional Flag check
Without any condition check, the whole
process is very smooth
Solution Avoid all conditional checks
Real Life
A little American Football Story
Theory vs. Practice
In
theory, there is no difference between

theory and practice
In practice, theory and practice are two
different things altogether
Live with condition checks

Keep
designs as clean as possible
A General Approach
IR
Starting
and Branch
Address
Generator
External Inputs
Conditional Codes
uPC
Control
Store
Control Word
Format of Microinstructions
Pick yours
Your
choice is as best as your neighbors
What we did :
One
bit position per control signal

Order of the bits ?
Dont matter
Can
result in long microinstructions
Not the number of microinstructions, but the width
A Note About Density

Observe that only a few bits are set to 1
Poor usage of bit space
This scheme is called Horizontal
Microprogram
Alternate Version : Encode the bits
Vertical
Microprogram
Vertical Microprogram
Encode the bits by grouping similar
elements together
General Idea :
Group
There can be only one source or destination

register
Some
similar resources together

operations are mutually exclusive
Read vs Write of memory
Design Issues
Encoding reduces the bit-space

But
requires decoders
Cost of decoder vs bit-space

Usually
decoder cost is very low
Another Idea
Group concuurently active signals
Every meaningful combination gets a code
Complex decoder to interpret every code
Vertical vs Horizontal
Horizontal
Faster
More
area
More common currently
Cheap transistors
Vertical
Slower
More
microinstructions
Microsequencing
Other ways to save on hardware
Every instruction had its own
microprogram sequence
Also, instructions have several addressing
modes
Only
the first few microinstructions differ
Can we share microcode?
A Powerful Technique in Sharing
Bit-ORing
Example
Two instructions share some microcode
Eventually, must branch
The default branch (one instructions) is X0
The other branch is stored at X1
Change the least significant bit(s?) to get a new address
Compare that with :
Having two conditional branches

Store two fields, one for each branch
Both very unclean
Thought Experiment :
What if we provided explicit branch
instead of storing next field in our
microprogram?
Typical instruction set will need a lot of
branches
Lot of time will be wasted on branching
A Pat on Our Back
We provided explicit field for address

Branch
location is now data

It is already saved
Caution :
Microinstruction
can get very wide
Solution :
There
is no free lunch.
Can we pipeline microfetch?
A neat idea :
Why wait till the current micro-op is over?

Branch field gives next operation
Get the next op
Caveat :
External inputs and status flags may change the order

What about interrupts?
Should have a mechanism that can invalidate microcode

prefetch
They are going to follow you everywhere
Similar to pipeline flush for instructions
Commonly used
Historical Perspectives
Hardwired Logic
Popular before 60s
Popular now
Speed Benefits
Microprogram
Popular in 70s
Only way people did it
Memory was slower than CPU

No on-chip cache
Best way is to store the microcode
Now Depends on who you ask?
Shades of gray :
Extremes of spectrum are harder to find nowadays
Tools for Design
Hardwired
Any
state machine optimizer

Assigning states, minimizing tranisitions, races,
hazards,..
Microcoding
Small
ones can be in binary

Large ones Use microassembler
Very useful debug tool

Can use microassembler simultaneously with actual hardware
development
Hardwired vs Microcoding
Hardwired units are faster and smaller
Emulation is easy with microcoding
Hardwired design is complex if large
Bugs in hardwired design cannot be fixed
in field
Hardwired control is not suited for loops
Looping
with microcode can be made as fast
Hardwired vs Microcode vs RISC
RISC
Simpler
instruction set
Hardwired Implementation
RISC instructions are like microcodes

Instructions
come from I-Cache instead of Control
Store
Difference :
Contents
are not fixed

Advantage : Only load what you want on the I-Cache
Keeps size smaller as compared to Control Stores
Microprogram vs Software
Imagine Floating Point Division

Solution 1 : Write in software
Long
process
Error prone
Many fetches repeatedly from memory for the given
sequence of operations
Solution 2 : Microcode
Long
process too but designers not programmers

Relatively error free more thorough design
Requires many cycles but fetched and used locally
Emulation
A very common use of microcoding

IBM System/360
32 bit architecture
16-bit registers
Secret :
Most implementations were 8-bit
Keep cost low
Heavy microcoding
Programmers oblivious
In 1992, International Meta Systems (IMS) announced

the 3250
Designed to emulate the x86, 68K, and 6502 architectures

Uses customizable microcode, among other techniques
Went bust, never released
Another Interesting Note
Writable Control Store

What
if you, a programmer, can write your

own control store?
Not a mad scientist thought
Implemented in
VAX
8800
PDP-11/60
IBM System/370
Current Trends
Microcode Update
Linux Utility - microcode_ctl
Companion
to IA32 microcode driver

It decodes and sends new microcode to the kernel
driver to be uploaded to Intel IA32 processors
Update is volatile lost on reboots
Microcode updates are also rolled into BIOS

updates typically
Ready
even before an OS is loaded
Intel Said..
The Pentium(R) Pro processor and Pentium(R) II processor may
contain design defects or errors known as errata that may cause the
product to deviate from published specifications. Many times, the
effects of the errata can be avoided by implementing hardware or
software work-arounds, which are documented in the Pentium Pro
Processor Specification Update and the Pentium II Processor
Specification Update. Pentium Pro and Pentium II processors include a
feature called "reprogrammable microcode", which allows certain types
of errata to be worked around via microcode updates. The microcode
updates reside in the system BIOS and are loaded into the processor
by the system BIOS during the Power-On Self Test, or POST.
Current Trends
Hyperthreading in P4
A second
logical CPU
Complete state of the system in both CPUs
Microcoding in P4
Two
pointers control flow independently

Both processors share the ROM entries
Access is alternated between the CPUs
Thank You

Lecture 04 Control Units

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 04 Control Units

Uploaded by

Copyright:

Available Formats

Control Unit :

Hardwired vs. Microprogrammed

Two Major Blocks in a CPU

aberration will result in wrong operation

A Simplified Control Unit

Lets Sample The Signals

Another Way to Generate Signals

gates to generate signals

the control signals in the sequence

Table with Sequencing

A well known process and design flows

MAP HLT Addr. Of Next

Sequence for MBA,ADD

If negative FLAG is set, jump to a new location by skipping to uInstruction at 0F

Lets Review the

Michael Slater's "Microprocessor Based Design" (pg.42):

Reason for Clumsiness

theory, there is no difference between

Live with condition checks

designs as clean as possible

choice is as best as your neighbors

bit position per control signal

result in long microinstructions

Not the number of microinstructions, but the width

A Note About Density

There can be only one source or destination

similar resources together

Read vs Write of memory

Encoding reduces the bit-space

Cost of decoder vs bit-space

decoder cost is very low

the first few microinstructions differ

Can we share microcode?

A Powerful Technique in Sharing

Compare that with :

Having two conditional branches

A Pat on Our Back

We provided explicit field for address

location is now data

can get very wide

Can we pipeline microfetch?

Why wait till the current micro-op is over?

External inputs and status flags may change the order

Should have a mechanism that can invalidate microcode

They are going to follow you everywhere

Similar to pipeline flush for instructions

Popular before 60s

Only way people did it

Memory was slower than CPU

Now Depends on who you ask?

Extremes of spectrum are harder to find nowadays

Tools for Design

state machine optimizer

ones can be in binary

Very useful debug tool

with microcode can be made as fast

Hardwired vs Microcode vs RISC

RISC instructions are like microcodes

come from I-Cache instead of Control

are not fixed

Keeps size smaller as compared to Control Stores

Imagine Floating Point Division

process too but designers not programmers