Professional Documents
Culture Documents
C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C
BUF INV MUX FDMUX BUFZ BUFZ BUFZ BUFZ BUFZ BUFZ BUFZ BUFZ
ND2 FDMUX MUX AN2L INV MUX FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX
18
AN2L AN2 FDOR AN2L OR INV MUX BUFZ BUFZ MUX BUFZ BUFZ MUX BUFZ BUFZ MUX BUFZ BUFZ MUX BUFZ BUFZ MUX BUFZ BUFZ MUX BUFZ BUFZ MUX BUFZ BUFZ
BUF AN2 AN2L FD OR INV ONE FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX
19
AN2 AN2L AN2L FD FDOR OR ONE FDN OR AN2L ZERO
FD AN2 AN2 MUX AN2L AN2L ND2 AN2L FDMUX BUFZ MUX BUFZ FDMUX BUFZ MUX BUFZ FDMUX BUFZ MUX BUFZ FDMUX BUFZ MUX BUFZ FDMUX BUFZ MUX BUFZ FDMUX BUFZ MUX BUFZ FDMUX BUFZ MUX BUFZ FDMUX BUFZ MUX BUFZ
20
AN2L AN2 AN2 AN2 INV FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX FDMUX
FDMUX MUX AN2L AN2 AN2 AN2L AN2 INV XOND INV XOND INV XOND INV XOND INV XOND INV XOND INV XOND INV XOND ONE
Width of FPGA
Figure 4: DISC Global Controller Layout.
Figure 2: Simplied Custom Instruction Module. The architecture of the global controller is seen
in Figure 5 and is comprised of the following sub-
modules:
Relocatable circuit modules communicate as estab- Data Register (DR): stores intermediate results,
lished by the global protocol and thus operate properly provides inter-module communication buering
at any vertical location. In a run-time environment, and assists in complex address generation (8 bits),
these circuit modules can be relocated as needed to
optimize the available hardware space. Address Register (AR): provides standard ad-
dressing modes for memory access (16 bits),
5 DISC Architecture Program Counter (PC): provides the sequencing
The DISC architecture implements relocatable capability of the processor (16 bits),
hardware with the linear hardware model on a sin- Status Register (SR): stores internal state of the
gle National Semiconductor CLAy31 FPGA coupled processor (4 bits),
to an external RAM. The CLAy31 provides a 56 x Instruction Register (IR): stores the opcode of
56 array of ne-grain logic cells allowing 56 complete the current instruction (8 bits),
rows in the linear hardware space. A complete proces- Global Control Unit (GCU): contains the cir-
sor is made by coupling a global controller to a library cuitry necessary to preserve communication pro-
of custom-instruction circuit modules (see Figure 3). tocol, sequence through processor states, and in-
terface with I/O.
Instruction
Module
Library Status Global Control Memory Control
Processor Memory Status Register
Add Unit
Memory Address Memory Address
To External Memory
Subtract
To Custom Instructions
Opcode
Multiply Instruction R. Program Counter
AND
Global Control Address Register
a+b-c^d
Instruction Module B Edge Detection
FFT
Figure 5: DISC Global Controller Architecture.
The global controller provides a consistent com-
Figure 3: DISC Linear Hardware Space. munication interface and standard protocol for all
custom-instructions at every vertical location. The
5.1 Global Controller global signals available to the custom-instructions in-
clude the following:
The global controller provides the circuitry for op-
erating and monitoring global resources such as the ex- Data Register Value: accesses contents of Data
ternal RAM, I/O, the internal communication network Register (8 bits),
IEEE Workshop on FPGAs for Custom Computing Machines, Napa, CA, April 19-21, 1995. 5
Data Register Feedback: provides new values for
Data Register (8 bits), IF OF EX
Memory Address: allows address generation con-
trol by custom-instructions (16 bits), Standard Instruction Sequence
Memory Data: allows bi-directional access of
memory data by custom-instructions (8 bits),
Status Signals: provides control capability for IF OF CC ... CC EX
custom-instructions (4 bits),
Instruction Register: provides opcode of current Custom Instruction Sequence
instruction (8 bits).
The global controller is also responsible for sequenc- Figure 6: DISC Instruction Sequences.
ing through the instruction cycles for the custom-
instruction modules. The following instruction cycles
are implemented by the global controller: load data register: load data register from mem-
ory,
Instruction Fetch (IF), conditional jump: jump with carry not set.
Operand Fetch (OF),
Halt Processor (HP), Each of these instructions follow the standard in-
Custom Cycle (CC), struction sequence of three cycles. These instructions,
Instruction Execution (EX). coupled with the custom-instruction library designed
for a particular application, provide the complete in-
The IF cycle stores the current program memory struction set of the processor. An application can im-
into the instruction register and increments the pro- plement an instruction set of any size by paging in-
gram counter. The OF cycle stores the current pro- struction modules in a demand-driven manner from
gram byte into the address register and also incre- the instruction library.
ments the program counter. The HP cycle causes all 5.2 Custom-instruction Modules
processor resources to remain idle and is used dur- Custom-instruction modules vary in size and com-
ing conguration. The CC cycle is used by complex plexity, but each is designed to t within the global
custom-instruction modules for adding additional cy- context described above. Specically, each module
cles and has no aect on global resources. The EX contains a decode and a data-path unit. Complex
cycle loads the value of the data register with the con- modules contain additional control structures.
tents of the data register feedback path. The decode unit assigns a specic op-code to the
Each instruction in the library operates in one of custom instruction and is responsible for acknowledg-
two possible instruction cycle sequences: standard ing its presence to the global controller. The decode
and custom. The standard instruction sequence fol- unit compares the contents of the IR for a match
lows a simple three-cycle execution: IF, OF, and EX. against its own opcode during the OF cycle. On a
Any instruction that completes its computation or positive match the module signals the global controller
function in a single clock cycle, such as basic arith- that the hardware is present and instruction sequenc-
metic and logic operations, will operate with this se- ing continues.
quence. The data-path is responsible for providing the
The custom-instruction sequence oers additional proper connections to the global communication net-
cycles for complex custom-instructions. The custom work and adhering to the established communication
sequence begins with the following two cycles: IF protocol. Instruction modules not executing refrain
followed by OF. The sequence then varies by insert- from sending any signals on the communication chan-
ing as many CC cycles as necessary to complete a nel to prevent the corruption of other operating in-
complex application-specic operation. The custom- structions. The data-path unit provides a new value
instruction sequence completes with the EX instruc- for the data register during the EX stage. Most in-
tion cycle. The custom-instruction module has com- structions perform their function by modifying the
plete control over the number of CC cycles needed for DR.
a particular function. Some instructions add as few as Several custom-instruction modules of varying size
one cycle, while others require thousands of cycles for have been implemented on DISC. These vary from a
a single operation. Figure 6 displays the two instruc- simple single row shifter to a complex edge-detection
tion sequences. module of 34 rows. Table 1 shows the current instruc-
The global control unit contains a number of de- tions available for DISC. The circuit layout for the
fault instructions necessary for controlling global re- Adder/Subtracter module is seen in Figure 7.
sources. These instructions are used for sequencing,
status control, and memory transfer and include the
following:
6 System Operation
The DISC processor was implemented on a PC-
set carry: sets carry bit in status register,
ISA custom board made exclusively for the study.
clear carry: clears carry bit in status register,
The board includes static bus interface circuitry, two
CLAy31 FPGAs, and memory. A conguration con-
store data register: store data register in memory, troller is implemented on the rst FPGA to monitor
IEEE Workshop on FPGAs for Custom Computing Machines, Napa, CA, April 19-21, 1995. 6
Upon receiving a request for an instruction mod-
Module Rows Gates ule, the host evaluates the current state of the DISC
Shifter 1 50 FPGA hardware and chooses a physical location for
Comparator 3 155 the requested module. The physical location is chosen
Add/Subtract 3 153 based on available FPGA resources and the existence
Addressing Modes 4 447 of idle instruction modules. If possible, the instruc-
Masking Operations 5 193 tion module is loaded in an FPGA location not cur-
Logical Operators 9 232 rently occupied by any other instruction module. If no
Big-Level Operations 9 296 empty hardware locations are available, a simple least-
Mean Filter 31 2156 recently-used (LRU) algorithm is used to remove idle
Edge Detector 33 2221 hardware. The host modies the bit-stream of the
requested hardware module to re
ect the placement
changes. The hardware module is then congured on
Table 1: Sample Custom Instruction Modules. the DISC platform by sending the new conguration
to the system. Figure 9 provides a simplied
ow chart
of DISC instruction execution.
AN2L INV AN2L AN2L AN2L AN2L AN2L AN2 BUFZ BUFZ XO2 BUFZ XO2 BUFZ XO2 BUFZ XO2 BUFZ XO2 BUFZ XO2 BUFZ XO2 BUFZ XO2
AN2 XO2 ND2 XOND ND2 XOND ND2 XOND ND2 XOND ND2 XOND ND2 XOND ND2 XOND ND2 XOND
Layout.
YES
Instruction
Present?
cent memory (see Figure 8). The board operates under Available?
Compute
DISC Configuration
RAM
New
Processor Controller Location
CLAy 31 CLAy 31
Configure
Instruction PC
Module
PC
Host Bus Interface Execute
Instruction
ISA Bus
g(x; y) = 81
X X g(x + m; y + n):
1 1
m=?1 n=?1 1 2 3
4 5 6
A coecient of 81 was used to simplify the design. The 7 8 9
128 x 64 grey scale image in Figure 10 was used as the
test image for both cases.
3 2 1 6 5 4 9 8 7
Shift
+ + + + + + + +