You are on page 1of 18

2.10.

2010
1
1
Lecture 5
Structure and Properties of FPGAs
FPGA Architecture
Current technology
Best flexibility
Propagation delays
vary not fixed from
pin to pin
Requires special
place & route
algorithms
2
2.10.2010
2
Basic Unit of FPGA Configurable
Logic
Note:
Synchronous
set/reset
The unit is
CLB Xilinx
LE - Altera
(Example:
Xilinx XC2000)
3
FPGA interconnects
FPGAs have hierarchical interconnect levels
Local interconnections (LI)
May have several levels
Row interconnects, column interconnects
Global interconnections (GI) (#GI <<#LI)
The different levels are connected with switches
Local interconnects are shorter than GI
Shorter implies better speed, but GI links are made broader and with
more repeaters =>faster with fixed length line than LI
LIs are many, GIs are scarce
Local interconnects are preferred in local communication
Does not have to pass many switches
d e l a y
Wirelength
GI
LI
4
2.10.2010
3
FPGA Routing - to connect
configurable blocks
5
Programmable Logic Trends

FLEX8000 1993 1 296 0.42 125 - - - - -
FLEX10K 1995 4 992 0.42 204 20 480 - 865 19 1.0
FLEX10KA 1996 12 160 0.3 204 40 960 - 865 38 2.4
APEX20K 1998 16 640 0.22 - - - - - -
APEX20KE 1999 51 840 0.18 - 442 368 4 PLLs 795 47 11.3
APEX20KC 2000 38 400 0.15 - 327 680 4 PLLs 795 66 8.4
APEX II 2001 67 200 0.13 - 1 146 880
8 PLL, I/O
interfaces
795 59 14.6
Stratix 2003 79 040 0.13 420 7 427 520
22 DSP, 12 PLLs,
I/O interfaces
491 125 27.9
Stratix II 2004 179 400 0.09 567 9 383 040
96 DSP, 12 PLL,
I/O interfaces
553 179 56.2
Stratix III 2006 338 000 0.065 - 20 497 491
112 DSP, 12 PLL,
I/O interfaces
514 196 113.9
SDRAM Controller
MHz LE Device Year
Max.
Capacity
[LE]
Process
m]
Max.
MHz
Max.
Memory
[bit]
Embedded
functions
Relative
size of
FPGA
6
2.10.2010
4
Example Design on EP2C85
33,216 LEs
Picture from Altera Chip
Planner tool
DarkerLEsarein use.
Letszoomin
16 LEs
7
Zoom in
One LE
Look-uptable
(comblogic)
D flip-flop
8
2.10.2010
5
Specific to FPGA
A lot of registers use them
Aggressive pipelining
Objective is to hide the routing delays as much as possible
Simple logic stages between registers
Hard macros
Use whenever appropriate
Higher performance than by building one with the FPGA native
resources
they are there anyway
Embedded multipliers are common
9
Dedicated Blocks and Interconnecti ons
10
2.10.2010
6
Integrated hard macros
The devices have increasing number of integrated hard macros
Included in each device despite of usage everyone pays.
Includes the most common functions
E.g.
Memories
High-speed multipliers with accumulate (MAC)
Integrated microprocessors
High speed I/O link controllers
PLL/DLLs for clock manipulation
11
Hard macros: DSP blocks
Properties
High-performance, power-optimized, fully registered and pipelined
multiplication operations
Supported 9-bit, 12-bit, 18-bit, 36-bit word lengths
Supported 18-bit complex multiplications
Efficiently supported floating-point arithmetic formats
Signed and unsigned input support
Built-in addition, subtraction and accumulation units to combine
multiplication results efficiently
Rich and flexible arithmetic rounding and saturation units
Efficient barrel shifter support
12
2.10.2010
7
DSP blocks
A DSP block is divided into four
blocks
Interface with four LAB rows
on the left and right
Can be cascaded
Cascading 18-bit input bus to
form tap-delay line for filtering
applications
Cascading 44-bit output bus to
propagate output results from
one block to the next block
without external logic support
13
Hard macros: TriMatrix Memory
Configurable, fast on-chip SRAM memories
Various bit widths supported, can be grouped together to form
different sized memories
14
2.10.2010
8
Trimatrix Memories
Packed mode: pack two single-
port memories into one physical
dual-port memory
Simple dual port: simultaneous
read and write
True dual-port: any combination
of simultaneous two operations of
read and write supported
e.g. read-read
15
Clock networks in FPGAs
FPGAs are designed for synchronous logic
Some exotic devices exist, but this is the case with 99% of FPGAs
FPGAs include clock networks and support different clock domains
within the device
The mystical clk signal is generated in one of the following ways:
Input (crystal) oscillator
Input to the FPGA device (dedicated pins) which buffers the
signal
This can be directly used
PLL circuitry that multiplies/divides the input clock
Locks to required frequency, may be used to e.g. create a
stable 200 MHz clock from 50 MHz input clock
Internal feedback-loop clocks or clock dividers
The most hazardous way
Doable, but dont use this
16
2.10.2010
9
Clock networks
Clock networks are hierarchical
Global clocks (Gclk)
Regional clocks (Rclk) (may have several)
Number of Rclks >>Gclks (e.g. 16 Gclks, 100 Rclks)
17
Regional clocks can only be used
in one device quadrant
Global clocks can be used to drive
logic and other blocks throughout the
device
Supported I/O standards and
applications
From the Device Manual
18
2.10.2010
10
FPGA CONFIGURATION
19
FPGA device technologies: Overview
Most common
2.10.2010
11
SRAM-based FPGA Configuration
The device needs a programming file (also called as bit file or
bitstream)
Includes the programming info for each cell of the FPGA
Usually proprietary format
Again, a lot of variation accross manufacturers and devices
Each cell needs to be configured at start-up
FPGA
Configuration
device
FLASH
- Stores the bitstream
Reads the bitstream
from the flash
Writes the bitstream to
the FPGA
Example:
21
SRAM-based FPGA Configuration
FPGAs may support also master-mode in which the FPGA may
directly connect to a memory to obtain its configuration bits
Does not need anyexternal configuration device
Also parallel programmers are used in order to increase the
programming speed
Byte-wide ports are common
J TAG-port is also supported
Standard connection originally for testing
Has room for special commands
Widely used in prototyping phase as the FPGA may be directly
programmed with J TAG, instead of programming the flash and then
reseting the device
22
2.10.2010
12
FPGA SELECTION CRITERIA
23
Separation of targets
Strong tendency towards high-end and low-end FPGA devices
Low-end
low cost, lower logic capacity, less memory, less integrated hard macros
Target is the traditional cost-sensitive consumer products and glue-logic
domain with possible fancy features such as a simple soft processor
Price some tens of euros, cheaper for high quantities
High-end are highly optimized, usually for speed, and large
Pricing thousands of euros/device, up to 10k-range for the best
(depends again on the volume)
Target is the traditional ASIC domain
When high performance is required but not enough products are
manufactured to compensate for ASICs higher NRE costs
Emerging trend is also to offer structured ASIC of the design
The design of an FPGA is burned into a structured ASIC that cannot
be re-programmed
Saves power and area, increases speed due to removal of the
programming resources
24
2.10.2010
13
FPGA device selection criteria (1)
Circuit capacity
Amount of logic elements and registers, logic element size
Amount of RAM, types of RAM
Required hard macros
Routing resources
I/O signal routing (How the location of an I/O pin affects the routing)
Circuit speed
Basic cell speed, routing speed
Routing delay predictability
Global signals
Signals that go to every cell (clk, reset)
Clock networks, clock generation inside the chip, dedicated clock I/O
pins
Dedicated global reset pin
Number of I/O signals and supported standards
Packaging, power consumption
Temperature range, radiation-hard
25
FPGA device selection criteria (2)
Voltage levels, inside the chip and I/O
Programming
Re-programming, flexibility
Security
External components required, price
Pricing
Unit price in volume production
Development cost
Ranges a lot depending on the amount, specific device and package
(and the client)
Prices are subject to rapid changes long term contracts should be
carefully considered
Development environment
CAD tools, usability, support
Future
Availability of the chips in volume
Structured ASICs
Compatible pin/package mapping between different flavors of the device
26
2.10.2010
14
Availability and Life Span
The digital CMOS technology develops rapidly
New devices are introduced faster and faster
The end of life span of certain device is dictated by its demand
Widely used devices are more certain to stick around for years
Very widely used devices may life quite long (even 10 years, e.g. Xilinx
XC3000, Altera Flex 10k)
The old device may be convertible to a new device without
modifications
Package, pins, operating voltage, configuration
Operating voltage tends to change between technology generations, so
that causes most of the problems with compatibility
The manufacturer may give some guarantees on life span
Choosing between different vendors may be complicated. The
experience with certain manufacturers devices may be the dominant
factor.
27
FPGA SYNTHESIS
28
2.10.2010
15
Quartus FPGA Design Software
Accepts HDL description of a system (VHDL, Verilog)
Quartus flow phases
1. Setup
2. Perform RTL synthesis
3. Map basic gates into FPGA logic
4. Place the logic into specific location in chip
5. Route (connect) the logic elements together
6. Provide statistics and analysis results
7. Create a programming file and upload it into the FPGA
29
Quartus II design flow after
simulated and verified design
30
Generic gate-level
representation
Places and routes the
logic into adevice
Converts the post-fit
netlist into a FPGA
programmingfile
Analyzes and validates
the timingperformance
of all logic in a design.
Run on FPGA
2.10.2010
16
Quartus - i. RTL Synthesis step
Synthesis
Creates basic gates and DFFs
from HDL
Result is so called netlist
A full description of logical
ports and their connections
Synthesis results can be viewed
with RTL Viewer tool
s y n t h
netlist
(inRTL viewer)
user_logic.vhd
31
Quartus - ii. Technology mapping
Transforms the basic gates into
technology specific components
I.e. into logic elements (LE) with
Altera FPGAs
LE contains look-up table and
DFF
same function with FPGAs
logic elements
gate-level netlist
x
1
x
0
x
2
x
1
m
1
m
2
m
6
x
z
x
0
x
2
1
LE
LE
m a p
LE
32
2.10.2010
17
Quartus - iii. Placement
Select which physical LE
implement the user function
Selected LE should be close
to each other
to physical IO pins that they
use
Other logic may complicate
placement
p l a c e
user function with FPGAs logic elements
LE
LE
LE
other functions
FPGA
33
Quartus - iv. Routing
Connects used LEs
together
to input/output pins
FPGAs have programmable
routing switches
Sometimes detours are needed
Algorithmically complex operation
May take a while with bigger
systems
Phases 4+5 together are also
called as Fitter phase
FPGA
34
2.10.2010
18
Quartus v. Analysis
Timing analyzer
Approximates LE and routing delays according to the fitter results.
Finds critical paths and calculates maximum clock speed for them.
Runs automatically after fitter
RTL viewer
Chip Planner
Area and power reports
used logic
unused logic
35
Summary
FPGA slower and bigger than (SC) ASIC, but more convenient for small
volumes
FPGA architecture: a regular matrix of interconnections and
programmable logic cells
FPGA interconnections: global and local buses
FPGA configuration: SRAM-based
FPGA hard macros
FPGA selection: several criteria
FPGA synthesis: Quartus Design Flow
36

You might also like