Professional Documents
Culture Documents
S. Mancini
Plan Introduction
ASICs FPGA Modles de cots
Problmatique
FPGA ou ASIC ?
Sur quels critres fonder son choix ? Quels sont les points communs et diffrences des mthodes de conception ?
1- ASIC vs FPGA
S. Mancini
Plan Introduction
ASICs FPGA Modles de cots
Les familles Les ASICs (Application Specic Integrated Circuit) se dcomposent en plusieurs familles : Full Custom
Les masques des transistors sont dessins.
Standard cells
Le circuit est un assemblage de cellules places/routes.
Gate array
Une mer de portes est route.
3- ASIC vs FPGA
S. Mancini
Plan Introduction
ASICs FPGA Modles de cots
99 ITRS (International Technology Roadmap for Semiconductors)
180nm 180nm
130nm
00 ITRS 01 ITRS
100nm 90nm
Leading Foundry
65nm
Year
1
2005 2005
2007 2007
2009 2009
UK Design Forum
Technologie 90 nm
430 KPortes/mm2 SRAM 1.6 1.2 mm2 par Mbit DRAM 0.5 mm2 par Mbit 6 9 couches de mtal
6
6- ASIC vs FPGA
S. Mancini
Principe Proposer des circuits gnriques recongurables volont. Ils sont constitus de matrices de cellules recongurables et dun rseau dinterconnexion. Principaux vendeurs :
Actel Altera Atmel Cypress Lattice Minc QuickLogic Xilinx
SRAM
RW Data
Flash
Les technologies diffrent par : La technologie de mmorisation de la conguration Le type de cellules lmentaires
7- ASIC vs FPGA Introduction- FPGA
Anti-fusibles
S. Mancini
S. Mancini
Flash
Anti-fusibles
Pr o A SI C P L U S F la s h F a m il y F P GA s
Pr oA S I C PL U S A r c hi t e c t u r e
PLUS
provides
The ProASICPLUS device core consists of a Sea-of-Tiles (Figure 1). Each tile can be configured as a 3-input logic function (e.g., NAND gate, D-Flip-Flop, etc.) by 9- ASIC vs FPGA programming the appropriate Flash switch interconnections (Figure FPGA Introduction- 2 on page 6 and Figure 3 on page 6). Tiles and larger functions are connected with any of the four levels of routing hierarchy. Flash switches are distributed throughout the device to provide nonvolatile, reconfigurable interconnect programming. Flash switches are programmed to connect signal lines to the appropriate logic cell inputs and outputs. Dedicated high-performance lines are connected as needed for fast, low-skew global signal distribution throughout the core. Maximum core utilization is possible for virtually any design. ProASICPLUS devices also contain embedded two-port SRAM blocks with built-in FIFO/RAM control logic. Programming options include synchronous or asynchronous operation, two-port RAM configurations, user defined depth and width, and parity generation or checking. Please see
Unlike SRAM FPGAs, ProASICPLUS uses a live on power-up ISP Flash switch as its programming element. In the ProASICPLUS Flash switch, two transistors share the floating gate, which stores the programming information. One is the sensing transistor, which is only used for writing and verification of the floating gate voltage. The other is the switching transistor. It can be used in the architecture to connect/separate routing nets or to configure logic. It is also used to erase the floating gate (Figure 2 on page 6).
Logi c Ti le
Sensing
S. Mancini
Floating Gate Switch In
Pr o A S I C P L U S F la s h F a m il y F P GA s
S. Mancini
The logic tile cell (Figure 3 on page 6) has three inputs (any or all of which can be inverted) and one output (which can connect to both ultra-fast local and efficient long-line routing resources). Any three-input, one-output logic function (except a three-input XOR) can be configured as one tile. The tile can be configured as a latch withFigure 2 Flash Switch clear or set or as a flip-flop with clear or set. Thus, the tiles can flexibly map logic and sequential gates of a design.
Switching
Actel (ProAsic)
Actel (Axcelerator)
A x c e le r a t o r F a m il y F P GA s
SuperCluster
TX
TX RX B
TX RX
TX
A x c e le r a t o r F a m il y F P G
C C
Local Routing In 1 Efficient Long-Line Routing
RAMC SC SC SC SC SC SC SC SC SC SC SC SC SC SC HD SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC HD SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC HD SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC HD SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC HD SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC HD SC SC SC SC SC SC SC SC SC SC SC SC SC SC RD RD RD RD RD RD RD RD RD RD RD RD RD RD RD RD RD RD RD RD RD RD RD RD RD RD RD RD SC SC SC SC SC SC SC SC SC SC SC SC SC SC HD SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC HD SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC HD SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC HD SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC HD SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC SC HD SC SC SC SC SC SC SC SC SC SC SC SC SC SC
R
RX RX
RAMC RAMC RAMC RAMC RAMC RAMC RAMC RAMC RAMC RAMC RAMC RAMC RAMC HD RAMC RAMC RAMC RAMC RAMC RAMC RAMC RAMC RAMC RAMC RAMC RAMC RAMC RAMC
In 3 (Reset)
I/Os
Switch in
1
The routing structure of ProASIC devices is designed to provide high performance through a flexible four-level hierarchy of routing resources: ultra-fast local resources, Test efficient long-line resources, high speed very long-line resources, and high performance global networks. The ultra-fast local resources are dedicated lines that allow the output of each tile to connect directly to every input of the eight Mot surrounding tiles (Figure 4 on page 7). distances and higher fanout connections. These resources vary in length (spanning 1, 2, or 4 tiles), run both vertically and horizontally, and cover the entire ProASICPLUS device (Figure 5 on page 7). Each tile can drive signals onto the efficient long-line resources, which can in turn, access every input of every tile. Active buffers are inserted automatically by routing software to limit the loading effects due to distance and fanout.
The high-speed very long-line resources, which span the entire device with minimal delay, are used to route very long or very high fanout nets. (Figure 6 on page 8). The high-performance global networks are low skew, high Switch fanout nets that are accessible from external pins or from internal logic (Figure 7 on page 9). These nets are typically used to distribute clocks, resets, and other high fanout nets requiring a minimum skew. The global networks are implemented as clock trees, and signals can be introduced at any junction. These can be employed hierarchically with signals accessing every input on all tiles.
Chip Layout
SC CoreSC SC Tile
256x9 Two Port SRAM The efficient long-line resources provide routing for longer or FIFO Block
Switch out
Flash
Logic Modules
Actels Axcelerator family provides two types of logic modules, the register cell (R-cell) and the combinatorial Figure 6 AX Device Architecture (AX1000 shown) 1 cell (C-cell). The AX C-cell can implement more than 4,000 In addition, every SRAM block has an embedded FIFO Table 1 Number of Core Tiles per Device combinatorial functions of up to 5 inputs (Figure 3 on control unit. The control unit allows the SRAM block to be Device Number of Core Tiles page 5). The C-cell contains carry logic for even more configured as a synchronous FIFO without using core logic efficient implementation of arithmetic functions. With its AX125 1 regular tile modules. The FIFO width and depth are programmable. The small size, the C-cell structure is extremely AX250 4 smaller tiles FIFO also features programmable ALMOST-EMPTY synthesis-friendly, simplifying the overall design as well as AX500 4 regular tiles (AEMPTY) and ALMOST-FULL (AFULL) flags in addition to reducing design time. AX1000 9 regular tiles the normal EMPTY and FULL flags. The embedded FIFO AX2000 16 regular tiles The R-cell contains control unit also contains the counters necessary for thea flip-flop featuring asynchronous clear, asynchronous preset, generation of the read and write address pointers as well as and active-low enable control signals (Figure 3 on page Embedded Memory control circuitry to prevent metastability and erroneous 5). The R-cell registers feature programmable clock polarity selectable on a operation. The embedded SRAM/FIFO blocks can be As mentioned earlier, each core tile has either three (in a register-by-register basis. This provides additional flexibility cascaded to create larger configurations. smaller tile) or four (in the regular tile) embedded SRAM (e.g., easy mapping of dual-data-rate functions into the blocks along the west side, and each variable-aspect-ratio FPGA) while conserving valuable clock resources. The clock SRAM block is 4,608 bits in size. Available memory source for the R-cell can be chosen from the hard-wired configurations are: 128x36, 256x18, 512x9, 1kx4, 2kx2 or 12- ASIC vs FPGA clocks, the routed clocks, or the internal logic. 4kx1 bits. The individual blocks have separate read and Introduction- FPGA configured with different bit widths Two C-cells, a single R-cell, and two Transmit (TX) and two write ports that can be Receive (RX) routing buffers form a Cluster, and two on each port. For example, data can be written in by 8 and Clusters comprise a SuperCluster (Figure 4 on page 5). read out by 1. The embedded SRAM blocks can be initialized Each SuperCluster contains an independent Buffer module, at power up via the device JTAG port (ROM emulation which supports automatic buffer insertion on high-fanout mode). nets by the place-and-route tool, minimizing system delays while improving logic utilization.
Circuit APA100
System Gates v3.1 Tiles (Registers) RAM 1 000 000 56 320 198 kBit
6
Circuit AX2000
2 000 000 10 752 21 504
The logic modules within the SuperCluster are arrange that two combinatorial modules are side by side, givin CCR CCR pattern to the SuperCluster. This CC pattern enables efficient implementation (minimum de of 2-bit carry logic for improved arithmetic performa (Figure 5 on page 5). The AX architecture is fully fracturable, meaning that if or more of the logic modules in a SuperCluster are used particular signal path, the other logic modules are available for use by other paths.
PLL Clocks
2 88
v3.1
338 kBit 8 4
S. Mancini
At the chip level, SuperClusters are organized into c tiles, which are arrayed to build up the full chip. Each c tile consists of an array of 336 SuperClusters and four SR blocks (176 SuperClusters and 3 SRAM blocks for AX250). The SRAM blocks are arranged in a column on west side of the tile (Figure 6 on page 6). For example, AX1000 is composed of a 3x3 array of 9 core t Surrounding the array of core tiles are blocks of I/O Clus and the I/O bank ring (Table 1 on page 6).
S. Mancini
Advanced v1.5
Advanced v1.5
Columns 4 4 6 8 8 10 12 14 16 18
in Kb 216 504 792 1,584 2,448 3,456 4,176 5,904 7,992 10,008
in Bits 221,184 516,096 811,008 1,622,016 2,506,752 3,538,944 4,276,224 6,045,696 8,183,808 10,248,192
Configurable Logic Blocks (CLBs)
The Virtex-II Pro configurable logic blocks (CLB) are organized in an array and are used to build combinatorial and synchronous logic designs. Each CLB element is tied to a switch matrix to access the general routing matrix, as shown in Figure 23.
COUT TBUF TBUF
R
write
ge)
Figure 43 shows the layout of the block RAM columns in the XC2VP4 device.
DCM
RocketIO TM Functional DCM Serial Transceivers Description: FPGA
A CLB element comprises 4 similar slices, with fast local feedback within the CLB. The four slices are split in two columns of two slices with two independent carry logic chains and one common shift chain.
Slice Description
Each slice includes two 4-input function generators, carry logic, arithmetic logic gates, wide function multiplexers and two storage elements. As shown in Figure 24, each 4-input function generator is programmable as a 4-input LUT, 16 bits of distributed SelectRAM+ memory, or a 16-bit variable-tap shift register element.
Altera (Apex/Stratix)
2_050901
The Virtex-II Pro configurable logic blocks (CLB) are organized in an array and are used to build combinatorial and synchronous logic designs. Each CLB element is tied to a CLBs switch matrix to access the general routing matrix, as shown in Figure 23.
A CLB element comprises 4 similar slices, with fast local Slice feedback within the CLB. The four slices are split in two colX1Y0 umns of two slices with two independent carry logic chains COUT Switch and one common shift chain.
Matrix
SHIFT CIN
RAM16
ORCY MUXFx
CLBs
COUT TBUF TBUF Slice X1Y1 Slice X1Y0 SHIFT Slice X0Y1 Slice X0Y0 CIN
Each slice includes X0Y14-input function generators, carry two logic, arithmetic logic gates, wide function multiplexers and Slice two storage elements. As shown in Figure 24, each 4-input Fast X0Y0 function generator is programmable as a 4-input LUT, 16 Connects bits of distributed SelectRAM+ memory, or a to neighbors 16-bit variable-tap shift register element. CIN
DS083-2_32_122001
SRL16 LUT G RAM16 MUXF5 SRL16 LUT F CY Register/ Latch CY Register/ Latch
CLBs
CLBs
CLBs
COUT
Reset
RAM16
Arithmetic Logic
DS083-2_31_122001
ute)
LUT G RAM16
Arithmetic Logic
DCM
DCM
DS083-2_11_010802
DS083-2_31_122001
Figure 43: XC2VP4 Block RAM Column Layout 1 Circuit Spartan 3 VirtexII
Logic Cells 74 880 125 136 Slices 33080 55 616 A Virtex-II Pro multiplier block is an 18-bit by 18-bit 2s comRAM 2,5 Virtex-II plement signed multiplier. MBit Pro 11 MBit devices incorporate
many embedded multiplier blocks. These multipliers can be associated with an 18 Kb block SelectRAM+ resource or can be used independently. They are optimized for 13- ASIC vs FPGA high-speed operations and have a lower power consumpIntroduction- FPGA an 18-bit x 18-bit multiplier in slices. tion compared to
Spartan 3 104 4 0
24
www.xilinx.com 1-800-255-7778
S. Mancini
S. Mancini
www.xilinx.com 1-800-255-7778
Plan Introduction
ASICs FPGA Modles de cots
S. Mancini
Sajoute
Cot de conception
Ingnieurs Outils de CAO 500 000 $ par an.
Cot unitaire
Cot de fabrication unitaire 0.2 $ par mm2 Un wafer 300 mm (90000 mm2)= 300 $
18
S. Mancini
$400,000 $300,000 $200,000 $100,000 $5 10 50 # of Units 100 150 FPGA Cost ASIC Cost
Device Only Cost (ASIC includes NRE) Units FPGA Cost ASIC Cost 5 $ 16,000 $ 350,150 ASIC includes NRE) 10 $ 32,000 $ 350,300 Cost ASIC Cost 50 $ 160,000 $ 351,500 320,000 $ 353,000 16,000 100 $ 350,150 $ 480,000 $ 354,500 32,000 150 $ 350,300 $
Device + EDA Tools Estimate (ASIC includes NRE) FPGA EDA $ 82,000 Simulation+Synthesis+FPGA Place&Route ASIC 3,200 Each EDA $ 343,000 Simulation+Synthesis+Timing+ATPG FPGA $
FPGA NRE $ -
Comparaison
ASIC Cost $ 693,300 $ 694,500 $ 696,000 $ 697,500 $ 699,000 $ 700,500
Les circuits multi-projets Plusieurs projets/circuits sont faits sur le mme wafer pour partager les NRE.
$ $ $
ASIC $ 30 Each Units $ FPGA NRE $FPGA Cost 3,200 Each ASIC 350,000 114,000 FPGA NRE 10 $ $ ASIC 50 $ $ 30 242,000 Each 100 $ ASIC NRE $ 350,000 402,000 150 $ 562,000 200 $ 722,000 250 $ 882,000
NRE ($)
FPGA ASIC
350 000
FPGA Cost ASIC Cost
$1,000,000
1
$600,000 FPGA Cost $400,000 ASIC Cost $200,000 $10 50 100 150 200 250 # of Units
Device + EDA Tools Estimate (ASIC includes NRE) $FPGA EDA $ 82,000 Simulation+Synthesis+FPGA Place&Route 50 100 150 ASIC EDA $ 5 343,000 10 Simulation+Synthesis+Timing+ATPG Units 10 50 100 150 200 250 FPGA Cost $ 114,000 $ 242,000 $ 402,000 $ 562,000 $ 722,000 $ 882,000 ASIC Cost $ 693,300 $ 694,500 $ 696,000 $ 697,500 $ 699,000 $ 700,500
# of Units
Europractice : AMI Semiconductor 0,35 m CMOS 680 Euro/mm2 CMP : STMicroelectronics 0,18 m CMOS HCMOS8D 990 Euro/mm2
1 . . . et la CAO
s Estimate (ASIC includes NRE) DA $ 82,000 Simulation+Synthesis+FPGA Place&Route FPGA/ASIC Cost vs Units (250KGates) DA $ 343,000 Simulation+Synthesis+Timing+ATPG ASIC Cost 18- ASIC vs FPGA $800,000 $ 693,300 Introduction- Modles de cots $ 694,500 $600,000 $ 696,000 $400,000 $ 697,500 $200,000 $ 699,000 $ 700,500 $Total Unit Cost (US$) 10 50 $1,000,000
http ://www.altera.com/products/devices/cost/cst-cost_step1.jsp
S. Mancini
S. Mancini
100
150
200
250
# of Units
Plan
Total Unit Cost (US$)
Flot de conception
$1,000,000 $800,000 $600,000 $400,000 $200,000 $10 50 100 Mthodes 150 200 250 communes # of Units Spcicit des ASICs Spcicit des FPGAs Le prototypage : FPGA vers ASIC Exemple de projet multi-plateforme : LEON
Synthse
Simulation
Programmation FPGA
non
Validation
oui
Validation
oui
S. Mancini
Synthse directe Les descriptions un "haut" niveau dabstraction des blocs fonctionnels sont transformes en cellules standards.
VHDL
Entity Synthse
e e2 e3
1
NETLIST
s
Placement Routage
LAYOUT
S. Mancini
Composants "prcaractriss"-IP Les circuits complexes sont proposs sous la forme de macro-blocs.
VHDL
Entity
e1 e2 e3
NETLIST
s
Les fondeurs proposent des modles de simulation et des masques (vue abstraite). La synthse se fait par instanciation de bote noire.
RAM
LAYOUT
chronously connected to the same clock in another chiplet, we phase-aligned these clocks and analyzed the signal paths to meet timing constraints. We achieved clock alignment by tweaking the clock insertion delays, using aligners in the clock module. Similarly, we made the clock trees as structurally identical as possible. As part of the physical design process, we met design completion and manufacturability goals by implementing techniques such as design rule checks, antenna xes, track lling, and doubling of vias wherever possible. Figure 4 shows the layout plot for the Viper designs initial version. Table 3 summarizes the major design parameters.
PNX8500 (philips)
1
Value TSMC 0.18 m, six metal layers About 35 million 1.2 million instances, or 8 million gates 243 instances, 750-Kbit memory
Table 3. Design statistics. Parameter Process technology Transistors Instances CPUs Memories
WE HAVE LEARNED much from the Viper design experience and trust it will guide us in the future, particularly since the next-generation SOC designs are significantly more complex, calling for still higher levels of integration. Some of our current activities, in addition to regular chip-development tasks, are investigating more efficient on-chip bus architectures and better design-reuse methodologies. I
We thank the Viper management and design teams for their hard work, particularly chief architects Gert Slavenburg and Lane Albanese, without whose foresight and leadership the project never would have been successful.
References
1. S. Rathnam and G. Slavenburg, An Architectural Overview of the Programmable Multimedia Processor, TM-1, Proc. 41st IEEE Computer Society Intl Conf. (COMPCON 96), IEEE CS
S. Mancini
chiplet timing, clock matching, and I/O timing analysis. To achieve timing closure, we made engineering change orders to the netlist after routing. Following each manipulation step, formal verication ensured that the modied netlist was functionally equivalent to the one after test insertion. We aligned all clock domains having synchronous chiplet crossings. For example, if the memory interface clock in one chiplet was syn-
Press, Los Alamitos, Calif., 1996, pp. 319-326. 2. D. Paret and C. Fenger, The I2C Bus, John Wiley & Sons, New York, 1997.
S. Mancini
Santanu Dutta is a design engineering manager at Philips Semiconductors in Sunnyvale, California. His research interests include design of high-performance
30
Modles dentres Les vendeurs de FPGA proposent des outils propritaires pour utiliser les FPGAs : Saisie de schmatique Langages de description spciques AHDL - Altera ABEL - Xilinx La synthse peut tre ralise par des outils tiers (Leonardo, Synplicity, Synopsys, etc ...).
S. Mancini
Placement/routage Le placement/routage est ralis par des outils propritaires. Ces outils permettent : dallouer les blocs fonctionnels dextraire une analyse de timing
Synthse
Enveloppe
Placement Routage
S. Mancini
S. Mancini
Principe On utilise des FPGAs pour valider la conception dun ASIC. Il existe des plateformes dmulation gnriques de
Mthodes communes grandes complexit (Aptix, Quickturn, . . . ). Spcicit des ASICs Spcicit des FPGAs Accroissement de la vitesse de simulation Le prototypage : FPGA vers ASIC Exemple de projet multi-plateforme : LEON Solutions for Wireless Communications and Image Processing
I/O cable connectors (20) with interleaved grounds provide flexible connection to target systems 31- ASIC vs FPGA Mthodologie de conception- Le prototypage : FPGA vers ASIC
S. Mancini
Exemple : Aptix
User-controlled power supply voltage selection and monitoring to support advanced prototyping components today and tomorrow I/O cable connectors (20) with interleaved grounds provide flexible connection to target systems
Plan
Board-edge I/O
The System Explorer MP3CF is optimized for prototyping DSP-based pipelined designs with moderate requirements for System Explorer MP3CF hardware interconnect between prototyping components. The MP3CF architecture provides 1 maximum performance for interconnect architecture System Explorer MP3CF prototypes orer MP3CF is optimized incorporating fixed-pin prototyping comDSP-based pipelined derate requirements for ponents such as CPUs, DSPs, memory ween prototyping compoF architecture provides cards, etc. Use the MP3CF for building mance for prototypes ed-pin prototyping comhigh-speed prototypes of wireless commuCPUs, DSPs, memory nication and digital-imaging applications. e MP3CF for building
USER COMPONENT HOLES FPGA FPGA FPGA FPGA FPGA FPGA REGION #1 REGION #2 REGION #3 FPGA FPGA FPGA FPGA FPGA FPGA
FPIC #1
140
FPIC #2
140
FPIC #3
FPIC #1
140
FPIC #2
140
FPIC #3
GLOBAL INTERCONNECT LINES All component pins in a given region connect through one FPIC device
140
GLOBAL INTERCONNECT LINES All component pins in a given region connect through one FPIC device
140
Mthodologie de conception- Le prototypage : FPGA vers ASIC opted the Aptix solution because it provides a
Nokia made a commitment to create real-time prototypes of all its new mobile phone designs. Prototypes are the only way
5
erification.
7 LEON-2 Users Manual by 3testing actual voice transmission 9 to validate our algorithms
quality. We adopted the Aptix solution because it provides a productive debug environment while maintaining our objective 4 of real-time verification. Stelios Podimatis
Member of Technical Staff, ASIC Engineering, Nokia (San Diego, LEON processor CA)
Local ram FPU Debug Support Unit
6
Architecture de LEON
3 5
Cibles technologiques RAM infre instancie instancie instancie instancie instancie instancie instancie PADS infrs infrs instancis instancis instancis instancis infrs infrs
Integer unit
CP I-Cache D-Cache Local ram PCI
4 6
Ethernet
AHB Controller
Technologie Modle comportemental Xilinx VIRTEX/2 FPGA Atmel ATC18/25/35 UMC FS90A/B UMC 0.18 um CMOS TSMC 0.25 um w. Artisan rams Actel Proasic FPGA Actel AX anti-fuse FPGA
PROM
I/O
SRAM
SDRAM
S. Mancini The LEON integer unit implements the full SPARC V8 standard, including all multiply and divide instructions. The number of register windows is congurable within the limit of the SPARC standard (2 - 32), with a default setting of 8. To aid software debugging, up to four watchpoint registers can be congured. Each register can cause a trap on an arbitrary instruction or data address range. If the debug support unit is enabled, the watchpoints can be used to enter debug mode.
1.4.2 Floating-point unit and co-processor The LEON model does not include an FPU, but provides a direct interface to the Meiko FPU
S. Mancini
Organisation du projet
cache
syncram
virtex2_syncram RAMB16_S36
Exemple de code
cachemem.vhd tech_map.vhd entity cachemem is entity syncram is ... ... dtags0 : syncram port map (... inf : if INFER_RAM generate ... u0 : generic_syncram generic map ( ... hb : if (not INFER_RAM) generate atc1 : if TARGET_TECH = atc18 generate u0 : atc18_dpram generic map (... ... tech_act18.vhd pragma translate_off entity hdss2_512x32cm4sw0 is ... architecture behavioral of hdss2_512x32cm4sw0 is ... pragma translate_on entity atc18_syncram is ... id0 : hdss1_128x32cm4sw port map (... ...
37- ASIC vs FPGA Durcissement aux radiations
Les mmoires instancies sont la fois : Des botes noires pour la synthse Les entits sont considres comme des cellules de la bibliothque. Des descriptions comportementales pour la simulation Elles peuvent tre fournies par le vendeur de RAM.
36- ASIC vs FPGA Mthodologie de conception- Exemple de projet multi-plateforme : LEON
S. Mancini
S. Mancini
Single Event Upset (SEU) Une particule peut faire changer dtat les lments de mmorisation (Latch, registres, SRAM, . . . ) .
e
SoC Bilan
Select Select s e 0
Single Event Transient (SET) La circuiterie combinatoire peut tre altre : Une erreur linstant dchantillonnage peut tre mmorise Larbre dhorloge gnre des fronts parasites
DQ
Latchup
gnd P+ N N P P N+ Caisson N
Substrat P
Clk
Clk D Q
D Clk Q
S. Mancini
Substrat N
N+
N P+ Caisson P
gnd
vdd
S. Mancini
vdd
S. Mancini
SoC Bilan
Standards
TMR Codes correcteurs derreur Auto-test
S. Mancini
Les mmoires
Vote
Standard Des codes correcteurs derreurs protgent les donnes stockes. Des bits supplmentaires sont ncessaires. Spciques Les bits dun mot sont spatialement spars. La surface est accrue.
CLK
(S)DRAM
Les SEU acclrent la dcharge des points mmoire. On peut accrotre le taux de rafrachissement.
Les registres doivent tre loigns pour ne pas subir le mme dfaut. Il doivent tre mis jour par la valeur corrige.
44- ASIC vs FPGA Durcissement aux radiations- Durcissement des ASICs
S. Mancini
S. Mancini
TMR la synthse classique est suivie dune modication de netlist. Cela peut tre fait par des scripts des outils de synthse ou par modication des chiers rsultats.
S. Mancini
Origine des disfonctionnements Les lments des FPGAs qui sont susceptibles de provoquer des disfonctionnements : Registres des cellules RAM embarque La conguration est sensible aux SEU
La SRAM peut tre altre (XC2VP125 : 43 Mbits de conguration) Les Anti-fusibles peuvent claquer Les EEPROM peuvent changer dtat
SoC Bilan
S. Mancini
Les lments de conguration externe (pour les FPGAs de type SRAM) doivent aussi tre protgs.
Remdes Les FPGAs sont plus dlicats durcir : Les registres et la RAM
Ce sont les mmes mthodes que les ASICs.
La conguration
Adopter des technologies moins sensibles aux SEUs
Les anti-fusibles sont moins sensibles que les SRAM/EEPROM
Vrier la conguration
Utilisation de la conguration partielle des FPGAs pour vrier les cellules automatiquement.
S. Mancini
Mthodologie de durcissement Il est possible dimplanter des TMR de faon transparente. Pour les FPGAS dActel, Synplify permet dimplanter directement : des Flip-op combinatoire des TMR des Flip-op combinatoire avec TMR En VHDL, cela se fait laide dattributs :
architecture top of top is attribute syn_radhardlevel of top : architecture is "tmr_cc" ; ... attribute syn_radhardlevel of counter_q : signal is "tmr" ; ...
Composants spciques Actel propose des circuits rsistants aux radiations : Programmation par anti-fusibles rsistants Sans registres Avec des registres durcis
D CLK CLK Voter Gate
R T 5 4 S X - S R a d To l e r a n t F PG A s f o r S p a c e A p p l i c a t i o n s
To achieve the SEU requirements, the D flip-flop in the RT54SX-S R-cell is enhanced (Figure 3). Both the master and slave latches are actually implemented with three latches. The feedback path of each of the three latches is voted with the outputs of the other two latches. If one of the three latches is struck by an ion and starts to change state, the voting with the other two latches prevents the change from feeding back and permanently latching. Care was taken in the layout to ensure that a single ion strike could not affect more than one latch.
Figure 4 is a simplified schematic of the test circuitry that has been added to test the functionality of all the components of the flip-flop. The inputs to each of the three latches are independently controllable so the voting circuitry in the feedback paths can be exhaustively tested. This testing is performed on an unprogrammed array during wafer sort, final test and post burn-in test. This test circuitry cannot be used to test the flip-flops once the device has been programmed.
CLK
CLK
CLK CLK
CLK CLK
Les latchs sont spares pour ne pas subir les mmes rayonnements.
Figure 3 RT54SX-S R-Cell Implementation of D Flip-Flop Using Voter Gate Logic
1
D Q
S. Mancini
Tst1
S. Mancini
CLK
Bilan
S. Mancini
Constituants des SoCs Les technologies actuelles permettent de mettre sur un mme circuit : ASIC Processeurs Mmoire (SRAM et DRAM) Bus systmes Analogique SoC=System on Chip. Les circuit programmables permettent le mme type de ralisation : les SoPC (System on Programmable Chip).
54- ASIC vs FPGA SoC- Rappels sur les SoCs 55- ASIC vs FPGA SoC- Rappels sur les SoCs
S. Mancini
S. Mancini
Synthtisables
Modles disponibles de haut niveau pour la synthse. Certaines parties doivent tre adaptes la technologies.
Paramtrables
Les processeurs sadaptent aux besoins de lapplication : Taille et type des caches Mcanismes systmes (TLB, adressage virtuel, . . . ) Co-processeurs
Bilan
FPGA
S. Mancini
ASIC
Modles gnriques (ex Leon) ou processeur fournis par vendeurs de FPGAs (ex : NIOS (Altera), MicroBlaze (Xilinx)). Ressources utilises : RAM double port, CAM. Performance 50 MHz
Esclave
Esclave
Esclave
Mux
Mux
Mux
La limitation des ressources impose des processeurs simples. Intgrs dans les FPGA
ExempleExcalibur ARM (Altera), Virtex II Pro (Xilinx)
Matre
Matre
Matre
Matre
Bus Trois-tats Bus multiplexeurs FPGA et peuvent cohabiter dans un mme circuit.
FPGA
Performance 300 MH
S. Mancini
S. Mancini
Les bus
Avalon Bus Specification
La mmoire
The Avalon bus module is generated automatically by the SOPC Builder, so that the system designer is spared the task of connecting the bus and peripherals together. The Avalon bus module is very rarely used as a discrete unit, because the SOPC Builder will almost always be used to automate the integration of processors and other Avalon bus peripherals into a system module. The designers view of the Avalon bus module usually is limited to the specific ports that relate to the connection of custom Avalon peripherals.
ASIC
La technologie est impose par les ressources Les bus trois-tats sont peu recommands (et mme souvent impossibles).
Note that the Avalon bus module (an Avalon bus) is a unit of active logic that takes the place of passive, metal bus lines on a physical PCB. (See Example 2). In this context, the ports of the Avalon bus module could be thought of as the pin connections for all peripheral devices connected to a passive bus. The Avalon Bus Specification Reference Manual defines only the ports, logical behavior and signal sequencing that comprise the interface to the Avalon bus module. It does not specify any electrical or physical characteristics of a physical bus.
ASIC
FPGA
Pour conomiser la logique, larbitrage peut tre fait au niveau de chaque esclave : les ls dinterconnexions Bus Avalon sontLes CPUs embarqus imposent des bus sysnombreux.
Figure 2. Avalon Bus Module Block Diagram - an example system The Avalon bus module provides the following services to Avalon peripherals connected to the bus:
FPGA
Les mmoires sont disponibles sous forme de blocs pr-caractriss. ROM et RAM sont gnres selon les besoins. Les technologies actuelles permettent la cohabitation de plusieurs types de mmoires (SRAM, SDRAM, associatives, . . . ). Les ROMs sont cres sur-mesures.
UMC propose des bibliothque et gnrateurs de SRAM.
http ://www.umc.com/english/design/b_1.asp
Altera Corporation
tmes.
S. Mancini
La mmoire Les FPGAs fournissent des blocs de mmoire lmentaires ( 4 KOctets). Ils peuvent tre assembles pour former de grandes quantits. Les ROMs sont synthtises en circuits combinatoires. Pas de SDRAMs.
Xilinx XC2VP125 (Virtex II Pro) (0,13 m ) 556 blocs de SRAM de 18Kbits = 10,008 Kbits
Congurations Timings
62- ASIC vs FPGA SoC- Etude comparative
Horloges multiples Les ASICs permettent des architectures de domaines dhorloges complexes. Des FIFOs asynchrones adaptes permettent les changements de domaines : les mta-stabilites sont rsolues. Chaque domaine dhorloge a son arbre dhorloge propre.
Application-Specic SOC Multiprocessors
CAB MPEG MBS + VIP1 + VIP2 ICP1 + ICP2 + MMI 1394 T-PI Conditional access (MSP1 + MSP2)
ASIC
ASIC
FPGA
FPGA
SelectRAM CLB
MSP3
chronously connected to the s another chiplet, we phase-aligne and analyzed the signal paths to constraints. We achieved clock tweaking the clock insertion dela ers in the clock module. Similarly clock trees as structurally identic As part of the physical design p design completion and manufact by implementing techniques such checks, antenna xes, track lling of vias wherever possible. Figure 4 out plot for the Viper designs init Table 3 summarizes the m parameters.
Table 3. Design statistics. Parameter Process technology Transistors Instances Memories CPUs Peripherals Clock domains Clock speed Power Supply voltage Package Value About 35 million 1.2 million instances, or 8 million gates 243 instances, 750-Kbit memory 2 (TriMedia TM32 and MIPS PR3940) 50 82 TM32: 200 MHz; PR3940: 150 MHz; SDRAM: 143 MHz 4.5 W 1.8-V core and 3.3-V I/O BGA456
S. Mancini
WE HAVE LEARNED much from th experience and trust it will gu future, particularly since the ne SOC designs are significantly m calling for still higher levels of inte of our current activities, in addit chip-development tasks, are inve efficient on-chip bus architectu design-reuse methodologies.
Acknowledgments
We thank the Viper manageme teams for their hard work, part architects Gert Slavenburg and L without whose foresight and lead ject never would have been succe
References
Press, Los Alamitos, Calif., 1996, & Sons, New York, 1997.
Horloges multiples
Functional Description: FPGA Each global clock multiplexer buffer can be driven either by the clock pad to distribute a clock directly to the device, or by the Digital Clock Manager (DCM), discussed in Digital Clock Manager (DCM), page 40. Each global clock multiClock Pad
Lanalogique La plupart des technologies numriques sont compatibles avec lanalogique. Les blocs analogiques sont conus part et intgrs lassemblage. Les zones numriques/analogiques sont spares pour rduire le bruit dhorloge.
plexer buffer can also be driven by local interconnects. The DCM has clock output(s) that can be connected to global clock multiplexer buffer inputs, as shown in Figure 47.
DCM
Local Interconnect
CLKOUT
ASIC
ASIC
Global clock buffers are used to distribute the clock to some or all synchronous logic elements (such as registers in CLBs and IOBs, and SelectRAM+ blocks. Eight global clocks can be used in each quadrant of the Virtex-II Pro device. Designers should consider the clock distribution detail of the device prior to pin-locking and floorplanning. (See the Virtex-II Pro Platform FPGA User Guide.)
FPGA
les FIFOs asynchrones sont faites de cellules du FPGA : leur performances sont limites.
38
Figure 48 shows clock distribution in Virtex-II Pro devices. In each quadrant, up to eight clocks are organized in clock rows. A clock row supports up to 16 CLB rows (eight up and eight down). To reduce power consumption, any unused clock branches remain static.
NW
NE
8
NW
8 BUFGMUX
NE
8 max
16 Clocks
8
16 Clocks
SW
8 BUFGMUX
SE
SW
SE
FPGA
8 BUFGMUX
DS083-2_45_122001
S. Mancini
S. Mancini
Lanalogique Pas danalogique intgre. Les circuit analogiques programmables existent mais ils sont peu performants.
ASIC
FPGA
S. Mancini
Comparaisons de performances Performances et complexit de la ralisation du microprocesseur LEON pour diffrentes cibles technologiques :
Technologie Atmel 0.18 CMOS std-cell Atmel 0.25 CMOS std-cell UMC 0.25 CMOS std-cell Atmel 0.35 CMOS std-cell Xilinx XC2V500-6 (0.15 m ) Altera 20K200C-7 (0.15 m ) Actel AX1000-3 (0.15 m ) Complexit ASIC 35K gates + RAM 33K gates + RAM 35K gates + RAM 2 mm2 + RAM FPGA 4,800 LUT + 14/32 block RAM 5,700 LCELLs + EAB RAM (52%) 7,600 cells + 14/36 RAM 65 MHz (post-layout) 49 MHz (post-layout) 48 MHz (post-layout) 165 MHz (pre-layout) 140 MHz (pre-layout) 130 MHz (pre-layout) 65 MHz (pre-layout)) Frquence
Bilan Matrise complte du projet ASIC Matrise de la rsistance aux radiations Cots rduits grande chelle Fort taux dintgration Performances maximum Les erreurs cotent cher Connaissance approfondie de la technologie NRE
69- ASIC vs FPGA Bilan
FPGA
http ://www.gaisler.com/
S. Mancini
S. Mancini
Bilan Temps de dveloppement rduits ASIC Familles rsistantes aux radiations Investissements rduits Contraintes darchitecture Mconnaissance des dtails internes /caractristiques Relachement de lattention Accroissement des risques de pannes Cots unitaires leves Complexit limite Performances limites
70- ASIC vs FPGA Bilan 71- ASIC vs FPGA Bilan
Technologie
Souplesse
Puissance de calcul
Rutilisabilit
FPGA
S. Mancini
S. Mancini
Rfrences
Plan Dtaill
Introduction Problmatique ASICs Les familles Evolution des technologie FPGA Principe Technologies de programmation Actel (ProAsic) Actel (Axcelerator) Xilinx (Spartan 3/Virtex II) Altera (Apex/Stratix) Modles de cots Cots des FPGAs Cot des ASICS Comparaison Les circuits multi-projets Mthodologie de conception Mthodes communes Flot de conception Spcicit des ASICs Synthse directe
ASIC vs FPGA
S. Mancini
Composants "prcaractriss"-IP Le Back-End Spcicit des FPGAs Modles dentres Placement/routage Utilisation des ressources Le prototypage : FPGA vers ASIC Principe Exemple : Aptix Exemple de projet multi-plateforme : LEON Architecture de LEON Cibles technologiques Organisation du projet Exemple de code Durcissement aux radiations Single Event Upset (SEU) Single Event Transient (SET) Latchup Durcissement des ASICs Principales mthodes Les registres Les mmoires SoC Rappels sur les SoCs Constituants des SoCs Un SoPC : Excalibur (Altera) Etude comparative Les microprocesseurs Les bus La mmoire Horloges multiples Lanalogique Bilan Comparaisons de performances Bilan Conclusion Rfrences Mthodologies de durcissement Durcissement des FPGAs Origine des disfonctionnements Remdes Mthodologie de durcissement Composants spciques Efcacit des durcissements
S. Mancini