You are on page 1of 92

Training C T i i Course on

Advance FPGA based Digital System Design


by Fahad Al Ghazali

*Organized by Skill Development Council Islamabad


(Ministry of Professional & Technical Training Govt. of Pakistan)

Xilinx Xtreme DSP Architecture

DSP Implementation I l t ti
Digital Signal Processing can be implemented in both hardware and p software Software based approach implements in general purpose Processor Programs the processor for the tasks of particular application
FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

DSP Implementation (2) I l t ti


Second approach is to use a special p p purpose , hard wired high p g performance , customized processor whose architecture pp p g processing tasks in g supports special signal p form of libraries
e.g. Texas Instruments, Tiger Shark, Da Vinci etc.

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

DSP Implementation (3) I l t ti


Application Specific Integrated circuits can be fabricated for a unique . q Feasible only if large number of units are required

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

DSP Implementation (4) I l t ti


DSP on FPGAs Benefits
Reduced Chip count in case design already requires programmable logic Useful in case of greater number of channels Flexibility Debugging

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

DSP D i Ch ll Design Challenges


High Throughput Multiple Concurrent operations Multiple ALUs Requirement of Memories , MAC R i t fM i MACs

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

DSP T i l O Typical Operations ti


DSP operate on fixed-word length data that arrive at regular intervals of time Multiplication and Addition commonly known as MAC operation MAC functional units must be implemented efficiently and must give high performance Floating point/ Fixed point arithmetics Memory read/write Number of channels
FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

Timing Ti i
The operations can be distributed spatially in different blocks or one block . Depends upon how many clock cycles we have before next y y sample In case whole of the binary word is being y g processed at the same time, then hardware resources ensure in- time delivery of the results The operations can be distributed over latency factor i.e. time between first input and first valid output t t
FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

High Speed processing requirement in DSP algorithms

*Source: Jan Rabaey, BWRC


FPGA based Digital Design using Verilog HDL 10 ( f p g a c o u r s e @ y a h o o . c o m )

DSP Architecture Support in Xilinx FPGAs


Todays FPGA architecture address DSP implementation issues and offer specialized architectures. Reasons:
Market is flowing more towards reduced chip count solution to decrease the the sizes of devices To extract market share of devices used in booming communication industry y To exploit the parallel architecture offered by FPGAs

FPGA based Digital Design using Verilog HDL 11 ( f p g a c o u r s e @ y a h o o . c o m )

DSP options i FPGA ti in FPGAs


The options are
DSP48 slice introduction in architecture Built in cores of DSP functions so that user does not have to start design from scratch g On-chip soft/hard processor I.Ps with Floating point unit and support for C

FPGA based Digital Design using Verilog HDL 12 ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

Xtreme DSP : Design Considerations


The DSP48 slice is a new element in the Xilinx development model referred to as Application Specific Modular Blocks (ASMBL) architecture Delivers off-the-shelf programmable devices with the g y processors, clock best mix of logic, memory, I/O, p management, and digital signal processing Each XtremeDSP tile contains two DSP48 slices to form the basis of a versatile coarse grain DSP architecture coarse-grain Support independent functions, including multiplier, p ( ), p y multiplier-accumulator (MACC), multiplier followed by an adder, three-input adder, barrel shifter,
FPGA based Digital Design using Verilog HDL 15 ( f p g a c o u r s e @ y a h o o . c o m )

DSP48 Architecture A hit t


The DSP48 slice is an 18 x 18 bit twos complement multiplier followed by a 48-bit signextended adder/subtracter/accumulator, a function that is widely used in digital signal processing (DSP) Its predecessors which came in Spartan III/IIIE were with the name of MULT18x18 Inherent Pipeline bases architecture enhanced throughput 48-bit bus internal offers high aggregation

FPGA based Digital Design using Verilog HDL 16 ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

Features F t

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

Xilinx XtremeDSP
Starting with Virtex 4 family, Xilinx introduced DSP48 block for high-speed DSP on FPGAs Essentially a multiply-accumulate core with many other features Now also Spartan 3A and Virtex 5 have DSP blocks Spartan-3A

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

Xtreme DSP Interconnect in Virtex

DSP48 and Block RAM have dedicated interconnect to prevent interconnect bandwidth issues
FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

Features F t
1. The 18-bit A bus and B bus are concatenated, with the A bus being the most significant. 2. The X,Y, and Z multiplexers are 48-bit designs. Selecting any of the 48 bit 36-bit inputs provides a 48-bit sign-extended output. 3. The multiplier outputs two 36-bit partial products, sign extended to 48 bits. The partial products feed the X and Y multiplexers. When OPMODE selects the multiplier, both X and Y multiplexers are utilized and the adder/subtracter combines the partial products into a valid multiplier result.

FPGA based Digital Design using Verilog HDL 25 ( f p g a c o u r s e @ y a h o o . c o m )

Features F t
4. The multiply-accumulate path for P is through the Z multiplexer. The P feedback through the X multiplexer enables accumulation of P cascade when the multiplier is not used 5. The Right Wire Shift by 17 bits path truncates the lower 17 bits, and sign extends the upper 17 bits 6. The gray colored multiplexers are programmed at configuration time gray-colored 7. The shared C register supports multiply-add, wide addition, or rounding 8. 8 Enabling SUBTRACT implements Z (X+Y+CIN) at the output of the adder/subtracter

FPGA based Digital Design using Verilog HDL 26 ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

Simplified Si lifi d DSP Slice M d l Sli Model

FPGA based Digital Design using Verilog HDL 32 ( f p g a c o u r s e @ y a h o o . c o m )

A input Logic i tL i

FPGA based Digital Design using Verilog HDL 33 ( f p g a c o u r s e @ y a h o o . c o m )

B input logic i tl i

FPGA based Digital Design using Verilog HDL 34 ( f p g a c o u r s e @ y a h o o . c o m )

C input L i i t Logic

FPGA based Digital Design using Verilog HDL 35 ( f p g a c o u r s e @ y a h o o . c o m )

P output Logic t tL i

FPGA based Digital Design using Verilog HDL 36 ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

DSP48 Slice: Virtex 4

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

DSP 48 Tile Til

FPGA based Digital Design using Verilog HDL 40 ( f p g a c o u r s e @ y a h o o . c o m )

DSP48E Slice : Virtex5

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

DSP48 Functionality F ti lit


Full speed operation is 500 MHz when using the pipeline registers Equation 1 1 summarizes the combination of X 1-1 X, Y, Z, and CIN by the adder/subtracter.
The CIN, X multiplexer output, and Y multiplexer output are always added together together. This combined result can be selectively added to or subtracted from the Z multiplexer output. Adder Out Add O t = (Z (X + Y + CIN))

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

DSP48 Functionality F ti lit


A and B are multiplied and the result is added to or subtracted from the C register. Selecting the multiplier function consumes both X and Y multiplexer outputs to feed the dd f d th adder.
The two 36-bit partial products from the multiplier are sign extended to 48 bits before being sent to the adder/subtracter. Adder Out = C (A B + CIN)

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

Simplified Form of DSP48 Si lifi d F f

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

Mathematical Functions M th ti l F ti
DSP 48 can perform mathematical functions such as:
Add/Subtract Accumulate Multiply py Multiply-Accumulate Multiplexer Barrel Shifter Counter Divide ( lti Di id (multi-cycle) l ) Square Root (multi-cycle)

Can also create filters such as:

Serial FIR Filter (Xilinx calls this MACC filters) Parallel P ll l FIR Filt Filter Semi-Parallel FIR Filter Multi-rate FIR Filters

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

MACC Filter

Xilinx implementation of a serial FIR filter called a MACC ( lti l accumulate filt ) (multiply l t filter) This example has 96 coefficients Max input sample rate = clock speed / number of p p p taps t
FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL 47 ( f p g a c o u r s e @ y a h o o . c o m )

DSP 48E1 i Vi t 6 in Virtex-6


Enhancements to the DSP48E1 slice provide improve flexibility and utilization, improved efficiency of applications, reduced overall power consumption, and increased maximum frequency. frequency The high performance allows designers to implement multiple slower operations in a single DSP48E1 slice using time-multiplexing methods

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

Features of DSP48E1 F t f
The DSP48E1 slice supports many independent functions. These functions include :
Multiply py Multiply accumulate (MACC) Multiply add Three-input add ee pu Barrel shift Wide-bus multiplexing Magnitude comparator Bit-wise logic functions, pattern detect, and wide counter

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

Virtex-6 Vi t 6 DSP 48E1 Sli Slice

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

Enhanced F t E h d Features
The Virtex-6 FPGA Virtex 6 DSP48E1 slice includes all Virtex-5 FPGA DSP48E features plus a variety of enhancements

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

Enhanced F t E h d Features (C td) (Contd)


The enhanced features in the Virtex-6 FPGA DSP48E1 slice are: 25 bit pre-adder with D register t enhance th 25-bit dd ith i t to h the capabilities of the A path INMODE control supports balanced pipelining when dynamically switching between multiply (A*B) and add operations (A:B)

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

DSP48E1 Tile and Interconnect


Two DSP48E1 slices and dedicated interconnect form a DSP48E1 tile . The DSP48E1 tiles stack vertically in a DSP48E1 column. The height of a DSP48E1 tile g is the same as five configurable logic blocks (CLBs) and also matches the height of one block RAM. The bl k Th block RAM i Vi t 6 d i in Virtex-6 devices can b split be lit into two 18K block RAMs. Each DSP48E1 slice aligns horizontally with an 18K block RAM. Virtex 6 Virtex-6 family members have 1 2 6 or 10 1, 2, 6, DSP48E1 columns.

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

No. of DSP48E1 Slices offered in Virtex-6 Family

SMU

CSE 5349/7349

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

DSP48E1 Sli P i iti Slice Primitive

CSE 5349/7349

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

Arithmetics A ith ti .
Floating point/Fixed Point Double/single precision Square Root, Multiply, Divide(float/fixed) Match ith M t h with MATLAB results i th next lt in the t session ..

FPGA based Digital Design using Verilog HDL 58 ( f p g a c o u r s e @ y a h o o . c o m )

Fixed Point Representation Fi d P i t R t ti


Qn.m Format N bits are in integer part and 5 bits are in mantissa part 10/15 = 0000000 0000000.. . 1 0 1 0 1 0 1 Weights of mantissa part-1 -2 -3 -4 -5 -6 -7 0.5+0.125+0.03125 +0.015625 = 0.6666777
FPGA based Digital Design using Verilog HDL 59 ( f p g a c o u r s e @ y a h o o . c o m )

Fixed Point Divider i Coregen Fi d P i t Di id via C

FPGA based Digital Design using Verilog HDL 60 ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL ( f p g a c o u r s e @ y a h o o . c o m )

Optimize Your Design for Xilinx A hi Xili Architecture y CORE Generator System

What are Cores?


A core is a ready-made function that you can instantiate into your design as a black box black box Cores can range in complexity Simple arithmetic operators, such as adders, accumulators, and multipliers System-level building blocks, including filters, transforms, and memories Specialized functions such as bus interfaces controllers and functions, interfaces, controllers, microprocessors Some cores can be customized

FPGA based Digital Design using Verilog HDL 65 ( f p g a c o u r s e @ y a h o o . c o m )

Benefits of Using Cores


Save design time Cores are created by expert designers who have in-depth knowledge of Xilinx FPGA architecture Guaranteed functionality saves time during simulation Increase design performance Cores that contain mapping and placement information have predictable performance that is constant over device size and utilization The data sheet f each core provides performance expectations for f Use timing constraints to achieve maximum performance

FPGA based Digital Design using Verilog HDL 66 ( f p g a c o u r s e @ y a h o o . c o m )

What is the CORE Generator System?


Graphical User Interface (GUI) that allows central access to the cores themselves, plus: Data sheets Customizable parameters (available for some cores) Interfaces with design entry tools Creates graphical symbols for schematic-based designs Creates instantiation templates for HDL-based designs Web access from the Help menu The IP Center contains new cores to download and install You always have access to the latest cores Direct access to http://support xilinx com http://support.xilinx.com
FPGA based Digital Design using Verilog HDL 67 ( f p g a c o u r s e @ y a h o o . c o m )

Invoking the CORE Generator System


select Project New Source Select IP (CoreGen & Architecture Wizard) and enter a filename t fil Click Next, then select the type of core

FPGA based Digital Design using Verilog HDL 68 ( f p g a c o u r s e @ y a h o o . c o m )

Core G C Generator GUI t

FPGA based Digital Design using Verilog HDL 69 ( f p g a c o u r s e @ y a h o o . c o m )

Xilinx CORE Generator System GUI

FPGA based Digital Design using Verilog HDL 70 ( f p g a c o u r s e @ y a h o o . c o m )

Core Customize Window

FPGA based Digital Design using Verilog HDL 71 ( f p g a c o u r s e @ y a h o o . c o m )

CORE Data Sheets

FPGA based Digital Design using Verilog HDL 72 ( f p g a c o u r s e @ y a h o o . c o m )

Schematic Design Flow


Generate a core Use the Edit Project Options to select a schematic p symbol instead of HDL templates Creates an EDIF file and schematic symbol Instantiate symbol onto your schematic Treated as a black box - no underlying schematic Proceed with normal schematic flow
FPGA based Digital Design using Verilog HDL 73 ( f p g a c o u r s e @ y a h o o . c o m )

HDL Design Flow: Compile Simulation Library


Before your first behavioral simulation, you must run compxlib.exe to compile th XilinxCoreLib simulation lib il the Xili C Lib i l ti library Located in $XILINX\bin\<platform> Supports ModelSim, Cadence NC-Verilog VCS ModelSim NC-Verilog, VCS, Speedwave, and Scirocco If you download new or updated cores, additional simulation models will be automatically extracted during installation

FPGA based Digital Design using Verilog HDL 74 ( f p g a c o u r s e @ y a h o o . c o m )

HDL Design Flow: Core Generation and Integration


Generate or purchase a core Netlist file (EDN) Instantiation template files ( p (VHO or VEO) ) Behavioral simulation wrapper files (VHD or V) Instantiate the core into your HDL source Cut and paste from the templates p p p provided in the VEO or VHO file Design is ready for synthesis and implementation Use the wrapper files for behavioral simulation ISE automatically uses wrapper files when cores are p y pp present in the design g VHDL: Analyze the wrapper file for each core before analyzing the file that instantiates the core

FPGA based Digital Design using Verilog HDL 75 ( f p g a c o u r s e @ y a h o o . c o m )

DSP48 macro in Xili ISE i Xilinx


DSP48 macro provides an easy-to-use interface that abstracts the XtremeDSP slice simplifies it d i lifi its dynamic operation b i ti by enabling th bli the specification of multiple operations via a set of userdefined arithmetic expressions p Support for up to 64 instructions Configurable latency Choose between XtremeDSP Slice or fabric Implementation Support of signed twos complement input data signed, two s
FPGA based Digital Design using Verilog HDL 76 ( f p g a c o u r s e @ y a h o o . c o m )

DSP48 macro in Xilinx ISE(Contd)


The user specifies 1 to 64 instructions in the core GUI that are translated into the various control signals for the XtremeDSP slice of the target device g The instructions are stored in a ROM from which the user selects the appropriate instruction using the SEL port
FPGA based Digital Design using Verilog HDL 77 ( f p g a c o u r s e @ y a h o o . c o m )

Basic Core I/O B i C I/Os


Name CLK SCLR A ACIN B CONCAT C CARRYIN SEL P Direction Optional Description Input Input Input Input Input Input Input Input Input Output No Yes Yes Yes Yes Yes Yes Yes Yes No
Clock active rising edge Synchronous Clear synchronous reset (active High). Asserting SCLR synchronously with CLK resets all registers

A Port input of operand to Xtreme DSP Cascaded A port . Driven by ACOUT B Port input of operand to Xtreme DSP Concatenation of A and B ports C port input to XtremeDSP slice add/sub. Carry in value from fabric SEL port Selects the instruction width as p per no. of instructions P port output from XtremeDSP slice add/sub, provides the selected instructions FPGA based Digital Design using Verilog HDL result. Max : 48 bits ( f p g a c o u r s e @ y a h o o . c o m ) 78

Basic Core I/O B i C I/Os


Name P Direction Optional Description Output No P port output from XtremeDSP slice add/sub, provides the selected instructions result. Max : 48 bits CARRYOUT of sub/add operation f b/ dd ti

CARRYO UT

Ouput O t

No N

FPGA based Digital Design using Verilog HDL 79 ( f p g a c o u r s e @ y a h o o . c o m )

Core S h C Schematic S b l ti Symbol

FPGA based Digital Design using Verilog HDL 80 ( f p g a c o u r s e @ y a h o o . c o m )

Configuration of Core C fi ti fC
A Graphical user interface appears when DSP48 macro is selected to be generated g via CoreGen First Component name is provided by user A number of instructions copied from available instructions can be pasted on to user-defined instructions There are 64 i t ti Th instructions
FPGA based Digital Design using Verilog HDL 81 ( f p g a c o u r s e @ y a h o o . c o m )

Pipeline Options Pi li O ti
There are 3 options ,
Automatic Tier1 axis Expert : Fully automated (as per ISE) : Configurable upto one tier ( one : Fully configurable

Checkboxes appear as to select whether pipeline is to be inferred or not at a certain point of hardware
FPGA based Digital Design using Verilog HDL 82 ( f p g a c o u r s e @ y a h o o . c o m )

Implementation

DSP 48 th through C G h CoreGen

FPGA based Digital Design using Verilog HDL 84 ( f p g a c o u r s e @ y a h o o . c o m )

DSP48 Consumption C ti

FPGA based Digital Design using Verilog HDL 85 ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL 86 ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL 87 ( f p g a c o u r s e @ y a h o o . c o m )

Instantiation T I t ti ti Template l t
dsp481 YourInstanceName ( ( ) .clk(clk), .sel(sel), // Bus [2 : 0] y ( y ), .carryin(carryin), .a(a), // Bus [17 : 0] .b(b), // Bus [17 : 0] b(b) .c(c), // Bus [47 : 0] .p(p)); // Bus [47 : 0] p(p));
FPGA based Digital Design using Verilog HDL 88 ( f p g a c o u r s e @ y a h o o . c o m )

High frequency synthesis Hi h f th i


Timing Summary: --------------Speed Grade: -12 Minimum period: 1 244 (M i Mi i i d 1.244ns (Maximum F Frequency: 804.001MHz) Minimum input arrival time before clock: 2.514ns p Maximum output required time after clock: 4.152ns

FPGA based Digital Design using Verilog HDL 89 ( f p g a c o u r s e @ y a h o o . c o m )

FPGA based Digital Design using Verilog HDL 90 ( f p g a c o u r s e @ y a h o o . c o m )

Device Utilization Summary of the design with 3 instructions


Device utilization summary: --------------------------Selected Device : 4vfx12ff668-12 Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number of IOs: Number of bonded IOBs: Number of GCLKs: Number of DSP48s: 61 out of 5472 1% 112 out of 10944 1% 53 out of 10944 0% 54 53 out of 320 16% 1 out of 32 3% 1 out of 32 3%

FPGA based Digital Design using Verilog HDL 91 ( f p g a c o u r s e @ y a h o o . c o m )

Thanks

You might also like