Comparative Study of Various FFT Algorithm Implementation On FPGA

Comparative Study Of Various FFT Algorithm
Implementation On FPGA

1
Aniket Shukla
2
Mayuresh Deshmukh

Mumbai University, B.E. Electronics ,Terna Engineering College, Nerul Navi Mumbai, India

1
shuklaaniket@ymail.com
2
mayuresh07deshmukh@gmail.com

ABSTRACT
Increased demand and advancements in product design in
the field of communication, multimedia, security and safety
equipment and other industrial and scientific products have
created the need for high volume, low cost, multifunction,
DSP based frequency analyzers that can use Fast Fourier
Transform ( FFTs )for their signal processing or data
manipulation. The paper deals with implementation of FFT
algorithms that can compute fourier transform of varied
signals in real time for frequency analysis of signals on
FPGAs (Spartan-6). With large demand for high dynamic
range for applications, floating point implementation is used
as fixed point implementation becomes increasingly
expensive. The inherent massive parallelism of FPGAs
allows these solutions to be competitive to software
equivalent.

KeywordsFFT, FPGA, Signal Processing

1. INTRODUCTION
In this paper various FFT techniques with comparisons and
their implementation on FPGA (Spartan-6) using various
techniques is discussed .It deals with finding the optimized
algorithm for 1024 point FFT. Fast Fourier Transform(FFT)
is generally used in frequency analysis of the signals
generated by vibration sensors, communication systems,
spectrum analysers, image processing and filters.
Generally FPGA based signal analysers are used for the
online real time signal analysis for predictive fault detection
of the systems using FFTs. Although the various DSPs and
embedded solutions are available for the signal processing ,
the parallel processing property and field programmable
feature of FPGA provides significant improvement in
computation time for real time signals. The real time
systems require the data acquisition, computation and
resultant output in real time, that is, as soon as data at input
arrives it is processed and the output is available within few
microseconds.
In this paper the comparative study of various FFT
algorithms and the implementation techniques are
discussed. The various challenges associated with
implementation are addressed and the probable solutions are
discussed.
2. FAST FOURIER TRANSFORMS
Fast Fourier transforms (FFT) are a group of algorithms for
significantly speeding up the computation of the DFT. The
FFTs are DFT equivalents and reduce the number of
multiplications and additions for a given point DFT. Also
quantization noise is reduced in FFTs.

FFT is mathematically defined by[1],[5]:

Various FFT algorithms used are:

Figure 1 Types of FFT

2.1 Cooley-Tukey:
The Cooley-Tukey algorithm[1],[5] has been the most
widely used FFT algorithm since it was published in 1965.
The basic idea of the algorithm is to divide the N-point DFT
into M, N/M point DFTs. Thus if M=2 then It is divided into
two N/2 DFTs. These are called the radix-2. Similarly we
have Radix-4,8,16etc.
Although the basic idea is recursive, most traditional
implementations rearrange the algorithm to avoid explicit
International Journal of Emerging Trends in Signal Processing
Volume 1 ,Issue 1, November 2012
19
recursion. Also, because the CooleyTukey algorithm breaks
the DFT into smaller DFTs, it can be combined arbitrarily
with any other algorithm for the DFT.

2.2 Winograd Algorithm:
[10-13]It factorizes Z
N
-1 into various polynomials having
coefficients of 1, 0, or 1, and therefore require few (if any)
multiplications, so Winograd can be used to obtain minimal-
multiplication FFTs and is often used to find efficient
algorithms for small factors. Winograd[1],[2] showed that
the DFT can be computed with only irrational
multiplications, hence reducing the number of
multiplications considerably, but at the cost of
hardware.Although this is no longer a weakness as the
modern hardware architecture consists of multiplier blocks.
It is generally used with Raders algorithm.

2.3 Rader- Brenner Algorithm:
[1],[2],[10-13]In this the complex multiplications are
replaced by multiplication of complex number by purely real
or imaginary number. It is realized by computing an N-point
DFT with N=2
t
.

2.4 Brunns Algorithm:
Bruun's algorithm[1],[2],[10-13] is a fast Fourier transform
(FFT) algorithm based on an unusual recursive polynomial-
factorization approach, proposed for powers of two by G.
Bruun in 1978. Because its operations involve only real
coefficients until the last computation stage, it was initially
proposed as a way to efficiently compute the discrete Fourier
transform (DFT) of real data.
The major advantage of this algorithm is its use of real
valued modulo polynomials throughout the computation.

TABLE 1:COMPUTATIONAL COMPARISONS
ALGORITHM SIZE MULTIPLICATIONS ADDITIONS
RADIX-2
256 1800 5896
1024 10248 30728
RADIX-4
256 1392 5488
1024 7856 28336
RADER-
BRENNER
256 1284 6464
1024 7172 34048

Thus by observing the various parameters generally Radix-4
FFT is selected for the computation of 1024 point DFT.
Although the DIT-FFT (Decimation in time) requires more
number of multiplications than radix-2 FFT but the use of
sparse matrices reduces the number of multiplications
considerably.
3. FPGA IMPLEMENTATION
The FPGAs are used for implementation of the FFT for
online system signal analysis[6] and the fact that field
programming can be done as per user need.
The Spartan-6 architecture[8] supports logic optimized look
up tables and DSP48A1 slices with block RAM that provides
most optimum resources for FFT implementation.
The highly logic optimized Virtex and Kintex series
architectures can also be selected. The selection depends on
the number of I/O blocks required by the user, computational
latency, sampling frequency of the device, speed and number
of DSP slices along with block RAM.

FPGA selection parameters:
1. LUTs
2. Block RAM
3. Trans-receiver sampling rate.
4. Supply voltages
5. Speed
6. DSP slices
7. I/O ports
8. Converters(ADC or DAC)

The various implementation techniques used can be broadly
classified as:
1. VHDL coding
2. EDA tools.

3.1 VHDL coding:
This is done using Xilinx ISE design suite.
VHDL is chosen over Verilog because Verilog does not
support signed arithmetic.
VHDL coding for FFT can be done using either the fabric or
the DSP blocks. Also it may be done using the structural or
the behavioral architecture.
Structural:
Each butterfly of the Radix-4 FFT is considered as a
component and then port mapped according to the signal
flow graph.
Behavioral:
The equations and matrices can be used to describe the
function of each stage of entity FFT.

3.2 EDA tools:

Xilinx ISE design Suite:

Schematics:
Schematics can be used and each stage adder, multiplier,
MUX, FFs, can be called from library and proper routing can
be done.

IP core generator:
The Xilinx LogiCORE IP[7] Fast Fourier Transform
(FFT) implements the Cooley-Tukey FFT algorithm, a
computationally efficient method for calculating the Discrete
Fourier Transform (DFT). The FFT core computes an N-
point forward DFT or inverse DFT (IDFT) where N can be
2m, m = 316.
20
For fixed-point inputs, the input data is a vector of N
complex values represented as dual bx-bit twos-complement
numbers, that is, bx bits for each of the real and imaginary
components of the data sample, where bx is in the range 8 to
34 bits inclusive. Similarly, the phase factors bw can be 8 to
34 bits wide.
For single-precision floating-point inputs, the input data is a
vector of N complex values represented as dual 32-bit
floating-point numbers with the phase factors represented as
24- or 25-bit fixed-point numbers.
All memory is on-chip using either block RAM or
distributed RAM. The N element output vector is
represented using by bits for each of the real and imaginary
components of the output data. Input data is presented in
natural order and the output data can be in either natural or
bit/digit reversed order. The complex nature of data input
and output is intrinsic to the FFT algorithm, not the
implementation.
Three arithmetic options are available for computing the
FFT:
1 .Full-precision unscaled arithmetic
2 .Scaled fixed-point, where the user provides the scaling
3 .Block floating-point (run-time adjusted scaling)

MATLAB simulation:

Matlab HDL coder, Xilinx blockset, signal processing
blockset can be used directly to design a FFT algorithm for
FPGA.

The FFT block computes the fast Fourier transform (FFT) of
each row of a sample-based 1-by-P input vector, u, or across
the first dimension (P) of an N-D input array, u.
y = fft(u,M) % P M
y(:,l) = fft(datawrap(u(:,l),M)) % P > M; l = 1,...,N.

The System Generator[9] token serves as a control panel for
controling system and simulation parameters, and it is also
used to invoke the code generator for netlisting. Every
Simulink model containing any element from the Xilinx
Blockset must contain at least one System Generator token.
Once a System Generator token is added to a model, it is
possible to specify how code generation and simulation
should be handled.

4. CHALLENGES FACED DURING IMPLEMENTATION
The various problems faced during implementation are:
1. Complex variables are not defined in VHDL.
2. Resources used by EDA tools are large.
3. Port mapping and schematics are complicated due to
large number of points.
4. Floating point number implementation.
5. Scaling and quantization errors.

Complex computation is not synthesizable using VHDL.
Hence user defined libraries are required to be defined in
which the real and imaginary parts are defined using integers
or real numbers and the computation is done separately for
both the parts. Although this utilizes more space but it can be
one of the possible solutions.
As the number of points are more the port mapping and the
use of generate statement is a tedious job as 1024 signals at
each stage need be routed. Floating point numbers are again
not synthesizable directly hence the CORDIC algorithm is
used. The CORDIC core implements a generalized
coordinate rotational digital computer (CORDIC) algorithm,
initially developed by Volder to iteratively solve
trigonometric equations, and later generalized by Walther to
solve a broader range of equations, including the hyperbolic
and square root equations. The CORDIC algorithm
introduces a scale factor to the amplitude of the result, and
the CORDIC core provides the option of automatically
compensating for the CORDIC scale factor. Since the FFT is
implemented with finite precision arithmetic, the results of
computation are affected by round off noise incurred during
butterfly calculations , the scaling data and approximation of
coefficients. The magnitude of the signal tends to increase at
each stage. Hence signal magnitude increase by a maximum
one bit at each stage and thus a scaling procedure is needed
to avoid overflow. An especially efficient scaling procedure
would be to compute all stages without scaling and then to
scale entire sequence by one bit. Resource utilisation
increases when the EDA tools are used.
Hence optimised full custom coding is recommended.

Figure 2 Resource utilization comparison.
Figure 2 above shows the comparison of resource utilisation
of the matlab program for 64-point FFT using fft function
and custom designed code.
It can be seen that the custom designed code requires 64
bytes (y)memory space for answer whereas the built in
function requires 798 bytes.(ans).

FFT
SYSTEM
GENERATOR
BLOCK
21

CONCLUSION
Thus by the above study it is concluded that for 1024 FFT
implementation on FPGA , the radix-4 , CORDIC , scaled ,
full custom algorithm is most suitable. The computation time
and resource optimization are required to be considered
while designing the FFT. The built in DSP slices can be
significantly used to reduce the use of FPGA fabric.
Such a design can be used for signal analysis to design filters
and for predictive fault detection of devices using the
vibration signals.

REFERENCES
[1] Winthrop W. Smith, Joanne M. Smith, Handbook of
Real Time FFT, IEEE Press, 1995.
[2]H. J. Nussbaumer, Fast Fourier Transforms and
Convolution Algorithms, Springer: Berlin 1981
[3]Ramrez R. W., FFT fundamentals and Concepts,
Prentice Hall
[4]Peter J. Ashenden, The Designer's Guide to VHDL,
Morgan Kauffman Publishers.
[5] J. G. Proakis, Digital signal processing: principles,
algorithms, and applications., Prentice-Hall Intemational,
1996.
[6]IEEE paper Fpga Implementation Of FFT Algorithms
Using FloatingPoint Numbers by Hilal Kaptan, Ali Tangel,
Suhap Sahin.
[7]http://www.xilinx.com/support/documentation/ip_docume
ntation/xfft_ds260.pdf
[8] http://www.xilinx.com/support/documentation/spartan-
6_data_sheets.htm
[9] http://www.mathworks.in/products/signal/
[10] Brenner, N.; Rader, C. (1976). "A New Principle for
Fast Fourier Transformation". IEEE Acoustics, Speech &
Signal Processing
[11]Brigham, E. O. (2002). The Fast Fourier Transform.
New York: Prentice-Hall
[12]Thomas H. Cormen, Charles E. Leiserson, Ronald L.
Rivest, and Clifford Stein, 2001. Introduction to
Algorithms, 2nd. ed. MIT Press and McGraw-Hill.
Especially chapter 30, "Polynomials and the FFT."
[13]http://en.wikipedia.org/wiki/Fast_Fourier_transform
22

Comparative Study of Various FFT Algorithm Implementation On FPGA

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comparative Study of Various FFT Algorithm Implementation On FPGA

Uploaded by

Copyright:

Available Formats

Comparative Study Of Various FFT Algorithm

You might also like