12FPGA Implementation 0f 32 Point GM24Sept13VIT

International Journal of Research in Electronics & Communication Technology
Volume 1, Issue 1, July-September, 2013, pp. 72-77, IASTER 2013

www.iaster.com, ISSN Online: 2347-6109, Print: 2348-0017
FPGA Implementation of 32 point Radix-2 Pipelined FFT

Aruna Arya, Prof. Augusta Sophy
SENSE Department, VIT University, Chennai, Tamil Nadu, India
ABSTRACT
This paper presents an advanced method of implementing Fast Fourier Transform(FFT) using
pipelining concepts. FFT is a technique which efficiently calculates the DFT by reducing the number
of addition and multiplication operations taken place. In this paper ,in between the intermediate
stages of every butterfly, pipelining is used to store the outputs of previous stages. Then, the
development of 32-point FFT based on Decimation-In-Time radix-2 algorithm has been done.
Pipelining is used to enhance the speed by decreasing the clock period and hence to improve the
throughput of the FFT processor.
Keywords: Fast Fourier Transform, Decimation In Time, Pipelining, Radix-2, Registers.

I. INTRODUCTION
The Discrete Fourier transform (DFT) is used for the analysis of discrete time signal algorithms. DFT
is most widely used technique for converting the samples from time domain to frequency domain.
This conversion of signals from time domain to frequency domain is termed as frequency analysis of
non-periodic signals [1]. It can also be achieved by Fast Fourier Transform and also in a very
efficient way. FFT performs this operation at a ver y fast rate and hence it is known as a n
efficient technique for calculating the DFT of signals. It does these tasks with less
overhead in the calculation of complex terms.
Many types of transforms are used to convert a function from one domain to other domain without
any information loss. For an example, Fourier Transform is used to convert the signals from the time
domain to the frequency
domain and it can be calculated by :(1)
DFT also provide samples at equally spaced frequencies as same as Fourier Transform[2]. Consider a
complex input x(n), then N complex multiplications and (N-1) complex additions are required to
compute each and every value of DFT according to formula:(2)
,
Where,
is the twiddle factor for N-point DFT.
72
International Journal of Research in Electronics & Communication

Technology, Volume-1, Issue-1, July-September, 2013, www.iaster.com
ISSN
(O) 2347-6109
(P) 2348-0017
Therefore, a total of N2 complex multiplications and N(N-1) complex additions are required to
compute all N values. Using DIT radix-2 algorithm for FFT, the complex multiplications can be
reduced upto (N/2)log2 N and complex additions can be reduced upto Nlog2N.
II. RADIX-2 DIT ALGORITHM

The simplest algorithms for FFT include radix-2 algorithms. The main strategy which is used in this
algorithm is divide and conquer which decomposing an N-point DFT into small DFTs. The
decimation in time radix-2 divides DFT into even and odd numbered input sequences i.e. it
rearranges the DFT into two parts: summation of even sequences n=[0,2,4,.N-1] and summation
of odd sequences n=[1,3,5N-1]. The outputs of these shorter FFTs are reused to compute many
outputs, thus greatly reducing the total computational cost. As the time samples are rearranged in
groups so it is called decimation in time[1]. They are rearranged in alternating fashion of two
groups i.e even and odd, so that it is called as radix-2 algorithm.
Let x(n) be a sequence of signals having length N=2v and if it decimated into its subsequences of
length N/2, [3] then the subsequences can be written as:(3)
n=0,1,2,..,
(4)
and
n=0,1,2,..,
i.e. even and odd index terms respectively.

The N-point DFT of x(n) given by Eq(2) can be rewritten as:-
=
Since,
, so :X(k) =
= G(k) +
H(k) ,
Where, k= 0,1,,N-1
(5)
The terms G(k) and H(k) in Eq.(5) denotes the DFT of g(n) and h(n) i.e. DFT of even and odd index
terms after decimation and
is the twiddle factor[3].
73

ISSN
(O) 2347-6109
(P) 2348-0017
Fig.1. Radix-2 decimation in time FFT for 32 inputs each of having 32 bits[1].
In Fig1. and Fig2. , after decimating the signals according to even and odd sequences, following inputs
are provided:Even-x(0), x(16), x(8) ,x(24), x(4), x(20), x(12) ,x(28) , x(2), x(18) ,x(10), x(26) ,x(6) ,x(22), x(14),
x(30).
Odd- x(1), x(17), x(9), x(25), x(5), x(21), x(13),x(29), x(3), x(19), x(15), x(31), x(7), x(23),
x(11), x(27).
III.
RADIX-2 FFT USING PIPELINING
Pipelining has several applications in DSP systems as well as in microprocessors. Pipelining basically refers to
the set of some elements connected in such a way that the output of one element can be given as an input to the
next one. These elements contain data for processing at every clock pulse.
Pipelining allows different functions to execute simultaneously [2]. And hence it can increase the
throughput of the system by executing many functions at same time. However, it cannot reduce the
executing and processing time for a single operation but it can definitely decrease the processing
time for a stream of tasks. The disadvantage of a pipelined system is that it has more hardware
requirements than in comparison to its simple system without pipelining.
Fig.2. Radix-2 decimation in time for 32 inputs each of having 32 bits with pipelining.
74

ISSN
(O) 2347-6109
(P) 2348-0017
The concept of pipelining is used here in between the stages of FFT design. Due to the separate
computation of even and odd input signals, a shuffle unit is needed [4]. So, pipelining is used for
implementing the data shuffle at intermediate stages. The shuffle unit can be any device used for
storage purpose. Here registers are used for this. The outputs from each stage are given to a register
[2]. And the outputs from each registers are given to next butterfly stage. Pipelining is done to
achieve higher throughput [4]. Two-parallel pipelining can be achieved by partitioning the inputs
in even and odd , and then they are simultaneously taken from each butterfly stage and provided to
the next stage with the use of storage register working as RAM.
The purpose of RAM is to just shuffle its inputs to its output side. The shuffling process takes
place for every clock pulse positive edge is encountered. Hence, at every clock pulse positive edge
the inputs are taken into one register at its input side and at next clock pulse positive edge these are
passed to output side of register. As after each register, one butterfly stage is present, so these
outputs of register are taken as input by the intermediate stages at clock pulse positive edge. Each
of the inputs is of 32 bits. The complex additions and multiplications are done using DIT radix-2
algorithm during execution of each butterfly stage and are provided to the stage next to it. Fig.2.
shows the proposed architecture for 32-point FFT radix-2 algorithms.
Fig. 3. Internal RTL view of 32-point radix-2 FFT using pipelining.
IV. RESULTS
1.
a)
Results obtained using Cadence:For twiddle factor:-
Timing slack obtained

Leakage Power
Dynamic power
Total power
Area
Gates
b) For signed 32 bit adder:-
4ps
156.649nw
1813406.009nw
1813562.658nw
4956mm2
202
75
Timing slack obtained
541ps
Total power
Area
Gates
947548.448nw
2129 mm2
64

c)
For signed 32 bit subtractor:-
Timing
obtained
slack
864835.019nw
Area
5542 mm2
313
Gates
(O) 2347-6109
(P) 2348-0017
d) For signed 32 bit multiplier:-
4ps
Total power
ISSN
Timing
slack
obtained
Total power
1ps
Area
36444 mm2
1337
Gate
6678047.389nw
2. Results obtained using Xilinx ISE 12.1 suite:a) For the simple architecture of Fig.1 without pipelining:16x64 bit ROM
64
32x32bit Multipliers
256
32bit adders
224
32bit subtractors
224
32 bit registers
448
b) For the architecture of Fig.2 using pipelining:16x64 bit ROM
64
32x32bitMultipliers
256
32bit adders
224
32bit subtractors
224
32 bit registers
704
V. CONCLUSION
The paper
proposes a pipelined architecture for 32 point FFT. It is done for enhancing
the throughput of the architecture which involves complex multiplications and additions. However,
using pipelining has increased the area overhead and hardware but then also it is efficient in terms of
increased speed as given in the following tables:1.
Table showing the Timing summary of clock without pipelining[1]
Minimum delay
30.783ns (17.053ns logic, 13.730ns route) (55.4%
logic, 44.6% route)
Total REAL time to Xst completion:
1611.00 secs
Total CPU time to Xst completion:
1611.39 secs
Total memory usage
1112720 kb
2. Table showing the Timing summary of clock with pipelining

Minimum period
19.919ns (maximum freq. :50.202 MHz)
(15.635ns logic, 4.284ns route)(78.5% logic,
21.5% route)
Minimum input arrival time before clock
5.166ns
Maximum output required time after clock 4.040ns
Total memory usage
3520400 kb
Total CPU time
10560.45secs
76

ISSN
(O) 2347-6109
(P) 2348-0017
Here, by comparing table 1&2 we can say that, the clock period is reduced using pipelining.
3. Table showing the utilization of resources without pipelining[1].
Device Utilization Summary(estimated values)
Logic Utilization
Used
Available
Utilization
Number of slice
73794
46560
158%
Number of fully used LUT-FF
0
73794
0%
Number of bounded IOBs
4096
240
1706%
Number of DSP48E1s
144
288
50%
Resource overuse
Not used
Resource overuse
Resource overuse
4. Table showing the utilization of resources with pipelining.

Device Utilization Summary(estimated values)
Logic Utilization
Used
Available
Utilization
Number of slices
53187
960
5540%
Number of slice FFs
19117
1920
995%
Number of 4 input LUTs
102131
1920
5319%
Number of bounded IOBs
4098
66
6209%
Number of MULT18XSIOs
3
4
75%
Number of GCLKs
1
24
4%
On comparison of table 3 &4, we can conclude that pipelining increases the total area coverage.
REFERENCES
Asmita Haveliya Design and simulation of 32-point FFT Using Radix-2 Algorithm for FPGA
implementation, Advanced Computing & Communication Technologies, 2012 Second
International Conference on, On page(s):167-171.
[2]. Mateus Beck Fonseca, Martins J.B.S, da Costa E.A.C., Design of Pipelined Butterflies from
Radix-2 FFT with Decimation in Time Algorithm Using Efficient Adder Compressors, Circuits
and Systems (LASCAS), 2011 IEEE Second Latin American Symposium on, On page(s): 1-4.
[3]. Monson H Hayes, digital signal processing, Second Edition (New Delhi, Tata McGraw Hill
Education Pvt. Ltd.).
[4]. Wei Han, T. Arslan, A.T. Erdogan and M. Hasan Multiplier-less based Parallel-pipelined FFT
architectures for Wireless communication applications, Acoustics, Speech, and Signal
Processings.(ICASSP05).IEEE International Conference on(Volume 5), On page(s): v/45-v/48.
[5]. Ahmed Saeed, M,Elbably, G.Aldelfadeel, and M.I. Eladawy, Efficient FPGA implementation
of FFT/IFFT Processor, International Journal of Circuits, Systems and Signal Processing,
Issue 3, Volume 3, 2009.
[6]. K. Maharatna, E. Grass, and U. Jaghold, A 64-point Fourier transform Chip for High-speed
wireless LAN application using OFDM, IEEE Journal of Solid- State Circuit, vol. 39, issue
no. 3, On page(s):484-493, March 2004.
[7]. Wei Han, A.T. Erdogan T.Arslan, and M.Hasa, A Novel Low Power Pipelined FFT based on
Subexpression sharing for wireless LAN applications, IEEE Signal Processing Systems
Workshop, 2004(SIPS 2004), On page(s):83-88.
[8]. M.Hasan and T. Arslan, A triple Port RAM based Low Power commutator architecture for a
Pipelined FFT Processor, Circuits and Systems, 2003, ISCAS03, Proceedings of the 2003
International Symposium, May 2003, vol.5,pp.353-356.
[9]. A. Wenzler and E. Luder, New structures for Complex Multipliers and their Noise Analysis,
Circuits and Systems 1995, IEEE International Symposium on.,ISCAS95, vol.2, pp.1432-1435.
[1].
77

12FPGA Implementation 0f 32 Point GM24Sept13VIT

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

12FPGA Implementation 0f 32 Point GM24Sept13VIT

Uploaded by

Copyright:

Available Formats

International Journal of Research in Electronics & Communication Technology

Volume 1, Issue 1, July-September, 2013, pp. 72-77, IASTER 2013

FPGA Implementation of 32 point Radix-2 Pipelined FFT

Keywords: Fast Fourier Transform, Decimation In Time, Pipelining, Radix-2, Registers.

is the twiddle factor for N-point DFT.

International Journal of Research in Electronics & Communication

II. RADIX-2 DIT ALGORITHM

i.e. even and odd index terms respectively.

International Journal of Research in Electronics & Communication

RADIX-2 FFT USING PIPELINING

International Journal of Research in Electronics & Communication

Fig. 3. Internal RTL view of 32-point radix-2 FFT using pipelining.

Results obtained using Cadence:For twiddle factor:-

Timing slack obtained

b) For signed 32 bit adder:-

Timing slack obtained

International Journal of Research in Electronics & Communication

For signed 32 bit subtractor:-

d) For signed 32 bit multiplier:-

2. Table showing the Timing summary of clock with pipelining

International Journal of Research in Electronics & Communication

4. Table showing the utilization of resources with pipelining.

You might also like