You are on page 1of 5

ISSN (Print) : 0974-6846

ISSN (Online) : 0974-5645

Indian Journal of Science and Technology, Vol 9(4), DOI: 10.17485/ijst/2016/v9i4/83322, January 2016

Low Power 64 Point FFT Processor


V. Sarada1* and T. Vigneswaran2
Department of ECE, SRM University, Chennai - 603203, Tamil Nadu, India; saradasaran@gmail.com
School of Electronics Engineering, VIT University, Chennai - 632014, Tamil Nadu, India; vigneshvlsi@gmail.com
1

Abstract
Objectives: This paper proposes a design of Low power FFT (Fast Fourier Transform) processor used in OFDM (Orthogonal
Frequency Division Multiplexing) application as there is demand for low power design of portable communication device.
Methods: This FFT processor is based on SDF (Single Path Delay Feedback) pipelined Architecture. Digit slicing multiplier
less architecture aids in realizing the complex Multiplication. To reduce power dynamic power dissipation, the proposed
architecture applies clock gating buffer. Control circuit is implemented using Gray code sequence instead of binary code
sequence. The design proposed here is implemented in Verilog HDL. Cadence tool is used for synthesizing the proposed
design Findings: The number of complex multiplication is also reduced by using radix -25 algorithms. The result shows
reduced power consumption up to 25%. Improvements: This paper is presented for 64 Point FFT design; this can also be
extended for Higher N point FFT design.

Keywords: Clock Gating, FFT, Multiplier Less Multiplier, Radix 25, SDF

1. Introduction
1.1 overview

FFT is a very important technique in modern DSP and


Telecommunication especially for application in OFDM
system1. The first FFT algorithm was proposed by2 the
complexity to O (N log2N) from O (N2) of DFT, N denote
the FFT size. For hardware design different FFT Processor
architecture have been proposed. The main classification is memory based34 and pipeline architecture styles5.
Memory based FFT processor design known as processor element approach. It consists of single processing
element and memory unit, hardware cost is less but have
long latency and low throughput. This drawback is overcome in pipeline architecture. The important pipeline
types are SDF and MDC (Multipath Delay Commutator).
In both the types multiplication complexity is same but
the difference is memory size and Hardware utilization
rate. SDF58 pipeline architecture require less memory size
than MDC. Higher radix algorithm5 reduce computation
complexity. The complex multiplier is realized by using
digit slicing concept multiplier less architecture. In order
to improve the power efficiency the buffer is designed
*Author for correspondence

with clock gating. Logic Encoding technique is used for


counter design in control unit.

1.2 Organization of the Paper


A brief review of Radix 25 FFT algorithm is described
in Section 2 and the proposed FFT architecture is presented in Section 3. In Section 4 the implementation and
comparison is described. In section 5 the conclusion are
summarized.

2. Radix 25 FFT Algorithm


A Discrete Fourier Transform (DFT) of length N is
expressed as follows

X (k) =

N 1

x(n)W
n= 0

nk
N ,k

= 0, 1,........, N 1 

(1)

Where WN denotes twiddle factor, k and n denotes frequency index and time index respectively.
The radix 2K algorithm5 has the same butterfly structure
as radix 2, the only difference is in the number of twiddle
factor for each stage. The 64 point FFT computation with

Low Power 64 Point FFT Processor

radix 25 algorithm consists of 6 stages. This algorithm is


formulated using 6 dimensional linear index mapping.
The radix 25 algorithm expression9 is given below.
N
N
N
N
N
n1 + n2 + n3 + n4 + n5 + n6
2
4
8
16
32

n=

n1 , n2 , n3 , n4 , n5 = 0, 1 n6 = 0,.....,

N
1
32

k = k1 + 2k2 + 4k3 + 8k4 + 16k5 + 32k6

N
k1 , k2 , k3 , k4 , k5 = 0, 1 k6 = 0,....., 1
32

X (k1 + 2k2 + 4k3 + 8k4 + 16k5 + 32k6 )


=

N
1
1
32

x 2 n + 4 n

n6 = 0 n5 = 0 n4 = 0 n3 = 0 n2 = 0 n1 = 0

N
N
N

n + n + n + n W nk
8 3 16 4 32 5 6 N

Figure 1. 64 Point Signal Flow graph using Radix 25.


3. The Proposed FFT Architecture
The types of architecture mostly used in FFT processor design are pipeline and memory based
architecture. Pipeline based architectures are most popular because they are designed by
increasing the performance and regularity of data path. The classification of pipeline is based on
the structure of buffer (memory) which is known as SDF and MDC. SDF architecture
has less
5
Figurerequirement
1. 64 and
Point
Signal
Flow
using
Radix 2SDF
. pipeline
hardware
higher
utilization
rate graph
than MDC.
We proposed
5
architecture for FFT 64 point. The diagram is shown in Figure 2.

Figure 1. 64 Point Signal Flow graph using Radix 2 .

3. The Proposed
FFT Architecture
(2)

The twiddle factor is expressed as follow The types of architecture mostly used in FFT processor design are pipeline and memor
architecture. Pipeline based architectures are most popular because they are desig
N
N
N
N

N
the performance and regularity of data path. The classification of pipeline is b
+ 32 k )
2 n1 + 4 n2 + 8 n3 + 16 n4 + 32 n5 + n6 (k1 + 2 k2 + 4 k3 + 8 k4 +16 kincreasing
6
5
WN
the structure of buffer (memory) which is known as SDF and MDC. SDF architecture
hardware requirement and
higher
utilization rate than MDC. We proposed SDF
TF
Stage1TF
2. Block diagram of 64 point FFT SDF.
Stage
2

Figure
2.Figure
Block
diagram isofshown
64 point
SDF.
architecture
for
FFT
64
point.
The
diagram
in FFT
Figure
2.
n3 ( k1 + 2 k2 )
n2 k2
The modules shown in Figure 2 are buffer of various size implemented by First In First Out
W
= (
1)n1k1 ( j)n2 k1 (

1
)
8

(FIFO) for Time multiplexing, Complex multiplier, Radix-2 Butterfly unit and Control unit.
FIFO functions
as shift register,
it receives
from butter
fly module
and feedback
again to
The modules
shown
indata
Figure
2 are
buffer
of various
Stage1BU
Stage 2 BU
butterfly unit. The radix-2 butterfly operation is shown in Figure 3.
Stage 4TF
Stage 3TF
size
implemented
by
First
In
First
Out
(FIFO)
for
Time

n3 k3
n5 k4
(2n4 + n5 )( k1 + 2 k2 + 4 k3 )
n4 k4
multiplexing, Complex multiplier, Radix-2 Butterfly
W
(

1
)
(

1
)
(

j
)

16

unit and Control unit. FIFO functions as shift register, it


Stage 4 BU
Stage 3 BU
receives data from butter fly module and feedback again
Stage 5TF

nk
to butterfly unit. The radix-2 butterfly operation is shown
n6 ( k1 + 2 k2 + 4 k3 + 8 k4 +16 k5 )
5 k5


(3)
(
1)n
W N6 6

WN
in Figure 3.
32
Stage 5 BU
The signal flow graph for 64 point FFT using radix 25
algorithms is shown in Figure 19.

3.1 Complex Multiplier

In FFT, complex multiplication is one of the operations


Figure 2. Block diagram of 64 point FFT SDF.
which is considered for performance analysis. One of the
complex
multiplications
with
multiplier
is given byby First In F
3.The Proposed FFT
The modules shown
in Figure
2 are buffer
of three
various
size implemented
(4) Complex
and shown
in FigureRadix-2
4. Various
complex
(FIFO) for TimeExpression
multiplexing,
multiplier,
Butterfly
unit and Contr
Architecture
FIFO functions asmultipliers
shift register,
receives
data 10
from
butter fly module and feedback a
haveitbeen
proposed
earlier.
unit. The radix-2 butterfly operation is shown in Figure 3.
The types of architecture mostly used in FFTbutterfly
processor
(ar + jai)(br + jbi) = {br (ar ai) +ai (br bj)} + j {bi (ar + ai)
design are pipeline and memory based architecture.
+ ai (br bj)}
(4)
Pipeline based architectures are most popular because
they are designed by increasing the performance and regularity of data path. The classification of pipeline is based
on the structure of buffer (memory) which is known
as SDF and MDC. SDF architecture has less hardware
requirement and higher utilization rate than MDC. We
proposed SDF pipeline architecture for FFT 64 point. The
diagram is shown in Figure 2.
2

Vol 9 (4) | January 2016 | www.indjst.org

3.1.1 Digit Slicing Multiplier


The digit slicing based multiplier aids in reducing the
computation complexity. The binary number can be
sliced into binary numbers of shorter length. This concept
is applied for designing digit slicing multiplier. The basic
is represented by following Expression11 (5) and (6).

Indian Journal of Science and Technology

V. Sarada and T. Vigneswaran

Figure 3. Radix -2 butterfly.

There are four different cases for the multiplication


between the four bits and the twiddle factors. Figure 6
omplex multiplication is one of the operations which is considered for performance
shows the block diagram of the digit-slicing multiplier
One of the complex multiplications with three multiplier is given by Expression
and the shift and addition technique. Shift-and-add
less(4)
using
Figure 4. Various complex multipliers have been proposed10 earlier.
multiplication is similar to the multiplication performed
Figure 3. Radix-2
butterfly.
by paper and pencil.
-2) butterfly.
)*(b +jb ) = {b (a -a ) +aFigure
(b - b )}3.
+ jRadix
{b (a +a
+ a (b -b )}
(4)

plex Multiplier

K0 = (A3A2A1A0)B,

Here x is any number withK an


absolute
less than one and x is sliced into b bl
= (A
A A A )value
B,
1
7 6 5 4

block being p bits wide. K = (A A A A ) B,


2
11 10 9 8
plex multiplication is one of the operations which is considered for
K3 =performance
(A15A14A13A12.)B
of the complex multiplications with three multiplier is given by Expression (4) and
X= AB=K0=+ 24K1 + 28 K2 + 212 K3
ure 4. Various complex multipliers have been proposed10 earlier.
For example X=A*B In this multiplication one of the operand (A) divided into fo
3.2 Clock Gating
shown
br+jbi) = {br (ar-ai) +ai (br- bj)} + j {bi (ar+a
(bFigure
(4)
i) + aiin
r-bj)} 5

Multiplier

There are two components of power in digital circuit they


are dynamic power and static power

Figure 4.Figure
Complex
multiplication.
4. Complex
multiplication.

Dynamic Power = fCLK C V2

(7)

Where represent the switching activity of a circuit, fCLK is


the frequency of the clock, V is the supply voltage and C is
F = FR + jFI
Figure
5. power
Digit represents
slice. the transistor
slicing based multiplier aids in reducing the computation complexity. The binary
capacitance. The device
static
b 1
b 1
an be sliced into binary numbers of
shorter length.P This
concept is applied for leakage
designing
power when the device is powered. The clock signal
11
(2P 1 )FRk
j (2A1Expression
)FIk 
(5)
and parts
(6). Such as
ng multiplier. The basic F
is =represented
by+following
divided
into(5)four
has
been
a notorious source of power dissipation because
k = 0 F=F +jF j = 0
R
I
A3A
2A1A0useful computation
of high frequency. part1=
It does not
perform
P-1
P-1
p 1
part2=
A
A
1
F=
) FRkp+j
)
F
7 6A5A4,
Ikserves the purpose of synchronization.
is the into b blocks
Here x is any numberbutwith
an absolute value less than one andClock
x is sliced
j
j
2 FRkj and
12
F
=
2
F
Where FRk =
part3=
A
A
A
A
11
10
9
8
Ik
Ikj
most
popular
method
for
power
reduction.
Clock
gating
(5)
block being p bits wide.
j =0
j =0
part4=
A
A
A
A
15 14 13clock
12. activity inside
saves power by reducing unnecessary
There
are
four
different
cases
for
the
multiplication
between
the four
In this
equation
FRk and FImultiplication.
have
values
which
are
the
gate
module
due
to
that
dynamic
power
dissipation
is bits and t
Figure
4. Complex
k
j
j
=
factors.
Figure
6
shows
the
block
diagram
of
the
digit-slicing
multiplier
eitherWhere
zero or
one.
Any
value
whose
absolute
value
is
less
FRk,j and FIk=
FIk,j
FRk=
reduced. In FFT the buffer is involved in more switching less using th
licing Multiplier
For
example
X=A*B
In thisBelow
multiplication
one of
operand
(A)
divided
into fourper
pa
addition
technique.
multiplication
is the
similar
the gating.
multiplication
than
be represented
twos
complement
as AnyShift-and-add
activity.
diagram shows
the
buffer
withtoclock
quation
and one can
have
values whichin are
either
zero orinone.
value
whose
shown
Figure
5
paper and pencil.
value is less than one can be represented
in two's complement
as
b 1
ing based multiplier aids in reducing
the
computation
complexity. The binary K0= (A3A2A1A0)*B,

k
1( pb 1)
p

x
=
2
X
2

(6)is applied for designing

k
be sliced into binary numbers of shorter
length. This concept
K1= (A7A6A5A4)*B,

k =0

x = [ is represented
]
.
(6)11 (5) and (6).
multiplier. The basic
by following
Expression

t Slicing Multiplier

F=

F=Fwith
Here x is any number
anI absolute value less than
R+jF
one and x is sliced into b blocks,
P-1 each block being p bits
) FRk +j
wide.
(5)
p 1

Xk =

2 X
j

K2= (A11A10A9A8)*B,
K3= (A15A14A13A12.)*B

P-1

)F

Ik slice.
Figure 5.4 Digit slice.
FigureX=
5. A*B=K
Digit
+ 2 K + 28 K
0

+ 212 K3

A divided into four parts Such as

part1= A3A2A1A0
part2= A7A6A5A4,
j
j
part3= A11A10A9A8
FIk=
FIk,jof the
Where
FRk=
For
example
X = AB F
InRk,j
thisand
multiplication
one
part4= A15A14A13A12.
operand
(A)have
divided
into four
partsare
as shown
inzero
Figure
5 one.
There
are or
four
different
cases for
the multiplication between the four bits and the tw
ion
and
values
which
either
Any value
whose
A divided
four parts Such
as complement
factors. Figure
e is less than one
can beinto
represented
in two's
as 6 shows the block diagram of the digit-slicing multiplier less using the shi
addition
technique.
Shift-and-add multiplication is similar to the multiplication perform
part 1 = A3A2A1A0
paper and pencil.
part 2 = A7A6A5A4,
K0= (A3A2A1A0)*B,
x =3 =[ A A A A ]
.
(6)
part
11 10 9 8
K1= (A7A6A5A4)*B,
Figure 6.
6. Complex
Complex
digitdigit
slicing.
part 4 = A15A14A13A12.
Figure
multiplication
using
slicing.
(A11A10A9Ausing
K2=multiplication
8)*B,
K3= (A15A14A13A12.)*B
j =0

k, j

3.2 Clock Gating

Vol 9 (4) | January 2016 | www.indjst.org

4
and
K1 +Journal
28 Kof2 Science
+ 212 K
X= A*B=K0 + 2Indian
3 Technology

There are two components of power in digital circuit they are dynamic power and stati
Dynamic Power = fCLK C V2
(7)

Low Power 64 Point FFT Processor

3.4 Gray Counter Design


The Logic level power optimization technique is the
reduction of switching activity. The total number of transition12 of a binary counter is

Bn = 2(2n 1)

(8)

The total number for a gray code counter is


Gn = 2n

(9)

The power dissipation is based on switching activity


i.e., number of transition. A gray code counter is more
efficient than a binary counter for designing a control circuit. The proposed 64 point FFT SDF architecture requires
5 bit counter for control circuit. It is designed using Gray
code sequence counter. The Table 1 shows the number of
transition based on Equation (8) and (9)

Table 2 shows the performance comparison between the


proposed 64 point FFT and normal FFT processor .The
proposed FFT Processor design is based on algorithm
radix 25, digit slice based multiplier less multiplier for
multiplication, with clock gated buffer and gray counter
sequence for control circuit.
The results shows power consumption of proposed
FFT processor is 29.3 mw at 166 MHz. This is around 25
% lesser when compared with the power consumption
of normal FFT processor design but with 2% increase in
area.

Generated by:
Generated on:
Module:
Technology
library:
Operating
conditions:
Wireload
mode:
when the device
powered. The
signal has
been
a notorious
of mode:
power
The isarchitecture
of clock
the proposed
FFT
processor
was sourceArea

4. Results and Comparisons

Encounter(R) RTL Compiler v11.20-s017_1


Sep 24 2015 04:55:45 pm
top_ver1_net_count_single1_cg
tsmc18 1.0
slow (balanced_tree)
enclosed
timing library

ation because of high frequency. It does not perform useful computation but Instance
serves the
Cells
Leakage
Dynamic
Total
designed in Verilog and simulated to verify its functionale of synchronization. Clock is the most popular method for power reduction. Clock
Power(nW)
Power(nW)
Power(nW)
12
The simulation
and synthesis
were inside
performed
using
saves power ity.
by reducing
unnecessary
clock activity
the gate
module due to that
top_ver1_net_ 13350 15111.048 29333223.933 29348334.981
ic power dissipation
is
reduced.
In
FFT
the
buffer
is
involved
in
more
switching activity.
the cadence design tool 180 nm CMOS Technology.
count_single1_
diagram shows the buffer with clock gating.
cg

Generated by: Encounter(R) RTL Compiler v11.20-s017_1


Generated on:

Sep 16 2015 03:33:30 pm

Module:

top_ver1_net_gray_single1

Technology
library:

tsmc18 1.0

Operating
conditions:

slow (balanced_tree)

Wireload mode: enclosed

Figure 7. Clock gating buffer.

Figure 7. Clock gating buffer.


ay Counter Design

Area mode:
Instance

timing library
Cells

Leakage
Power(nW)

Dynamic
Power(nW)

Total
Power(nW)

1. Number
of transition
comparison
ogic level powerTable
optimization
technique
is the reduction
of switching activity. The total
top_ver1_net_ 13512 15504.540 39314365.425 39329869.965
r of transition12 of a binary counter is
Number of Number
Transition
Number of Transition(8) gray_single1
n
Bnof
2(2

1)
=
Bitcode counterfor
for Gray
tal number for a gray
is binary
n
G
2
(9)
n =62
5
32
ower dissipation is based on switching activity i.e., number of transition. A gray code
r is more efficient than a binary counter for designing a control circuit. The proposed 64
Table 2.
Comparisons
of for
FFTcontrol
processor
FFT SDF architecture
requires
5 bit counter
circuit. It is designed using
Gray
In this
paper, radix 25 algorithm, digit slice based multiplier
equence counter. The Table 1 shows the number of transition based on Equation (8) and
for complex multiplication, with Clock gated buffer, gray
Word Power Frequency Area (No.
length (mW)
(MHz)
slice used)
counter sequence for control circuit used for designing of

5. Conclusion

Tablewithout
1. Number
low power 64 point FFT processor. The result shows that
FFT(using
16of transition
39.329 comparison
166.6
13512
clock gating buffer
the design using, with clock gated buffer and gray counter
Numberand
of Bit
Number of Transition for
binary Number of Transition for
sequence for control circuit lowers power consumption
binary
Gray
sequence counter)
by 25% than the design without clock gated buffer and
5
62
32
Proposed
16
29.348
166.6
13350
normal binary counter sequence. Our proposed FFT
ults and Comparisons

chitecture of the proposed FFT processor was designed in Verilog and simulated to verify
Vol 9 (4) | January
| www.indjst.org
ctionality. 4The simulation
and2016
synthesis
were performed using the cadence design tool 180
MOS Technology. Table 2 shows the performance comparison between the proposed 64
FFT and normal FFT processor .The proposed FFT Processor design is based on algorithm
25, digit slice based multiplier less multiplier for multiplication, with clock gated buffer and

Indian Journal of Science and Technology

V. Sarada and T. Vigneswaran

processor design can be used to reconfigurable FFT


processor of various OFDM based application for low
power consumption.

6. Acknowledgement
The author would like to thank SRM University Research
lab for supporting this work.

7. References
1. Kalaivani D, Karthikeyen S. VLSI implementation of areaefficient and low power OFDM transmitter and receiver.
Indian Journal of Science and Technology. 2015; 8(18):16.
2. Cooley JW, Tukey JW. An algorithm for the machine
calculation of complex Fourier series. Mathematics of computation. 1965; 19(90):297301.
3. Magar S, Shen S, Luikuo G, Fleming M, Aguilar R. An
application specific DSP chip set for 100 MHz data rates.
Proceedings of International Conference on Acoustics,
Speech, and Signal Processing, ICASSP88; New York. 1988
Apr. p. 198992.
4. Baas BM. A low-power high-performance, 1024-point
FFT processor. IEEE Journal of Solid-State Circuits. 1999;
34(3):3807.

Vol 9 (4) | January 2016 | www.indjst.org

5. Sarada V, Vigneswaran T. Reconfigurable FFT processor


A broader perspective survey. International Journal of
Engineering and Technology. 2013; 5(2):94956.
6. He S, Torkelson M. Designing pipeline FFT processor
for OFDM (de) modulation 1998 URSI. International
Symposium on Signal, System and Electronics, ISSE98;
Pisa. 1998. P. 25762.
7. Groginsky HL, Works GA. A pipeline fast Fourier
transform. IEEE Transactions on Computers. 1970;
C-19(11):10159.
8. Maharatna K, Grass E, Jagdhold U. A 64-point Fourier
transform chip for high-speed wireless LAN application
using OFDM. IEEE Journal of Solid-State Circuits. 2004;
39(3):48493.
9. Cho T, Lee H. A high-speed low-complexity modified
radix-25 FFT processor for high rate WPAN applications
2011. IEEE International Symposium on Circuits and
System, (ISCAS); Rio de Janerio. p. 125962.
10. Yu C, Yen MH, Hsiung PA, Chen SJ. A Low-Power 64-point
Pipeline FFT/IFFT Processor for OFDM Applications. IEEE
Transactions on Consumer Electronics. 2011; 57(1):40.
11. Sarada V, Vigneswaran T. Low power complex multiplier
based FFT processor. International Journal of Engineering
and Technology (IJET). 2015; 7(4):13238.
12. Yeap GK. Practical low power digital VLSI design. Norwell,
MA, USA: Kluwer Academic Publications; 1998.

Indian Journal of Science and Technology

You might also like