Professional Documents
Culture Documents
Indian Journal of Science and Technology, Vol 9(4), DOI: 10.17485/ijst/2016/v9i4/83322, January 2016
Abstract
Objectives: This paper proposes a design of Low power FFT (Fast Fourier Transform) processor used in OFDM (Orthogonal
Frequency Division Multiplexing) application as there is demand for low power design of portable communication device.
Methods: This FFT processor is based on SDF (Single Path Delay Feedback) pipelined Architecture. Digit slicing multiplier
less architecture aids in realizing the complex Multiplication. To reduce power dynamic power dissipation, the proposed
architecture applies clock gating buffer. Control circuit is implemented using Gray code sequence instead of binary code
sequence. The design proposed here is implemented in Verilog HDL. Cadence tool is used for synthesizing the proposed
design Findings: The number of complex multiplication is also reduced by using radix -25 algorithms. The result shows
reduced power consumption up to 25%. Improvements: This paper is presented for 64 Point FFT design; this can also be
extended for Higher N point FFT design.
Keywords: Clock Gating, FFT, Multiplier Less Multiplier, Radix 25, SDF
1. Introduction
1.1 overview
X (k) =
N 1
x(n)W
n= 0
nk
N ,k
= 0, 1,........, N 1
(1)
Where WN denotes twiddle factor, k and n denotes frequency index and time index respectively.
The radix 2K algorithm5 has the same butterfly structure
as radix 2, the only difference is in the number of twiddle
factor for each stage. The 64 point FFT computation with
n=
n1 , n2 , n3 , n4 , n5 = 0, 1 n6 = 0,.....,
N
1
32
N
k1 , k2 , k3 , k4 , k5 = 0, 1 k6 = 0,....., 1
32
N
1
1
32
x 2 n + 4 n
n6 = 0 n5 = 0 n4 = 0 n3 = 0 n2 = 0 n1 = 0
N
N
N
n + n + n + n W nk
8 3 16 4 32 5 6 N
3. The Proposed
FFT Architecture
(2)
The twiddle factor is expressed as follow The types of architecture mostly used in FFT processor design are pipeline and memor
architecture. Pipeline based architectures are most popular because they are desig
N
N
N
N
N
the performance and regularity of data path. The classification of pipeline is b
+ 32 k )
2 n1 + 4 n2 + 8 n3 + 16 n4 + 32 n5 + n6 (k1 + 2 k2 + 4 k3 + 8 k4 +16 kincreasing
6
5
WN
the structure of buffer (memory) which is known as SDF and MDC. SDF architecture
hardware requirement and
higher
utilization rate than MDC. We proposed SDF
TF
Stage1TF
2. Block diagram of 64 point FFT SDF.
Stage
2
Figure
2.Figure
Block
diagram isofshown
64 point
SDF.
architecture
for
FFT
64
point.
The
diagram
in FFT
Figure
2.
n3 ( k1 + 2 k2 )
n2 k2
The modules shown in Figure 2 are buffer of various size implemented by First In First Out
W
= (
1)n1k1 ( j)n2 k1 (
1
)
8
(FIFO) for Time multiplexing, Complex multiplier, Radix-2 Butterfly unit and Control unit.
FIFO functions
as shift register,
it receives
from butter
fly module
and feedback
again to
The modules
shown
indata
Figure
2 are
buffer
of various
Stage1BU
Stage 2 BU
butterfly unit. The radix-2 butterfly operation is shown in Figure 3.
Stage 4TF
Stage 3TF
size
implemented
by
First
In
First
Out
(FIFO)
for
Time
n3 k3
n5 k4
(2n4 + n5 )( k1 + 2 k2 + 4 k3 )
n4 k4
multiplexing, Complex multiplier, Radix-2 Butterfly
W
(
1
)
(
1
)
(
j
)
16
nk
to butterfly unit. The radix-2 butterfly operation is shown
n6 ( k1 + 2 k2 + 4 k3 + 8 k4 +16 k5 )
5 k5
(3)
(
1)n
W N6 6
WN
in Figure 3.
32
Stage 5 BU
The signal flow graph for 64 point FFT using radix 25
algorithms is shown in Figure 19.
plex Multiplier
K0 = (A3A2A1A0)B,
Multiplier
Figure 4.Figure
Complex
multiplication.
4. Complex
multiplication.
(7)
k
1( pb 1)
p
x
=
2
X
2
(6)is applied for designing
k
be sliced into binary numbers of shorter
length. This concept
K1= (A7A6A5A4)*B,
k =0
x = [ is represented
]
.
(6)11 (5) and (6).
multiplier. The basic
by following
Expression
t Slicing Multiplier
F=
F=Fwith
Here x is any number
anI absolute value less than
R+jF
one and x is sliced into b blocks,
P-1 each block being p bits
) FRk +j
wide.
(5)
p 1
Xk =
2 X
j
K2= (A11A10A9A8)*B,
K3= (A15A14A13A12.)*B
P-1
)F
Ik slice.
Figure 5.4 Digit slice.
FigureX=
5. A*B=K
Digit
+ 2 K + 28 K
0
+ 212 K3
part1= A3A2A1A0
part2= A7A6A5A4,
j
j
part3= A11A10A9A8
FIk=
FIk,jof the
Where
FRk=
For
example
X = AB F
InRk,j
thisand
multiplication
one
part4= A15A14A13A12.
operand
(A)have
divided
into four
partsare
as shown
inzero
Figure
5 one.
There
are or
four
different
cases for
the multiplication between the four bits and the tw
ion
and
values
which
either
Any value
whose
A divided
four parts Such
as complement
factors. Figure
e is less than one
can beinto
represented
in two's
as 6 shows the block diagram of the digit-slicing multiplier less using the shi
addition
technique.
Shift-and-add multiplication is similar to the multiplication perform
part 1 = A3A2A1A0
paper and pencil.
part 2 = A7A6A5A4,
K0= (A3A2A1A0)*B,
x =3 =[ A A A A ]
.
(6)
part
11 10 9 8
K1= (A7A6A5A4)*B,
Figure 6.
6. Complex
Complex
digitdigit
slicing.
part 4 = A15A14A13A12.
Figure
multiplication
using
slicing.
(A11A10A9Ausing
K2=multiplication
8)*B,
K3= (A15A14A13A12.)*B
j =0
k, j
4
and
K1 +Journal
28 Kof2 Science
+ 212 K
X= A*B=K0 + 2Indian
3 Technology
There are two components of power in digital circuit they are dynamic power and stati
Dynamic Power = fCLK C V2
(7)
Bn = 2(2n 1)
(8)
Gn = 2n
(9)
Generated by:
Generated on:
Module:
Technology
library:
Operating
conditions:
Wireload
mode:
when the device
powered. The
signal has
been
a notorious
of mode:
power
The isarchitecture
of clock
the proposed
FFT
processor
was sourceArea
ation because of high frequency. It does not perform useful computation but Instance
serves the
Cells
Leakage
Dynamic
Total
designed in Verilog and simulated to verify its functionale of synchronization. Clock is the most popular method for power reduction. Clock
Power(nW)
Power(nW)
Power(nW)
12
The simulation
and synthesis
were inside
performed
using
saves power ity.
by reducing
unnecessary
clock activity
the gate
module due to that
top_ver1_net_ 13350 15111.048 29333223.933 29348334.981
ic power dissipation
is
reduced.
In
FFT
the
buffer
is
involved
in
more
switching activity.
the cadence design tool 180 nm CMOS Technology.
count_single1_
diagram shows the buffer with clock gating.
cg
Module:
top_ver1_net_gray_single1
Technology
library:
tsmc18 1.0
Operating
conditions:
slow (balanced_tree)
Area mode:
Instance
timing library
Cells
Leakage
Power(nW)
Dynamic
Power(nW)
Total
Power(nW)
1. Number
of transition
comparison
ogic level powerTable
optimization
technique
is the reduction
of switching activity. The total
top_ver1_net_ 13512 15504.540 39314365.425 39329869.965
r of transition12 of a binary counter is
Number of Number
Transition
Number of Transition(8) gray_single1
n
Bnof
2(2
1)
=
Bitcode counterfor
for Gray
tal number for a gray
is binary
n
G
2
(9)
n =62
5
32
ower dissipation is based on switching activity i.e., number of transition. A gray code
r is more efficient than a binary counter for designing a control circuit. The proposed 64
Table 2.
Comparisons
of for
FFTcontrol
processor
FFT SDF architecture
requires
5 bit counter
circuit. It is designed using
Gray
In this
paper, radix 25 algorithm, digit slice based multiplier
equence counter. The Table 1 shows the number of transition based on Equation (8) and
for complex multiplication, with Clock gated buffer, gray
Word Power Frequency Area (No.
length (mW)
(MHz)
slice used)
counter sequence for control circuit used for designing of
5. Conclusion
Tablewithout
1. Number
low power 64 point FFT processor. The result shows that
FFT(using
16of transition
39.329 comparison
166.6
13512
clock gating buffer
the design using, with clock gated buffer and gray counter
Numberand
of Bit
Number of Transition for
binary Number of Transition for
sequence for control circuit lowers power consumption
binary
Gray
sequence counter)
by 25% than the design without clock gated buffer and
5
62
32
Proposed
16
29.348
166.6
13350
normal binary counter sequence. Our proposed FFT
ults and Comparisons
chitecture of the proposed FFT processor was designed in Verilog and simulated to verify
Vol 9 (4) | January
| www.indjst.org
ctionality. 4The simulation
and2016
synthesis
were performed using the cadence design tool 180
MOS Technology. Table 2 shows the performance comparison between the proposed 64
FFT and normal FFT processor .The proposed FFT Processor design is based on algorithm
25, digit slice based multiplier less multiplier for multiplication, with clock gated buffer and
6. Acknowledgement
The author would like to thank SRM University Research
lab for supporting this work.
7. References
1. Kalaivani D, Karthikeyen S. VLSI implementation of areaefficient and low power OFDM transmitter and receiver.
Indian Journal of Science and Technology. 2015; 8(18):16.
2. Cooley JW, Tukey JW. An algorithm for the machine
calculation of complex Fourier series. Mathematics of computation. 1965; 19(90):297301.
3. Magar S, Shen S, Luikuo G, Fleming M, Aguilar R. An
application specific DSP chip set for 100 MHz data rates.
Proceedings of International Conference on Acoustics,
Speech, and Signal Processing, ICASSP88; New York. 1988
Apr. p. 198992.
4. Baas BM. A low-power high-performance, 1024-point
FFT processor. IEEE Journal of Solid-State Circuits. 1999;
34(3):3807.