Professional Documents
Culture Documents
Consider the rst subdivision step of the radix-2 DIF FFT, which is depicted by
the Gentleman-Sande buttery in Figure 4.2. Recall that by dening y = x + x+ N
2
and z = x x+ N N
the two half-size subproblems to be solved are
2
1
N
2
Yk = X2k =
(4.1)
y k
N ,
=0
k = 0, 1, . . . , N/2 1,
and
1
N
2
Zk = X2k+1 =
(4.2)
=0
z k
N ,
2
k = 0, 1, . . . , N/2 1.
4.1
Recall that each subdivision step is dened recursively by the Gentleman-Sande buttery. Since there is no combination step, it is straightforward to derive an iterative
algorithm by simply subdividing each resulting subproblem iteratively until the problem size is one. This iterative subdividing process is presented in Algorithm 4.1.
Algorithm 4.1 The skeleton of an iterative radix-2 DIF FFT algorithm.
begin
NumOfProblems := 1
Initially: One problem of size N
ProblemSize := N
while ProblemSize > 1 do
Halve each problem
for K := 1 to NumOfProblems do
Compute the Gentleman-Sande buttery to divide
the Kth problem into two halves.
end for
NumOfProblems := NumOfProblems 2
ProblemSize := ProblemSize/2
end while
end
As discussed in Section 2.1, the twiddle factors are assumed to have been pre
computed and stored in an array w, with w[] = N
, 0 < N/2. To halve the
Kth subproblem, the pseudo-code shown in Figure 4.3 can now be inserted inside the
for-loop in Algorithm 4.1 The result is Algorithm 4.2.
Figure 4.3 The pseudo-code implementing the Gentleman-Sande buttery.
4.2
To have a concrete example for use throughout this book, the iterative DIF FFT
Algorithm 4.2 is applied to an input sequence of size N = 32 = 25 . Figure 4.4 displays
the initial contents of a, the ve stages of buttery computation, and the resulting
scrambled sequence (which will be deciphered in Section 4.4.)
To help identify the pairs of subproblems resulting from every stage of buttery
computation, the subproblems involving a[0], which initially contains the input element
x0 , are highlighted in Figure 4.4.
Note that, in general, for an input sequence of N = 2n elements, there are exactly
N/2 butteries in each of the log2 N stages of buttery computations.
Figure 4.4 The DIF FFT with naturally ordered input and bit-reversed output.
Adapted from E. Chu and A. George [28], Linear Algebra and its Applications, 284:95124, 1998.
With permission.
4.3
The actual performance of an algorithm often depends on whether the algorithm accesses data in the computer memory eciently. This is particularly true on todays
high-performance machines, where the oating-point arithmetic is so much faster than
the time required to transfer data in and out of memory. Using the example in Figure 4.4, it is a simple task to verify that after the rst stage of Algorithm 4.2 is com
pleted, only a subset of the N/2 pre-computed N
elements are accessed in each of the
subsequent stages. This subset becomes smaller and smaller as depicted in Figure 4.5
below for N/2 = 16.
Figure 4.5 Accessing the short w vector by the DIF FFT.
Figure 4.5 shows that the individual factors are accessed in array locations apart
by a stride equal to a power of two. This causes the so-called power-of-2 problem on
machines with a hierarchy of memory banks. To avoid this problem, it is common to
store the individual twiddle factors required in each stage in consecutive locations of a
long w vector of size N 1 as shown in Figure 4.6. It is a simple task to modify the
pseudo-code program of the FFT Algorithm 4.2 to employ the long w vector, and it is
left as an exercise.
4.4
To provide background, and to make this section self-contained, a brief review of the
binary number system is presented.
4.4.1
Binary i2 i1 i0
Verication: i2 22 + i1 21 + i0 20 = L
000
0 22 + 0 21 + 0 20 = 0
001
0 22 + 0 21 + 1 20 = 1
010
0 22 + 1 21 + 0 20 = 2
011
0 22 + 1 21 + 1 20 = 3
100
1 22 + 0 21 + 0 20 = 4
101
1 22 + 0 21 + 1 20 = 5
110
1 22 + 1 21 + 0 20 = 6
111
1 22 + 1 21 + 1 20 = 7
4.4.2
By making use of the binary number properties, the scrambled FFT output can now
be easily deciphered if a simple binary address based notation is used to assist in the
specication of Algorithm 4.2. That is, instead of referring to the decimal value of J
in a[J], one uses the binary representation of J and does all the thinking in terms of
binary numbers.
Observe that in Algorithm 4.2, the initial ProblemSize = N = 2n , and the variable
HalfSize takes on the values 2n1 , 2n2 , , 2, 1. Since the distance between a[J] and
a[J + HalfSize] is a power of 2, the binary number property 5 applies here. That is, the
addresses a[J] and a[J + HalfSize], denoted as binary numbers in1 i1 i0 , dier only
in bit ik when HalfSize = 2k and the algorithm can be expressed in terms of binary
addresses as shown below.
Algorithm 4.3 The radix-2 DIF FFT algorithm in terms of binary addresses:
begin
k := n 1
Initial problem size N = 2n
while k 0 do
Halve each problem
Apply Gentleman-Sande buttery computation
to all pairs of array elements whose
binary addresses dier in bit ik
k := k 1
end while
end
For N = 8 = 23 , the DIF FFT algorithm as described above consists of three stages
involving the sequence k = 2, 1, 0, with butteries being applied to all pairs of array
elements whose binary addresses dier in bit ik . In Table 4.2, the binary addresses of
all four pairs of a[J] and a[J + HalfSize] are given for all three stages of the buttery
computation.
Table 4.2 Decimal and binary addresses of a[J] and a[J + HalfSize] for N = 8, k =
2, 1, 0.
Buttery
Current
Decimal
Computation
HalfSize
Stage 1
N/2 = 22 = 4
(k = 2)
Stage 2
N/2 = 2 = 2
(k = 1)
2
N/2 = 2 = 2
3
Binary
Decimal
Binary
a[J]
a[J + HalfSize]
J + HalfSize
a[0]
000
a[4]
100
a[1]
001
a[5]
101
a[2]
010
a[6]
110
a[3]
011
a[7]
111
a[0]
000
a[2]
010
a[1]
001
a[3]
011
a[4]
100
a[6]
110
a[5]
101
a[7]
111
Stage 3
N/2 = 2 = 1
a[0]
000
a[1]
001
(k = 0)
N/23 = 20 = 1
a[2]
010
a[3]
011
N/2 = 2 = 1
a[4]
100
a[5]
101
N/23 = 20 = 1
a[6]
110
a[7]
111
follows the notation xi2 i1 i0 will mean xr , where i2 i1 i0 is the binary representation of
r. Similarly, a[i2 i1 i0 ] refers to the element a[m], where the binary representation of m
is i2 i1 i0 .
Then, the entire Table 4.2 can be replaced by a one-line shorthand notation, which
describes the three-stage DIF FFT process being applied to a[m] as the sequence
i2 i1 i0
2 i1 i0
2 1 i0
where ik indicates that the buttery computation involving the pairs in a dierent in
bit ik is being performed in the current stage, and k indicates that the corresponding buttery operations were performed in a previous stage. The transformation is
completed when butteries in all stages have been performed.
Which element of output X will be found in a[i2 i1 i0 ]? If equations (4.1) and (4.2)
are interpreted as implied by the buttery in Figure 4.2, the rst half of a will contain
the even numbered Xs after all stages are completed, and the second half will contain
the odd-numbered Xs. This in turn means that after the rst buttery step identied
The conclusion can now be immediately extended to the example for N = 32. The
entire Figure 4.4 can be replaced by the sequence
i4 i3 i2 i1 i0
4 i3 i2 i1 i0
4 3 i2 i1 i0
4 3 2 i1 i0
4 3 2 1 i0
with the understanding that on input, a[i4 i3 i2 i1 i0 ] contains xi4 i3 i2 i1 i0 , and on output,
a[i4 i3 i2 i1 i0 ] contains the bit-reversed Xi0 i1 i2 i3 i4 . Refer to Figure 4.4 for the decimal
subscripts of all 32 bit-reversed output elements Xi0 i1 i2 i3 i4 .
4.5
The twiddle factors corresponding to the three-stage DIF FFT may be specied by the
binary representation of the address Jtwiddle in Algorithm 4.2 as shown in Table 4.3.
Binary
w [Jtwiddle]
Jtwiddle w.r.t.
modified
v
i2il
io
NumberOfProblems = 1
w[o] = w::
(Halfsize = 4)
w [ l ]= w h
xi2il io
Shorthand
not at ion w.r.t .
modified x i 2 i l i o
w [ 2 ]= w:
w [ 3 ]= wR
v
72
il io
v
7 271i o
NumberOfProblems = 2
w[o]= w;
(Halfsize = 2 )
w [ 2 ]= w;
ioo = 00
ioo = 10
NumberOfProblems = 4
w [ o ]= w;
00
io0
WN
w; = 1
(Halfsize = 1 )
i1 i0
Observe that during the i2 i1 i0 stage, the twiddle factor w[i1 i0 ] = N
is used to
update a[1i1 i0 ] and during the 2 i1 i0 stage that follows, the updating of a[2 1i0 ] involves
i0 0
w[i0 0] = N
. Finally, the updating of a[2 1 1] during the last 2 1 i0 stage all involves
0
w[00] = N = 1.
For N = 32, the twiddle factors corresponding to the ve stages of buttery computations are