You are on page 1of 58

FFT Circuit Design

Outline
Applications of FFT in Communications Fundamental FFT Algorithms FFT Circuit Design Architectures Conclusions

DAB Receiver
Tuner OFDM Demodulator
256/512/ 1024/2048 point FFT

Channel Decoder

Mpeg2 Audio Decoder

Packet Demux Controller

Control Panel

WLAN OFDM System


TRANSMITTER

FEC Coder

S/P

IFFT 64-pt

Guard Interval Insertion

D/A LPF

Up Converter

MAC Layer
6Mbps ~ 54Mbps

RECEIVER

FEC Decoder

P/S

FFT 64-pt

Guard Interval Removal

A/D LPF

Down Converter 4

ADSL (Discrete Multi-tune) System


TRANSMITTER Data In
S/P

QAM encoders

IFFT 512-pt

add cyclic prefix

P/S

D/A + transmit filter

RECEIVER Data Out P/S


QAM decoders FEQ FFT 512-pt S/P

channel

remove cyclic prefix

TEQ

receive filter + A/D 5

Applications of FFT in Communications


Comm. System FFT Size WLAN 64 DAB 256/512/ 1024/2048 DVB 2048/ 8192 ADSL 512 VDSL 512/1024/ 2048/4096

OFDM

DMT

Outline
Applications of FFT in Communications Fundamental FFT Algorithms FFT Circuit Design Architectures Conclusions
7

Fundamental FFT Algorithms


Discrete Fourier Transfer Pair Radix-2 FFT (N = 2)
Decimation-in-time (DIT) Decimation-in-frequency (DIF)

FFT for composite N (N = N1 N2)


Cooley-Tukey Algorithms Radix-r FFT
8

Discrete Fourier Transform Pair


Let
DFT x[n ] X [k ]

denote a DFP pair. We have


X [k ] =

N 1 n=0

x [ n ] W Nkn , k = 0 , 1 , ..., N 1 ,

1 x[n ] = N

N 1 n=0

X [ k ] W N kn , n = 0 , 1, ..., N 1,

Where, W N = e j ( 2 / N ) .
9

Observations
WNk is N-periodic. WNk is conjugate symmetric.
Both x[n] and X[k] are N-periodic. If x[n] is real, then X[k] is conjugate symmetric and vice versa.
10

Observations
A direct calculation requires approximately N2 complex multiplications and additions. FFT algorithms reduce the computation complexity to the order of N log N. Algorithms developed for FFT also works for IFFT with only minor modifications.

11

Example: Zero-Padding (WLAN)


WLAN 52 sub-carriers: use 64-point FFT.
Null #1 #2 . . #26 Null Null Null #-26 . . #-2 #-1 0 1 2 0 1 2 Time Domain Outputs

Subcarriers

26 27 37 38

IFFT

26 27 37 38

62 63

62 63
12

Decimation-in-Time Radix-2 FFT


Assume N is an even number.
X [k ] = x[n] W
n =0 N 1 kn N N / 2 1 k N r =0

N / 2 1 r =0

x[2r ] W

kr N /2

+W

x[2r + 1] W

kr N /2

k = G[k] + WN H[k]

k = 0, K , N 1

13

Observations
G[k] is DFT of even samples of x[n]. H[k] is DFT of odd samples of x[n]. G[k] and H[k] are N/2-periodic.
WNk+N/2 = - WNk.
14

DIT Radix-2 FFT


X [r ] = G[r] + W Nr H[r] , = G[r] - W Nr H[r] , 0 r < N / 2.
G[r] WNr H[r] -WNr X[r+N/2]
15

X [ r + N / 2 ] = G[r] + W N( r + N / 2 ) H[r] ,

X[r]

Decimation-in-Time Radix-2 FFT


Butterfly for Radix-2 DIT FFT (M-1)th stage

WNr

Mth stage

-WNr

(M-1)th stage -1

Mth stage

WNr In-place Computation

16

Decimation-in-Time Radix-2 FFT


First layer decimation
x[0] x[2] x[4] x[6] x[1] x[3] x[5] x[7] G[0] G[1]
N/2-point G[2] DFT

X[0] X[1] X[2] X[3] X[4]


WN0 WN1 WN2 WN3 -1 -1 -1 -1

G[3] H[0] H[1]


N/2-point H[2] DFT

X[5] X[6] X[7]


17

H[3]

Decimation-in-Time Radix-2 FFT


x[0] x[4] x[2]
WN0 WN0

X[0] -1 -1 -1
WN0 WN0

X[1] X[2] X[3] -1 -1 -1 -1 X[4] X[5] X[6] X[7]


18

x[6] x[1] x[5] x[3] x[7]

WN0

-1

WN2

-1
WN0

WN1

-1 -1

WN2 WN3

WN0

-1

WN2

Bit Reversal
n0 n1 0 0 1 x[n2 n1 n0] 0 1 0 1 1 1 0 1 n2 0 1 0 x[0 1 0] x[1 1 0] x[0 0 1] x[1 0 1] x[0 1 1] x[1 1 1]
19

x[0 0 0] x[1 0 0]

Decimation-in-frequency Radix-2 FFT


Assume N is an even number.
X [k ] X [ 2r ] = =
kn x[n] WN , k = 0, K , N 1 n =0 ( N / 2 ) 1 n =0 rn ( x[n] + x[n + N / 2]) WN / 2 N 1

X [2r + 1] =

( N / 2 ) 1 n =0

n rn ( x[n] x[n + N / 2]) WN WN / 2

r = 0, K , N/ 2-1

20

Decimation-in-frequency Radix-2 FFT


X [ 2r ] =
( N / 2 ) 1 n =0

g[ n] W
n =0

rn N /2

X [2r + 1] =

( N / 2 ) 1

n rn h[n] WN WN / 2 ,

where, g[n] = (x[n] + x[n + N/ 2 ]) h[n] = (x[n] x[n + N/ 2 ]) r = 0, K , N/ 2-1

21

Decimation-in-frequency Radix-2 FFT


Butterfly for Radix-2 DIF FFT

(M-1)th stage

Mth stage

-1

WNn

In-place Computation
22

Decimation-in-frequency Radix-2 FFT


First layer decimation
x[0] x[1] x[2] x[3] x[4] x[5] x[6] x[7]
-1 -1 -1 -1

g[0] g[1] g[2] g[3] h[0] h[1] h[2] h[3]


WN0 WN1 WN2 WN3

X[0]
N/2-point DFT

X[2] X[4] X[6] X[1]

N/2-point DFT

X[3] X[5] X[7]


23

Decimation-in-frequency Radix-2 FFT


x[0] x[1] x[2] x[3] x[4] x5] x[6] x7] -1 -1 -1 -1
WN0 WN1 WN2 WN3

X[0] -1 -1 -1
WN0 WN2 WN0

X[4] X[2]

-1

WN0

X[6] X[1]

-1 -1 -1
WN0 WN2

WN0

X[5] X3]

-1

WN0

X7]

24

Butterfly Comparison
Butterfly (decimation-in-frequency) (M-1)th stage Mth stage

-1 Butterfly (decimation-in-time)

WNn

(M-1)th stage -1

Mth stage

WNr

25

Cooley-Tukey Algorithm
N = N1 N 2
n = N 2 n1 + n 2 , k = k1 + N 1 n 2 , 0 n1 N 1 1, 0 n 2 N 2 1, 0 k 1 N 1 1, 0 k 2 N 2 1,

2D point re - arrangemen t : x [ n ] = x[ N 2 n1 + n 2 ], X [ k ] = X[ k 1 + N 1 k 2 ].
26

Cooley-Tukey Algorithms
Twiddle factor

N1 1 k1n2 k 2 n2 k1n1 X [ k ] = x[ N 2 n1 + n2 ] W N1 WN WN 2 , n2 = 0 n1 = 0 G [ n 2 , k1 ]
N 2 1

~ G [ n 2 , k1 ]

27

Observations
N1 = 2, N2 = N/2 -> 1st stage of the decimation in frequency radix-2 FFT. N1 = N/2, N2 = 2 -> 1st stage of the decimation in time radix-2 FFT.
In general, N = N1 N2 Nn. If N = r n -> Radix-r.
28

Radix-3 FFT (DIF)


Assume N is a multiple of 3.
X[k] X[3r]
kn = x[n]WN n=0 N 1

( N / 3)1 n=0

rn (x[n] + x[n + N / 3] + x[n + 2N / 3])WN / 3 j 2

X[3r +1] =

( N / 3)1 r =0

(x[n] + x[n + N / 3]e


r =0

+ x[n + 2N / 3]e + x[n + 2N / 3]e

j 2

n rn ) WNWN / 3

X[3r + 2] =

( N / 3)1

(x[n] + x[n + N / 3]e

j 2

j 2

2 rn ) WN nWN / 3

29

Radix-3 FFT (DIF)


Butterfly for Radix-3 DIF FFT

(M-1)th stage

j2

j2

WNn

Mth stage

e e

j2

j2

WN2n

30

Radix-4 FFT (DIF)


Assume N is a multiple of 4.
X[4r] = X[4r +1]=
(N/ 4)1 n=0

(x[n]+ x[n+ N/ 4]+ x[n+2N/ 4]+ x[n+3N/ 4])Wrn4 N/ (x[n]+( j)x[n+ N/ 4]+(1)x[n+2N/ 4]+ jx[n+ N/ 4])WnWrn4 N N/
r=0

(N/ 4)1

X[4r +2]= X[4r +3]=

(N/ 4)1 r=0

(x[n]+(1)x[n+ N/ 4]+ x[n+2N/ 4]+(1)x[n+3N/ 4])W2nWrn4 N N/

(N/ 4)1 r=0

(x[n]+ jx[n+ N/ 4]+(1)x[n+2N/ 4]+( j)x[n+3N/ 4])W3nWrn4 N N/


31

Radix-4 FFT (DIF)


Butterfly for Radix-4 DIF FFT

(M-1)th stage

Mth stage

32

Split Radix FFT


Mix Radix-2 and Radix-4 architecture. Compute even transform coefficients based on Radix-2 strategy and odd coefficients based on Radix-4 strategy. Can perform FFT for N = 2.
33

Simplify Butterfly Representations


Radix-2

Radix-4

34

Split-Radix FFT

35

Computational Complexity
Method DFT Radix-2 Radix-4
# of Complex Multiplications # of Complex Additions

N2 (N/2) log2N

N(N-1) N log2N

(3N/8) log2N (3N/2) log2N

The above numbers do not tell the whole story! Architecture is the key issue to trade of among performance, cost, hardware complexity, etc.
36

Outline
Applications of FFT in Communications Fundamental FFT Algorithms FFT Circuit Design Architectures Conclusions
37

FFT Architecture Design Considerations


Trade-off among accuracy, speed, hardware complexity, and power consumption best fit architecture should be application dependent. Main architecture differences in:
Degrees of parallelism number and complexity of processing elements, Control schemes - hardware utilization and data flow control.
38

Degree of Parallelism
One simple processing unit or multiple simple processing units
x[0] x[4] x[2] x[6] x[1] x[5] x[3] x[7]
WN0 WN0 WN0 WN0

X[0] -1
WN0

X[1] -1 -1
WN0

X[2] X[3] -1 -1 -1 -1 X[4] X[5] X[6] X[7]


39

-1

WN2

-1
WN0

WN1

-1 -1

WN2 WN3

-1

WN2

Degree of Parallelism
Simple processing units versus complicate processing units

40

Memory-based FFT architecture


Single butterfly or processing element. Required memory size = N. A control unit ensures the right data flows to compute FFT. Firmware Like. Low complexity. Low speed.
41

Memory-based FFT Block Diagram

Data In

Input Buffer Coefficients ROM or Generator Butterfly or Processing Element RAM Data Out

Control

Control Unit

42

Pipeline Architectures
FFT Signal Flow Graph Multiple path delay commutator Single path delay commutator Single path delay feedback
43

Radix-2 Signal Flow Graph (DIT)


x[0] x[4] x[2] x[6] x[1] x[5] x[3] x[7]
WN0

X[0] -1
WN0 WN0

X[1] -1 -1
WN0 WN1 WN0

X[2] X[3] -1 X[4] X[5] -1 X[6] -1 X[7] -1


BF2

-1 -1

WN2

WN0

-1 -1
BF2

WN2 WN3

WN0

-1
Buffer BF2

WN2

Buffer ROM

ROM

ROM

44

Radix-2 Signal Flow Graph (DIF)


x[0] x[1] x[2] x[3] x[4] x5] x[6] x7] -1 -1 -1 -1 -1 -1 -1
WN0 WN1 WN2 WN3 WN0 WN2 WN0

-1 -1

WN0 WN0

-1 -1
Buffer

WN0 WN2

-1
Buffer

WN0

X[0] X[4] X[2] X[6] X[1] X[5] X3] X7]

BF2

BF2

BF2

ROM

ROM

ROM

45

Multi-Path Delay Commutator


Delay Delay Commutator (switch) Delay Delay Butterfly

46

Radix-2 Multi-Path Delay Commutator


x[0] x[1] x[2] x[3] x[4] x5] x[6] x7] -1 -1 -1 -1
4567 3210 delay
WN0 WN2 WN0 WN2

-1 -1 -1 -1

WN0 WN0 WN0 WN0

-1 -1 -1 -1
76543210

WN0 WN1 WN2 WN3

X[0] X[4] X[2] X[6] X[1] X[5] X3] X7]

switch 3210 4567 5410 7632

3210 3210 4 5 6 7 butterfly 4 5 6 7 delay 5410 5410 7 6 3 2 butterfly 7 6 3 2 delay 6420 6420 7 5 3 1 butterfly 7 5 3 1

switch

5410 7632 delay 6420 7531

switch

delay

47

Radix-2 Multi-Path Delay Commutator


8 4 BF2 2 C2 1

C2

C2 4

BF2

C2

BF2

BF2

N=16
48

Radix-4 Multi-Path Delay Commutator


192 128 64 16 BF4 32 48 C4 48 32 16 4 BF4 8 12 C4 12 8 4 1 BF4 2 3 C4 3 2 1

C4

BF4

N=256
49

Single Path Delay Commutator

Delay Commutator

Butterfly

50

Radix-2 Single Path Delay Commutator

DC2

BF2

DC2

BF2

DC2

BF2

DC2

BF2

N=16

51

Radix-4 Single Path Delay Commutator

DC4

BF4

DC4

BF4

DC4

BF4

DC4

BF4

N=256

52

Single Path Delay Feedback


Delay

Butterfly

53

Radix-2 Single Path Delay Feedback


8 4 2 1

BF2

BF2

BF2

BF2

N=16
54

Radix-4 Single Path Delay Feedback


64x3 16x3 4x3 1x3

BF4

BF4

BF4

BF4

N=256
55

R22SDF
128 64 32 16 8 4 2 1

BF2 I

BF2 II

BF2 I

BF2 II

BF2 I

BF2 II

BF2 I

BF2 II

N=256
56

Hardware Comparison
Architecture Multiplier # R2MDC R2SDF R4MDC R4SDF R4SDC R22SDF 2(log4 N-1) 2(log4 N-1) 3(log4 N-1) log4N-1 log4N-1 log4N-1 Adder # 4 log4 N 4 log4 N 8 log4 N 8 log4 N 3 log4 N 4 log4 N Memory Size 3N/2-2 N-1 5N/2-4 N-1 2N-2 N-1 Control simple simple simple medium complex simple

57

Conclusions
Effect FFT computation is essential to many communication applications utilizing OFDM or DMT technique. A pipelined FFT architecture is applied where a high real-time performance is required. A memory-based FFT architecture can be adopted when cost is more concerned than speed. A best fit FFT architecture depends on application specific requirements to tradeoff among accuracy, speed, chip size, power consumption, etc.
58

You might also like