You are on page 1of 29

Digital Signal Processors ET1304

Projects

Benny Sallberg, Ph.D.


Department of Electrical Engineering
Blekinge Institute of Technology
SE-371 75 Karlskrona, Sweden
E-mail: bsa@bth.se
November 3, 2010
Abstract
The projects in the course Digital Signal Processors ET1304 focus this year on various topics for
speech signal processing. Example projects include, for instance, noise reduction in speech signals,
dynamic range compression, automatic gain control. Four main project groups are used as umbrella
projects. All projects within each main project group are using the same underlying platform but each
project implements a unique speech signal processing method. It is recommended that all projects
within the same main project group collaborate regarding development of the common platform, and
it is required that each group implement and report their unique method individually.
Contents
1 Introduction 3
1.1 Filter banks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Analysis-synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Decimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Perfect reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Subband processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Filter Bank Topologies 5
2.1 FIR lter bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 IIR lter bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 WOLA lter bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Uniform FFT modulated lter bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Speech Processing Methods 12
3.1 Adaptive Gain Equalizer Noise Suppressor . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Spectral Subtraction Noise Suppressor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Wiener Filter Noise Suppressor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 Adaptive Gain Equalizer Speech Signal Suppressor . . . . . . . . . . . . . . . . . . . . . . 14
3.5 Spectral subtraction Speech Signal Suppressor . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.6 Wiener Filter Speech Signal Suppressor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.7 Automatic Gain Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.8 Calibrated Parametric Equalizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Document version 1.1, November 3, 2010


1
4 Project groups 16
5 Project report and presentation 17
5.1 Method Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Project presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Project report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
A WOLA lter bank reference implementation 20
A.1 Example initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
A.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
A.3 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
B Uniform FFT modulated lter bank reference implementation 21
B.1 Example initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
B.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
B.3 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
C Voice activity detection 23
C.1 Example initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
C.2 VAD function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
D Implementation Issues and Hints 24
D.1 Use MATLAB test vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
D.1.1 Example of test vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
D.2 Analysis lter bank verication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
D.3 Analysis-synthesis lter bank verication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
D.4 Estimating DSP processing load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
D.5 Recursive Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
D.5.1 Example 1: Power and magnitude spectrum estimation . . . . . . . . . . . . . . . 28
D.5.2 Example 2: Background noise spectrum estimation . . . . . . . . . . . . . . . . . . 29
2
1 Introduction
Modern speech processing methods are usually implemented in the time-frequency domain. The time-
frequency domain means that a signal is represented not only as a function of time, which is the normal
representation, but it is also a function of frequency. The time-frequency domain representation can be
achieved by ltering the input time signal by a bank of bandpass lters, where the bandpass lters have
very little mutual overlap in frequency, see example in g. 1. Hence, methods that transform a signal
from the time- to the time-frequency domain are usually referred to as lter banks (see, e.g., [1]). The
transformed lter bank signals are denoted as subband signals since each of them describe a sub-band of
the original signal. Signal processing methods are generally more ecient when they are implemented
using lter banks, since the processing load can be implemented in parallel for every subband. Filter
bank processing can be considered as a divide-and-conquer approach within signal processing, since larger
problems are sub-divided into many smaller problems. While lter banks are essential components of
speech processing, and in signal processing in general, they will have the focus during this years course
projects.
0 0.1 0.2 0.3 0.4 0.5
80
70
60
50
40
30
20
10
0
10
H0[f] H1[f] H2[f] H3[f] H4[f] H5[f] H6[f] H7[f]
1
0
l
o
g
|
H
k
[
f
]
|
2
Normalized frequency - f
Figure 1: A bank of eight bandpass lters h
k
[n], with Fourier transforms H
k
[f] = F{h
k
[n]}, comprise a
lter bank.
1.1 Filter banks
Assume a sampled (and quantized) input signal x[n], such as the signal from the Analog-to-Digital
Converter (ADC) of a sound codec connected to a DSP. The signal is uniformly sampled using the
frequency F
s
Hz, and we shall assume, for clarity, that the signal is represented using oating point
arithmetic and that it takes values between -1 and 1. Now, a lter bank uses a set of K bandpass lters,
where the impulse response of each bandpass lter is h
k
[n], for k = 0 . . . K 1, where index k = 0
corresponds to DC frequency and where index k = K/2 corresponds to the Nyquist frequency. The
lter bank output signals are the convolution of the input signal with each of the bandpass lters, i.e.,
x
k
[n] = (h
k
x)[n], where denotes the linear convolution operator. Each lter bank output signal x
k
[n]
is referred to as a subband signal, since it describes one sub-band of the signal. It shall be clear that the
signal set {x
1
[n], . . . , x
K
[n]} is a time-frequency signal since the index n represents time and the index k
represents the sub-band index which corresponds to a unique frequency band.
3
Analysis
filter
bank
g
0
g
1
g
K-1
x
0
[n]
x
1
[n]
x
K-1
[n]
y
0
[n]
y
1
[n]
y
K-1
[n]
x[n] y[n]
Synthesis
filter
bank
Figure 2: Analysis-synthesis lter bank, where the subband signals x
k
[n], from the analysis, are processed
by subband processors g
k
to yield subband output signals y
k
[n] = g
k
{x
k
[n]}, to be reconstructed in the
synthesis. Note that if the subband processors are lters, the subband output signals are computed using
the ordinary convolution y
k
[n] = (g
k
x
k
)[n].
1.2 Analysis-synthesis
The part of the lter bank that transforms a time-signal into a corresponding time-frequency repre-
sentation is referred to as an analysis lter bank. There exists a corresponding inverse transform that
transforms a time-frequency signal, e.g., x
k
[n], into a time-signal, e.g., x[n]. The part of the lter bank
that implements this inverse transform (which is also known as reconstruction) is referred to as a synthesis
lter bank. An analysis-synthesis lter bank is illustrated in g. 2.
1.3 Decimation
It may be noted that, in the denition above, we have a factor K over-representation of the signal since
the subband signals have the same sampling frequency but they, at the same time, represent only 1/K
of the frequencies of the original signal. Hence, it is common practice to decimate the sub-band signals
by a certain factor D, i.e., the analysis lter bank output is decimated as x
k
[nD] = (h
k
x)[nD]. The
factor D is referred to as the decimation rate. The decimation rate is related to the number of subbands
K by the over-sampling ratio O =
K
D
. The over-sampling ratio describes the redundancy in the sub-band
data in relation to the original input time-signal data. If the over-sampling is equal to one, which means
that K = D, then we say the lter bank is critically sampled or critically decimated. Normally, we allow
a certain redundancy in the lterbank, e.g., by a factor of O = 2 which we would refer to as a factor
two over-sampled lter bank. The decimation n the analysis lter bank is naturally compensated by a
corresponding interpolation in the synthesis lter bank. Otherwise, we would change the sample rate
from the input of the analysis lter bank to the output of the synthesis lter bank. Note that after the
interpolation, it is required to lter the interpolated signal in order to suppress aliasing terms due to the
interpolation process.
1.4 Modulation
Some lter bank topologies construct the bandpass lters h
k
[n] by modulating (shifting in frequency)
a prototype low pass lter p[n] by multiplication of a complex sinusoidal h
k
[n] = e
j2kn
K
p[n], note that
h
0
[n] = p[n]. The modulation of the prototype lter in the lter bank bandpass lters leads, after some
mathematical manipulation, to a highly compact and ecient structure, suitable for DSP implementation.
Filter banks that use modulated bandpass lters are referred to as modulated lter banks, but they are
merely specialized cases of the more general lter bank denition, above.
1.5 Perfect reconstruction
Denote by H
A
{ } the analysis lter bank operation and by H
S
{ } the synthesis lter bank operation.
Then, if a certain lter bank yields that x[n] = H
S
{H
A
{x[n]}} we say the lter bank provides a perfect
4
reconstruction. Perfect reconstruction is normally required when you compress sound and speech, but
perfect reconstruction may not be required when you implement adaptive or time-variant methods, such
as the methods in the projects of this course. Hence, we shall allow a non-perfect reconstruction in our
lter banks, i.e., we require only that they provide an approximate reconstruction x[n] H
S
{H
A
{x[n]}}.
1.6 Subband processing
In the sequel, we shall assume a system model according to g. 2, where the lter bank analysis stage
produces a set of subband signals x
k
[n] from a time signal x[n]. The subband signals are subject to
subband processors, denoted as g
k
, to compute subband output signals y
k
[n] = g
k
{x
k
[n]}. The lter
bank synthesis stage transforms the output subband signals y
k
[n] to a time representation y[n]. The
general work ow when using lter banks follows:
1. Filter bank analysis of x[n] to yield x
k
[n],
2. subband processing of x
k
[n] to yield subband output signals y
k
[n] = g
k
{x
k
[n]}, where g
k
are
subband processors,
3. lter bank synthesis of y
k
[n] to yield y[n],
4. update lter bank states.
To this point, we have not discussed the details about the subband processors g
k
; we have only considered
that they provide a mapping from x
k
[n] to y
k
[n]. The exact denition of g
k
shall become clear in section 3
where the signal processing methods are dened.
2 Filter Bank Topologies
Filter banks exist in numerous congurations today (see, e.g., [1]). We shall focus upon four lter bank
topologies, namely;
1. FIR lter bank (2.1),
2. IIR lter bank (2.2),
3. WOLA lter bank (2.3),
4. FFT modulated lter bank (2.4).
2.1 FIR lter bank
A FIR lter bank uses a bank of equal length FIR bandpass lters in the analysis stage. Let h
k
[n] denote
the impulse response of each bandpass lter, where k = 0 . . . K 1 refers to the subband index and the
lters are dened for n = 0 . . . N 1, where N is the lter length. Then each subband signal x
k
[n] is
computed using the convolution sum as
x
k
[n] = (x h
k
)[n] =
N1

l=0
x[n l]h
k
[l]. (1)
The convolution sum above represents the analysis stage of a FIR lter bank. The K bandpass lters
can be designed in MATLAB using the fir1 function, and where the coecients of the designed lters
are stored on the DSP memory. After the subband processors are computed, i.e., y
k
[n] = g
k
{x
k
[n]}, the
5
synthesis stage of a FIR lter bank simply weighs the subband output signals y
k
[n] to yield the output
signal y[n] according to
y[n] =
K1

k=0
y
k
[n]. (2)
The structure of an analysis-synthesis FIR lter bank is illustrated in gure 3. Note that we have no
Synthesis filter bank
g
0
g
1
g
K-1
x
0
[n]
x
1
[n]
x
K-1
[n]
y
0
[n]
y
1
[n]
y
K-1
[n]
x[n]
y[n]
h
0
[n]
h
1
[n]
h
K-1
[n]
Analysisfilter bank
FIR
Figure 3: Analysis-synthesis FIR lter bank structure, note that the subband selective analysis bandpass
lters h
k
[n] are FIR lters.
decimation in this lter bank structure.
2.2 IIR lter bank
An IIR lter bank is very similar to a FIR lter bank in that it also uses a bank of bandpass lters in the
analysis stage. However, the bandpass lters are now IIR lters described by the polynomial function
H
k
[z] =
B
k
[z]
A
k
[z]
=
b
0
+b
1
z
1
+. . . +b
M
z
M
1 +a
1
z
1
+. . . +a
P
z
P
, (3)
where b
m
are FIR coecients and a
p
are IIR coecients. One example of an IIR lter bank using eight
second-order-sections is illustrated in gure 4. Each subband signal x
k
[n] is computed using a dierence
equation
x
k
[n] =
P

p=1
a
p
x
k
[n p] +
M

m=0
b
m
x[n m]. (4)
The design of the K IIR bandpass lters can be done in MATLAB, and where the a
p
and b
m
polynomial
coecients of the designed lters are stored on the DSP memory. After the subband processors are
computed, i.e., y
k
[n] = g
k
{x
k
[n]}, the synthesis stage of an IIR lter bank simply weighs the subband
output signals y
k
[n] to yield the output signal y[n] according to
y[n] =
K1

k=0
y
k
[n]. (5)
The structure of an analysis-synthesis IIR lter bank is illustrated in gure 5. Note that we have no
decimation in this lter bank structure.
6
0 0.1 0.2 0.3 0.4 0.5
50
40
30
20
10
0
10
H0[f] H1[f] H2[f] H3[f] H4[f] H5[f] H6[f] H7[f]
1
0
l
o
g
|
H
k
[
f
]
|
2
Normalized frequency - f
Figure 4: Example of an IIR lter bank using eight bands. The IIR bandpass lters are implemented,
in this example, using second-order-sections on the form H
k
[z] =
1z
2
12r cos
k
z
1
+r
2
z
2
, where r and
k
determine the -3 dB-bandwidth and center frequency of each bandpass lter, respectively.
2.3 WOLA lter bank
The Weighted OverLap Add (WOLA) lter bank is an ecient lter bank technique frequently used in
DSPs, see, e.g., [2, 3]. Four variables, together with an analysis window function w[n], dene the WOLA
lter bank, namely; L the length of the analysis window, D the decimation rate (block rate), K the
number of subbands, and D
F
the synthesis window decimation rate.
The analysis stage, see gure 6, accepts a block of D new input data samples. Each new block is
fed into an input FIFO buer u[n], of length L samples. The data in the input FIFO is element-wise
weighted by the analysis window function and stored into a temporary buer t
1
[n] = u[n] w[n], of length
L samples. The temporary buer is time-folded into another temporary vector t
2
[n], of length K samples.
The time-folding means that the elements of t
1
[n] are modulo-K-added to t
2
[n], according to
t
2
[n] =
L/K1

m=0
t
1
[n +mK]. (6)
The temporary buer t
2
[n] is circularly shifted by K/2 samples in order to produce a zero-phase signal
for the FFT. This means that the upper half of t
2
[n] is swapped place with the lower half of t
2
[n]. The
circularly shifted buer t
2
[n] is then fed into a K-sized FFT to compute the subband signals x
k
[n].
The synthesis stage of the WOLA lter bank implements the actual weighted overlap-add procedure,
i.e., the WOLA procedure. The synthesis stage, see gure 7, starts by applying a size-K IFFT to the
processed subband signals y
k
[n]. The IFFT output is circularly shifted K/2 samples, to counter-act the
circular shift used in the analysis stage, and the circularly shifted data is stored in a temporary buer
t
3
[n], of size K samples. The buer t
3
[n] is then stacked, by repetition, in the buer t
4
[n] of length
L/D
F
, where L is the analysis window length, and D
F
is the synthesis window decimation factor. The
buer t
4
[n] is weighted by a synthesis window function z[n] of size L/D
F
, dened as z[n] = w[nD
F
],
i.e., a factor D
F
decimated analysis window function. The weighted data is summed with the data in
the output FIFO, t
5
[n] of length L/D
F
, and the output FIFO data is over-written with the summation
result, i.e., t
5
[n] t
5
[n] + z[n] t
4
[n]. The output FIFO is then shifted left by D samples, i.e., D zeros
are lled from the FIFOs rear, and the out-shifted data is the actual output data block, y[nD].
7
Synthesis filter bank
g
0
g
1
g
K-1
x
0
[n]
x
1
[n]
x
K-1
[n]
y
0
[n]
y
1
[n]
y
K-1
[n]
x[n]
y[n]
H
0
[z]
H
1
[z]
H
K-1
[z]
Analysisfilter bank
IIR
Figure 5: Analysis-synthesis IIR lter bank structure. Note that the subband selective analysis bandpass
lters H
k
[z] are IIR lters.
The WOLA lter bank is popular due to its simplicity and since many DSP technologies have built
in support for circular buers and FIFOs, which of course helps when implementing the WOLA l-
ter bank. Also, DSPs have usually direct support for FFT/IFFT which renders furthermore ecient
implementations.
Due to the complexity of this lter bank structure, a reference MATLAB implementation of the
WOLA lter bank is provided in appendix A.
2.4 Uniform FFT modulated lter bank
We use a set of K bandpass lters, H
k
[z] for k = 0 . . . K 1, with impulse response functions h
k
[n],
each of length N taps. These bandpass lters are created by modulating (frequency shifting) a low-pass
prototype lter (which is equal to the rst bandpass lter at DC frequency, i.e., h
0
[n]), according to
h
k
[n] = W
kn
K
h
0
[n], for n = 0 . . . N 1, (7)
where W
K
= e
j
2
K
. The Z-transform of each modulated bandpass lter is
H
k
[z] =

n=0
h
k
[n]z
n
=

n=0
W
kn
K
h
0
[n]z
n
=

n=0
h
0
[n]
_
W
k
K
z
_
n
= H
0
[W
k
K
z]. (8)
So, it is clear that each bandpass lter H
k
[z] = H
0
[W
k
K
z] is a frequency shifted (modulated) version of the
prototype lter H
0
[z]. The notation uniform in the section title comes from the fact that the lters are
uniformly distributed on the frequency axis during the modulation process. There are also non-uniform
modulation techniques which are not described here.
We shall now assume that the bandpass signals, i.e., the input signal ltered by each modulated
bandpass lter, are subject to decimation a factor D, where D = K/O, and O denotes the over-sampling
ratio. A lter with a following decimator can be implemented using a polyphase composition, see, e.g.,
[1]. The corresponding polyphase implementation is achieved by dividing the prototype lter H
0
[z] into
D polyphase components, denoted as P
d
[z] with impulse response function p
d
[n] for d = 0 . . . D 1,
according to
H
0
[z] =
D1

d=0
z
d
P
d
[z
D
], (9)
p
d
[n] = h
0
[d +nD], (10)
P
d
[z] =

n=0
p
d
[n]z
n
=

n=0
h
0
[d +nD]z
n
. (11)
8
t
1
[n]
u[n]
w[n]
L
x[nD]
D
L
Circular
shift
FFT
K
x
0
[n]
x
1
[n]
x
K-1
[n]
g
0
g
1
g
K-1
y
0
[n]
y
1
[n]
y
K-1
[n]
t
2
[n]
K K K K K K
L
K
Figure 6: Analysis stage of a WOLA lter bank structure (reprinted from [3]).
Hence, inserting (9) and (11) into (8), yields a polyphase representation of each modulated bandpass
lter H
k
[z] as follows
H
k
[z] = H
0
[W
k
K
z] =
D1

d=0
W
dk
K
z
d
P
d
_
W
kD
K
z
D

=
D1

d=0
W
dk
K
z
d

n=0
h
0
[d +nD]
_
W
kD
K
z
D
_
n
=
=
D1

d=0
W
dk
K
z
d

n=0
h
0
[d +nD]W
nkD
K
z
nD
=
D1

d=0
W
dk
K
z
d

n=0
h
0
[d + nD]W
nk
O
z
nD
,
(12)
where we used that W
nkD
K
= W
nk
O
. The factor W
nk
O
is used to introduce O groups of subband
polyphase components, where each group will be described by the index g = 0 . . . O 1. Each group g
contains subbands having indices k = g +lO for l = 0 . . . D1. Considering only the indices k belonging
to group g, i.e., k = g + lO, yields W
nk
O
= W
n(g+lO)
O
= (W
g
O
)
n
. Each subband component of group g
9
t
5
[n]
t
4
[n]
z[n]
Output FIFO
Circular
shift
IFFT
K
x
0
[n]
x
1
[n]
x
K-1
[n]
g
0
g
1
g
K-1
y
0
[n]
y
1
[n]
y
K-1
[n]
K
Zeros
Output y[nD]
K
L/D
F
L/D
F
D
D
t
3
[n]
L/D
F
Figure 7: Synthesis stage of a WOLA lter bank structure (reprinted from [3]).
is expressed as
H
g+lO
[z] =
D1

d=0
W
d(g+lO)
K
z
d

n=0
h
0
[d +nD]W
n(g+lO)
O
z
nD
=
=
D1

d=0
W
dl
D
z
d

n=0
W
dg
K
h
0
[d +nD] (W
g
O
)
n
. .
=q
g,d
[n]
z
nD
. .
=Q
g,d
[z
D
]
=
=
D1

d=0
W
dl
D
z
d
Q
g,d
[z
D
] =
_
W
0l
D
. . . W
(D1)l
D
_

z
0
Q
g,0
[z
D
]
.
.
.
z
(D1)
Q
g,D1
[z
D
]

, (13)
where we used the fact that W
dlO
K
= W
dl
D
. We denote the lters q
g,d
[n] = W
dg
K
(W
g
O
)
n
h
0
[d+nD] as the
d
th
modied polyphase component of group g, and they have the Z-transforms Q
g,d
[z] =

n=0
q
g,d
[n]z
n
.
We now stack all subbands related to the g
th
group, i.e., subbands with indices k = g + lO for l =
10
0 . . . D 1, in a vector H
g
[z] according to
H
g
[z] =

H
g
[z]
H
g+O
[z]
.
.
.
H
g+(D1)O
[z]

W
00
D
. . . W
(D1)0
D
.
.
.
.
.
.
.
.
.
W
0(Dl)
D
. . . W
(D1)(Dl)
D

z
0
Q
g,0
[z
D
]
.
.
.
z
(D1)
Q
g,D1
[z
D
]

=
= DW
D
1

z
0
Q
g,0
[z
D
]
.
.
.
z
(D1)
Q
g,D1
[z
D
]

, (14)
where W
D
is identied as the D
th
-order discrete Fourier transformation (DFT) matrix, i.e., W
D
1
is
the D
th
-order inverse discrete Fourier transformation (IDFT) matrix. Note that the subband indices in
each group are reordered and thus need to be rearranged in an increasing order after each analysis step.
Note that the size of the DFT/IDFT operations is D, the decimation rate. This lter bank structure
requires that a block of D new data samples are input to the lter bank for each update of the lter bank
states. Hence, the sampling rate of the subband signals is F
S
/D, where F
S
is the sampling frequency
of the fullband signal, x[n]. If D is a power of two, e.g., D = 2
p
for any integer p > 0, then the
DFT/IDFT operation is eciently implemented using the fast Fourier transform (FFT) and its inverse,
the IFFT. Many DSPs have built in support for FFT/IFFT which may be exploited to yield an ecient
implementation.
The analysis stage of this lter bank is illustrated in gure 8.
g
0
g
O
g
(D-1)O
x
0
[n]
x
O
[n]
x
(D-1)O
[n]
x[nD]
x[(n-1)D+1]
q
0,0
[n]
q
0,1
[n]
q
0,D-1
[n]
Group 0
q
O-1,0
[n]
q
O-1,1
[n]
q
O-1,D-1
[n]
Group O-1
x[nD-1]
IFFT
D
IFFT
D
g
O-1
g
2O-1
g
K-1
x
O-1
[n]
x
2O-1
[n]
x
K-1
[n]
y
0
[n]
y
O
[n]
y
(D-1)O
[n]
y
O-1
[n]
y
2O-1
[n]
y
K-1
[n]
Figure 8: Uniform FFT modulated analysis lter bank.
So far, only the analysis lter bank has been discussed. The synthesis is derived in a similar fashion
as the analysis, but its derivation is left as an exercise for the interested student. The synthesis of this
lter bank is illustrated in gure 9. It may, however, be noted that the major dierence between analysis
and synthesis lies in the lters q
g,d
[n], of the analysis, that are replaced by r
g,d
[n], in the synthesis. These
11
g
0
g
O
g
(D-1)O
x
0
[n]
x
O
[n]
x
(D-1)O
[n]
g
O-1
g
2O-1
g
K-1
x
O-1
[n]
x
2O-1
[n]
x
K-1
[n]
y
0
[n]
y
O
[n]
y
(D-1)O
[n]
y
O-1
[n]
y
2O-1
[n]
y
K-1
[n]
FFT
D
FFT
D
r
0,0
[n]
r
0,1
[n]
r
0,D-1
[n]
Group 0
r
O-1,0
[n]
r
O-1,1
[n]
r
O-1,D-1
[n]
Group O-1
y[nD]
y[(n-1)D+1]
y[nD-1]
Figure 9: Uniform FFT modulated synthesis lter bank.
lters r
g,d
[n] are referred to as the d
th
modied polyphase component of group g, but they are based on
a dierent prototype lter.
It is also noted that the subband signals of the FFT modulate lter bank are complex valued data,
which may require special attention in some implementations.
Due to the complexity of this lter bank structure, a reference MATLAB implementation of the
uniform FFT modulated lter bank is provided in appendix B.
3 Speech Processing Methods
Each lter bank structure, see section 1.1, will be used as a fundament to implement eight dierent
speech processing methods, described next. Hence, in total there will be 4 8 = 32 unique projects in
this course.
We have three groups of methods; noise suppression (sections 3.1, 3.2, 3.3), speech suppression (sec-
tions 3.4, 3.5, 3.6), and speech utility functions (sections 3.7 and 3.8). Noise suppression methods enhance
speech and suppress noise, and they can be used for instance to enhance speech in human communica-
tion systems, e.g., VoIP or cellular telephony. Speech suppression methods suppress, on the other hand,
speech signals in favor of noise, and can, for instance, be used in, e.g., public address, Karaoke systems,
or as a support function to generate better noise references for other noise suppression systems. Speech
utility functions are used to enhance part of a signal related to speech.
In the following we shall assume that the received subband signals comprise a mixture of clean speech
plus degrading noise, i.e.,
x
k
[n] = s
k
[n] +v
k
[n], (15)
where s
k
[n] denotes clean subband speech and v
k
[n] denote subband noise.
3.1 Adaptive Gain Equalizer Noise Suppressor
The Adaptive Gain Equalizer (AGE) is a simple but eective method to reduce noise in speech signals
[4]. The AGE uses the fact that speech can be considered non-stationarity compared to the noise. Two
12
energy, or magnitude, measures track the stationary background noise level and the varying speech plus
noise level. The quotient between these two measures form a gain function that amplies non-stationary
speech signals, hence, the methods name Adaptive Gain Equalizer. We shall add a minor adjustment
to the original method described in [4], where the gain function is now:
g
k
[n] = min
_
1,
_
A
x;k
[n]
L
k
A
x;k
[n]
_
p
k
_
, (16)
where L
k
, A
x;k
[n], A
x;k
[n] and p
k
are dened in [4]. Since g
k
[n] has only one lter tap, the subband
output signal is computed using multiplication as y
k
[n] = g
k
[n] x
k
[n].
The measures A
x;k
[n] and A
x;k
[n] can be implemented using recursive averages, see section D.5.
Evaluate the methods noise suppression performance using sound les from [5, 6].
3.2 Spectral Subtraction Noise Suppressor
The spectral subtraction method is frequently used as speech enhancement method in industry and
research today [7]. The spectral subtraction method assumes that the spectral properties of the noise do
not change much from just before speech is active until the speech is active. Hence, if there is no speech
present, the input signal is x[n] = v[n] and a spectrum of the background noise level, denoted as

P
v;k
[n],
is estimated. The spectrum of the input signal, with or without speech, is
P
x;k
[n] = P
s;k
[n] +P
v;k
[n], (17)
since we assume that noise and speech are uncorrelated, i.e., that P
sv;k
[n] = 0. We may rewrite this
relationship as
P
s;k
[n] = P
x;k
[n] P
v;k
[n]. (18)
Hence, the spectral subtraction method continuously subtracts the noise spectrum estimate from the
input signal spectrum to acquire an approximate speech spectrum, i.e.,

P
s;k
[n] = max
_

P
x;k
[n]

P
v;k
[n], 0
_
, (19)
where the max-function is to ensure a constantly positive spectrum estimate. The enhanced speech signal
is
y
k
[n] =

P
s;k
[n]
1
p
e
j

x
k
[n]
, (20)
where

x
k
[n] is the phase of the noisy speech and p determines if magnitude (p = 1) or power spectrum
(p = 2) should be estimated.
The spectral subtraction method requires a Voice Activity Detector (VAD) that detects if speech is
active in one frame. The VAD detection is used to trigger the background noise spectrum estimation,
which should be conducted during speech-silence periods, i.e., when the VAD detector is zero. One ex-
ample implementation of a VAD is given in appendix C.
The power spectrums can be estimated using recursive averages, see section D.5.
Evaluate the methods noise suppression performance using sound les from [5, 6].
13
3.3 Wiener Filter Noise Suppressor
The Wiener lter provides an optimal estimate of a signal embedded in additive noise by minimizing the
Mean Squares Error (MSE),
E{|e
k
[n]|
2
} = E{|s
k
[n] g
k
[n] x
k
[n]|
2
}, (21)
where E{ } is the expectation operator. In other words, the Wiener lter gives the optimal estimate of
speech given a linear combination (ltering) of speech plus noise. The MSE is expanded as
E{|e
k
[n]|
2
} = E{|s
k
[n]|
2
} 2g
k
[n] E{|s
k
[n]|
2
} +g
k
[n]
2
E{|x
k
[n]|
2
}, (22)
since E{s
k
[n]x

k
[n]} = E{|s
k
[n]|
2
} + E{s
k
[n]v

k
[n]} = E{|s
k
[n]|
2
}. The minimum MSE is found where
E{|e
k
[n]|
2
}
g
k
[n]
= 0, which renders that
E{|e
k
[n]|
2
}
g
k
[n]
= 2 E{|s
k
[n]|
2
} + 2g
k
[n] E{|x
k
[n]|
2
} = 0, (23)
g
k
[n] =
E{|s
k
[n]|
2
}
E{|x
k
[n]|
2
}
. (24)
Replacing true expectations with their estimates yields the Wiener solution
g
k
[n] =

P
s;k
[n]

P
x;k
[n]
, (25)
where we may use the trick from spectral subtraction in order to obtain a clean speech spectrum estimate,
i.e., equation (19).
Note that we need a VAD to detect if speech is active in one frame. The VAD detection is used to
trigger the background noise spectrum estimation, which should be conducted during speech-silence pe-
riods, i.e., when the VAD detector is zero. One example implementation of a VAD is given in appendix C.
The power spectrums

P
v;k
[n] and

P
x;k
[n] can be estimated using recursive averages, see section D.5.
Evaluate the methods noise suppression performance using sound les from [5, 6].
3.4 Adaptive Gain Equalizer Speech Signal Suppressor
Modify the adaptive gain equalizer method in section 3.1 so that, instead of suppressing noise, speech is
suppressed.
Evaluate speech suppression performance using sound les from [5, 6].
3.5 Spectral subtraction Speech Signal Suppressor
Modify the spectral subtraction method in section 3.2 so that, instead of suppressing noise, speech is
suppressed.
Evaluate speech suppression performance using sound les from [5, 6].
3.6 Wiener Filter Speech Signal Suppressor
Modify the Wiener lter in section 3.3 so that, instead of suppressing noise, speech is suppressed.
Evaluate speech suppression performance using sound les from [5, 6].
14
3.7 Automatic Gain Control
Automatic Gain Control (AGC) is sometimes referred to as Dynamic Range Compression (DRC). The
objective of an AGC function is to expand low signals and compress large signals to a predetermined
signal level. This is useful practically since we know that an AGC-processed signal has a well-dened
magnitude. Also, in some applications, such as in hearing-defenders, we use AGC-functions to hear weak
sounds better and to protect against harmful loud sounds.
The AGC is a recursive method that controls a signal gain g
k
[n] so that the power of the AGC-
output signal, i.e., E{|y
k
[n]|
2
} where y
k
[n] = g
k
[n] x
k
[n], meets a pre-dened subband target power P
k
.
Dene an input signal power estimate as p
k
[n] E{|x
k
[n]|
2
}. The output signal power is E{|y
k
[n]|
2
} =
g
k
[n]
2
E{|x
k
[n]|
2
} g
k
[n]
2
p
k
[n]. The gain is increased if the output power is less than the target
power and it is decreased if the output power is above the target power. The gain is usually bounded so
that G
min;k
g
k
[n] G
max;k
, where G
min;k
< 1 and G
max;k
> 1 are system constants. The amount of
decrease and increase in gain value is controlled by the attack and release times T
A,k
and T
R;k
, respectively
(see appendix D.5 for a denition of how these attack and release times relate to recursive averages). A
VAD (see appendix C) is usually used to have the AGC operate only on speech signals. The VAD gives a
speech activity detection d[n] that can be zero or one, zero if there is no speech and one if there is speech
present. The gain for next input sample is updated as follows;
g
k
[n + 1] =

(1
A;k
)g
k
[n] +
A;k
G
min;k
if g
k
[n]
2
p
k
[n] > P
k
,
(1
R;k
)g
k
[n] +
R;k
G
max;k
else if g
k
[n]
2
p
k
[n] P
k
and d[n] = 1,
g
k
[n] else.
(26)
Evaluate the AGC performance using sound les from [5, 6].
3.8 Calibrated Parametric Equalizer
A parametric equalizer is a structure that allows subband gains g
k
to be tuned according to a predened
function. The output of a parametric equalizer is y
k
[n] = g
k
x
k
[n], note that g
k
[n] g
k
since we expect
the gains to be xed during normal operation. The parametric equalizer can, for instance, provide large
gain for low subbands to improve the base sounds (low frequency sounds).
We shall, in this project, have a calibrated parametric equalizer structure where the user can modify
the gain gauges of the equalizer in an implicit manner. By sending in a known sound le, and simulta-
neously pressing one DSP button, the equalizer estimates the power-level of the input sound le. When
the user releases the button, the equalizer recomputes the equalizer gain values so that the sound from
the sound le is equalized. The calibrated parametric equalizer has two operating modes;
Mode 1: Equalize This is the default operating mode where the input signal is simply scaled by the
gain values, i.e., y
k
[n] = g
k
x
k
[n].
Mode 2: Calibrate During the calibration, a sound le is input to the equalizer and the power spectrum
density of the input signal is estimated, i.e.,

P
x;k
[n] E{|x
k
[n]|
2
}. When the calibration ends, the
gain values are updated according to g
k
=
_
P
T;k

P
x;k
[n]
, where P
T;k
are nominal power spectrum
density values dened based on system settings (i.e., P
T;k
needs to be calibrated as well). The gain
is bounded as G
min;k
g
k
G
max;k
, where G
min;k
< 1 and G
max;k
> 1. When the calibration
ends, and the gains are updated, the processing falls back to the default equalize mode.
Note that the user should be able to switch between these two operating modes, and the active mode
should be marked by a led-light. On startup, before the user has commenced a calibration, the gains
should be set to 1.
15
4 Project groups
The projects in the course focus this year on various topics for speech signal processing. Four main
project groups are used as umbrella projects. All projects within each main project group are using the
same underlying platform but each project implements a unique speech signal processing method. It is
recommended that all projects within the same main project group collaborate regarding development
of the common platform and it is required that each group implement and report their unique method
individually.
Main project group 1
Group Filter bank Application
1 FIR (2.1) Adaptive Gain Equalizer Noise Suppressor (3.1)
2 FIR (2.1) Spectral Subtraction Noise Suppressor (3.2)
3 FIR (2.1) Wiener Filter Noise Suppressor (3.3)
4 FIR (2.1) Adaptive Gain Equalizer Speech Signal suppressor (3.4)
5 FIR (2.1) Spectral Subtraction Speech Signal suppressor (3.5)
6 FIR (2.1) Wiener Filter Speech Signal Suppressor (3.6)
7 FIR (2.1) Automatic Gain Control (3.7)
8 FIR (2.1) Calibrated Parametric Equalizer (3.8)
Main project group 2
Group Filter bank Application
9 IIR (2.2) Adaptive Gain Equalizer Noise Suppressor (3.1)
10 IIR (2.2) Spectral Subtraction Noise Suppressor (3.2)
11 IIR (2.2) Wiener Filter Noise Suppressor (3.3)
12 IIR (2.2) Adaptive Gain Equalizer Speech Signal suppressor (3.4)
13 IIR (2.2) Spectral Subtraction Speech Signal suppressor (3.5)
14 IIR (2.2) Wiener Filter Speech Signal Suppressor (3.6)
15 IIR (2.2) Automatic Gain Control (3.7)
16 IIR (2.2) Calibrated Parametric Equalizer (3.8)
Main project group 3
Group Filter bank Application
17 WOLA (2.3) Adaptive Gain Equalizer Noise Suppressor (3.1)
18 WOLA (2.3) Spectral Subtraction Noise Suppressor (3.2)
19 WOLA (2.3) Wiener Filter Noise Suppressor (3.3)
20 WOLA (2.3) Adaptive Gain Equalizer Speech Signal suppressor (3.4)
21 WOLA (2.3) Spectral Subtraction Speech Signal suppressor (3.5)
22 WOLA (2.3) Wiener Filter Speech Signal Suppressor (3.6)
23 WOLA (2.3) Automatic Gain Control (3.7)
24 WOLA (2.3) Calibrated Parametric Equalizer (3.8)
Main project group 4
Group Filter bank Application
25 Modulated (2.4) Adaptive Gain Equalizer Noise Suppressor (3.1)
26 Modulated (2.4) Spectral Subtraction Noise Suppressor (3.2)
27 Modulated (2.4) Wiener Filter Noise Suppressor (3.3)
28 Modulated (2.4) Adaptive Gain Equalizer Speech Signal suppressor (3.4)
29 Modulated (2.4) Spectral Subtraction Speech Signal suppressor (3.5)
30 Modulated (2.4) Wiener Filter Speech Signal Suppressor (3.6)
31 Modulated (2.4) Automatic Gain Control (3.7)
32 Modulated (2.4) Calibrated Parametric Equalizer (3.8)
16
5 Project report and presentation
5.1 Method Evaluation
It is expected that each group undertakes a thorough evaluation of their implemented structure, both in
MATLAB and by commencing measurements on their DSP implementation. Each method describes how
the respective implementation can be evaluated.
It is furthermore expected that the evaluation analyzes relevant system parameters, and that the
evaluation tries to nd values of each system parameter that provides an optimal performance in respective
application. System parameters include lter bank parameters, and parameters from each respective
speech processing method.
5.2 Project presentation
1. Each group should prepare to orally present, demonstrate and defend their respective implementa-
tion.
2. Each group should nd a way to demonstrate their solution using a setup that clearly shows the
implementation performance.
3. The total time for oral presentation and demonstration, combined, is restricted to 15 min per group.
Observe that each group needs to pass the presentation as well as the report, so be well prepared for the
presentation.
5.3 Project report
1. Each group will write a project report of maximum 5 pages.
2. All MATLAB and DSP C or assembly code should be submitted together with the report in a
compressed zip le.
3. The report disposition and formatting should follow:
The report should be formatted using IEEE manuscript template for conference proceedings,
see
http://www.ieee.org/conferences_events/conferences/publishing/templates.html.
The front page of the report should show the title of the group project, the full name and
personal-id number of all project members, the course code and the report submission date.
The report should briey (max 2 pages) describe the groups implementation including the
lter banks.
The remaining pages should be used for describing results from the evaluation and discussions
regarding the implementation, or improvements made to each respective project, conclusions,
and for references.
Note that the report should discuss the MATLAB reference implementation as well as the
DSP implementation.
Relevant gures, tables and equations should have proper labels and they must be referred to
form the text, i.e., oating gures that are not referred to form the text are disallowed.
The amount of source code listing within the report should e kept at a minimum!
4. The report must be submitted electronically by email to the course teacher (bsa@bth.se) before
the project presentation.
17
Observe that reports longer than 5 pages, or reports that do not conform to the manuscript template or
the instructions above, will be rejected and the project group will be failed.
If major errors are found in the report, or if the attached MATLAB or DSP source code does not
execute, the student group will be given two extra weeks to correct the errors. The two extra weeks start
at the day the errors have been pointed out by the course teacher. If the student group fails to submit a
correction to the report, or the source code, within these two weeks, the group fails.
18
References
[1] P. P. Vaidyanathan. Multirate Systems and Filter Banks. Prentice Hall, 1993.
[2] R. E. Crochiere. A weighted overlap-add method of short-time fourier analysis/synthesis. IEEE
Trans. Acoust. Speech and Sig. Proc., ASSP-28:99102, February 1980.
[3] R. Brennan and T. Schneider. An ultra low-power dsp system with a exible lterbank. IEEE 35th
Asilomar Conference on Signals, Systems and Computers, 1:809813, February 2001.
[4] N. Westerlund, M. Dahl, and I. Claesson. Speech enhancement for personal communication using an
adaptive gain equalizer. Elsevier Signal Processing, 85(6):10891101, 2005.
[5] Y. Hu and P. Loizou. Subjective evaluation and comparison of speech enhancement algorithms.
Elsevier Speech Communication, 49:588601, 2007.
[6] Y. Hu and P. Loizou. Noizeus: A noisy speech corpus for evaluation of speech enhancement algorithms.
Internet: http://www.utdallas.edu/
~
loizou/speech/noizeus/, 1 Nov. 2010.
[7] S. F. Boll. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust.
Speech and Sig. Proc., ASSP-27:113120, April 1979.
19
Appendices
A WOLA lter bank reference implementation
The following lists a MATLAB reference implementation for the WOLA lter bank. It assumes an input
block, denoted as xblock, of D input data samples and it produces an output block, denoted as yblock,
of D output data samples. The initialization should be run once at program startup, then analysis is
executed before synthesis for each new block of input samples, and we assume that subband processors
transform the subband data before synthesis.
A.1 Example initialization
% prototype filter length
L = 128;
% number of subbands
K = 32;
% oversampling ratio
OS = 2;
% prototype filter decimation ratio
DF = 1;
% output prototype filter length
LOUT = L / DF;
% data decimation rate
D = N/OS;
% analysis prototype filter
w = 0.85*N*fir1(L-1,1/N);
% synthesis prototype filter
if LOUT < L
wOUT = w(1:DF:L);
else
wOUT = w;
end;
% input FIFO
xmem = zeros(L,1);
% output FIFO
ymem = zeros(LOUT ,1);
A.2 Analysis
% update input FIFO
xmem = [xmem(D+1:end); xblock ];
% weight input FIFO with analysis prototype window
temp = xmem .* w;
% time -folding
xn = temp(1:K);
for k = K+1:K:L-K+1
xn = xn + temp(k:k+K -1);
end;
% circular shift
xn = [xn(K/2+1: end); xn(1:K/2)];
% fft
20
X = fft(xn, K);
% since we know input data is real valued, its Fourier transform is
% complex conjugate symmetric , and we may omit analysis of the upper
% frequency half , since it can anyway be reconstructed from the
% lower frequency half ...
X = X(1:K/2+1);
A.3 Synthesis
% reconstruct the upper frequency half , since we can assume complex
% conjugate symmetry of the data.
Y = [Y; conj(Y(end -1: -1:2))];
% compute ifft , certify that the result is real valued.
temp = real(ifft(Y));
% revert circular shift
temp = [temp(K/2+1: end); temp(1:K/2)];
% repeat the ifft -data , weight with synthesis prototype window, and
% add with output FIFO.
ymem = repmat(temp ,LOUT/K,1) .* wOUT + ymem;
% shift output FIFO
yblock = ymem(1:D);
ymem = [ymem(D+1:end); zeros(D,1)];
B Uniform FFT modulated lter bank reference implementa-
tion
The following lists a MATLAB reference implementation for the uniform FFT modulated lter bank. It
assumes an input block, denoted as xblock, of D input data samples and it produces an output block,
denoted as yblock, of D output data samples. The initialization should be run once at program startup,
then analysis is executed before synthesis for each new block of input samples, and we assume that
subband processors transform the subband data before synthesis.
B.1 Example initialization
% number of subbands
K = 32;
% oversampling ratio
O = 4;
% decimation rate
D = K/O;
% number of taps per polyphase component
P = 3;
% total prototype filter length is L = D*P;
% prototype filter
PROTFLT = sqrt(2/O)*fir1(D*P - 1, 1/K);
% compute analysis and synthesis modified polyphase components for
% each group , g
POLYPHS_A = zeros(D, P, O );
POLYPHS_S = zeros(D, P, O );
IDXK = 0 : P-1;
for g = 0:O-1
for d = 0:D-1
21
POLYPHS_A(d+1, :, g+1) = fliplr(D * exp( j*g*2*pi*d/K ) ...
* PROTFLT( IDXK*D + d + 1 ) .* exp( j*2*pi*g*IDXK/O ));
POLYPHS_S(d+1, :, g+1) = flipdim(conj( POLYPHS_A(d+1, :, g+1)), 2);
end;
end;
% input and output FIFOs
SMPMEM_A = zeros(D, P);
SMPMEM_S = zeros(D, P, O);
% temporary and intermediate variables
X = zeros(K/2+1, 1);
TempA = zeros(D, 1);
IDXI = 1:(K/2)/O;
IDX_A = zeros ((K/2)/O, O);
IDX_S = zeros(D, O);
for g = 1 : O
IDX_A(:, g) = g : O : K/2;
IDX_S(:,g) = g : O : K;
end;
B.2 Analysis
% update analysis FIFO
SMPMEM_A = [xblock , SMPMEM_A(:, 1:end -1)];
% go through all groups
for g = 1:O
% filter with respective modified polyphase component , and compute the
% IFFT
TempA = ifft( sum( SMPMEM_A .* POLYPHS_A(:,:,g), 2) );
% since the subband indices for each group are in an irregular order ,
% we use IDX_A to reorder the subband indices in an increasing order.
X(IDX_A(:,g)) = TempA(IDXI , :);
if g==1
X(end) = TempA(IDXI(end)+1, :);
end;
% note that X contains only the first frequency half since we assume
% that xblock is real valued, which implies that X is complex conjugate
% symmetric.
end;
B.3 Synthesis
% reconstruct the upper frequency half
Temp = [Y; conj(Y(end -1: -1:2))];
% go through each synthesis group and update the synthesis FIFOs and
% compute the FFT , here , IDX_S rearranges the subband indices from
% a linear ordering into an order according to each group g.
for g = 1 : O
SMPMEM_S(:, 2:end , g) = SMPMEM_S(:, 1:end -1, g);
SMPMEM_S(:, 1, g) = fft( Temp(IDX_S(:, g), :));
end;
% compute the output by filtering synthesis FIFO with synthesis modified
% polyphase components. real is used to ensure real output data.
yblock = real(sum(sum(SMPMEM_S .* POLYPHS_S , 3), 2));
22
C Voice activity detection
The following lists a MATLAB reference implementation for a Voice Activity Detector (VAD). It as-
sumes one input sample, denoted as xsample, and it produces one VAD-decision, denoted as VADout. If
VADout=0 it indicates there is no speech and if VADout=1 it indicates speech. The VAD decision can
be used, for instance, to estimate background noise during speech pauses, i.e., when VADout=0. The
initialization should be run once at program startup, then the VAD function is executed for each new
input sample.
C.1 Example initialization
% Sampling frequency in Hz
fs = 8000;
% Time constants of the VAD
tA = 1/(fs *0.025); % 0.025 s
tB = 1/(fs *0.1); % 0.1 s
tBF = 1/(fs*5); % 5 s
% SNR detection threshold
SNR_THS = 1;
% Number of subbands in VAD
K = 4;
% VAD filter bank parameters
alpha = 0.99;
T = 3;
q = linspace(0,1,K+T).^2;
q = 0.5*(q(1:end -T) + q(2:end -T+1));
R = 1-pi/2*sqrt(q)/K;
theta = pi*q;
% VAD filter bank memory
ZMEM = zeros(K,2);
% Energy measures
E = zeros(K,1);
Eflr = zeros(K,1);
% temporary vector
xk = zeros(K ,1);
C.2 VAD function
% Compute VAD analysis filter bank
for k = 1:K
Ak = [1,-2*R(k)*cos(theta(k)),R(k)^2];
Bk = [1,-alpha];
[xk(k), ZMEM(k ,:)] = filter(Bk , Ak, xsample , ZMEM(k,:));
end;
% Compute subband energy
E = (1-tA)*E + tA*xk.^2;
if n<3/tA
Eflr = E;
end;
% Estimate subband SNR
SNREstimate = (E - Eflr)./Eflr;
% Update background noise estimate
23
id = find(SNREstimate <= SNR_THS);
jd = find(SNREstimate > SNR_THS);
if length(id) > 0
Eflr(id) = min((1-tB)*Eflr(id) + tB*E(id), E(id));
end;
if length(jd) > 0
Eflr(jd) = min((1-tBF)*Eflr(jd) + tBF*E(jd), E(jd));
end;
% Compute VAD detection output
VADOut = double(mean(SNREstimate) > SNR_THS);
D Implementation Issues and Hints
D.1 Use MATLAB test vectors
It is suggested that you rst implement all parts of your method in MATLAB. First, to see you have
understood the concept and, second, that you have a correct reference implementation. Then, to verify
parts of your DSP-implementation, generate random test input vectors in MATLAB and compute test
output vectors. Then, in your DSP-implementation, input the same test input vectors and compute (in
the DSP) corresponding output vectors. If the MATLAB output vector and DSP output vector are the
same, your implementation is correct for those test vectors.
D.1.1 Example of test vectors
Assume I want to verify that my function mymul is correctly implemented. This function implements the
following
mymul(a, b) = a b, (27)
for two real variables a and b. In MATLAB, my code is
function [out] = mymul(a,b);
out = a*b;
And my corresponding C-implementation is
float mymul(float a, float b)
{
return a*b;
}
To test my C-implementation, i generate random test vectors in MATLAB, e.g.,
NUM_TESTS = 3;
TEST_VEC_a = round (10000*randn(NUM_TESTS ,1))/10;
TEST_VEC_b = round (10000*randn(size(TEST_VEC_a)))/10;
for k = 1: NUM_TESTS
fprintf(1, mymul (%.1f,%.1f)=%.2f\n, TEST_VEC_a(k), TEST_VEC_b(k), ...
mymul(TEST_VEC_a(k), TEST_VEC_b(k)));
end;
Example MATLAB console output:
mymul ( -354.1, -801.2)=283704.92
mymul ( -1372.4 ,23.0)= -31565.20
mymul (877.1 ,285.1)=250061.21
When I use the same input data a, b to my C-function I see that I get an identical output, which leads
me to conclude that my C-implementation works as expected (for these test vectors).
24
D.2 Analysis lter bank verication
A simple way to verify and test an analysis lter bank implementation is to input a sinusoidal signal, where
the sinusoidals frequency is exactly matching the center frequency of one subband, and then analyze the
subband signals x
k
[n]. We know that the amplitude of adjacent subbands should be very low in the
middle of each subband, which means we have a little amount of aliasing. Hence, if the analysis stage
is correctly implemented we shall se most of the signal energy only in the subband where the sinusoidal
signal is centered.
In this test, the subband processors are g
k
= 1 for k = 0 . . . K 1. An example is illustrated in
gure 10 where a sinusoidal signal is centered in band k = 2. A corresponding gure showing the input
subband signals x
k
[n] after the analysis lter bank is given in gure 11.
0 0.1 0.2 0.3 0.4 0.5
80
70
60
50
40
30
20
10
0
10
H0[f] H1[f] H2[f] H3[f] H4[f] H5[f] H6[f] H7[f]
1
0
l
o
g
|
H
k
[
f
]
|
2
Normalized frequency - f
Figure 10: An eight band analysis lter bank (shaded) with a sinusoidal signal centered in the middle
of band k = 2 (black). Notice that adjacent bands have a very small magnitude (< 60 dB) where the
sinusoidal signal is located, which means we have little aliasing.
D.3 Analysis-synthesis lter bank verication
It is straightforward to verify a DSP-implementation of an analysis-synthesis lter bank. What you need
is a white noise generator (e.g., randn command in MATLAB), subband processors that are constant
scalars, and a power spectrum estimator (e.g., psd command in MATLAB). The subband processors are
now scalar weights that can take on values g
k
= 0 or g
k
= 1, see the lter bank conguration in gure 12.
We note that a linear system is dened by the relationship y[n] = (hx)[n] which has the corresponding
Fourier transform relationship Y [f] = H[f] X[f]. The square modulus of this relationship is |Y [f]|
2
=
|H[f]|
2
|X[f]|
2
. Taking the expectation value yields E{|Y [f]|
2
} = |H[f]|
2
E{|X[f]|
2
} which is equal
to P
yy
[f] = |H[f]|
2
P
xx
[f], where P
xx
[f] and P
yy
[f] are the power spectral densities (PSD) of the input
signal x[n] and output signal y[n], respectively. Hence, the transfer magnitude function is computed as
|H[f]| =
_
Pyy[f]
Pxx[f]
. We will use this last relationship to estimate the lter bank transfer function and
thereby verify the lter bank.
The protocol we follow to verify a lter bank implementation is:
1. Generate a noise sequence x[n] by, for instance using the randn command in MATLAB. Store this
sequence as a wav-le.
25
100 200 300 400 500
x0[n]
x1[n]
x2[n]
x3[n]
x4[n]
x5[n]
x6[n]
x7[n]
Sample index
Figure 11: Input subband signals from an eight band analysis lter bank with a sinusoidal signal centered
in the middle of band k = 2. Notice that most of the signal magnitude is found in subband 2. A small
initial transient can be seen for all bands around 70 samples, this is the step due to the sinusoidal signal
being activated is seen across subbands.
2. Input the generated noise sequence x[n] to a DSP-project that simply copies the ADC input to the
DAC output, i.e., without any lter bank processing. Estimate the PSD of the DAC output and
store the PSD as

P
xx
[f]. This step is illustrated in gure 13.
3. According to gure 14, repeat the following for each subband k = 0 . . . K 1:
(a) Set the subband processor weights to zero except for at index k where the weight is one, i.e.,
g
m
=
_
1, if m = k
0, if m = k
for m = 0 . . . K 1.
(28)
(b) Input the generated noise sequence x[n] to the ADC input.
(c) Perform the analysis-synthesis lter bank operation.
(d) Output the synthesis lter bank output signal y[n] to the DAC output channel.
(e) Estimate the PSD of the DAC output and store the PSD as

P
yy|k
[f].
(f) Estimate the transfer magnitude function, as |

H
k
[f]| =
_

P
yy|k
[f]

Pxx[f]
.
(g) Compare the estimated transfer magnitude function |

H
k
[f]| to the ideal transfer magnitude
function, |H
k
[f]|.
(h) Repeat steps 3.a to 3.g for each subband k = 0 . . . K 1.
One example on how the results from the verication procedure may look for the third band, i.e.,
h
2
[n], of an eight band lter bank is illustrated in gure 15. We see that the estimated transfer magnitude
function |

H
2
[f]| is very similar to the true transfer magnitude function |H
2
[f]| which leads us to conclude
that the lter bank implementation of subband k = 2 is correct.
26
Analysis
filter
bank
x
0
[n]
x
1
[n]
x
K-1
[n]
y
0
[n]
y
1
[n]
y
K-1
[n]
x[n] y[n]
Synthesis
filter
bank
g
0
g
1
g
K-1
Figure 12: Analysis-synthesis lter bank conguration used to verify the lter bank implementation.
The subband signals x
k
[n] are scaled by subband scaling constants g
k
to yield subband output signals
y
k
[n] = g
k
x
k
[n].
White
noise
generator
Power
spectrum
estimator
P
xx
[f]
ADC DAC
x[n]
DSP board
x[n]
Copyinput to the output
Figure 13: Pass a generated noise le through the DSP system without doing any processing of the signal,
estimate the PSD of the output signal.
D.4 Estimating DSP processing load
It is important to ensure that the implemented method does not require more than available DSP cycles
for its computation. Else, we risk that the DSP processed sound contains undesirable artifacts due to
e.g. clipping. The DSP processing load can be estimated by using a General Purpose IO (GPIO) pin.
Connect any available GPIO pin to, e.g., an oscilloscope. In your DSP-software, dene the same GPIO
as an output pin. Now, when you start processing a new sample, or block of samples, in your DSP-code,
set the GPIO high, and when your software is nished processing, set the GPIO low. This means that
the GPIO signal will be toggeling between low and high, and it is low when you are not processing
samples and high when you are processing samples. By measuring the duty cycle on the GPIO pin we
can estimate the processing load. If, for instance, the GPIO signals duty cycle is 50 %, it means the
time the GPIO is low is the same as the time the GPIO is high, hence, we utilize 50 % of the available
resources. Note that you should have less than 100 % resource utilization, and you should reserve a small
processing margin (say 5 %) for other over-head tasks performed by the DSP to avoid sound artifacts.
D.5 Recursive Averages
Recursive averages are highly useful in practical implementations due to their low computational cost.
In eect, they implement an averaging memory whose memory length can, at least approximately, be
dened arbitrarily.
Let us consider a one-to-one mapping (i.e., a scalar function of a scalar variable) f
k
of a random
variable, e.g., x
k
[n], and then we consider its expectation, i.e., E{f
k
(x
k
[n])}. Auto-regressive averages
are used to continuously approximate this expectation and where the approximation is updated for every
new input sample. Denote the average that approximate E{f
k
(x
k
[n])} as a
k
[n], it is dened as
a
k
[n] = (1
k
)a
k
[n 1] +
k
f
k
(x
k
[n]), (29)
27
White
noise
generator
Power
spectrum
estimator
P
yy
[f]
ADC DAC
Filter
bank
x[n] y[n]
DSP board
Figure 14: Pass a generated noise le through the DSP system and apply an analysis-synthesis lter
bank where the subband processors are zero except for at one subband index where it is one, and then
estimate the PSD of the output signal.
0 0.1 0.2 0.3 0.4 0.5
80
70
60
50
40
30
20
10
0
10
H2[f]
1
0
l
o
g
|
H
2
[
f
]
|
2
Normalized frequency - f
Figure 15: Verication of the second band of an eight band lter bank using the proposed verication
method, where the estimated transfer magnitude function |

H
2
[f]| (black) and the true transfer magnitude
function |H
2
[f]| (gray) are overlayed.
where
k
is referred to as the learning parameter. The learning parameter is dened as
k
=
1
FST
;k
,
where F
S
is the sampling frequency in Hz, and T
;k
is the averages memory length in seconds. Note
that if the data x
k
[n] is decimated by a factor D, for instance by a lter bank, then the corresponding
denition of the learning parameter is
k
=
D
FST
;k
. We see that a
k
[n] is low-pass ltering f
k
(x
k
[n]).
Hence, if
k
1, we will have a
k
[n] E{f
k
(x
k
[n])}, asymptotically.
D.5.1 Example 1: Power and magnitude spectrum estimation
Let the factor p decide if we estimate power or magnitude spectrum of a signal x
k
[n]. If p = 2, a power
spectrum is estimated, and if p = 1, a magnitude spectrum is estimated. The spectrum of x
k
[n] is
E{|x
k
[n]|
p
}. The spectrum is estimated using a recursive average

P
x;k
[n], according to

P
x;k
[n] = (1
k
)

P
x;k
[n 1] +
k
|x
k
[n]|
p
. (30)
28
D.5.2 Example 2: Background noise spectrum estimation
Let the factor p decide if we estimate power or magnitude spectrum of the background noise of a signal
x
k
[n]. If p = 2, a power spectrum is estimated, and if p = 1, a magnitude spectrum is estimated. Let a
VAD (see section C) signal if there is speech in a signal frame, or not. The VAD detector value is d[n] = 0
if there is no speech in the frame and it is d[n] = 1 if there is speech in the frame. Since we want to
estimate background noise, we should only estimate when d[n] = 0 because x
k
[n] = v
k
[n] if there is no
speech present. The spectrum of v
k
[n] is E{|x
v
[n]|
p
}. The spectrum is estimated, during speech pauses,
using a recursive average

P
v;k
[n], according to

P
v;k
[n] =
_
(1
k
)

P
x;k
[n 1] +
k
|x
k
[n]|
p
if d[n] = 0,

P
v;k
[n 1] else.
(31)
29

You might also like