You are on page 1of 83

SPEECH ENCRYPTION USING

WAVELET PACKETS

A Thesis

Submitted for the Degree of

f l n e f er of s c i e n c e
L

in the Faculty of Engineering

BY
AJIT S . BOPARDIKAR

Department of Electrical Communication Engineering


INDIAN INSTITUTE

OF

SCIENCE

BANGALORE-560 0 12
FEBRUARY

Acknowledgements
It is a pleasure to thank all the people involvedwith my work at IISc, directly or indirectly.
4

I am grateful to my thesis supervisor Prof. V. U. Reddy for his encouragement and


guidance. He set an example with his hardwork, meticulousness and single-minded dedication to his work.

I would also like to- thank Dr. B. S. Adiga for the interaction, help and friendship. Dr.
Ajay Divakaran deserves special thanks for patiently going through so many drafts of the thesis and helping improve it in its form and content I would like to thank Dr. N. S. Jayant, Dr. R. V. Cox, Dr. S. Kadamte and Dr. D. Sinha of the AT & T Bell labs for the interaction I had with them. They helped me get a better understanding of speech scrambling and the issues involved. Dr. T. V. Sreenivas and the students of the speech lab have been very kind to give me speech signals for my simulations. I would like to thank them for this and for numerous discussions. It was fun working in the ASP lab because of my labmates, George Mathew, K. Anand,

M. K. Sridharan, P. Ravindra and P. S. Vanisree. The gentle George Mathew seems to have
a solution to any problem. Friendly 'tiffs' and interesting discussions with Anand, Ravindra and Sridharan helped to keep up my spirits.

I would like to thank my former lab-mates Sebastian Gracias for insightful discussions
on wavelets and Ganesh Iyer for all the 'chaos' he created.

I thank my collegues, Yogish and L. Srinivas and the visiting students from Stanford,
Babak, Tibor, Park and Bijit, who proved to be great friends. , Darshak, Srikala, Preeta, Gsri, Yatish, Rajesh I thank my friends Rajesh, ~ u b b u Malay, Banginwar, Kavita, La1 and Sanjeev who made my stay at IISc a very fruitful and memorable one, would also like to thank my friends Santosh, Ravi and Jay for helping me out in various ways.

I thank the Karumbiahs for providing me with a home away from home. Finally, I thank my family who stood by me at all times, for their love and support

Abstract
The aim of speech scrambling algorithms is to transform clear speech into an unintelligible signal so that it is difficult to decrypt it in the absence of the key. Most of the existing speech scrambling algorithms tend to retain considerable residual intelligibility in the scrambled speech and are easy to break. Typically, a speech scrambling algorithm involves permutation of speech segments in time, frequency or time-frequency domain or permutation of transform coefficients of each speech block. The time-frequency algorithms have given very low residual intelligibility and have attracted much attention. We first study the uniform filter bank based time-frequency 'scrambling algorithm with respect to the block length and number of channels. We use objective distance measures to estimate the departure of the scrambled speech from the clear speech. Simulations indicate that the distance measures increase as we increase the block length and the number of channels. This algorithm derives its security only from the time-frequency segment permutation and it has been estimated that the effective number of permutations which give a low residual intelligibility is much less than the total number of possible permutations.

In order to increase the effective number of permutations, we propose a time-frequency


scrambling algorithm based on wavelet packets. By using different wavelet packet filter banks at the analysis and synthesis end, we add an extra level of security since the eavesdropper has to choose the correct analysis filter bank, correctly rearrange the time-frequency segments, and choose the correct synthesis bank to get back the original speech signal. Simulations performed with this algorithm give distance measures comparable to those obtained for the uniform filter bank based algorithm. Finally, we introduce the 2-channel perfect reconstruction circular convolution filter bank and give a simple method for its design. The filters designed using this method satisfy the paraunitary properties on a discrete equispaced set of points in the frequency domain.

Contents
Acknowledgements Abstract Contents Notation

iii
vi 1

1 Motivation and Background

1.1 Earlier Scrambling Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Time Domain Scrambling Algorithms 1.1.2 Frequency Domain Scrambling Algorithms . . . . . . . . . . . . . .

.................

1.2

Current Scrambling Algorithms

........................

1.3 Wavelets and Wavelet Packets


1.4 Contribution of the Thesis
2 Filter Banks and Wavelets

.........................

2 3 3
4

............................

4
0

2.1 Prerequisites .

...................................

2.2 Introduction
2.3

......................................
.......................

6
8

Perfect Reconstruction for M = 2 .

2.4 The Logarithmic Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 The Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5.1 Introduction
2.5.2

................................ The Wavelet Transform - A Brief Introduction . . . . . . . . . . . . .


.........................

17 19 19

2.5.3 Multiresolution Analysis

2.6 Generalization to M-channel PR Systems .


2.7 Further Generalization: Wavelet Packets

...................

23 25

....................

2.8 Conclusion .
3

....................................

The Time-Frequency Scrambling Algorithm


3.1 Introduction

....................................

3.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


3.3.1

3.3.2
3.4

Block length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Number of channels in filter bank . . . . . . . . . . . . . . . . . . . .

Conclusion .

....................................

4 A Wavelet Packet Based Speech Scrambling Algorithm


4.1

Introduction

4.2
4.3

.................................... A Wavelet Packet Based Time-Frequency Scrambling Algorithm . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6 2-Channel Perfect Reconstruction Circular Convolution Filter Banks

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 The Basic Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


5.3 Design of Circular Convolution Filter Banks

5.3.1
5.3.2 5.3.3 5.3.4

.................. Downsampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Upsampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Conditions for Perfect Reconstruction Filter Design

..................

...............................

5.4 Conclusion .
6 Conclusions

....................................
..........................
................
................

6.1 Suggestions For Further Work

A Fundamentals of Multirate Systems A.1 The Downsampler . . . . . . . . . . . . . . . . . A.2 Upsampler . . . . . . . . . . . . . . . . . . . . . . A.3 Noble Identities
A.4 Polyphase Representation

..................................

............................

B Objective Distance Measures

71

B.l
B.2

LPCDistance..

..... ................ ..., ........ CepstralDistance.. . . . . . . . . . . . . . . . . . . . . . . . ; . . . . . . .

71
72

C Orthonormaiity Relation for Perfect Reconstruction Circular Convolution 73 Filter Banks


References
74

Not at ion
bold faced capital : matrices bold faced small
: vectors
: Set of real numbers

R R+
2
Zf

: Set of positive real numbers : Set of integers : Set of positive integers


: Space of square integrable signals : Space of square summable sequences

L2

(w

v)

Chapter 1 Motivation and Background


In many applications, speech can only be transmitted over insecure channels such as telephone links. However, most of these applications require secrecy. In such cases, it becomes
imperative to encrypt the speech signal. The speech signal can be encrypted in two ways: Digital and Analog. Digital encryption involves digitization of the input speech signal first. The digitized signal is then compressed to produce a bit stream at a suitable bit rate. This bit stream is encrypted and transmitted over the channel using a modem [1,2].

On the other hand, in analog speech encryption, also called speech scrambling, we act on
the speech samples themselves. Any analog speech scrambling algorithm needs to satisfy the following requirements [1,3] :
1. The scrambled speech should be unintelligible.

2. The scrambled speech should occupy the same bandwidth as does the original speech
signal. That is, the scrambling process should be a bandwidth preserving operation.

3. It should be difficult to decrypt, if the dedyption key is not available. In other words,
it should be cryptanalytically strong or secure.
4. The communication delay caused by the scrambling process must be as small as pos-

sible.
5. The recovered speech at the receiver end should be of good quality and should preserve

both the intelligibility of the speech and the characteristics of the speaker. Speech is highly redundant. For example, it remains intelligible even when subjected to severe amplitude distortion such as infinite peak clipping [4]. Filtering out all frequency

components above or below 1.8 kHz still allows 67% of all syllables to be recognized correctly
141. This makes it all the more difficult to scramble. These issues have motivated the design

of more and more sophisticated speech encryption algorithms.

Digital encryption algorithms are cryptanalytically stronger than their analog counterparts. Moreover, they retain a lower residual intelligibility. However, analog scramblers do have some distinct advantages [5]. Firstly, no modem or speech compression is required for transmission. Secondly, the quality of the recovered speech is independent of the language. Further, these scrambling schemes can easily be interfaced with the existing analog channels such as telephone, satellite or mobile, cogqmunication inks. In this thesis, we shall concentrate on analog speech encryption algorithms. reported in the literature.

In the next section, we shall briefly review some of the analog scrambling algorithms

1.1

Earlier Scrambling Algorithms

The earliest speech scrambling algorithms used the simplest "one dimensional" methods. They involved manipulation of speech elements in either the time or the frequency domain.

1.1.1

Tiine Domain Scrambling Algorithms

The earliest time domain scrambling algorithms involved permutation of the time samples.

In this algorithm, the input speech was first divided into blocks and the samples in each block
were permuted [1,6]. This resulted in a pronounced loss of intelligibility. However, it also resulted in a drastic increase in the bandwidth, Further, passing of the scrambled speech through a "real" communication channel and descrambling resulted in recovered speech of poor quality. Consequently, generalizations of this algorithm were sought. It was realized that dividing each block into smaller segments of more than one sample - normally around

10 to 30 ms long (which corresponds to 80 to 240 samples at 8kHz sampling rate), alid


permuting these segments was a bandwidth preserving operation [7]. The permutations could be done either in blockwise or in a sequential manner [1,7]. A block contained 8 to 10 segments. It was found that these algorithms were robast in the case of a bad channel [I]. Time domain scrambling algorithms were not cryptanalytically strong. The key was easy to break and considerable residual intelligibility was retained in the scrambled speech [7].

1.1.2

Frequency Domain Scrambling Algorithms

The earliest frequency domain speech scramblers were either of the frequency inversion type or the bandsplatter type, or a combination of both [I].
Frequency inversion involves flipping the sign of alternate speech samples. As is evident,

this algorithm does n d afford any cryptanalytic strength - it is essentially a "one key" system. Moreover, experienced eavesdroppers have been known to recognize parts of speech which have been frequency inverted [1,3]. Hence, this method can only be used as a protection against casual eavesdropping.
Bondsplitters split the speech block into MZfr&pency bands. These bands are shuffled

among themselves and re-synthesized to give the scrambled speech. However this has been proved to be cryptanalytically weak. As speech contains a high amount of energy in the lower frequencies, a compromising amount of intelligibility can be recovered by merely plating one or two lower frequency bands correctly or by *concentratingn on a single subband corresponding to one of the lower frequency bands [I]. Later algorithms employed a two-pronged strategy of combining time segment scrambling or bandsplitting, with frequency inversion. This merger gave improved scrambler performance [6].However, the real improvements came with recent advances in signal processing theory such as block transforms like discrete cosine transform (DCT), and filter banks. These, along with the proliferation of fast dedicated

DSP chips made it possible to

devise and implement stronger and more complex speech scrambling algorithms.

Current Scrambling Algorithms


The later scrambling algorithms include transform based and time-frequency based scrambling. The basic idea behind the transform baied algorithms is to take a-block of speeh signal and transform it. These transform coefficients are permuted and an inverse transform is taken to get the scrambled speech. The following transforms have been used for the purpose of scrambling: discrete Fourier transform (DFT), DCT, discrete prolate spheroidal transform, and the Walsh Hadamard transform [8,9,5].

In the case of DFT, care has to be taken to ensure that the scrambling of the Fourier
coeAicients is such that one gets a real scrambled signal when the inverse DFT is taken. The main advantage of the transform based scramblers is very low residual intelligibility, particularly for those transforms which tend to remove speech redundancy [5].

Time-frequency scrambling [3,7]is a hybrid of time segment scramblers and bandsplitters.

Here, each speech block is first split into M uniform frequency bands and the output from
each subband is divided into smaller segments. These time-frequency segments are then permuted in a sequential [7] or blockwise (31 manner, rearranged into M bands and resynthesized to give the scrambled speech segment. Cox et al. [3] worked with segments consisting of a single sample This corresponds to time-frequency sample scrambling. This algorithm give8 very low residual intelligibility. Its cryptanalytic strength is derived from the scrambling of time-frequency segments. If we have N time-frequency segments, the total number of possible permutations is N!. However, it has been . estimated that the effective number of '*,
,

permutations which give low residual intelligibility is much less than N! (101.

1.3

Wavelets and Wavelet Packets

Wavelets have found practical applications in such diverse fields as speech and image compression and detection of asteroid families [11,12,13]. The discovery of orthonormal wavelets [14,15,16] and their close connection to perfect reconstruction filter banks has led to further advancements in both the wavelet and filter bank theory [17,13,18]. These include M-band

regular wavelets [13], biorthogonal wavelets [19,20], and wavelet packets (211. Wavelet packets are a generalization of wavelet decomposition. They allow us to split the signal into arbitrary non-uniform frequency bands. It is precisely this property that we exploit to propose a new time-frequency scrambling algorithm.

In this thesis, we explore the use of uniform filter banks and wavelet packets for speech
scrambling keeping the following objectives in view. 1. To study how the distance measure between the scrambled and the unscrambled speech (which gives an estimate of the residual intelligibility in the scrambled speech) is affected by different scrambling schemes.

2. To arrive at schemes which yield a greater effective number of permutations.

1.4 Contribution of the Thesis


In this thesis, we study two different scrambling approaches, one based on uniform filter
banks and the other based on wavelet packets. Also, we introduce the Zchannel perfect

reconstruction circular convolution filter banks and give a simple method to design these filters. The thesis is organised as follows. Chapter 2 gives a brief review of the filter-bank, wavelet and wavelet packet theory.
b

In Chapter 3, we discuss the performance of the uniform filter bank based time-frequency scrambling algorithm with respect to block length and number of channels. In Chapter 4, we propose time-frequeniy scrambling based on wavelet packet filter banks. Chapter 5 describes
Zchmnel perfect reconstruction circular convolution filter banks. Finally in Chapter 6, we summarize the results of the thesis and suggest further studies to be made as well as possible future work.
.-,

"

Chapter 2 Filter Banks and Wavelets


2.1 Prerequisites
The prerequisites for this and the subsequent chapters are the fundamentals of multirate systems. These are described in Appendix A. A more detailed exposition is given in [18].

2.2

Introduction

In signal processing, signals are represented using a variety of bases such as discrete Fourier
bases, discrete cosine bases or wavelets. The choice of a particular basis to represent a signal depends on the application or the type of information one is trying to extract from the signal.

In this chapter, we will be concerned with the generation of basis functions for the spaces

12(2) m d La('R) using ideas based on multirate fiIter banks.


An M-channel multirate filter bank is a set of M bandpass filters which cover the spectrum [O,2n). A signal can be split into M subbands by passing it through an analysis filter bank, This will produce M times as many samples. Since we are now splitting the signal into narrow bands, we can downsample the output of each filter to its Nyquist rate. Thus, the output of each filter in the filter bank is crktically sampled (i.e., maximally decimated)

and the total number of samples is preserved. The outputs of the analysis filter bank can be or upsamplers) and the synthesis filter recombined using expanders (also called interpolato~s
bank. The expanders increase the number of samples in each band to the original number

by introducing an appropriate number of zeros between two consecutive samples, A typical maximally decimated uniform M-channel analysis-synthesis filter bank system is shown in
Fig. 2.1.

If the output 3(n) in Fig. 2.1 is an exact replica of the input signal x(n) (upto a delay

Figure 2.1: M-channel maximally decimated analysis-synthesis filter bank system.

and scale factor), then the system is known as a perfect reconstruction (PR) system. First,
consider the outputs of the analysis filter bank. These are given by

and the outputs of the decimators xi(n),i = 0,1,. . . ,M

- 1, are
.
, .

i = 0,1,.

. .,M - 1 ,m E 2 . Thus, the ~ i ( n i) = , 0,1,. . . ,M - 1 , are the inner products of the signal x(n) with aim(n)= hi(Mm - n), i = 0,1,. . , ,M - 1, m E 2.
~

Now, consider the output of the synthesis bank

Using (2.1) in (2.2), we can write

where bi,(n) = fi(n -Mrn),i = 0,1, ...,M

- l,m E 2

(2.5)

Now the questions we wish to ask are the following. Is perfect recons\ruction possible, i.e., i ( n ) = x(n)? If so, under what conditions and what is the relationship between the analysis and the synthesis filter banks? In the next section, we answer these questions for the case M = 2 and in a later section, we show that these results extend to any M ,
. .

2.3

Perfect Reconstruction for M = 2


' I ,

.. . t Figure 2.2 shows the maximally decimated 2-channel filter banks. We assume that the

impulse responses of the filters we are dealing with are real and of finite duration (i.e.,
the filters are FIR). This system was first introduced by Esteban and Galand [22]. The

Figure 2.2: 2-channel maximally decimated analysis-synthesis filter bank system.

filters Ho(z) and HI(%) are lowpass and highpass, respectively. A typical analysis filter bank response is shown in Fig. 2,3. The overlap between Ho(z) and Hl(z) means that their outputs, yo(n) and yl(n), are not sufficiently bandlimited and this results in aliasing on downsampling. Using the multirate identities given in Appendix A, we get an expression for ~ ( z )the , z-transform of 5(n),

The second term corresponds to the alias component. Esteban and Galand [22] showed that with a proper choice of Fo(z) and Fl(z), this term could be completely cancelled. For example, the alias term vanishes if we choose [22]

Substituting (2.7) and,(2.8) in (2.6), we get

In general, the reconstructed signal ~ ( zsuffers ) from amplitude and phase distortion because of T ( z ) ,i.e., I T(ejw) 1 is not constant for all w and T ( z ) is not of linear phase. For perfect
reconstruction, it is necessary that

T ( z )= cz-*o
where c is some real constant. Thus, T(z) should introduce only scaling and a delay. Let
US

(2,ll)

first consider the design of Ho(z) and Hl(z) with the condition that they possess

linear phase. The earliest solutions (181 proposed to this effect used the relation

which can also be written as

This means, HI(%)is a frequency shifted version of Ho(z). Substituting this relation into

(2,lO) gives

T ( z )= Hi(%) 7H :

(-Z)

(2.143

Now, the only way for the RHS of (2.14) to satisfy the condition (2.11) is to choose the following trivid solution [18]

Note that these solutions correspond to Ztap filters. Recall that short filters (filters with fewer taps) give poor frequency selectivity. However, filters with more than two taps can be designed if we do not insist on zero amplitude distortion and zero phase distortion of T(z).

Now consider the design with linear phase constraint but with non-zero amplitude distortion. Zero phase distortion can be achieved if Ho(z) is chosen to have linear phase since

it slakes T(z) of linear phase as well. The amplitude distortion can be minimized by forcing
the following relation to hold as far as possible

In other words, the LHS of (2.17) should remain approximately constant over the whole
frequency range. The equality condition in (2.17) implies symmetry with respect to the quadrature frequency 7r/2 [181.This is often referrs4 to as the pfwer complementarity prop-

erty. Substituting (2.12) into (2.17)gives

This can be expressed as

where H~(z)= Ho (t-'). Let

Then, it follows from (2.19) that H(z), the z-transform of the autocorrelation of ho(n), is a
half-bandfilter. Relations (2.18)and (2.19) imply that,the autocorrelation of ho(n) is equal

to zero for all non-zerb even lags (191.That is

(ho( n ) ,ho(n + 2k)) = 6k0

(2.21)

In other words, ho(n) is orthonorrnal to its even translates. Equation (2.21) also implies that the order of Ho(z) is odd.
It was independently shown by Smith and Barnwell [23] and Mintzer I241 that both arnplitude and phase distortion could be completely eliminated and that perfect reconstruction is possible by relaxing the linear phase constraint on Ho(z) while maintaining adherence to

(2.18). The relationship between Ho(z) and Hl(z) in this case is given by

where L is odd. From (2.7), (2.8) and (2.22), the impulse responses of Hl(z), Fo(z) and &(o) are given by ht(n) = (-l)nho(L - n ) (2.23) (2.24) (2.25)

fo(4 =

ho(L - n)

fi ( n ) = h l ( L - n )

If we assume that ho(n) is causal, then the remaining filters are also causal provided the delay L 2 the order of Ho(z). The frequency response of a typical pair H o ( z ) and H I ( % )is shown in the Fig. 2.3. We now make the fol1~iii~'~bservationk.

frequency in radians

Figure 2.3: Typical frequency response of the analysis bank filters of a 2-channel perfect reconstruction system.

1. Because of the relationship between ho(n) and h l ( n ) and (2.19), we have

Also, since H I ( 2 ) = - z - ~ & ( - z ) , the z-transform of the cross-correlation of ho(n) and

hl ( n ) is given by
H o ( z ) f i ( z )= - Z - ~ H ~ ( Z ) H , ( - Z ) (2.27)

Since L is odd, the above equation implies that the cross-correlation is equal to zero for all even lags. That is (ho(n), hl(n t 2k)) = 0 Thus, ho(n) is orthogonal to the even translates of hl(n) and vice versa. 2. Let (2.28)

H(e) is known as the alias component (AC) matrix [la]. Then, using (2.22), the conditions given in (2.lg), (2.21), (2.26) an"d'(2.28) can bd expressed in matrix form as

where HCZ) = H T ( r l ) (the superscript T denotes Vector transpose). This, as we see later, is the paraunitary condition [18]. Let the order of the filters be L1.Form the matrices

- n). Pre-multiplying the infinite The kth row contains elements of the sequence hib(2k
signal vector x with Hi gives the inner product (x(n), h42k

- n)), i = 0,1, which

corresponds to the decimated output of the filters Hi(%),i = 0,l. Similarly, premultiT represents the process of upsampling and plying the infinite signal vector x with H passing through the synthesis filters (assuming the relations (2.24) and (2.25)). Also, equations (2.21), (2.26) and (2.28) imply that

This means, (HTH,)~=

HTH,, i = 0 , l . Also,

( H T H ~ ) ~= HTHi.

Hence HrHo

represents the orthogonal projection onto the row space of

Ho. We denote this space as Vo. Similarly, HTHl represents the orthogonal projection onto the row space of HI

, which we denote by Wo. Also, (2.32) implies that

&lWo. Since the filters ho(n) and

h l ( n ) satisfy (2:30), the system shown in Fig. 2.2 (with (2.24) and (2.25)) achieves
perfect reconstruction, Thus, we have

H T H ~+ H T H ~= I

(2,33)

Hence, any sequence x ( n ) in the space 12(2) = V-1 can be reconstructed from its projections onto two orthogonal complement subspaces spanned by the rows of Ho and

HI. That is,


= V, @ Wo

(2.34)

Thus, h o ( n ) and h l ( n ) and their even tfansl'ates form anbrtho~ormal basis for 1 2 ( 2 ) . To design a Zchannel perfect reconstruction filter bank, it is necessary to design the prototype low pass filter H o ( z ) . To design H o ( z ) , we need to design the half-band filter

H ( a ) . H ( z ) can be designed to the given passband ripple and transition width specifications

using the Parks-McClellan or any other standard algorithm [18]. Since H ( z ) is a half-band filter, it satisfies the following properties:
1. H ( r ) is equiripple. 2. I f wp and w , represent the passband and the stopband edges, then wp w, = n

3. The order of H ( z ) is even.


Care is to be taken to ensure that H(ejw)2 0 Vw. Then, H o ( z ) is a spectral factor of H ( z ) .

Note that Ifo(%)is of odd order.

2.4

The Logarithmic Tree

In the previous section, we saw how a sequence x ( n ) E 12(2) = V-1 could be projected onto two orthogonal complement subspaces of K1, and W o , using the analysis side of a 2-channel perfect reconstruction filter bank system. Similarly, a sequence belonging to Vo
could be projected onto two orthogonal complement subspaces of

b,

and Wl, such that

Vo = & $ Wl.

Thus, we have V-l = Wo $ WI $ K. This process when repeated iteratively

gives rise to a logarithmic dyadic tree. The signal z ( n ) is thus split into unequal frequency bands as shown in Fig. 2.4(b) by recursively splitting the lower band using a Zchannel analysis filter bank (see Fig. 2.4(a)).

Figure 2.4: (a) Two-channel analysis filter banks cascaded on the lowpass side to form a three level logarithmic tree-structured filter banks. (b)The frequency splitting effected by the system shown in Fig. (a).

In general, for a k-level decomposition, we can write

This gives a ladder of subspaces

Using the noble identities (see Appendix A), the logarithmic twe shown in Fig. 2.4(a) can be simplified to the filter bank shown in Fig. 2.5. Here

Ho(z) = H;(z4)H ~ ( z H~ i ()% )

H&) HI ( z ) = H; (z4)II;(z2)
H&)

= H&~)H;(z)

H 3 ( 4 = H;(z)

Figure 2.5: Filter bank equivalent to the three level decomposition shown in Fig. 2.4(a).

The synthesis tree is the mirror image of the analysis tree. Figs. 2.6 and 2.7 show the synthesis tree and its equivalent filter bank (obtained using the noble identities). Here

Note that perfect reconstruction is possible for the logarithmic tree-structured system if the

Figure 2.6: Dyadic tree-structured synthesis filter bank corresponding to the analysis filter bank shown in Fig. 2.4(a).

Figure 2.7: Filter bank equivalent to the three-level synthesis system shown in Fig. 2.6,

analysis-synthesis filter bank pair at each level forms a perfect reconstruction system [14,18].

As we shall see in the next section, the outputs of the logarithmic tree-structured analysis filter bank constitute precisely discrete time wavelet representation of the input sequence.

2.5
2.5.1

The Wavelet Transform


Introduction

The wavelet transform is a representation of a square integrable signal in terms of translations and dilations of a single function called the mother wavelet. It can also be viewed as a refinement of the short time Fourier transform .. (STFT). , *, The Fourier transform and its inverse for a sequence x(n) are given by
00

X(eJw ) = (x(n), ejwn)=


n=-a0

r (n) c-fWn

and

Equation (2.39) can be interpreted as expressing the sequence x ( n ) in terms of the orthonorma1 basis sequences {ejwn),E[o,2,+These bases are extremely well localized in frequency but

have poor time localization (because of their infinite duration). Consequently, the Fourier transform smears any localized frequency information over the entire spectrum. This drawback is overcome by the STFT where localized frequency information is obtained by windowing the signal and taking its Fourier transform at various window positions. The STFT of a signal is given by

Thus, the STFT maps a one dimensional signal t o a two dimensional time-frequency plane. This two-dimensional representation is called the spectrogram. The STFT can be written in the following ways which makes its interpretation easier,

where @ denotes the convolution operation. Hence, we can say that the STFT is the output

of a bank of bandpass filters

V(ej(w-wi)). We

can also write (2.41) as

This is the inner product of the window modulated at different frequencies and taken at different positions in time. Based on (2.42) and (2.43), we see that the STFT has some
The time and frequency resolution depend on the duration of the window. drawbacks [25,18]. the longer They remain fixed once the window is selected. By the uncertainty pinciple [18], the window the bitter the frequency resolution and the worse the time resolution, and vice versa, The STFT is redundant. It is therefore calculated on a uniform discrete grid as shown in Fig. 2.8. The Balian-Low theorem [26,17] shows that a non-redundant representation is
A

-2K

-K

I-

2K

3K

Figure 2.8: The STFT sampling grid (this is known as a tiling diagram).

not possible for windows which have good localization in both time and frequency domain. The drawbacks of the STFT are overcome by the wavelet transform which we describe nelgt.

2.5.2

The ~ a G e l e t Transform A Brief Introduction

Unlike the STFT, the wavelet transform represents a signal using translations and dilations of a single function [14,18]. Thus, if $(t) is the so-called mother wavelet, then the basis functions are the translates and dilates of this function, namely, tlas(t) = h i ( ? )
a E at,

b E 'R. Thus, the continuous wavelet transform (CWT) of a signal x(t) is given by

where the superscript

* denotes complex conjugate. For a function to be a wavelet, it must

be admissible. That is, if Q(w) represents the .Fourjer transfory of $(t), then it must satisfy the following condition.

i
This implies

00

Q(4 I,& < L

lqO) = J$(t)dt = 0
The wavelet transform represents the signal as outputs of bandpass filters of varying bandwidths which are "constant-Q" filters. Thus, it represents the signal on a time-scale plane at varying time-frequency resolutions. This time-scale representation is called the scalogram. The CWT is redundant, and hence, one samples the scalogram. This is done on a non-uniform grid by discretizing a and b as follows.

This means larger time shifts for wavelets with larger duration and smaller time shifts for wavelets with smaller duration. Thus, we use the wavelets

as the bwis functions. Daubechies, Mallat and Meyer [14,15,16] proved that for the case
a0

= 2 and bo = 1, optimal sampling of the scalogram is possible and that orthonorrhal

wavelet bases could be constructed. This was based on multiresolution analysis.

2.5.3

Mult iresolut ion Analysis

The aim of multiresolution analysis is to represent a signal f ( t )E L2(R) in terms of successive approximations, each of which is a smoothed version of f ( t ) [14,15,16]. Specifically, we project f (t) onto a ladder of vector spaces {K)iEz which satisfy the following conditions:

3.

i(Z

nvi = (01,the zero function.


* f(2t) E 4-1
I

4, f (t) E Vj

6. 3 #(t)E Vo such that )(t - n), n E Z, qpns$tutes . an orthonormal basis for Vo. O(t) is called the scaling function. It follows from conditions (4) and (5) that the set
{$jk(t) = 2'j12$(2'jt

- k), k E 2) is an orthonormal basis for 5.

Let Pj f(t) represent the projection of a signal f(t) onto the vector space Vj. Note
from conditions (I), (2) and (3) that ,lim complement of Q with respect to denoted by
J-*-cQ

Pjf (t) = f (t), Also, let Wj be the orthogonal Then, the projection of a signal f ( t ) onto Wj,

Qj f(t), is the detail signal such that

This means

5-1= I$

@ Wj

The orthonormal basis for Wj, {$jk(t) = 2-jf2$(2;jt

- k), k E 21, is called the wavelet

basis. Using the multiresolution conditions and (2.49) recursively, we have [15,16]

Thus, {'$jk(t),j, k E Z) forms an orthonormal basis for L2(R). Now, since &

c V-l

and

Woc IL1, for some constants {h,) and ign)we have the following relations

Here 21f2 plays the role of a normalizing constant. Based on the above discussion, we have the following results:

1. Orthonormality of the basis functions implies the following:

Fkom (2.51) to (2.55), we obtain the following conditions on the coefficients {h,) and
bn):

The conditions on h, and g, are identical to those specified for the 2-channel perfect reconstruction filters ho(n) and hl(n) discussed in the previous section (Eqs. (2.21), (2.26), (2.28)).

2. .Using the relation (2.49) recursively upto K levels, we get

Thus we obtain a decomposition similar to that obtained by the logarithmic tree in Fig 2.4(a). Consequently, the wavelet transform samples the scalogram in a non-uniform to s a non-uniform tiling of the dyadic fashion as shown in Fig. 2.9. This ~ o r r & ~ o n d time-frequency plane. One way of looking at the multiresolution analysis is in terms of the logarithmic splitting of the frequency domain as shown in Fig. 2.4(b).

3. In the case of a discrete time signal z(n)

12(2), it is shown in [14,18,21] that {h.)

and {gn), the coefficients of the two-scale difference equations (2.51) and (2.52), can be used to calculate the discrete time wavelet transform coefficients at each 1evel"of the logarithmic tree. Note from equations (2.56),(2.57) and (2.58), that (h,) and
{g,)

satisfy, the same conditions as the filter coefficients ho(n) and hl (n) of the 2-

channel perfect reconstruction filter bank. Thus, ho(n) and hl (n) are equivalent to the sequences {hn) and {g,). Hence, the sequences at the output of a 2-channel analysis filter bank of a PR system constitute the discrete time wavelet transform coefficients of the input sequence.

4T

8T

12T

(time)

Figure 2.9: The wavelet transform sampling grid or tiling diagram for

a0

= 2 and bo = 1

(T= 2a/wo).

4. Not every 2-channel analysis filter bank of a PR system can be used to construct a
continuous-time wavelet basis. The low pass filter ho(n) needs to satisfy an additional condition of regularity [I7,181. The wavelet function $(t) and the scaling function
4(t) are then determined by using the two-scale difference equations (2.51) and (2.52)
&

recursively [17,18,21].

2.6

Generalization to M-channel PR Systems

Fig. 2.1 shows an M-channel filter bank system. The M-channel system is a generalization of
the 2-channel system. In order to see the prop%;& of the M-channel perfect reconstruction system, we need to define the following.

1. Alias Component M a t r i x [la]: This is a generalization of the matrix given for the
case M = 2 (see (2.29)) given by

2. Polyphase Representation: From Appendix A, we can write the type 1 polyphase expansion of an analysis filter Hk(z) as

The polyphase matrix E(z) of the analysis bank is given by

Similarly, one can express the polyphase matrix R(z) of the synthesis bank using the type 2 polyphase expansion of filters Fk(z), i.e.,

The polyphase matrix of the synthesis bank is given as

The alias component matrix H(z) and the polyphase matrix E(z) are related as [18]

where D(z) = diag

P z"

d M - l ) ] and

W isi& M x M DFJ' matrix, and the superscript

t denotes Hermitian transpose.


The M-channel perfect reconstruction systems have the following properties [18]:

That is, the alias component matrix of a perfect reconstruction system is paraunitary
[la] . The above condition can be written in two ways.

I=O

i, k = 0,1,. . .,M - 1. That is, for each filter, the power spectrum and its M uniformly shifted versions add up to a constant.

i , k = 0,1,. .,M

- 1. This is the power complementarity property as can be seen

by putting i = k = 0. Note that it reduces to (2.17) for M = 2.

2, From relation (2.65), we see that the paraunitarity of the alias component matrix
implies the paraunitarity of the polyphase matrix.

3. The synthesis filters Fk(z)are related to the analysis filters by the relation [18]

for k = 0,1,. .. ,M

- 1 where ML is the length of the analysis filters. In view of (2.69)


&

(or (2.70))) the synthesis bank is paraunitary if the analysis bank is paraunitary.

4. Paraunitarity implies orthonormality of hi(n), i = 0,1, , . ,M - 1 (and hence of f,(n)) and their Mth shifted versions, i,e.,

C hi(n)hj(n + Mk) = bijbk,


n

The design of an M-channel perfect reconstruction filter bank is more complex than that of a 2-channel perfect reconstruction PR system. The design methods are given in

[la]. In our work, we have used Cosine modulated M-channel filter banks designed using the procedure given in [18].

2.7

Further Generalization: Wavelet Packets

The wavelet and the uniform filter bank representation8 explained so far may not be suitable
for all applications. For signals such as speech, where the energy is localized around eerta& frequencies, a non-uniform frequency band representation may be more efficient. This can be achieved by using generalized tree-structured or wavelet packet filter bank. The wavelet packet transform and its inverse are given by [25]

with pk E 2+, k = 0,1,.

.. ,M - 1, such that

Here, {hk(n)} and { fk(n)), k = 0,1, . . . ,M

- 1, are the impulse responses of the filters in

the equivalent analysis and the synthesis filter banks corresponding to the wavelet packet

system. These impulse responses can be obtained using the noble identities. Figs. 2.10

and 2.11 give an example of a tree-structured wavelet packet analysis filter bank and its equivalent. It can be shown [27] that perfect reconstruction is guaranteed as long ds the filter banks used at each level in the tree are paraunitary. In other words, {hk(n))and { fk(n)),k = 0,1,. . . ,M - 1, are or$honormal [27]. That is

where gkl is the gcd of the decimation (interpolation) factors associated with the respective analysis (synthesis) filters. Thus, the expansion of the signal in terms of fk(n - pkm),

k = 0,1,. . . ,M 1, as given in (2.73) is the wavelet packet representation of the signal. The
logarithmic tree-structured filter bank and the M-channel uniform filter bank are special cases of the wavelet packet system. Thus, depending on the filter banks used at each level,

Figure 2.10: An example of 2 level tree-structured wavelet packet analysis filter bank.

wavelet packets allow greater flexibility in splitting a signal into non-uniform frequency bands,

d e n t to the tree-structured system shown in Fig. 2.10. Here, = Hi(z)Gl(z2), H2(z) = Hi(z)G2(z2), &(z) = H:(z)HA(z2)

We can illustrate the wavelet packet analysis and synthesis banks by using the 9ree diagramsn. These tree diagrams are a convenient way of representing the structure of a wavelet packet filter bank. For example, the tree diagram corresponding to the wavelet packet analysis filter bank shown in Fig. 2.10 is given in Fig. 2.12.

2.8

Conclusion

In this chapter, we presented a review of filter banks and wavelet theory. Starting with the
simplest case of &channel filter banks, the conditions under which perfect reconstruction is obtained were stated. Next, we described how a logarithmic splitting of the spectrum could be obtained by cascading 2-channel filter banks. Having explained the wavelet transform, the concept of orthonormal wavelets and their close connection to perfect reconstruction filter banks was brought out using the multiresolution analysis. The idea of perfect reconstruction was then generalized to M-channel filter banks. We further generalized this idea of to

arbitrary tree structured or wavelet packet filter banks. In the subsequent chapters, we apply
b

the uniform filter banks and wavelet packets to speech scrambling.

Figure 2.12: Tree diagram corresponding to the wavelet packet analysis filter bank shown in . . Fig. 2.10.

Chapter 3

The Time-Frequency Scrambling Algorithm


Introduction
In Chapter 1, we explained the idea of analog speech encryption or speech scrambling. We
also stated the requirements that need to be satisfied by the speech scrambling algorithms.
For the purpose of continuity, we restate them here. The scrambled speech should be unintelligible. The scrambling process should be bandwidth preserving, The scrambling algorithm should be cryptanalytically strong. The scrambling process should involve a minimum possible communication delay. It is also required that the decrypted speech be of good quality and, the intelligibility as well as the speaker characteristics should be preserved.

In this chapter, we study the performance of the time-frequency scrambling algorithm which is similar to that proposed by Cox et al. in (31 using objective distance measures. The
objective distance measures we use are: 1, LPC Distance

2. Cepstral Distance
The definition and a brief description of these measures is given in Appendix B. These measures give an indication of the residual intelligibility of the scrambled speech. The larger the measure, the lesser is the residual intelligibility in the scrambled speech. Thus, we use these objective distance measures as estimates of how "far" the scrambled speech is from the corresponding un-encrypted (which we shall refer to as clear hereafter) speech.

In the subsequent sections, we first describe the time-frequency scrambling algorithm in


detail and then show the effect of 1, block length
'

2. number of channels
on the distance measures.

3 . 2

The Algorithm

Fig. 3.1 is an illustration of the scrambling algorithm, It shows from the point where the incoming speech signal has been divided into blocks of N samples and then describes the various operations performed on each block. Each clear speech block is given as an input to a uniform M-channel analysis filter bank which we refer to as AFBl. As an illustration, we have used M = 5 in Fig. 3.1. The output of each channel of the filter banks is further divided into segments of equal length, say p samples. Each segment may contain one or more samples (p 2 1). In Fig. 3.1, the output of each channel has been divided into 5 segments. This gives us a total of 25 segments. These segments are numbered as shown. They are interleaved and given to a permuter. The permuter permutes the segments using a random number generator. Depending on the output of the random number generator the segments are shuffled , for example as shown in Fig. 3.1. Thus, the permuter shuffles the time-frequency segments in some random order depending on the seed provided to the random number generator. These segments are now "de-interleaved" into M-channels (5 channels in Fig. 3.1) and given as input to an M-channel (5 channel in Fig. 3.1) synthesis filter bank which we refer to as SFB2. The output of the synthesis bank is the scrambled speech.

At the receiver, each block of the scrambled speech is first split into M channels usiag
an M-channel analysis filter bank AFB2 which together with SFB2 at the transmitter ehd
constitutes a perfect reconstruction system. The output of each channel of the filter bank is then divided into segments of p samples. This results in the permuted time-frequency segments, same as those at the input of SFB2. These segments are first interleaved and then de-permuted. The de-permuter uses the same seed as the one used by the permuter on the scrambling side, so that the segments are put back in correct order. The segments are de-interleaved to form M channels and given as

--E
, , , ,L

clear speech block

AFBl

Time-frequency 'segments

I1.

interleave

.*.

125

deinterleave

permuted time-frequency segments

scrambled speech block

SFB2

Figure 3.1: Block schematic representation of the time-frequencyscrambling algorithm. The figure shows varioua operations performed on each block of speech at the transmitter end.

the input to the M-channel synthesis filter bank SFB1. The analysis-synthesis filter bank pair AFBl and SF& form a perfect reconstruction (PR) system. Similarly, SF& and

AFB2 are chosen to form a PR pair. In our experiments, SFBz is taken to be the same as SFBl (and hence AFB2 is same as AFB1) for convenience. Thus, the original speech
block is obtained at the output of the synthesis filter bank. We are assuming here that the received scrambled speech blocks are error-free. In other words, if the channel introduces certain intersymbol iderference, it is compensated at the receiver before the unscrambling process begins. We may point out here that we assumed one-sample segments (i.e., p = 1) throughout this thesis.

3.3

Performance Evaluation

In this section, we evaluate the performance of the time-frequency scrambler described in Section 3.2 with respect to
1, block length

2. number of channels (M) in the filter banks


using the LPC and the cepstral distance measures, This gives an idea of how the residual intelligibility of the scrambled speech changes as we change the above parameters. We use

loth order LPC and cepstral coefficients to compute the distance measure for each block.
The segment size has been kept constant at 1 for all the following simulations. To implement the analysis and synthesis filter-banks, we used the circular extension approach proposed in
1281.

3.3.1

Black length

To determine the effect of block length on the distance measures, a given speech signal wtm scrambled using the time-frequency scrambling algorithm for different block lengths (N). The

LPC and the cepstral distances between each clear and the corresponding encrypted block
were calculated for each block and, the mean LPC and cepstral distances were determined over all the blocks. The above procedure was repeated for different values of N. The block lengths used were N = 60,120,180,240 and 300 samples. The filter banks used at the analysis and the synthesis end consisted of 40-tap 5-channel cosine modulated filters (see Fig. 3.2).

frequency in radians

Figure 3.2: Magnitude responses of the analysis filters in the 5-channel PR system used to study the effect of block length on the distance measures. Each filter has 40 taps.

Table 3.1: Effect of block length on the distance measures (sentenke used: beg that guard for one gallon of gas, sampling rate= 8kHz and loth order LPC and cepstral coefficients were used to compute the distance measures). block lennth number of blocks mean LPC distance mean cepstral distance

'

This experiment was performed for different speech signals. Although variations in the distance measures were observed, the general trend was an increase in the mean distances

as the block length was increased. Table 3.1 shows the mean distances for,a typical speech
sentence. Note from the results that the distance measures increase with increase in block length.

A possible explanation for this is as follows. As the block length increases, the number
of positions in which a time-frequency sample can be placed increases giving rise to the possibUy that its permuted position is farther from its original position. Further, a greater number of time-frequency samples are now avGlable for permut~tion.

3.3.2

Number of channels in filter bank

To determine the effect of M, the number of channels in the filter bank, a given speech signal
was time-frequency scrambled using an M-channel analysis-synthesis pair. The mean LPC

and cepstral distances were determined by averaging the respective measures over all the blocks. This experiment was conducted for different values of M, keeping the block length fixed at 240. M was varied from 2 to 10 in steps of 2. The filter responses are shown in Figs.

3.3 to 3.5.
The experiment was performed for aeverdl spoken speech sentences and the mean

LPC

and cepstral distances were determined for each. Although variations were observed in the mean distances, in general, they showed an increase with increasing M. The explanation for this increase could be as follows. For a given speech signal some regions of the spectrum contribute more to the intelligibility than do the others [4]. By increasing the number of channels we split the spectrum into finer and finer bands thereby breaking up these regions into more than one band, giving rise to a possibility that the residual intelligibility in an individual band is reduced. We may however point out that the increase in the distances with M was not very significant. Table 3.2 shows the mean distance measures obtained for a typical speech sentence.

Conclusion
In this chapter, we have described the uniform time-frequency scrambling algorithm. For
greater cryptanalytic strength or security, the random number generator (used by the permuter) can use a different seed for each block. Then, the eavesdropper has to renew his

1.5
/ - - - -

----_---_.---

0.5 -

0 0

0.5

1.5

2
b

2.5

frequency in radians

Figure 3.3: Magnitude responses of the filters in the analysis bank for different values of M . (a) 2-channel PR system (each filter has 20 taps) (b) 4-channel PR system (each filter has 40 tape).

Figure 3.4: Magnitude responses of the filters in the analysis bank for different values of M. (a) 6-channel PR system (each filter has 60 taps) (b) 4-channel PR system (each filter has 80 taps).

Figure 3.5: Magnitude responses of the filters in the analysis bank for M = 10 (each filter haa 100 taps).

Table 3.2: The mean distances for different values of M, the number of channels in the filter bank (sentence used: beg that guard for one gallon of gas, block length =240, number of blocksrl55, sampling rate= 8kH2, and 10" order LPC and cepstral coefficients were used to compute the distance measures).

mean LPC distance

mean cepstral distance

search for the right combination for each block. But note that this approach of using different seed for different block adds an overhead to the key since information regarding the seeds is to be transmitted to the receiver.

Chapter 4

A'Wavelet Packet Based Speech Scrambling Algorithgn


4.1 Introduction
In the previous chapter, we evaluated the performance of the time-frequency scrambling
algorithm based on uniform decomposition of the signal as a function of the block length-

and the number of channels, This algorithm derived its cryptanalytic strength or security from
the permutation of the time-frequency samples. It has been estimated by Jayant [lo] that the effective number of permutations, i.e., permutations that give low residual intelligibility, for such an algorithm is much less than the total number of permutations thus reducing the security of the algorithm. In this chapter, our aim is to devise a more secure time-frequency scrambling algorithm using wavelet packets.

4.2

A Wavelet Packet Based Time-Frequency Scrambling Algorithm

In this section, we propose a time-frequency scrambling algorithm which uses wavelet packets
at both the analysis and synthesis end of the scrambler, We use the same notations as in Chapter 3 to denote the analysis and synthesis filter banks. AFBl represents the analysis filter bank on the scrambling side and SF Bl denotes the synthesis filter bank at the receiver
0

end. Further, AFBl and SFBl form a PR system. Similarly, the synthesis filter bank we use

on the scrambling side will be denoted by SFB2 and the analysis filter bank at the receiver
by AF&, and these two form a PR pair. First, each speech block of length N is passed through a Zlevel wavelet packet analysis

level 1

I clear speech block

=14

AF B~
I

6) 777

level 2 level 1

0
scrambled speech block

(4

SFB?

Figure 4.1: Time-frequency scrambling using matched analysis and synthesis wavelet packet tree structures. (a) Time-frequency samples at the output of the analysis filter bank. (b) Time-frequency segments, (c) Time-frequency segments scrambled and de-interleaved. (d) Time-frequency samples at the input of the synthesis filter bank. ."

filter bank AFB1. The first level is an M-channel filter bank. Each of the M branches is decomposed into either P channels or &.channels at the second level. Thus, the tree structure of the wavelet packet analysis bank can be chosen from 2M possible combinations. We have chosen M = 4, P = 3 and Q = 2. This gives 16 possible tree structures. Figure 4,l shows a wavelet packet analysie filter bank. The output channels of this filter bank are divided into segments of one sample each. Thus, we get
&

N segments. In Fig. 4.1, N = 24. Clearly, all the channels cannot contain

the same number of segments, In the example considered here, the ten output channels of
..3, _ ..,2, 2 and 2 segments, respectively. These the analysis filter bank contain 3, 3, 2, 2, 2, 3,

time-frequency segments are ordered as shown in Fig. 4.1, interleaved,-and given as input to a permuter. The permuter shuffles these segments using a random number generator and the permutation order depends on the seed provided to this generator. These permuted time-frequency segments are then de-interleaved and fed to a synthesis filter bank SFB2.Care needs to be taken to see that each channel in the synthesis bank gets appropriate number of segments. For example, in Fig. 4.1, the ten channels have 3, 3, 2, 2, 2, 3, 3, 2, 2 and 2 segments. The output of the synthesis filter bank is the scrambled speech block. For our simulations, SFB2was thosen to form a PR system with AFB1. The magnitude respbnses of the filter banks are shown in Figs. 3.3(b), 3.3(a) and 4.2, respectively. The block length was taken as 240 (i.e., N = 240). The LPG and cepstral distance measures are shown in Table 4.1 for six different analysis tree structures. As in the earlier case, we have used loth order LPC and cepstral coefficients. The sentence for which the results are shown is same as the one used in Chapter 3. We have used the circular extension approach proposed in [28] to implement the filter banks. As there are 2* possible analysis tree structures, the eavesdropper has to search these

2M combinations to get the correct tree and then get the corresponding permuted timefrequency samples. After finding the correct analysis filter bank AFBz (which should forin a PR pair with SFB2), the next task is to find the correct random number sequence for
de-permuting. Finally, the synthesis bank corresponding to AFBl is to be determined. The security of the algorithm can be increased by using a tree structure at the second level of SFB2 different from the corresponding structure of AFB1. In Fig. 4.3, we have illustrated this scheme with 3, 2, 2 and 2 channels for SFB2 and 2, 3, 2 and 3 channels for

AFBl .

Figure 4.2: Magnitude responses of the analysis filters for 3-channel filter bank (each filter has 30 taps).

Table 4.1: Mean distance measures for six different analysis tree structures in the wavelet packet based time frequency scrambling scheme shown in Fig. 4,l (sentence used: beg that guard for one gallon of gas, block length = 240, number of blocks=155, sampling rate = 8 kHz and, 10" order LPC and cepstral coefficients were used to compute the distance measures). 2nd level mean LPC analysis filter banks distance 2.9362 2$3,2,3 mean cepstral distance 3.3072

level 2

---LAclear speech block

t t t t t t
3-L

-1

Operations

t t t t t t Ftt t t t t t t
(4

0
scrambled speech block

Figure 4.3: Time-frequency scrambling using different analysis and synthesis tree structurqa. (a) Time-frequency samples at the output of the apalysis bank. (b) Time-frequency segments. (c) Timefrequency segments scrambled and de-interleaved so that each channel of the synthesis bank contains appropriate number of samples. (d) Time-frequency samples at the input of the synthesis bank (note the arrangement of the time-frequency samples at the input of the synthesis bank).

Note that depending on the choice of the synthesis tree structure, the segments should be de-interleaved such that each channel gets appropriate number of the segments. For example, in Fig. 4.3, we de-interleave so as to have 2 segments each for the first 3 channels and 3 segments each ;or the last 6 channels. Thus, the output of the synthesis filter bank
&

SFBl is a block of N samples which is the scrambled speech.


For our simulations, the block length was fixed at 240. Table 4,2 shows the mean distance meaeures for nine different analysis-synthesis tree combinations. The sentence for which the results are shown is same as that. used in Chapter 3. Also, as before, we have used the circular extension approach to implement the ?. filter . banks.
- I

Table 4.2: Mean distance measures obtained for nine different analysis.synthesis tree combinations in the wavelet packet based time-frequency scrambling scheme shown in Fig. 4.3 (sentence used: beg that guard for one gallon of gas, block length = 240, number of blocks = 155, sampling rate = 8 kHz and, loth order LPC and cepstral coefficients were used to compute the distance measures). 2nd level filter banks analysis synthesis 2,3,2,3 2,2,3,3 2,2,2,3 2,3,2,2 3,2,2,2 2,3,3,2 3,2,2,3 2,2,3,2 3,3,3,2 3,2,3,2 3,3,2,2 3,3,3,2 3,2,3,3 3,2,2,3 2,3,3,3 2,3,2,3 3,2,2,2 2,3.3,3 mean

LPC
distance 2.9460 2.9829 2.9443 2.9312 2.8727 2.8247 2.8373 2.8849 2.9187

mean cepstral distance 3.3572 3,3363 3.2768 3.2102 3.2422 3.1688 3,2590 3.2685 3.2386

In this scheme, we see that for a given block, the eavesdropper has to choose a correct analysis bank (which should form a PR system with the synthesis bank SFB2), de-permute the time frequency segments, and finally choose the correct synthesis bank SFBl (which should form a PR system with the analysis bank AFBI). Since the analysis and synthesis
tree structures can be chosen from 2M possible combinations, there are a total of 22Mandysissynthesis tree pairs from which the eavesdropper has to pick the correct pair, This adds an extra level of security. For example, in Fig. 4.3, we get a total of 28 = 256 analysis-synthesis tree pairs.

h m Tables 4.1 and 4.2, we observe that the wavelet packet based scrambling algorithm

gives nearly the same (or slightly higher) distance measures compared to the uniform timefrequency scrambling algorithm of Chapter 3. But, the advantage of this algorithm is in the increased security. This can be explained as follows. The total number ofbpermutationsper block in this case is, 22MN!,where M is the number of channels at the first level of the tree. This is an increase by a factor 22M compared to the total number of possible permutations

(N!) that we get in the time-frequency algorithm described in Chapter 3. We can therefore
expect higher security in the wavelet packet based algorithm.

4.3

Conclusion

In this chapter, we proposed a time-frequency scrambling algorithm using wavelet~packets.


We have also pointed out how this scheme gives rise to increased security. In the next chapter, we introduce 2-channel perfect reconstruction circular convolution filter banks and describe a method for the design of these filters.

Chapter 5 2-Channel Perfect Reconstruct ion Circular Convolution -. Filter Banks


.+,

5.1

Introduction

The theory of perfect ieconstruction filter banks was reviewed in Chapter 2. In this chapter, we consider the problem of realizing a two-channel perfect reconstruction circular convolution filter bank. As the discrete Fourier transform (DFT) filtering is equivalent to circular convolution of the input with the impulse response of the filter, we use the terminology DFT filter bank and circular convolution filter bank interchangeably. We begin by describing what this filtering means. Next, we derive constraints on the design of the filters in the filter bank system. 411 through this chapter, we assume that the input to the filter bank system, x(n), has an even number of samples.

5.2

The Basic Procedure

The basic procedure can be understood by referring to Fig. 5.1. Here, we first take the DFT of the input signal z(n). We use the notations X(k) and x ( & ~ " ' / ~ )interchangeably to denote the DFT of x ( n ) . We multiply X(k) point by point with

Ha(k), the DFT of ho(n),

where ho(n) is the impulse response of the lowpass section of the filter bank. This amounts to circularly convolving ho(n) and x(n). We then take the inverse discrete Fourier transform (IDFT) of Xo(k) and downsample it by 2. This sequence is denoted by vo(n). This procedure

, the DFT of hl(n) (the impulse response of is repeated for the lower branch with Hl(k)
the highpass section af the filter bank), to give vl(n) as the output. In this way, we have decomposed the input signal into two sequences vo(n) and vl(n) of N / 2 samples each.

Figure 5.1: 2-channel circular convolution filter bank implementation. (a) Analysis bank. (b) Synthesis bank.

To recombine the two sequences, we upsample vo(n) and vl(n) by 2. We denote these new sequences by uo(n) and ul(n). The DFTs of these sequences are now multiplied point by point with Fo(k) and Fl(k), respectively, and added. Fo(k) and Fl(k) are the DFTs of
4

fo(n) and fl(n), respectively. Here, fo(n) and fl(n) denote the impulse responses of the synthesis filters corresponding to the analysis filters ho(n) and hl(n), respectively. Thus, ~ ( k= ) Fo(k)Uo(k) Fl(k)Ul(k). Let 2(n) = I D F T ( ~ ( ~ ) ) For . convenience, we represent the entire operation (analysis and synthesis) by the equivalent diagram shown in Fig. 5.2. Thus, this analysie-synthesis bank. is similar to the 2-channel filter bank system described

Figure 5.2: Analysis-synthesis filter bank system equivalent to Fig. 3,l.

in Chapter 2 (Fig. 2.2). However, the crucial difference is in the way Ho(k), HI(k), Fo(k) and Fl(k) are designed and implemented. These filters are defined and designed in the DFT domain, i.e., the filters are defined at a discrete set of frequencies that are multiples of 2a/N.

In the next section, we propose a simple way of designing these filters for perfect reconstruction.

5.3

Design of Circular Convolution Filter Banks

First, we obtain the relations for the downsampling or decimation and upsampling or interpolation. Next, using these relations, we derive conditions on Ho(k), HI (k), Fo(k) and F I ( ~ )

for perfect reconstruction.

5,3,1

Downsampling

Figure 5.3: The decimator or the downsampler

Consider Fig. 5.3 which shows an M-fold dowr"nia&pler. The idput to the downsampler is a sequence x(n) of N samples. The output of the downsampler, denoted by y(n), is defined as

where M is an integer. The sequence y(n) contains N / M samples. It is assumed that N is

an integer multiple of M.

To obtain an expression for the DFT of the output sequence y(n), we first define an
N-lengt h intermediate sequence xl (n) as follows [18]:
xl(n) =

{0

x(n) n = multiple of M otherwise

Clearly y(n) = x(Mn) = xl(Mn) Now, the DFT of y(n) is given by

Substituting (5.3) in (5.4), we get

Since xl(n) is zero at all n except when n is a multiple of M, the above equation is equivalent

Now, to represent xl(n) in terms of x(n), we define a comb sequence as follows

This is periodic with a period M and can be written as

1 n = multiple of M 0 otherwise

Using (5.7) and (5.8), we can write

Substituting this expression into (5.6) gives

which can be simplified as

Note from (5.12) that each DFT coefficient of the decimated sequence y(n) is a sum of M

DFT coefficients of the input sequence to the decimator x(n).


5.3.2

Upsampling

Figure 5.4: The expander or the upsampler The upsarnpling or the expansion process is represented in Fig. 5.4. An L-fold upsampler, where L is an integer, takes an input sequence x(n) and produces an output sequence defined
as follows:

x(n/L) n = multiple of L otherwise

The output sequence y(n) has NL samples. The DFT of y(n) is given by

which can be written as

In view of (5.13), the second term in the above equation is equal to zero, Now, consider the first term. Making a change of variable n = Lm and noting the relation
(5.13), we can write

which can be simplified as

Note from (5.17) that NL-length DFT Y ( k )is nothing but concatenation of L DFTs of x(n).
I

6,3,3 Conditions for Perfect Reconstruction


Since the input signal x ( n ) has N samples, the filters also have a support of N. We assume that all the sequences and filters are real valued. To understand how the filters are designed, we express the two-channel perfect reconstruction equations in terms of the DFT's. Referring to Fig. 5.2, we have

In view of (5.12), we get the following expressions for the outputs of the decimators
V ~ ( ~ ~ W) (N ) =/ ~5[Ho(e jZrk/N)~(~jlrk/N ) $ ~ ~ ( - e j ~ * ~ l ~ ) x ( - e (5.20) j~~~/~)]
1 (ej2nk/N)~(e32d/N) ( e k I I ) = -[HI + (-e32rk/N)~(-ej2rk/N)] (5.21) 2

for k = 0,1,. . .,N / 2

- 1. Similarly, using (5.17), we get

Note that U ~ ( e j ~ " and ~ l ~U l )( e jZTklN) are concatenation^ of 2 DFTs of w~(n) and ul(n), prepectively. Now, is given by

for k = 0,1,. . . ,N - 1. The second term in the above equation is the alias term. To cancel this term, we choose

Substituting (5.26) and (5.27) into (5.25) gives

Thus, for perfect reconstruction, we require

is a circular shift by Lo samples. where c is a real constant and e-j2*L~klN Following the conditions described in Chapter 2, we require

This is the so-called power complementarity property. Equation (5.30) can be written as

The above equation implies that the cyclic autocorrelation of ho(n) is 0 for all even lags except for the zeroth (see Appendix C for details). That is, ho(n) is orthonormal to its even circular shifts:
N-1

n=O

x ho(n)ho((n + 2k) mod N ) =

6(2kmod~),)

k E2

(5.32)

Now, for perfeet reconstruction, we choose

where N - 1 is odd. This corresponds to the condition

Combining (5.33), (5.28) and (5.31), we get R(ejlrk/N) = e-j2r(N-I)k/NX (ejZrk/N ) , k = o , l ,

..., N - 1

(5.35)

where e-J2u(N-1)klN corresponds to a circular shift by N - 1 samples. Thus, under the above 'conditions, the output sequence from the synthesis bank is a circularly shifted version of the input sequence. Using the relations (5.26), (5.27) and (5.34),we can show the following:

Similarly, from (5.31) and (5.33), we can show

and

N-1

ho(n)hl((n t 2 k ) modN) = 0, k E 2
n=O

(5.39)

Note that (5.32), (5.38) and (5.39) are cyclic counterparts of (2.21), (2.26) and (2.28). Now, let

This.is the alias component matrix. As in Chapter 2, we have in this case

where superscript

denotes the Hermitian transpose.

54

5.3.4 Filter Design


To obtain the filters Hl(ej2*klN), FO(ej2*kIN) and FI(ejzTk/lN), we first. need to design the
prototype Ho(ej2"lN). For this, we require the half-band filter ~ ( e j " ~ / where ~)

Note that ~ ( e j ~is~a ~ zero l ~ phase ) filter. Furthermore, from (5.31), it is clear that ~ ( e j ~ *has ~ / a~ symmetry ) with respect to N/4. This means, the peak passband and ~topbaad ripples are equal and the band edges are equally away from N/4. Thus, design of H(ej2ukIN) is equivalent to assigning a value 6 kg& DFT coe&cient as follows (we use the notation H(k) = H(ejaTkjN)).Recall that the impulse responses of the filters are real. We also assume that 0 5 H(k) 5 1. Conditions for t h e half-band filter

For some given H(k) ,

We now give some exitmples to illustrate the above.

Examples of half-band filters


Example 1. N = 8

1. Let H(0) = 1. Then H(4) = 0.


2, Let H ( l ) = 1. Then H(7) = 1 and H(3) = H(5) = 0.

3. The remaining values of H(k) for k = 2 and 6 are constrained as follows. H(2) = H(6)
and, H(2) = 1 - H(2). Hence, H(2) = H(6) = 0.5 The filter H(k) is thus given as H(k) = {1,1,0.5,0,0,0,0.5,1), and hence,

h(n) = {0.5,0.301777,0, -0.051777,0, -O.O5l777,O,O.3Ol777)

Example 2. N = 8
1. Let H ( 0 ) = 0.75. Then H ( 4 ) = 0.25.
2. Let H ( l ) = 0.37. Then H ( 7 ) = 0.37 and H ( 3 ) H ( 5 ) = 0.63.

3. The remaining values of H (k) are fixed as follows. H ( 2 ) = H ( 6 ) and, H (2) = 1 - H ( 2 ) . Hence, H(2)= H(6)= 0.5.
Thus

H ( k ) = {0.75,0.37,0.5,0.63,0.25,0.63,0.5,0.3'& iyd hence, h(n) = {0.5,0.0l6538,0,0.l08462,0,0.l08462,0,0.0~6538)

Example 3, N = 10
1. Let H ( 0 ) = 1. Then H ( 5 ) = 0. 2. Let H ( l ) = 1. Then, H ( 9 ) = 1 and H ( 4 ) = H ( 6 ) = 0. 3. Let H ( 2 ) = 1. Then, H ( 8 ) = 1 and H ( 3 ) = H ( 7 ) = 0. Thus

H ( k ) = { 1 , 1 , 1 , 0 , 0 , 0 , 0 , 0 , 1 , 1 ) ,and hence,
h(n) = {0.5,0.323607,0, -0.123607,0,0.1,0, -0.123607,0,0.323607)

Note that in all the above cases, h ( n ) = 0 for n = even except for n = 0 .

Design of portotype filter Having designed ~ ( e J ~ " we ~ lproceed ~ ) , to obtain ~ , - , ( e j ~ * as ~ / follows. ~) ~o(ej~"~fy) hm the following form ~ ~ ( ~ j ? r k= / 1N~) ~ ( ~ j 2 " k I le36(k) N) (5.43)

and also, it satisfies


~ ( ~ j 2 f l k / N ) ~ ~ ( ~ I ' z r k 12 /N)

=I

Since the filters are assumed real, the phase response 4(k) is antisymmetric with respect to
~ the ~ corresponding ~ l ~ ) N/2, with $(O) = 0. We now give couple of examples of ~ ~ ( e jand impulse responses.

Examples of prototype filters Example 1. Consider, H ( k ) = {l,l,O.5,O,O,O, O.5,l)

Then,

I Ho(k) )={l,l,O.7O7lO7,O,O,O, O.707lO7,l).


Let us choose d(k) as d(k) = (090, o,o, o,o, 0,O). Taking the IDFT of Ho(k), we get ho(n) = (0.551777) 0.301777, -0.051777, -0.051777,0.051777, -0.051777, -0.051777,0.301777)

Example 2. Consider H(k) = {0.75,0.37,0.5,0.63,0.25,0.63,0.5,0.37) X n this case, I Ho(k) I= {0.866025,0.608276,0.707107,0.793725,0.5,793725,0.707107,0.608276). Let us choose d(k) as
+(k) = {0,1.11689,0.130226, -2.67465,0,2.67465, -0.130226, -1.11689). Taking the IDFT of Ho(k), we get ho(n) = {0.235523,0.274628,0.22147, -0.116165,0.456543, -0.13721, -0.230523,0.16176) Once we have ho(n), we can find hl(n),fo(n) and fi(n) using (5.34), (5.36) and (5.37).

5.4

Conclusion

In this chapter, we have described the design of ~ircularconvolution filter banks. The
advantage of this method is the ease of design and flexibility.

Chapter 6

Conclusions
Ln this thesis, we first gave a brief overview of the existing scrambling algorithms. Then,
we reviewed the 2-channel perfect reconstruction filter banks and showed the close connection between filter banks and wavelets. The properties of 2-channel filter banks were then extended to M channels and generalized tree-structured (or wavelet packet) filter banks. Next, the performance of the uniform filter bank based time-frequency scrambling algorithm was studied with respect to the block length and the number of channels. We used the

LPC and cepstrd distances as objective distance measures in the performance study. These
distance measures showed an increase with increasing block length and number of channels. This algorithm derives its security from the permutation of time-frequency segments, and the effective number of permutations for this algorithm has been estimated to be much less than the total number of permutations, To increase this effective number, we proposed a wavelet packet based time-frequency scrambling algorithm. Here, the eavesdropper has to

search for the correct analysis tree, then rearrange the time-frequency segments .correctl,y and finally search for the correct synthesis tree in order to recover each speech block. As
a result, this algorithm gives a greater number bf permutations and thereby is expected to give a greater effective number of permutations as compared to the uniform filter bank based scrambling algorithm. Finally, we introduced the 2-chanriel perfect reconstruction circular convolution filter banks. After developing the conditions for perfect reconstruction in terms of DFT and circular convolution, we gave a simple method to design these filter banks.

8.1

Suggestions For Further Work

We have studied the uniform filter bank and the wavelet packet based scrambling algorithms using objective distance measures. These distance measures are only estimafes of the residual intelligibility in the scrambled speech. Thus, it is necessary to conduct listening tests, also called subjective tests, to evaluate the performance of the algorithms. The subjective tests suggested are the digit intelligibility tests proposed by Jayant [6]. Also, for the wavelet packet based algorithm, the effective number of permutations needs to be estimated.
. r' be of considerable significance in a practical system. We shall briefly discuss each of these
r ..
* C /

In our study of the scrambling algorithms, we did not consider certain issuee which would

issues. Bandwidth expansion is a consequence of the time-frequency permutations. It assumes considerable importance when the scrambled signal has to be transmitted over a bandpass channel. An approach to counter bandwidth expansion for the uniform filter bank based algorithm was given in [3]. To explain this approach, we use the notation used in Chapter

3 to denote the analysis and synthesis filter banks. The incoming speech is first lowpass
filtered so that its bandwidth is same as that of the channel. The filtered speech ie fed to the M-channel analysis filter bank, AFBI. At the output of AFB1, the first K subbands with a total bandwidth not exceeding that of the channel are retained, while the remaining bands are rejected. The time-frequency segments are now permuted and de-interleaved to form K band; These bands are assigned to an appropriate set of K consecutive filters of the M-channel synthesis bank SFB2, and zeros are given to the remaining filters, so that the scrambled speech signal is in the passband of the channel. Fig. 6.1 shows an example of this scheme
as used in [3]. Here, the channel is assumed to be.a telephone channel having a passband

from 200 Hz to 3200 Hz. The input speech is sampled at 7760 Hz. The bandwidth of each aubband is 485 Hz. Each speech block is decomposed into 8 subbands out of which the lower 5 subbands, occupying a total bandwidth of 2425 Hz, are retained. The permuted timefrequency segments are de-interleaved into 5 bands and are fed to Fl (z), F2(z),

. . . ,Fa(z) of

SFB2,while giving zeros to the rest of the filters, namely, Fo(z), F6(z) and F7(z). As a result, the signal at the output of SFBz occupies the frequency band 485 Hz to 2910 Hz and
thus lies in the passband of the channel. An extension of this scheme could be used for the wavelet packet based time-frequency

input #pee&

970

-1 4 5 s ~ ~

Scrambling Operations

Figure 6.1: A method to counter bandwidth expansion in the case of uniform filter bank based scrambling algorithm.

scrambling algorithm. The block diagram of the suggested scheme is shown in Fig. 6.2. The output of SFBz is the encrypted speech which lies in the passband of the channel. The performance of the algorithms with the bandwidth preservation schemes need to be
b

' .

evaluated. Further, the effect of distortions introduced by the channel, on the performance of the scrambling algorithms needs to be studied. In particular, equalization needs to be carried out to compensate for the amplitude and phase distortions introduced by the channel before .unscrambling the signal. The unscrambled speech should be compared with the

original speech so as to evaluate. the degradation introduced by the channel. It is also


required to repeat the above experiment for different SNRs totevaluate the performance of the scrambler in the presence of noise. Further, the effect of loss of synchronization between the transmitter and the receiver on the intelligibility of the unscrambled speech needs to be examined. Finally, an efficient key distribution scheme is required for the uniform filter bank and wavelet packet based scrambling algorithms. We are not aware of any key distribution schemes for the time-frequency scrambling algorithms and this problem is still open. We have only introduced the properties and the design of 2-channel perfect reconstruction circular convolution filter banks. It would be interesting to study the trade offs between the design parameters and their performance in applications such as speech and image com. -

e-

preasion. This would enable us to develop useful design criteria using the flexibility of this method.

speech bloclu

Blocker

Scrambling Operations
c

encrypted s5eech

Figure 6.2: Extension of the method of Fig. 6.1 to the case of wavelet packet based scramblipg algorithm.*

Appendix A

Fundamentals of Multirate Systems


In this appendix, we give a brief introduction to the fundamental building blocks of multirate systems. A detailed exposition and mathematical derivations are given in [18].

A.l

The Downsampler

The downsampler is also called the decirnator, sampling rate compressor and subsampler. Fig. A.1 shows the block diagram of an M-fold downsampler. It takes an input sequence

x(n) and produces an output sequence

where M is an integer. In other words, a downsampler by M retains only those samples of


x ( n ) which occur at times equal to multiples of M. Fig. A.2 shows how the downsampler

acts on a sequence for the case M = 2. In the transform domain, Y ( z ) ,the z-transform of the output of the downsampler, can be written as

where W = e - j 2 " I M . This can be written in terms of the Fourier transform as

First, X ( e j U )is stretched by a factor M to obtain X ( e j w I M ) . The stretched version is now shifted uniformly by successive multiples of 2n and added to give a 27r periodic spectrum. Downsampling gives rise to aliasing unless the signal is sufficiently bandlimited. Fig. A.5(b) shows the effect of decimation for M = 2.

Figure A.l: An M-fold downsampler or decimator.

Figure A.2: Demonstration of a downsampler forM = 2. The samples with solid dots are retained.

A.2

Upsampler

Figure A.3: An L-fold upsampler or expander.

Figure A.4: Demonstration of upsampling for L = 2.

The upsampler is shown in Fig. A.3. An upsampler is also called an interpolator, sampling
rate expander or an expander. It takes an input sequence x(n) and produces an output
sequence

x ( n / L ) If n is an integer multiple of L otherwise


The process of upsampling is demonstrated in Fig. A.4 for L = 2. In the transform domain, the relation between x(n) and y(n) is given by

In terms of the Fourier transform, the above equation can be written as


Y (ejw ) = x (ejwL)
&

(A4

Thus, Y ( e j w )is an L-fold compressed version of X ( e J Wand ) it contains L compressed copies of X(eJw).These are called images. Fig A.5(c) demonstrates this for L = 3.

-2 f f
(b.1

-7f

, y(ejw)'
Downsampling (M = 2)
x(ejWI2)

27

x ( -e j ~ / 2 )
C

-2u

aliasing (4

Y (ejw)

aliasing Upsampling ( L = 3)

Figure A.5: Effect of downsampling and upsampling on the spectrum of the original signal. (a) The original signal spectrum, (b) The spectrum of the signal after downsampling by 2 (note the overlap of t4e dilated and shifted versions of the spectrum with the original dilated spectrum). (c) The spectrum of the signal after upsampling by 3.

A.3

Noble Identities

The noble identities arise when analy~ing polyphase structures (to be explained in the next section) and in forming equivalent filter banks corresponding to tree structured systems as

Figure A.6: The noble identities

explained in Chapter 2. Let G(z) be a rational function. Then Figs. A.6(a) and A.6(b) are equivalent and Figs. A.6(c) and A.6(d) are equivalent. We now prove these two equivalences. For Fig. A.6(a), the output of the downsampler by M is given by

. Thus, the output of G(z) is

Now for Fig. A.6(b), the input to the downsampler is x(z)G(zM). Then, using (A.2), the output of the downsampler &(z) is

which is the same as Yl(z). To prove the second equivalence, we see from (A.5) that the output &(z) of the system shown in Fig. A.6(c) is, %(Z) = G(zL)X(zL). NOWconsider the output of the upsampler in

Fig. A.6(d). This is x ( z L ) . Hence, the output of this system is K(z) = G ( z ~ ) x ( z ~ ) which ,
is the same as Y3(z).

A.4

Polyphase Representation

Consider a filter H(z) =

5 h ( n ) ~ - ~Given . an integer M,we can decompose H(r)

aa

This can be compactly written as

where

and

I = 0,1,. . . ,M Equation (A.8) is called the type 1polyphase representation of H(z) and El(%),

1, are the polyphase components of H(z). Clearly, the polyphase components E l ( % ) ,I = 0,1, . . . ,M - 1, depend on the choice of M.
The type 2 polyphase representation of H(z) is given by

where (A.10) Type 1 and type 2 polyphase representations

01H(z) are shown in Figs.

A.7 and A.8,

respectively, for arbitrary M and L. The equivalent representations, shown in the same figuree, are obtained using the noble identities. Polyphase representations are useful for the analysis of multirate eystems. They are also useful in the efficient implementation of decimation and interpolation filters [18]. Assuming the filter to have an order N and decimation by a factor M, for a direct implementation,

+ 1 multiplications and N additions are performed in a single unit of time.

After this

the filter is idle for the next M - 1 units of time. In other words, the wasteful utilization of the computational resources places a high requirement on the computational speed. Using

Figure A.7: (a) The original system. (b) The type 1 polyphase representation of the system. (c) Equivalent representation obtained using the noble identities.

Figure A.8: (a) The original system. (b) The type 2 polyphase representation of the system. (c) Equivalent representation obtained using thenoble identities.

polyphase representation, the multipliers and the adders in E l ( z ) , 1 = 0,1,. . . ,M - 1, have M units of time to complete their work. This results in a more efficient utilization of computational resources. For an upsampler of factor L, followed by,a filter, a direct structure calculates N t 1 multiplications although only every Lth sample is non-zero. Using
the polyphase structGre we eliminate these unnecessary calculations.

Appendix B Objective Distance Measures


In this appendix, we give definitions and brief descriptions of the objective distance measures wed in this thesis. These are the LPC and the cepstral distance measures.

Be1

LPC Distance

Linear Predaction (291 attempts to model the characteristics of the vocal tract using an all

pole model given by

where p is the order of the model and al, a l , .

. .,a,

are the linear prediction coefficients. Let

The

LPC distance, also known as the

Itakura distance or the log-likelihood ratio [30,5], is

baaed on the dissimilarity between the all-pole model of the clear and the encrypted (or scrambled) speech block. To calculate this measure between a block of clear speech and a block of scrambled speech, the LPC coefficients of both are calculated. The LPC distance between them is given by

where R, is the autocorrelation matrix of the clear speech block, vector a, contains the

LPC coefficients for the clear speech block and vector a, contains the LPC coefficients for
the scrambled speech block.

Note that the LPC distance is always greater than or equal to zero, i.e., dLpg(c,e ) 2 0. .Also it is nonsymmetric. That is, dLPC(c, e ) # dLPc(e,c). In our study, we have chosen p,
the order of the LPC model, as 10,

Cepstral Distance
The cepstral distance measure (51 is defined as

where cc(i) and c,(i) are the cepstral coefficients of the clear and the scrambled speech

blocks respectively. The cepstral coefficients are calculated from the LPC coefficients using the formula given by [29]

where (1, -q,-a$, ,. . , -a,) are the LPC coefficients of the block under consideration. The cepstsum represents the smoothened spectrum of a signal. Thus, the cepstral distance ia a measure of the spectral difference between the clear and the corresponding scrambled speech. Since we are using a loth order linear predictor in LPC distance calculation, we also use 10 cepstral coefficients to calculate the cepstral distanc'e.

Appendix C

Orthonormality Relation for Perfect Reconstruct ion Circular Cpnvolution Filter Banks
In this Appendix, we prove the result (5.32) given (5.31) Let us denote the cyclic autocorrelation of ho(n) by r(n). Then
N-1

r(k) =

n=O

ho(n)hO((nt k ) mod N )

From (5.1) m d (5.12), downsampling of r ( k ) by 2 gives


1 DFT ( r ( 2 k ) ) = ; ; [R(ej2"'fN) + R(- ej2"'IN)]

But, f r o m (5.31)

Thus,

or

N-1

n=O

C h ~ ( n ) h o ( (t n 2k) mod N )

6(2kmadN)0, k E 2

References
[l]H. C. Beker and F. C. Piper, Secure Speech Communications. Academic Press, 1985.

[2] W.Diffie and M. Hellman, "New directids%i'cryptograp&,"


Information Theory, vol. 22, pp. 644-654, Nov. 1976.

IEEE Transactions on

[3] R. V. Cox, D. E. Bock, K. B. Bauer, J. D, Johnston, and J. H. Snyder, "The analog voice privacy system," AT 8 T Tech. Journal, vol. 66, pp. 119-131, Jan./ Feb. 1987.
[4] ''Tutorial on speech technology," Speech and Vision Laboratory, Dept, of Computer

Science and Engg., Indian Institute of 'fechnology, Madras, Dec. 1992.


[5]

S8Sridharan, E. Dawson, and B. Goldburg, "Fast Fourier transform based speech encryption system," IEE Proc.-I, vol. 138, pp. 215-223, June 1991,

161 N. S. Jayant, B. J. McDermott, S. W. Christensen, and A. M. S. Quinn, 'A comparison

of four methods for analog speech scrambling," IEEE Transactions on Communications, vol. 29, pp. 38-23, Jan. 1981.

[7] N. S. Jayant, R. V. Cox, B. J. McDermott, and A. M. S. Quinn, "Analog scramblers


b~ed on sequential permutations in time and frequency," BSTJ, vol. 62, pp, 25-46,

Jan, 1983,

[%I

B. Goldburg, S. Sridharan, and E. Dawson, "Design and cryptanalysis of transform based andog speech scramblers," IEEE J. Sel. Areas in Comm., vol. 11, pp. 735-744,June 1993.

[9] K. Sakurai,

T,Koga, and T,Muratani,

"An analog speech scrambling scheme using

FFT,"IEEE J. Sel. Areas in Comm., vol. 2, pp. 434-442, March 1984.


[la] N.

S. Jayant, "Effective number of keys in a voice privacy system based on permutation scrambling," AT & T Tech. Journal, vol. 66, pp, 132-136, Jan./ Feb. 1987.

[Ill P. Bendjoya, E. Slezak, and C. Froeschl6, "The wavelet transform: a new tool for
asteroid family determination," Astronomy and Astrophysics, vol. 251, pp. 312-330, 1991. [12] I. Daubechies, S. G. Mallat, and A. Willsky, "Special issue on wavelets and multiresoMion analysis," IEEE Ransactions on Information Theory, vol. 32, March 1992. 1131 P. Duhamel, P. Flandrin, T. Nishitani, A. H. Tewfik, and M. Vetterli, "Special issue on wavelets and filter banks," IREE Transactions on Signal Processing, vol, 41, Dec. 1993.

[14]I. ~aubechiek,"Orthonorrnal bases of compact1y.supported bvavelets," Comn. in Pure


and Applied Math., vol. 41, pp. 909-996, Nov. 1988. [IS] S.

CL

-.I

G. Mallat, "Multiresolution approximations

and wavelet orthonormal bases of

L'(R),n l h n r a c t i o n ~ of the American Mathematical Society, vol. 315, pp. 69-87, Sept.
1989. [16] S. G. Mallat, "A theory for multiresolution signal decomposition: the wavelet representation," IEEE Tkansactions on Pattern Analysis and Machine Intelligence, vol. 11, pp. 674-693, July 1989. [17] I. Daubechies, Ten lectures on wavelets. SIAM, CBMS series, 1992. [18] P. P. Vaidyanathan, Multirate systems and filter banks. Englewood Cliffs, New Jersey: Prentice-Hall, 1993. [19] C. Herley, Wavelets and Filter Banks. PhD thesis, Columbia University, 1993.
[20] M. Vetterli and C. Herley, "Wavelets and Filter Banks: theory and design," IEEE

Transactions on Signal Processing, vol. 40, pp. 1414-1429, Sept. 1992. [21] M. V. Wickerhauser, "INRIA lectures on wavelet packet algorithms," Tech. Rep., Dept: of Math,, Yale Univ., March 1991. [22]

D,Esteban and C. Galand, "Applications of quadrature mirror filters to subband voice


coding schemes," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Proc., pp. 191194, May 1977.

3. T.Sdthabd T. P. Barnwell, "Exact reconstruction techniques for tree structured subband CEulaZs," IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 38,
pp, 434-441, June 1986.
(281 F. Mintzer, 'Filters for distortion-free two-band rnultirate filter banks," IEXE Duns.

ors Acdwtics, Speech and Signal Processing, vol. 33, pp. 628-630, June 1985.

[q5. D. Gracias, Channel Equalization using Wave Packets, Master's thesis, Indian Institute of Scimcc, Bmgaloe, J m . 1994.

f24I. Daubechias, S'.Jdard, and J. Jourr~B,3 4 -aimpie Wilsop orthonormal basis with
exponentid decay,n SIAM J. of Math. And., vol. 22, pp. 554-572, March 9991.
(271 A.

K. Soman an8 F. P. Vaidyanathan, "On orthonormal wavelets and filter banks,"

IEEE fkcnsactions on Signal Processing, vol, 41, pp. 1170-1183, March 1993.

[2&j 6. Karlueon and M. Vetterli, "Extension of finite length signal8 for sub-b,and coding,"
Signal Processing, vol. 17, pp. 161-168, Feb. 1989.

J, ~ ~ o u"Linear l , Prediction:A tutorial review," Proc. IEEE, vol. 63, pp. 561-580,
1975.

[30] S. R. Quackenbush, T. P. Barnwell, and M. A. Clements, Objective measures of speech quality, Englewood Cliffs, New Jersey 07632.: Prentice-Hall, 1988.

You might also like