14ec3029 Speech and Audio Signal Processing

14EC3029 SPEECH AND AUDIO
SIGNAL PROCESSING
Credits3:0:0
Pre requisites: Basic Digital Signal Processing,
Good knowledge of MATLAB and Simulink
D.Sugumar
Asst.Prof (AGP 8000)/ECE
Karunya University
Coimbatore
Course objective
To study the analysis of various M-band filter
banks for audio coding
To learn various transform coders for audio
coding.
To study the speech processing methods in
time and frequency domain
To study the basic concepts of speech and
audio.
Course outcome
On successful completion you should be able to:
1. Express the speech signal in terms of its time domain and frequency
domain representations and the different ways in which it can be
modelled.
2. Express the simple features used in speech and audio applications
3. Able to understand the operation of algorithms, and the effects of varying
parameter values within these;
4. Synthesise block diagrams for speech applications, know the purpose of
the various blocks, and detail algorithms that could be used to implement
them;
5. Able to Implement components of speech processing systems, including
speech recognition and speaker recognition, in MATLAB.
6. Able to understand the behaviour of previously unseen speech processing
systems and hypothesise about their merits.
Course Contents
Mechanics of speech and audio
Nature of Speech signal Discrete time modelling of
Speech production Classification of Speech sounds
Absolute Threshold of Hearing - Critical Bands- Masking,
Perceptual Entropy -The perceptual audio quality measure
(PAQM) - Cognitive effects in judging audio quality.
Time-frequency analysis - filter banks and transforms.

Audio coding and transform coders.
Time and frequency domain methods for speech
processing, Homomorphic speech analysis.
Linear predictive analysis of speech, Application of LPC
parameters. Formant analysis.
Audio Engineering-related Professions
Musician
Audio/electronics technician
Recording Engineer
Architectural Acoustician
Psychoacoustician
Electroacoustic device designer
Electronics designer
Computer programmer
Audio engineer: One who devises creative solutions
to difficult problems in the field of audio.
Audio-related Organizations,
Conferences and Publications
Acoustical Society of American (ASA) Journal: JASA
(since 1929) meets twice yearly in various locations.
Most members work at universities or scientific labs.
http://asa.aip.org
Audio Engineering Society (AES) Journal: JAES
(since 1953) meets several times a year in various
locations. Most members work in the audio industry.
http://www.aes.org
IEEE Signal Processing Society
Conferences: Int. Conf. on Acoustics, Speech and Signal
Processing (ICASSP) and Workshop on Applications of
Signal Processing to Audio and Acoustics (WASSP).
http://www.ieee.org/organizations/society/sp/
Audio-related Organizations,
Conferences and Publications
International Computer Music Association (since 1980)
Publication: Computer Music Journal (quarterly since 1977)
Yearly International Computer Music Conference (ICMC) at American, European,
and Asian locations (since 1974).
http://computermusic.org
Society for Music Perception and Cognition (SMPC)
Yearly conferences alternate between U.S./Canada and Europe/Asia.
http://www.musicperception.org/
International Conference on Digital Audio Effects (DAFx)

http://dafx.labri.fr Yearly international (since 1997).
International Conference on Music Information Retrieval
http://ismir2007.ismir.net/ Yearly international (since 2000).
International Symposium on Musical Acoustics
http://iwk.mdw.ac.at/ma/ Roughly biennial international.
Reference Books
Books
1. Digital Audio Signal Processing, Second Edition, Udo Zolzer, A John Wiley& sons Ltd
Publicatioons,2008.
This book ... covers noiseshaping, gives you formulas
for peaking and shelving
filters used in mixing
consoles, tells you how to
implement a state of the art
reverb or dynamic
compression algorithm and
explains how audio
compression using
psychoacoustic effects
works. The mathematics are
not that complicated, but
you should already know
what FFT or IIR stands for
and how they work to be
able to use the book.
2. Applications of Digital Signal Processing to Audio and Acoustics, Mark Kahrs,

Karlheinz Brandenburg, Kluwer Academic Publishers New York, Boston, Dordrecht,
London , Moscow,2002
Good One
3. . Digital Processing of Speech signals L.R.Rabiner and R.W.Schaffer - Prentice

Hall 1978
Perfect Choice for this course
4.Speech and Audio Signal Processing by Ben Gold and Nelson

Morgan
This is a book much needed in the speech and
audio community because of its unique

perspective on these topics. By their very
nature, speech, music and other audio signals
are only fully understood if one takes into
account their perception, production, and the
context within which they exist (language,
symphony). To appreciate what to process
about such signals, the scientist must have a
broad appreciation of linguistics, hearing, vocal
tract models, and the brain in general, in
addition to the standard engineering tools and
approaches. This is why this book is valuable.
It indeed attempts to reach out to all these
fields with just enough details to inspire the
reader, and to provide links to existing more
detailed literature. The book is well written,
full of excellent illustrations, and it was the
perfect choice for this class.
5. T. F. Quatieri, Principles of Discrete Time Speech Processing, Prentice Hall

Inc, 2002
1. Express the speech signal in terms of its time
domain and frequency domain representations
and the different ways in which it can be
modelled
2. Derive expressions for simple features used in
speech classification applications;
3. Explain the operation of example algorithms
covered in lectures, and discuss the effects of
varying parameter values within these;
4. Synthesise block diagrams for speech
applications, explain the purpose of the
various blocks, and describe in detail algorithms
that could be used to implement them;
5. Implement components of speech processing
systems, including speech recognition and
speaker recognition, in MATLAB.
6. Deduce the behaviour of previously unseen
speech processing systems and hypothesise
about their merits.
study
I hear...and
I forget
I see...and
I remember
learn
I Practice
do...and I understand
Chinese Proverb
Hardware & Software for QA
My collection @
Google drive
E-Books
E-books
Videos
For DSP & solution NPTL

Youtube
For Matlab
For Labview
For Scilab
Websites
JDSP
Mathwork center
Dspguru
etc
A Brief History of Audio (Analog)

Electromagnetic microphone (Ernst Siemens 1874)
Telephone (Alexander Graham Bell 1876)
Phonograph (wax cylinders) (Thomas Edison 1877)
Gramophone 78 record (Emile Berliner 1888)
Telegraphone magnetic wire recorder (Valdemar Paulsen 1898)
Telharmonium first electrical synthesizer (Thadeus Cahill 1900)
AM radio (Reginald Fessenden 1905)
First radio broadcast (Lee DeForest (Met Opera) 1910)
Vacuum tube amplifier (Edwin Armstrong 1912)
Electrostatic microphone (E. Wente, Bell Labs 1916)
Electromagnetic loudspeaker (Chester Rice &Edward Kellog 1924)
FM radio (Edwin Armstrong 1933; common use began in 1950s)
Wire recorder (consumer heyday 1947-1952)
Magnetic tape (reel-to-reel Ampex 1948; cassette Phillips 1962)
33 rpm (LP) record (Columbia Records 1948)
multitrack recording (Ampex 1954)
stereo LP (Westrex 1958) and FM (GE/Zenith 1961, based on Armstrong)
A Brief History of Audio (Digital)

12-bit digital recording on computer tape (Bell Labs, 1957)
13-bit digital recording on computer tape (Illiac II, UIUC, 1964)
12-bit digital recording on computer disk (DEC/Stanford U., 1965)
16-bit 2/4 channel digital recorder (Soundstream, 1976/77)
Sony PCM (16-bit stereo recorded on video tape 1978)
compact disk (CD) (16-bit stereo format) (Phillips/Sony 1983)
stereo digital audio tape (DAT) (Phillips/Sony 1986)
recordable CD: CD-R (Phillips/Sony 1988)
sound compression (MP3 standard 1989)
audio record/playback from home computer (NeXT 1989)
8-track digital audio tape (ADAT) (Alesis 1991)
8-track digital audio tape (DA-88) (Tascam 1992)
MiniDisc MD (Sony 1998)
Introduction to Speech Signal

Processing
Speech: Fundamental and eortless mode of communication
among humans.
Speech communication: Talker, listener and channel
Speech Production Process: Message formulation, language
coding, neuro-muscular commands, movement of speech
production organs, acoustic pressure variations
Speech Perception Process: acoustic pressure variations,
movement of speech perception organs, neuro-muscular
commands, message comprehension
Applications (Project Areas)

1 Speech Modification:
time-scale manipulations:
Fitting the speech waveform In Radio and TV commercials into an allocated time slot and the synchronization of audio and video
presentation.
Speeding up speech
Message playback
Voice mail
Reading machines and books for the blind
Slowing down speech

Learning a foreign language
Voice transformations using Pitch and spectral changes of speech signal:

Voice disguise
Entertainment
Speech synthesis
Spectral change of frequency compression and expansion:

may be useful in transforming speech as an aid to the partially deaf.
Many methods can be applied to music and special effects.

2.Speech Coding
Goal is to reduce the information rate measured in bits per second
while maintaining the quality of the original waveform.
Waveform coders:
Represent the speech waveform directly and do not rely on a speech
production model.
Operate in a high range of 16-64 kbps
Vocoders:
Largely are speech model-based and rely on a small set of model parameters.
Operate at the low bit range of 1.2-4.8 kbps
Lower quality then waveform coders.
Hybrid coders:
Partly waveform based and partly speech model-based
Operate in the 4.8 16 kbps range

Applications of speech coders include:
Digital telephony over constrained bandwidth channels
Cellular
Satellite
Voice over IP (Internet)
Video phones
Storage of Voice messages for computer voice mail
applications.

3 Speech Enhancement
Goal is to improve the quality of degraded speech.
Preprocess speech before is degraded:
Increasing the broadcast range of transmitters constrained by a peak power
transmission limits (e.g., AM radio and TV transmissions).
Enhancing the speech waveform after it is degraded.

Reduction of additive noise in
(Digital) telephony
Vehicle and aircraft communications
Reduction of interfering backgrounds and speakers for the hearing impaired,
Removal of unwanted convolutional channel distortion and reverberation
Restoration of old phonograph recordings degraded by:
Acoustic horns
Impulse-like scratches from age and wear

4 Speaker Recognition
Speech signal processing exploits the variability of speech model
parameters across speakers.
Verifying a persons identity (Biometrics)
Voice identification in forensic investigation.
Understanding of the speech model features that cue a persons

identity is also important in speech modification where model
parameters can be transformed for the study of specific voice
characteristics:
Speech modification and speaker recognition can be developed synergistically.
Office
Room No 206 (2nd Floor of ECE)
E-Mail: sugumar@karunya.edu

14ec3029 Speech and Audio Signal Processing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

14ec3029 Speech and Audio Signal Processing

Uploaded by

Copyright:

Available Formats

14EC3029 SPEECH AND AUDIO

Time-frequency analysis - filter banks and transforms.

Audio Engineering-related Professions

International Conference on Digital Audio Effects (DAFx)

2. Applications of Digital Signal Processing to Audio and Acoustics, Mark Kahrs,

3. . Digital Processing of Speech signals L.R.Rabiner and R.W.Schaffer - Prentice

Perfect Choice for this course

4.Speech and Audio Signal Processing by Ben Gold and Nelson

audio community because of its unique

5. T. F. Quatieri, Principles of Discrete Time Speech Processing, Prentice Hall

Hardware & Software for QA

For DSP & solution NPTL

A Brief History of Audio (Analog)

A Brief History of Audio (Digital)

Introduction to Speech Signal

Applications (Project Areas)

Slowing down speech

Voice transformations using Pitch and spectral changes of speech signal:

Spectral change of frequency compression and expansion:

Many methods can be applied to music and special effects.

Applications (Project Areas)

Applications (Project Areas)

Applications (Project Areas)

Enhancing the speech waveform after it is degraded.

Applications (Project Areas)

Understanding of the speech model features that cue a persons

You might also like