Professional Documents
Culture Documents
DSP
by
Andreas Spanias, Ph.D.
Speech Processing
spanias@asu.edu
Topics
5. Algorithm Examples
6. Remarks
20 Formant Structure
0.0
Amplitude
Magnitude (dB)
-1.0 -20
0 8 16 24 32 0 1 2 3 4
Time (mS) Frequency (KHz)
20
0.0
0
Amplitude
f1 f2 f3 f4 f5
Speech Analysis/Synthesis
Speech analysis-synthesis: speech is analyzed (represented)
in terms of a compact parametric set which is then used for
speech synthesis. Speech coding at medium-rates and below
is achieved using an analysis-synthesis process.
x =
f f f
X(z) S(z)
1/A(z)
* =
t t t
V/UV
VOCAL SYNTHETIC
gain TRACT
SPEECH
FILTER
b0
H ( z) M
1 ai z i
i 1
s(n)
+
^
Select + + s(n)
-
or Form gain
Excitation
+ +
A (z) A(z)
L
LTP LP
MSE W(z)
1
1 0.95 z 30
10
-10
0 0.5 0.9 1
Network or toll
Toll or Network quality refers to quality comparable
to the classical analog speech (200-3200 Hz)
Communications
Communications quality implies somewhat degraded
speech quality but adequate for cellular communications.
Synthetic
Synthetic speech is usually intelligible but can be
unnatural and associated with a loss of speaker recognizability.
Employ for the most part full searches of the code books and
LTPs
x (n) ^ +
C s (n)
W
+ +
g W(z)
...
... + +
-
e (n)
VQ index C
Error
Minimization
e c k s w sˆ w0 g k sˆ w k
swT sˆw k
gk T
sˆw k sˆw k
c k s s w T w
T sˆ k s T
w 2
sˆ w k sˆ w k
w
M.R. Schroeder and B. Atal, "Code-Excited Linear Prediction (CELP): High Quality Speech at
Very Low Bit Rates," Proc. ICASSP-85, p. 937, Tampa, Apr. 1985.
Long Term
ga
Filter State
+ speech
Codebook 1 g1 Postfilter
VQ-1 Index
A(z)
Codebook 2 g2
VQ-2 Index
- developed by Motorola - part of IS-54 and GSM cellular standards
- speech sampled at 8 kHz - segmented in 20ms frames - sub-frames of 5 ms
- complexity estimated at 30 MIPS - - MOS 3.45
I. Gerson and M. Jasiuk, "Vector Sum Excited Linear Prediction (VSELP) Speech Coding at 8 kbits/s,"
Proc. ICASSP-90, pp. 461-464, New Mexico, Apr. 1990.
A. Spanias, M. Deisher, P. Loizou and G. Lim, "Fixed-Point Implementation of the VSELP algorithm,
ASU-TRC Technical Report, TRC-SP-ASP-9201, July 1992.
Employ for the most part partial searches of the LTPs; usually open loop
estimate refined by closed loop search around the neiborhood of estimate
GSM: Enhanced Full Rate Speech Transcoder, ETSI GSM 6.60, Nov. 1996
- ETSI TS 126 090 V.3.1.0 2000-01 - AMR SPEECH CODEC TRANSCODING FUNCTIONS 3G-TS 26.090 Technical Specification
- R. Ekudden, R. Hagen, I. Johansson, and J. Svedburg, "The Adaptive Multi-Rate speech coder, Proc. IEEE Workshop on
Speech Coding, pp. 117-119, 1999
• Algorithm to provide higher quality, flexibility, and capacity over existing IS-96C, IS-
127 EVRC, and IS-733 (that replaced IS-96C but working at higher average rate)
• The Conexant SMV algorithm became the core technology for 3G CDMA (core SMV
algorithm to be refined in the interim by participating companies according to the
publication below)
• Based on 4 codecs: full rate at 8.5 kbps, half rate at 4 kbps, quarter rate at 2 kbps, and
eighth rate at 800 bps
• Pre-processing includes noise suppression similar to IS 127 EVRC
• Full rate and half rate based on Conexants eXtended CELP (eX-CELP) a core
technology also used in the ITU G.4 Conexant submission to ITU-4
• Performed better than IS-733 and IS-127 in tests with and without background noise
• Scored as high as 4.1 MOS at full rate with clean speech. Performed very well with
background noise
REFERENCES:
[1] “The SMV algorithm selected for TIA and 3GPP2 for CDMA applications,” conference paper by Conexant systems, Y.Gao, E.
Schlomot, A. Benyassine, J. Thyssen, H. Su, and C Murgia (portions published at ICASSP-2001)
STANDARDS AT A GLANCE
• ITU Telephony
– G.711 PCM (64 kbps) late 60’s
– G.726 ADPCM (32/40/ 24/16 kbps) 1988
– G.728 LD-CELP coding (16 kbps) 1992
– G.723.1 True Speech (5.3/6.3 kbps) 1995
– G.729 CS-ACELP (8/12.8/6.4 kbps) 1996 and Annex in 1998
– G.4kbps Toll quality at 4 kbps (on going)
• Non-ITU
– MPEG1/Audio (includes MP3), 1991
– MPEG2/Audio: 64 kbps (1992)
– MPEG4/Audio: audio/speech coding at bit rates between 64 and 2 kbps (1998)
– MPEG7/Audio: audio/speech/MIDI coding (ongoing)
• ETSI (GSM):
– 13 kbps RPE-LTP (Full rate GSM, 1988)
– 6.5 kbps VSELP (Half-rate GSM, 1993)
– 12.2 kbps EFR (Enhanced full-rate GSM, 1996)
– 12.2 - 4.75 kbps AMR (Adaptive Multi Rate, 1999)
• ARIB Japan
– Full-rate PDC (Personal Digital Communication) 6.7 kbps VSELP
– Half-rate PDC 3.45 kbps Multimode CELP`
Vocoder/Waveform/Hybrid
MOS PCM
Hybrid Coders ADPCM
1-5 SMV
CELP
Waveform Coders
MELP
LPC10e
Vocoders
1 2 4 8 16 32 64