Audio Coding: Ketan Mayer-Patel

Audio Coding
Ketan Mayer-Patel
CS 294-9 :: Fall 2003

Overview of Today
PCM
Linear Sampling Techniques
m-LaW
DPCM
Generic Coding Techniques
ADPCM
MPEG-1 Psychoacoutic Coding
Vocoding Speech Specific Techniques
CS 294-9 :: Fall 2003

Audio Signals
Analog audio is basically voltage as a continuous function
of time.
Unlike video which is 3D, audio is a 1D signal.
Can capture without having to discretize the higher dimensions.
Audio sampling basically boils down to quantizing signal
level to a set of values.
Digital audio parameters:
bits per sample
sampling rate
number of channels.
CS 294-9 :: Fall 2003

Sampling
Pulse Amplitude Modulation (PAM)

Each samples amplitude is represented by 1 analog value
Sampling theory (Nyquist)
If input signal has maximum frequency (bandwidth) f,
sampling frequency must be at least 2f
With a low-pass filter to interpolate between samples, the
input signal can be fully reconstructed
CS 294-9 :: Fall 2003

PCM
0100 Quantization error (noise)
0011
0010
0001
0000
1001
1010
1011
1100
Pulse Code Modulation (PCM)

Each samples amplitude represented by an integer code-word
Each bit of resolution adds 6 dB of dynamic range
Number of bits required depends on the amount of noise that is tolerated
SNR 4.77
n =
6.02
CS 294-9 :: Fall 2003
Linear PCM
Uses evenly spaced quantization levels.
Typically 16-bits per sample.
Provides a large dynamic range.
Difficult for humans to perceive
quantization noise.
Compact Disks
16-bit linear sampling
44.1 KHz sampling rate
2 channels
CS 294-9 :: Fall 2003
Non-linear Sampling
If we try to use 8 bits per sample, dynamic
range is reduced significantly and
quantization noise can be heard.
In particular, we end up with not enough
levels for the lower amplitudes.
Solution is to sample more densely in the
lower amplitudes and less densely for the
higher amplitudes.
Sort of like a log scale.
CS 294-9 :: Fall 2003
Non-linear Sampling Illustrated
Output
Input
CS 294-9 :: Fall 2003

m-law and A-law
Non-linear sampling called companding
8-bits companded provides dynamic range
equivalent to 12-bits.
U-law and A-law are companding standards
defined in G.711
Difference is in exact shape of piece-wise
linear companding function.
CS 294-9 :: Fall 2003

m -Law companding
Provides 14-bit quality (dynamic range) with an 8-bit

encoding
Used in North American & Japanese ISDN voice service
Simple to compute encoding
ln(1 + m|x|)
f(x) = 127 x sign(x) x (x normalized to [-1, 1])
ln(1 + m)
CS 294-9 :: Fall 2003
m -Law Encoding
High-resolution 8-bit Inverse
PCM encoding Table m-Law 14-bit
Table decoding
(12, 14, 16 bits) Lookup encoding Lookup
Sender Receiver
Input Step Segment Quanti- Code

Amplitude Size zation Value
0-1 1 0000 0
1-3 000 0001 1
...
...
...
...
2
29-31 1111 15
31-35 001 0000 16
...
...
...
...
4
91-95 1111 31
95-103 010 0000 32
8
...
...
...
...
215-223 1111 47
223-239 011 0000 48
16
...
...
...
...
463-479 CS 294-9 :: Fall

1111 2003 63
m -Law Decoding
High-resolution 8-bit Inverse
PCM encoding Table m-Law 14-bit
Table decoding
(12, 14, 16 bits) Lookup encoding Lookup
Sender Receiver
m-Law Multiplier Decode

Endoding Amplitude
0000000 1 0
0000001 2
...
...
2
0001111 30
0010000 33
...
...
4
0011111 93
0100000
... 99
...
8
0101111 219
0110000 231
...
...
16
CS 294-9 :: Fall0111111
2003 471
Difference Encoding
0100
0011
0010
0001
0000
1001
1010
1011
1100
Differential-PCM (DPCM)
Exploit temporal redundancy in samples
Difference between 2 x-bit samples can be represented
with significantly fewer than x-bits
Transmit the difference (rather than the sample)
CS 294-9 :: Fall 2003
Slope Overload Problem
0100
0011
0010
0001
0000
1001
1010
Slope Overload
1011
1100
Differences in high frequency signals near the

Nyquist frequency cannot be represented with a
smaller number of bits!
Error introduced leads to severe distortion in the higher
frequencies
CS 294-9 :: Fall 2003
Adaptive DPCM (ADPCM)
Use a larger step-size to encode differences

between high-frequency samples & a smaller step-
size for differences between low-frequency
samples
Use previous sample values to estimate changes in
the signal in the near future
CS 294-9 :: Fall 2003
ADPCM
To ensure differences are always small...
Adaptively change the step-size (quanta)
(Adaptively) attempt to predict next sample
value
y-bit + x-bit
PCM Difference ADPCM
sample + Quantizer difference

Predicted Step-Size
PCM Adjuster
Sample n+1
+
Predictor + +
Dequantizer
CS 294-9 :: Fall 2003

IMAs proposed ADPCM
16-bit + 4-bit
PCM Difference ADPCM
sample + Quantizer difference

Step-Size
PCM Adjuster
Sample n1
+
Register + +
Dequantizer
Predictor is not adaptive and simply uses the last

sample value
Quantization step-size increases logarithmically
with signal frequency
CS 294-9 :: Fall 2003
IMA Difference Quantization
16-bit + Difference 4-bit
PCM + Quantizer ADPCM
sample difference
(in step-size units)
PCM Step-Size
sample Adjuster
n1
+
Register + +
Dequantizer
Quantizer Step-Size
Quantization Output Multiples
difference < 1 4 step_size 000 0.0
1 step_size < difference < 1 step_size 001 0.25
4 2
2 4
3 step_size < difference < step_size 011 0.75
4
step_size < difference < 5 4 step_size 100 1.0
4 2
2
CS 4294-9 :: Fall 2003
7 step_size < difference 111 1.75
4
IMA Step-size Table
Step Step Step Step Step
Index Size Index Size Index Size Index Size Index Size
0 7 18 41 36 230 54 1282 72 7132
1 8 19 45 37 253 55 1411 73 7845
2 9 20 50 38 279 56 1552 74 8630
3 10 21 55 39 307 57 1707 75 9493
4 11 22 60 40 337 58 1878 76 10442
5 12 23 66 41 371 59 2066 77 11487
6 13 24 73 42 408 60 2272 78 12635
7 14 25 80 43 449 61 2499 79 13899
8 16 26 88 44 494 62 2749 80 15289
9 17 27 97 45 544 63 3024 81 16818
10 19 28 107 46 598 64 3327 82 18500
11 21 29 118 47 658 65 3660 83 20350
12 23 30 130 48 724 66 4026 84 22358
13 25 31 143 49 796 67 4428 85 24623
14 28 32 157 50 876 68 4871 86 27086
15 31 33 173 51 963 69 5358 87 29794
16 34 34 190 52 1060 70 5894 88 32767
17 37 35 209 53 1166 71 6484
CS 294-9 :: Fall 2003
Adaptive Step-size Selection
16-bit + Difference
4-bit
PCM + Quantizer ADPCM
Sample difference
(in step-size units)
PCM Step-Size
Sample Adjuster
n1 +
Register + +
Dequantizer
Index
Range Limit Adjustment Step-Size Quantizer
Step-Size
Table
(0 to 88) + Table Index Output
Lookup Adjustment
Previous Lookup
Index Register
CS 294-9 :: Fall 2003

New Step-Size
Adaptive Step-size Selection
Step-Size Range Limit Step-Size Difference
Table (0 to 88) + Index Table Index Quantizer
Lookup Adjustment Adjustment
Previous Lookup
Index Register
New Step-Size
Quantizer Step-Size Table Step-Size

Quantization Output Index Adjustment Adjustment
difference < 1 4 step_size 000 -1 X 0.91
1 step_size < difference < 1 step_size 001 -1 X 0.91
4 2
1 step_size < difference < 3 step_size 010 -1 X 0.91
2 4
3 step_size < difference < step_size 011 -1 X 0.91
4
step_size < difference < 5 4 step_size 100 2 X 1.21
5 step_size < difference < 3 step_size 101 4 X 1.46
4 2
3 step_size < difference < 7 step_size 110 6 X 1.77
2 4
7 step_size < difference 111
CS 294-9 :: Fall 2003 8 X 2.14
4
IMA ADPCM Example
X Step Q Adj I M Decode

150 7 0 150
155 5 7 010 -1 0 0.5 3.5 154
167 13 7 111 8 8 1.75 12 166 + Difference
170 4 16 001 -1 7 0.25 4 170 Xn + Quantizer
250 80 14 111 8 15 1.75 24.5 195
250 55 31 111 8 23 1.75 54 249 Step-Size
Xn1 Adjuster
250 1 66 000 -1 22 0.0 0 249
250 1 60 000 -1 21 0.0 0 249 +
200 -49 55 011 -1 20 0.75 -41 208
200
Register + Dequantizer
+
200
200
200
200 CS 294-9 :: Fall 2003
200
Networking Considerations
The IMA codec is
+ reasonably robust to errors
Dequantizer + PCM
+ An interval with a low-level
sample n1
Step-Size signal will correct any step-
Adjuster Register size error
Quantizer Step-Size Table

Quantization Output Index Adjustment
difference < 1 4 step_size 000 -1
1 step_size < difference < 1 step_size 001 -1
4 2
1 step_size < difference < 3 step_size 010 -1
2 4
3 step_size < difference < step_size 011 -1
4
step_size < difference < 5 4 step_size 100 2
5 step_size < difference < 3 step_size 101 4
4 2
3 step_size < difference < 7 step_size 110 6
2 4
7 step_size < difference CS 294-9 :: Fall 2003 111 8
4
Psychoacoustic Properties
100
Audible
Sound 80
Level
(dB) 60
40
20 Inaudible
0 Frequency
0.02 0.05 0.1 0.2 0.5 1 2 5 10 20 (kHz)
Human perception of sound is a function of frequency

and signal strength
(MPEG exploits this relationship.)
CS 294-9 :: Fall 2003
Auditory Masking
100 Audible
80
Sound Masking tone
Level 60
(dB) 40
Masked tone
20
Inaudible
0 Frequency
0.02 0.05 0.1 0.2 0.5 1 2 5 10 20 (kHz)
The presence of tones at certain frequencies makes

us unable to perceive tones at other nearby
frequencies
Humans cannot distinguish between tones within 100
Hz at low frequencies and 4 kHz at high frequencies
CS 294-9 :: Fall 2003
MPEG Encoder Block Diagram
PCM
Audio
Samples Mapping Quantizer Coding
(32, 44.1,
48 kHz)
Psycho-
Frame Encoded
acoutstic Bitstream
Packing
Model
Ancillary Data
CS 294-9 :: Fall 2003

Subband Filter
Transforms signal from time domain to
frequency domain.
32 PCM samples yields 32 subband samples.
Each subband corresponds to a freq. band evenly
spaced from 0 to Nyquist freq.
Filter actually works on a window of 512
samples that is shifted over 32 samples at a time.
Subband coefficients are analyzed with
psychoacoustic model, quantized, and coded.
CS 294-9 :: Fall 2003

Layer 1
384 samples per frame.
Iterative bit allocation process:
For each subband, determine MNR.
Increase number of quantization bits for
subband with smallest MNR.
Iterate until all bits used.
Fixed allocation of bits among subbands for
a particular frame.
Up to 448 kb/s
CS 294-9 :: Fall 2003
Layer 2
1152 samples per frame.
Iterative bit allocation.
Subband allocation is dynamic.
Up to 384 kb/s
CS 294-9 :: Fall 2003

Layer 3
1152 samples
Up to 320 kb/s
Each subband further analyzed using MDCT
to create 576 frequency lines.
4 different windowing schemes depending on
whether samples contain attack of new
frequencies.
Lots of bit allocation options for quantizing
frequency coefficients.
Quantized coefficients Huffman coded.
CS 294-9 :: Fall 2003
Vo-coding
Concept: Develop a mathematical
model of the vocal cords & throat
Derive/compute model parameters for
a short interval and transmit to the
decoder
Use the parameters to synthesize
speech at the decoder
So what is a good model?

A buzzer in a tube!
The buzzer is characterized by its
intensity & pitch
The tube is characterized by its
formants
CS 294-9 :: Fall 2003
Vocoding - Basic Concepts
75
Amplitude 60
45
30
15
Frequency
0 (kHz)
Formant frequency maxima & minima in

the spectrum of the speech signal
Vocoders group and code portions of the
signal by amplitude
CS 294-9 :: Fall 2003
Buzzer and Tube Model
yadda yadda yadda
Vocoding principles:
voice = formants + buzz pitch & intensity
voice estimated formants = residue
Linear Predictive Coding (LPC)
A sample is represented as a linear combination of p
previous samples
p
y(n) = ak y(n k) + G x x(n)
k=1
CS 294-9 :: Fall 2003

LPC
Decoder artificially generates speech via formant synthesis
A mathematical simulation of the vocal tract as a series of bandpass
filters
Encoder codes & transmit filter coefficients, pitch period, gain
factor, & nature of excitation
Standards:
Regular Pulse Excited Linear Predictive Coder (RPE-LPC)
Digital cellular standard GSM 6.1 (13 kbps)
Code Excited Linear Predictive Coder (CELP)
US Federal Standard 1016 (4.8 kbps)
Linear Predictive Coder (LPC)
US Federal Standard 1015 (2.4 kbps)
CS 294-9 :: Fall 2003

Networking Concerns
Audio bandwidth is actually quite small.
But human sensitivity to loss and noise is
quite high.
Netwoking concerns:
Loss concealment
Jitter control
Especially for telephony applications.
CS 294-9 :: Fall 2003

Audio Coding: Ketan Mayer-Patel

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Audio Coding: Ketan Mayer-Patel

Uploaded by

Copyright:

Available Formats

Audio Coding

CS 294-9 :: Fall 2003

CS 294-9 :: Fall 2003

CS 294-9 :: Fall 2003

Pulse Amplitude Modulation (PAM)

CS 294-9 :: Fall 2003

Pulse Code Modulation (PCM)

CS 294-9 :: Fall 2003

CS 294-9 :: Fall 2003

Provides 14-bit quality (dynamic range) with an 8-bit

Input Step Segment Quanti- Code

463-479 CS 294-9 :: Fall

m-Law Multiplier Decode

Differences in high frequency signals near the

Use a larger step-size to encode differences

CS 294-9 :: Fall 2003

Predictor is not adaptive and simply uses the last

CS 294-9 :: Fall 2003

Quantizer Step-Size Table Step-Size

X Step Q Adj I M Decode

Quantizer Step-Size Table

Human perception of sound is a function of frequency

The presence of tones at certain frequencies makes

CS 294-9 :: Fall 2003

CS 294-9 :: Fall 2003

CS 294-9 :: Fall 2003

So what is a good model?

Formant frequency maxima & minima in

yadda yadda yadda

CS 294-9 :: Fall 2003

CS 294-9 :: Fall 2003

CS 294-9 :: Fall 2003

You might also like