You are on page 1of 71

Phn 2: Nn Audio s

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

Digital Audio Compression

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

MPEG Audio: Specifications

MPEG-1 (ISO/IEC 11172-3) provides:


Single-channel ('mono') and two-channel ('stereo' or 'dual mono')
coding of digitized sound waves at 32, 44.1, and 48 kHz
sampling rate.
The predefined bit-rates range from 32 to 448 kbit/s for Layer I,
from 32 to 384 kbit/s for Layer II, and from 32 to 320 kbit/s for
Layer III.
MPEG-2 BC (ISO/IEC 13818-3) provides:
A backwards compatible (BC) multi-channel extension to
MPEG-1

9/14/2006

Up to 5 main channels plus a 'low frequent enhancement' (LFE)


channel can be coded
The bit-rate range is extended up to about 1 Mbit/s;

An extension of MPEG-1 towards lower sampling rates 16,


22.05, and 24 kHz for bitrates from 32 to 256 kbit/s (Layer I)
and from 8 to 160 kbit/s (Layer II & Layer III).
Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

MPEG Audio: Specifications (2)

MPEG-2 AAC (ISO/IEC 13818-7) provides

MPEG-4 (ISO/IEC 14496-3) provides

A very high-quality audio coding standard for 1 to 48 channels at sampling


rates of 8 to 96 kHz, with multichannel, multilingual, and multiprogram
capabilities.
AAC works at bitrates from 8 kbit/s for a monophonic speech signal up to in
excess of 160 kbit/s/channel for very-high-quality coding that permits
multiple encode/decode cycles.
Three profiles of AAC provide varying levels of complexity and scalability.
Coding and composition of natural and synthetic audio objects,
Scalability of the bitrate of an audio bitstream,
Scalability of encoder or decoder complexity,
Structured Audio: A universal language for score-driven sound synthesis
TTSI: An interface for text-to-speech conversion systems.

MPEG-7 (ISO/IEC 15938) will provide

9/14/2006

Standardized descriptions and description schemes of audio structures


and sound content,
A language to specify such descriptions and description schemes.

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

Related specifications

MUSICAM

ASPEC

Adaptive Spectral Perceptual Entropy Coding


Designed for high degrees of compression to allow audio
transmission on ISDN

NICAM 728

Masking pattern adapted Universal Sub-band Integrated


Coding And Multiplexing
Designed to be suitable for DAB (Digital Audio Broadcasting)

Used for European PAL television audio

Dolby AC-3

9/14/2006

Design for ATSC Digital TV


Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

Tng quan v nn audio

Nn audio da vo hai hin tng sau:


Th nht, vi tn hiu audio in hnh, khng phi mi
tn s u xut hin ng thi.
Th hai, do hin tng che mt n, thnh gic ca con
ngi khng th nhn bit c mi chi tit ca tn
hiu audio.
C cu nn audio chia ph m thanh thnh cc bng
bng cch lc hoc m ha bin i, v s dng t d
liu hn khi m t cc bng c bin thp.
Khi hin tng che mt n ngn cn hoc lm gim
mc nghe thy ca mt bng c th, th lng d liu
cn gi cn c th gim i na.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

Background of audio compression

Audio compression takes advantage of two facts.

First, in typical audio signals, not all frequencies are


simultaneously present.
Second, because of the phenomenon of masking, human
hearing cannot perceive every detail of an audio signal.

Audio compression splits the audio spectrum into


bands by filtering or transforms, and includes
less data when describing bands in which the level
is low.
Where masking prevents or reduces audibility of
a particular band, even less data needs to be sent.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

Tng quan v nn audio (2)

Nn audio kh hn nn video l do s chnh xc ca thnh gic.


1- S che mt n:

Ch c th che mt n khi m thanh che v m thanh c che l


trng nhau v khng gian.
S trng nhau v khng gian lun tn ti ch thu mono
nhng khng c ch thu stereo
Do , h thng m thanh stereo v m thanh vng,ngi ta chp
nhn h s nn thp t c mt cht lng xc nh.

2- Cht lng loa :

9/14/2006

Cng hng tr ca nhng loa cht lng km che mt n cc


mo dng nhn to.
Kim tra mt b nn bng loa cht lng km s cho kt qu sai,
tn hiu c cht lng chp nhn c s gy tht vng khi nghe
chng bng loa tt.

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

Background of audio compression (2)

Audio compression is relatively harder than video compression


because of the acuity of hearing.
1- Masking:
Masking only works properly when the masking and the masked
sounds coincide spatially.
Spatial coincidence is always the case in mono recordings but
not in stereo recordings, where low-level signals can still be
heard if they are in a different part of the soundstage.
Consequently, in stereo and surround sound systems, a
lower compression factor is allowable for a given quality.
2. Speakers quality:
Delayed resonances in poor loudspeakers actually mask
compression artifacts.
Testing a compressor with poor speakers gives a false result,
signals which are apparently satisfactory may be disappointing
when heard on good equipment.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

c tnh nghe ca con ngi

Hnh pha trn cho thy ngng nghe ca


con ngi l hm ca tn s.
Tt nhin, nhy cao nht nm trong
vng ni.
Hnh di m t ngng nghe khi c s
xut hin ca m n sc.

Ch rng ngng nghe c nng cao


vi cc m tn s cao v c tn s thp
l hin tng che mt n.

Vi ph u vo phc tp, v d nh m
nhc, ngng nghe u tng hu ht cc
tn s.
Kt qu l, m x x ca ct-xt audio
tng t ch c th nghe c khi nhc
im lng.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

10

The characteristics of human hearing

The top figure shows that the threshold


of hearing is a function of frequency.
Naturally, the greatest sensitivity is
in the speech range.
The bottom figure shows the hearing
threshold in the presence of a single
tone.

Note that the threshold is raised for


tones at higher frequency and to some
extent at lower frequency masking
effect.

A complex input spectrum, such as


music, raises the threshold at nearly all
frequencies.
As a result, the hiss from an analog audio
cassette is only audible during quiet
passages in music.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

11

c tnh nghe ca con ngi (2)

m thanh phi c xut hin


t nht 1ms trc khi chng
tr nn c th nghe c.

9/14/2006

Do s p ng chm ny,
hin tng che mt n vn c
th xy ra ngay c khi hai tn
hiu khng hin din ng
thi
Hin tng che mt n trc
v che mt n sau xut hin
khi m che mt n tip tc che
m thanh cc mc thp hn
trc v sau khong thi gian
din ra m che mt n .

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

12

The characteristics of human hearing (2)

A sound must be present for


at least about 1 millisecond
before it becomes audible.

9/14/2006

Because of this slow


response, masking can still
take place even when the two
signals involved are not
simultaneous.
Forward and backward
masking occur when the
masking sound continues to
mask sounds at lower levels
before and after the masking
sound's actual duration.

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

13

S che mt n

S che mt n lm tng ngng nghe, cc


b nn li dng hin tng ny ny bng
cch tng nhiu sn, cho php biu din
m thanh bng t bit hn.
Nhiu sn ch c th tng tn s m ti
c nh hng ca s che mt n.
ti a ho nh hng ca s che mt n,
cn phi chia ph audio ra lm cc bng tn
khc nhau cho php a ra cc lng
nn/gin v nhiu khc nhau trong mi bng.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

14

Masking

Masking raises the threshold of hearing,


compressors take advantage of this effect by
raising the noise floor, which allows the audio
waveform to be expressed with fewer bits.
The noise floor can only be raised at frequencies at
which there is effective masking.
To maximize effective masking, it is necessary to
split the audio spectrum into different frequency
bands to allow introduction of different amounts of
companding and noise in each band.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

15

M hnh b m ho MPEG Audio tng qut

u vo

B lc
bng con

Phn
phi bit

Pht
Lung bit

u ra

Tnh ton
ngng
che mt n

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

16

MPEG Audio: General encoder model


Input

Sub-band
Filter

Bit
Allocation

Bit-stream
Generation

Output

Compute
Masking

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

17

Thut ton m ho MPEG Audio

S dng cc b lc bng con chia tn hiu audio thnh 32 bng tn


con tng ng vi 32 bng quan trng nht.

Tnh ton lng che mt n cho mi bng gy ra bi bng ln cn bng


cch sdng m hnh tm l thnh gic.

Nu nng lng trong mt bng thp hn ngng che mt n , n s b


b qua. Mt khc, xc nh s bt cn thit biu din h s sao cho
nhiu sinh ra do lng t ha thp hn hiung che mt n (1bit tng
ng 6dB).

Pht ra lung bit.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

18

MPEG Audio encoding algorithm

Use sub-band filters to divide the audio signal into


32 frequency sub-bands that approximate the 32
critical bands.
Determine amount of masking for each band caused
by nearby band using the psychoacoustic model.
If the power in a band is below the masking
threshold, ignore it. Otherwise, determine number
of bits needed to represent the coefficient such that
noise introduced by quantization is below the
masking effect (1 bit 6 dB).
Generate bitstream

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

19

MPEG Audio: V d m ha
Bng

1 2

10

11

12

13

14

15

16

Mc (db)

0 8

12

10

10

60

35

20

15

Sau khi phn tch, mc ca 16 bng u tin trong 32


bng
Mc ca bng th8 l 60dB. Nu cho mc che mt n
12 dB bng 7, 15dB bng 9, th:

Mc bng 7 l 10 dB ( < 12 dB ), b qua.


Mc bng 9 l 35 dB ( > 15 dB), m ho.

9/14/2006

C th m ho ln ti 2bit (=12db) ca sai s lng t.

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

20

MPEG Audio: Coding example


Band

1 2

10

11

12

13

14

15

16

Level (db)

0 8

12

10

10

60

35

20

15

After analysis, the levels of the first 16 of the 32 bands are:


The level of the 8th band is 60dB. If it gives a masking of 12
dB in the 7th band, 15dB in the 9th, then

Level in 7th band is 10 dB ( < 12 dB ), ignore it.


Level in 9th band is 35 dB ( > 15 dB ), encode it.

9/14/2006

Can encode with up to 2 bits (= 12 dB) of quantization error.

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

21

M ho bng con _ Sub-band coding (SBC) - Nn gin

Hnh v bn m t mt b nn gin chia bng.


B lc chia bng l mt tp hp cc bng hp,
pha tuyn tnh, chng gi ln nhau v c cng
mt di thng.
u ra mi bng gm cc mu c trng cho mt
dng sng.
Trong mi bng tn, u vo audio c khuch
i ln mc cao nht trc khi truyn i.
Sau , mi mc s c quay tr v gi tr
chnh xc ca n
Nhiu trn ng truyn s c gim trong mi
bng.
Nu so snh s gim nhiu vi ngng nghe ta
thy cc bng c th chp nhn mt lng nhiu
ln hn nh hin tng che mt n.
Do , trong mi bng sau khi nn-gin, c th
gim di t ca cc mu.
K thut ny t c mt t s nn v nhiu gy
ra do gim phn gii c che mt n.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

22

Sub-band coding (SBC) - Companding

The Figure shows a band-splitting compandor.


The band-splitting filter is a set of narrow-band,
linear-phase filters that overlap and all have the
same bandwidth.
The output in each band consists of samples
representing a waveform.
In each frequency band, the audio input is
amplified up to maximum level prior to
transmission.
Afterwards, each level is returned to its correct
value.
Noise picked up in the transmission is reduced in
each band.
If the noise reduction is compared with the
threshold of hearing, it can be seen that greater
noise can be tolerated in some bands because
of masking.
Consequently, in each band after companding,
it is possible to reduce the wordlength of
samples.
This technique achieves a compression
because the noise introduced by the loss of
resolution is masked.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

23

MPEG Audio Lp I

Hnh v trn m t b m ho chia bng dng trong MPEG lp I c n gin


ho.
u vo audio s c a vo b lc chia bng, b ny chia ph ca tn hiu ra
lm cc bng (32 bng).
Trc thi gian oc chia thnh cc khi c chiu di bng nhau.
Trong MPEG Lp I, c 384 mu u vo, do trong u vo ca b lc c 12 mu
trong mi bng ca lot 32 bng ny.
Trong mi bng, tn hiu c khuch i ln ti a nh mt php nhn
H s khuch i yu cu l khng i trong khong thi gian ko di mt block.
Mt h s t l c truyn cng vi mi block ca mi bng cho php qu trnh
c o ngc li bn gii m.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

24

MPEG Audio Layer I

The figure shows a simplified bandsplitting coder used in MPEG Layer I.


The digital audio input is fed to a bandsplitting filter that divides the spectrum of
the signal into a number of bands (32 bands).
The time axis is divided into blocks of equal length.
In MPEG Layer I, this is 384 input samples, so in the output of the filter there are
12 samples in each of 32 bands.
Within each band, the level is amplified by multiplication to bring the level up
to maximum.
The gain required is constant for the duration of a block
A single scale factor is transmitted with each block for each band in order
to allow the process to be reversed at the decoder.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

25

MPEG Audio Lp I (tip)

u ra ca cc b lc cng c phn tch xc nh ph ca


tn hiu u vo.

S phn tch ny iu khin m hnh che mt n xc nh


mc che mt n trong mi bng.

Kh nng thc hin che mt n cng ln th chnh xc ca


cc mu trong mi bng cng c th nh.

chnh xc mu b gim i bng cch lng t ho li gim


di t.

S gim ny l khng i vi mi t trong mt bng, nhng cc


bng khc nhau c th s dng di t khc nhau

di t cn c truyn di dng m phn phi bit cho mi


bng cho php b gii m gii m dng bit ng.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

26

MPEG Audio Layer I (cont.)

The filter bank output is also analyzed to determine the


spectrum of the input signal.
This analysis drives a masking model that determines the
degree of masking that can be expected in each band.
The more masking available, the less accurate the samples in
each band can be.
The sample accuracy is reduced by requantizing to reduce
wordlength.
This reduction is also constant for every word in a band, but
different bands can use different wordlengths.
The wordlength needs to be transmitted as a bit allocation
code for each band to allow the decoder to deserialize the bit
stream properly.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

27

MPEG dng bit audio mc 1

Hnh trn m t dng bit audio MPEG mc 1, n bao gm:

Mu ng b v phn mo u.

32 t m phn phi bit, mi t 4 bit.

32 h s t l s dng trong vic nn-gin mi bng.

9/14/2006

Nhng m ny m t di t ca cc mu trongmi bng con.


Cc h s t l ny xc nh khuch i cn thit trong b gii m
a audio v mc chnh xc.

D liu audio trong mi bng.

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

28

MPEG Level 1 audio bit stream

The top Figure shows an MPEG Level 1 audio bit stream,


which includes:

Synchronizing pattern and the header,


32 Bit allocation codes of four bits each.

32 scale factors used in the companding of each band.

9/14/2006

These codes describe the wordlength of samples in each


subband.
These scale factors determine the gain needed in the decoder to
return the audio to the correct level.

Audio data in each band.


Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

29

MPEG b gii m lp I

Tn hiu ng b c pht hin bi b nh thi, n cng tch cc d


liu phn phi bit v d liu h s t l.
D liu phn phi bit sau c dng tch ra cc mu d liu
(sample) c chiu di bin i.
Qu trnh lng t ha li v qu trnh nn c o ngc li bng
vic s dng d liu h s t l a mi bng tng ng quay li
mc chnh xc ban u.
32 bng tch bit ny sau c ghp li vi nhau bng mt b lc
ghp sinh ra audio u ra.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

30

MPEG Layer I decoder

The synchronization pattern is detected by the timing


generator, which deserializes the bit allocation and scale
factor data.
The bit allocation data then allows deserialization of the
variable length samples.
The requantizing is reversed and the compression is
reversed by the scale factor data to put each band back to
the correct level.
These 32 separate bands are then combined in a combiner
filter which produces the audio output.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

31

MPEG Audio: khi nim lp

H s nn

(Tc bit gc l 1,4 Mbps tng ng vi cht lng audio CD)

1:4

Lp 1 (tng ng 384 kbps cho tn hiu stereo).

1:6...1:8

Lp 2 (tng ng 256..192 kbps cho tn hiu stereo).

1:10...1:12

Lp 3 (tng ng 128..112 kbps cho tn hiu stereo).

3 lp trong MPEG audio: Lp I, II, III

9/14/2006

M hnh c bn ging nhau.


phc tp ca CODEC tng theo lp.
B m ho lp cao hn c th gii m lung ca lp thp hn ( v d,
b gii m lp III c th gii m lung lp II, v..v )
M hnh tm l thnh gic con ngi c dng xc nh m
phn phi bit cho mi bng con.

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

32

MPEG Audio: The concept of Layers


Compression Ratios (Original bitrate is 1,4 Mbps of CD quality audio)
1:4

by Layer 1 (corresponds to 384 kbps for a stereo signal),

1:6...1:8

by Layer 2 (corresponds to 256..192 kbps for a stereo signal),

1:10...1:12

by Layer 3 (corresponds to 128..112 kbps for a stereo signal),

Three layers in MPEG audio: Layer I, II, III

9/14/2006

Basic model is similar.


CODEC complexity increases with each layer.
Encoder of higher layer can decode stream of lower layer (e.g.
Layer III decoder can decode Layer II stream, etc)
Psychoacoustic model is used to determine bit allocation to each
subband.

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

33

MPEG Audio: Loi blc

Lp I: b lc DCT vi mt khung v tn s tri u trong mi bng.

Lp II: S dng 3 khung trong mt b lc (tng cng 1152 mu).

M hnh tm l thnh gic ch dng hin tng mt n tn s.

M hnh tm l thnh gic s dng mt cht mt n thi gian.

Lp III: S dng b lc bng tt hn (cc tn s khng bng nhau).

M hnh tm l thnh gic dng c hiung mt n thi gian.

C li dng d tha stereo.

S dng b m ho Huffman.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

34

MPEG Audio: Filter type

Layer I: DCT type filter with one frame and equal


frequency spread per band

Layer II: Use three frames in filter (total 1152


samples)

Psychoacoustic model only uses frequency masking.

Psychoacoustic models a little bit of the temporal masking.

Layer III: Better critical band filter is used (nonequal frequencies)

9/14/2006

Psychoacoustic model includes temporal masking effects.


Takes into account stereo redundancy.
Uses Huffman coder.
Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

35

B m ho Audio MPEG-1 (Layer I & II)


SF = Scale factor
R = Rate
SMR = Signal to Mask Ratio

32 subbands
0 to 31

Scaler

Quantizer

SMRn
Psychoacoustic
model

9/14/2006

Scale
factor
encoder

Bit-rate
allocation

Rn

Quantized
sample
encoder

Multiplexer

PCM
input

Analysis
filter bank

SFn

Output

Bit-rate
allocation
encoder

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

36

MPEG-1 Audio Encoder (Layer I & II)


SF = Scale factor
R = Rate
SMR = Signal to Mask Ratio

32 subbands
0 to 31

Scaler

Quantizer

SMRn
Psychoacoustic
model

9/14/2006

Scale
factor
encoder

Bit-rate
allocation

Rn

Quantized
sample
encoder

Multiplexer

PCM
input

Analysis
filter bank

SFn

Output

Bit-rate
allocation
encoder

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

37

MPEG-1 b m ho Audio (tip)

Lung audio u vo chy qua mt bng lc chia u vo thnh


nhiu bng con.
ng thi lung audio u vo i qua m hnh tm l thnh gic
xc nh t s ca nng lng tn hiu vi mc che mt n
cho mi bng con.
Khi phn phi bit s dng h s tn hiu trn mt n quyt
nh vic phn chia tng s bit c dng cho qu trnh lng t
ho tn hiu bng con gim thiu ti a kh nng nghe thy
nhiu lng t ho.
Cui cng, b ghp knh ghp cc mu bng con c
lng t ha v nh dng cc d liu ny cng vi thng tin
ph thnh dng bit m ho.
Cc d liu ph thuc ty c th c chn vo trong lung bit
m ho.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

38

MPEG-1 Audio Encoder (cont.)

The input audio stream passes through a filter bank that divides
the input into multiple subbands of frequency.
The input audio stream simultaneously passes through a
psychoacoustic model that determines the ratio of the signal
energy to the masking threshold for each subband.
The bit- or noise allocation block uses the Signal-to-Mask
Ratios to decide how to apportion the total number of code
bits available for the quantization of the subband signals to
minimize the audibility of the quantization noise.
Finally, the multiplexer takes the representation of the quantized
subband samples and formats this data and side information into
a coded bitstream.
Ancillary data not necessarily related to the audio stream can
be inserted within the coded bitstream.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

39

MPEG Audio: ghp cc mu bng con


Subband
filter 1
Audio
samples
in

Subband
filter 2
.
.
Subband
filter 32

12
samples

12
samples

12
samples

12
samples

12
samples

12
samples

12
samples

12
samples

12
samples

Layer I
frame

9/14/2006

Layer II, III


frame

Lp I: 12 * 32 = 384 mu.
Lp II, III: 12* 3* 32 = 1152 mu.
Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

40

MPEG Audio: Subband sample grouping


Subband
filter 1
Audio
samples
in

Subband
filter 2
.
.
Subband
filter 32

12
samples

12
samples

12
samples

12
samples

12
samples

12
samples

12
samples

12
samples

12
samples

Layer I
frame

9/14/2006

Layer II, III


frame

Layer I: 12 * 32 = 384 samples,


Layer II, III: 12* 3* 32 = 1152 samples
Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

41

M hnh tm l thnh gic: Lp I & II


512 or 1024
frequencies

PCM
input

Compute
quiet
threshold

Fast
Fourier
Transform
(FFT)

Tonal/
tonal
nontonal
separator

non
tonal
Compute

signal
power

Sn

Compute
tonal
masking
threshold
function
Compute
nontonal
masking
threshold
function

Masking
threshold
function

Calculate
Minimum

Mn
SMRn

B tch nhn dng v phn tch cc thnh phn m thanh dng sine
v cc m dng khng sine (ging nhiu) v kh nng che mt n ca
hai loi tn hiu ny khc nhau.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

42

Psychoacoustic model: Layer I & II


512 or 1024
frequencies

PCM
input

Compute
quiet
threshold

Fast
Fourier
Transform
(FFT)

Tonal/
tonal
nontonal
separator

non
tonal
Compute

signal
power

Sn

Compute
tonal
masking
threshold
function
Compute
nontonal
masking
threshold
function

Masking
threshold
function

Calculate
Minimum

Mn
SMRn

The separator identifies and separates the tonal and noiselike components (non-tonal) of the audio signal because the
masking abilities of the two types of signal differ.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

43

MPEG-1 bm ho Audio Lp III (mp3)


Sub-subbands Scale_factors
32 subbands
0 to 31
MDCT

Scaler

Quantizer

SMRn

Psychoacoustic
model

Calculate windows sizes,


Scale factor bands,
Bit rate allocation
and quantization taking
buffer fullness into account

Buffer
fullness

Multiplexer

PCM
input

Quantized
sample
Huffman
encoder

Buffer

Analysis
filter bank

Scale
factor
encoder

Output

Side
information
encoder
Side
information

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

44

MPEG-1 Audio Layer III Encoder (mp3)


Sub-subbands Scale_factors
32 subbands
0 to 31
MDCT

Scaler

Quantizer

SMRn

Psychoacoustic
model

Calculate windows sizes,


Scale factor bands,
Bit rate allocation
and quantization taking
buffer fullness into account

Buffer
fullness

Multiplexer

PCM
input

Quantized
sample
Huffman
encoder

Buffer

Analysis
filter bank

Scale
factor
encoder

Output

Side
information
encoder
Side
information

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

45

Dng khung ca 3 lp
Layer I

Layer II

Layer III

Header

CRC

Bit Allocation

Scale factor

Samples

Ancillary

(32)

(0,16)

(128,256)

(0-384)

Header

CRC

Bit Allocation

SCFSI

Scale factor

(32)

(0,16)

(128,256)

(0-60)

(0-384)

Header

CRC

Side information

Main Data

Ancillary

(32)

(0,16)

(136, 256)

(may belong to other frames)

data

data
Samples

Ancillary
data

SCFSI = Scale Factor Selection Information _ thng tin la chn h s


t l.
Thng tin ph ca khung mp3 = 17bytes (136 bit) trong ch knh
n v 32 bytes (256 bits) trong ch knh i.
CRC l ty chn.
Trong khi lp I ch cha 384 mu th lp II v lp III cha 1152 mu.
D liu chnh ca mp3 c th cha d liu ca cc khung hng xm
(xem cc slide sau)

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

46

Frame formats of 3 layers


Layer I

Layer II

Layer III

Header

CRC

Bit Allocation

Scale factor

Samples

Ancillary

(32)

(0,16)

(128,256)

(0-384)

Header

CRC

Bit Allocation

SCFSI

Scale factor

(32)

(0,16)

(128,256)

(0-60)

(0-384)

Header

CRC

Side information

Main Data

Ancillary

(32)

(0,16)

(136, 256)

(may belong to other frames)

data

data
Samples

Ancillary
data

SCFSI = Scale Factor Selection Information


Side Information of mp3 frame = 17 bytes (136 bits) in single
channel mode and 32 bytes (256 bits) in dual channel mode.
CRC is optional
While Layer I contains only 384 samples, Layer II and Layer III
contains 1152 samples
Main data of mp3 may contain data of neighbor frames (See next
slide)

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

47

Khung mp3

Phn d liu chnh cha gi tr h s t l m ho v cc d liu c


m ho Huffman.

Chiu di ca n ph thuc vo tc bit v chiu di ca d liu ph thuc.

Chiu di ca phn h s t l ph thuc vo vic cc h s t l c c s


dng li hay khng, v cng ph thuc vo chiu di ca s (di hay ngn).

H s t l c dng trong vic lng t ho li cc mu.

Do tnh cht ca m Huffman nn tc bit thay i theo thi gian trong


sut qu trnh m ho.

C th dng nh dng VBR (tc bit thay i) kim sot vn ny,


nhng cc ng dng nh truyn thng qung b th thng yu cu mt
tc bit c nh

9/14/2006

Do ngi ra a ra mt k thut gi l d tr bit cho php s dng


khong khng gian lu tr cc d liu cha s dng n ca mt khung cho mt
hoc hai khung tip theo.

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

48

MP3 frame

The main data section contains the coded scale factor values
and the Huffman coded frequency lines
Its length depends on the bitrate and the length of the ancillary
data.
The length of the scale factor part depends on whether scale
factors are reused, and also on the window length (short or long).
The scale factors are used in the requantization of the
samples
The demand for Huffman code bits varies with time during the
coding process.
The variable bitrate format can be used to handle this, but a fixed
bitrate is often required for an application such as broadcasting
Therefore there is also a bit reservoir technique that allows
unused main data storage in one frame to be used by up to
two consecutive frames

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

49

Khung mp3 - d tr bit

Thit k ca dng bit lp III ph hp hn vi nhu cu thay i theo thi gian


ca b m ho
Ging vi lp II, lp III x l d liu audio trong cc khung c 1,152 mu.
Khc vi lp II, d liu c m ho th hin cc mu ny khng cn thit
phi va kht vi khung c chiu di c nh trong dng bit m ho.
B m ha c th cho cc bit vo mt kho d tr khi n cn s bit t hn s bit
trung bnh m ho mt khung.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

50

MP3 frame Bit Reservoir

The design of the Layer III bitstream better fits the encoder's time
varying demand on code bits.
As with Layer II, Layer III processes the audio data in frames of
1,152 samples.
Unlike Layer II, the coded data representing these samples do not
necessarily fit into a fixed length frame in the code bitstream.
The encoder can donate bits to a reservoir when it needs fewer
than the average number of bits to code a frame.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

51

Khung mp3 - d tr bit (2)

Sau , khi b m ho cn nhiu hn s bit


trung bnh ny, n c th mn cc bit trong
kho d tr.
B m ho ch c th mn cc bit cho
trong cc khung qu kh, n khng th
mn cc bit cho t khung tng lai.
Dng bit MP3 bao gm 9-bit con tr, bt u
d liu chnh, ch ra a ch byte khi u
ca d liu audio cho khung .

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

52

MP3 frame Bit Reservoir (2)

Later, when the encoder needs more than


the average number of bits to code a frame,
it can borrow bits from the reservoir.
The encoder can only borrow bits donated
from past frames; it cannot borrow from
future frames.
MP3 bitstream includes a 9-bit pointer,
"main_data_begin," with each frame's side
information pointing to the location of the
starting byte of the audio data for that frame.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

53

MP3: phn tch tn s lai

Mc ch
Tng phn gii tn strong cc bng con c m
ho nhn thc tt hn.
Cho php gim bt nhiu rng ca gy ra bi cc b lc
bng con.
MDCT (Modified Discrete Cosine Transform) - Bin i
cosin ri rc ci tin.
50% bin i gi nhau
Ca s MDCT ngn : 6 bng con ph (12 im DCT)
trong mi bng con. Phn gii thi gian tt hn.
Ca s MDCT di: 18 bng con ph(36 im DCT)
trong mi bng con. Phn gii tn s tt hn.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

54

MP3: Hybrid frequency analysis

Purpose

Increase the frequency resolution in subbands for better


perceptural coding.
Allow for some cancelation of aliasing caused by
polyphase analysis subband filters.

MDCT (Modified Discrete Cosine Transform)

9/14/2006

50% overlapped transform


Short-window MDCT: 6 sub-subbands (12 point DCT) in
each subband. Better time resolution.
Long window MDCT: 18 sub-subbands (36 point DCT) in
each subband. Better frequency resolution.

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

55

B gii m mp3

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

56

MP3 Decoder

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

57

c tnh MP3
Sound quality

Bandwidth

Mode

Bitrate

Reduction ratio

Telephone sound

2.5 kHz

mono

8 kbps *

96:1

Short wave

4.5 kHz

mono

16 kbps

48:1

AM radio

7.5 kHz

mono

32 kbps

24:1

FM radio

11 kHz

stereo

56...64 kbps

26...24:1

Near-CD

15 kHz

stereo

96 kbps

16:1

CD

>15 kHz

stereo

112..128kbps

14..12:1

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

58

MP3 Performance
Sound quality

Bandwidth

Mode

Bitrate

Reduction ratio

Telephone sound

2.5 kHz

mono

8 kbps *

96:1

Short wave

4.5 kHz

mono

16 kbps

48:1

AM radio

7.5 kHz

mono

32 kbps

24:1

FM radio

11 kHz

stereo

56...64 kbps

26...24:1

Near-CD

15 kHz

stereo

96 kbps

16:1

CD

>15 kHz

stereo

112..128kbps

14..12:1

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

59

MPEG-2 Audio

S khc nhau gia MPEG-1 v MPEG-2 audio


i vi stereo 2 knh.

9/14/2006

Tc ly mu PCM m rng ti c cc tn s 16, 22.05,


24 kHz.
Tc bit c m rng ti c mc thp n 8 kbits/s.
Bng lng t ho tt hn cc tc thp hn.
Ci thin hiu qu m ho ca h s t l (scale_factor)
v ch cng (intensity_mode_stereo) lp III.

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

60

MPEG-2 Audio

Difference between MPEG-1 and MPEG-2


audio for two-channel stereo

9/14/2006

Initial PCM sampling rate extends to include 16,


22.05, 24 kHz.
Pre-assigned bitrates are extended to as low as 8
kbits/s.
Provide better quantization tables for lower rates.
Improve the coding efficiency of the coding of
scale_factor and intensity_mode stereo in Layer
III.
Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

61

MPEG-2 Audio: tng thch ngc

nh ngha m thanh vng 5 knh

B gii m MPEG-1 c th gii m tn


hiu L v R.
Phng php m ho

Tri trc (L), phi trc (R), trung tm


trc (C), cnh/sau tri (LS), cnh/sau
phi (RS), v (ty chn) loa siu trm
(low-frequency enhancement _ LFE)

Knh L v R c m ho nh MPEG1.
Cc knh b sung c m ho nh d
liu ph thuc trong lung audio MPEG-1.

3/2 stereo: L, R, C, LS, RS.


Stereo 5.1: L, R, C, LS, RS, LFE

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

62

MPEG-2 Audio: Backward Compatible (BC)

Define a five-channel surround


sound

MPEG-1 decoder can decode the L


and R signal.
Coding method:

Front left (L), front right (R), front


center (C), side/rear left (LS),
side/rear right (RS), and (optional)
low-frequency enhancement (LFE)

L and R channels are coded as


MPEG-1 does.
Additional channels are coded as
ancillary data in the MPEG-1 audio
stream.

3/2 stereo: L, R, C, LS, RS


5.1 channel stereo: L, R, C, LS, RS,
LFE

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

63

Khung MPEG-2 Audio


Header

CRC

Bit Allocation

SCFSI

Scale factor

Samples

MC

MC

MC

MC

MC

Header

CRC

Bit Allocation

SCFSI

Predictor

Ancillary data 1

MC Samples

Ancillary data 2

Multi-lingual
Commentary

Multi-Channel (MC) audio data information

Theo hnh v, khung MPEG-2 audio l dng m


rng ca khung MPEG-1, vi h tr a khung v a
ngn ng.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

64

MPEG-2 Audio frame


Header

CRC

Bit Allocation

SCFSI

Scale factor

Samples

MC

MC

MC

MC

MC

Header

CRC

Bit Allocation

SCFSI

Predictor

Ancillary data 1

MC Samples

Ancillary data 2

Multi-lingual
Commentary

Multi-Channel (MC) audio data information

As can be seen on the Figure, MPEG-2 Audio frame


is an extension of MPEG-1 frame, which supports
multi-channel and multi-lingual.

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

65

S tng thch gia MPEG-2 BC and MPEG-1


MPEG-2
MPEG-2
MPEG-1
MPEG-1
Mono & Stereo
32, 44.1, 48 Khz
Layer
LayerI I

Layer
LayerIIII

Low
Low
Frequency
Frequency

Layer
LayerIIIIII

MultiMultiChannel
Channel

Mono & Stereo


18, 22.05, 24 Khz
Layer
LayerI I

9/14/2006

Layer
LayerIIII

5 channels
32, 44.1, 48 Khz
Layer
LayerIIIIII

Layer
LayerI I

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

Layer
LayerIIII

Layer
LayerIIIIII

66

MPEG-2 BC and MPEG-1 compatibility


MPEG-2
MPEG-2
MPEG-1
MPEG-1
Mono & Stereo
32, 44.1, 48 Khz
Layer
LayerI I

Layer
LayerIIII

Low
Low
Frequency
Frequency

Layer
LayerIIIIII

MultiMultiChannel
Channel

Mono & Stereo


18, 22.05, 24 Khz
Layer
LayerI I

9/14/2006

Layer
LayerIIII

5 channels
32, 44.1, 48 Khz
Layer
LayerIIIIII

Layer
LayerI I

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

Layer
LayerIIII

Layer
LayerIIIIII

67

MPEG-2 Audio: M ho audio tin tin (AAC)

nng cao cht lng nn audio s dng cc


k thut mi nht
Cn c gi l MPEG-2 NBC (Non Backward
Compatible)
Tc ly mu PCM: 8 kHz n 96 kHz.
H trt mono ln n 48 knh audio

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

68

MPEG-2 Audio: Advanced Audio Coding (AAC)

To further improve the quality of compressed


audio using state-of-the-art technologies.
It was designated as MPEG-2 NBC (Non
Backward Compatible)
Initial PCM sampling rate: 8 kHz to 96 kHz.
Support from mono up to 48 audio channels

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

69

Nhng im quan trng

Cc chun MPEG Audio


C ch MPEG Audio
Thnh gic ca con ngi & hin tng mt n
C ch m ho bng con SBC
M hnh tm l thnh gic ca con ngi
Khi nim cc lp
MPEG-1 M ho/gii m audio
Lp I Lp II Lp III
S khc nhau
MPEG-2 Audio BC
MPEG-2 AAC (NBC)

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

70

Key Points

MPEG Audio Specifications


MPEG Audio mechanism

MPEG-1 Audio encoding/decoding

Human hearing & Audio masking


Sub-band coding (SBC) mechanism
Psychoacoustic model
The concept of layers
Layer I Layer II Layer III
Differences

MPEG-2 Audio BC
MPEG-2 AAC (NBC)

9/14/2006

Nguyen Chan Hung - Faculty of Electronics & Telecommunications - HUT

71

You might also like