Professional Documents
Culture Documents
Overview
Digital Sound Psychoacoustics Time to Frequency Domain Transformation MPEG/Audio Basic Algorithm Related Work Web references
2 of 17
Format to use?
.au, aiff, .wav, and of course .mp3
4 of 17
Psychoacoustics
Principles of the human perception of sound MPEG compression algorithm uses model of human hearing to remove data (perceptual coding algorithm) Frequency range is about 20 Hz to 20 kHz, most sensitive at 2 to 4 KHz. Dynamic range (quietest to loudest) is about 96 dB Normal voice range is about 500 Hz to 2 kHz Low frequencies -> vowels, bass; High -> consonants
CPSC 538a MPEG Audio Tutorial January 12, 2004 5 of 17
6 of 17
7 of 17
Frequency Masking
8 of 17
Temporal Masking
If we hear a loud sound, then it stops, it takes a little while until we can hear a soft tone nearby. Experiment: Play 1 kHz masking tone at 60 dB, plus a test tone at 1.1 kHz at 40 dB. Test tone can't be heard (it's masked). Stop masking tone, then stop test tone after a short delay. Adjust delay time to the shortest time when test tone can be heard (e.g., 5 ms). Repeat with different level of the test tone and plot:
9 of 17
Combination
10 of 17
Transforming time/level input signals to frequency/power FFT (here) most popular fast and easy, and in most numerical methods texts. Used by psychoacoustic model. DCT often used for spatial frequency since represents linear signals better. Something similar used by filter bank. Wavelets use non-sine/cosine functions for better performance on data with sharp discontinuities. Demo http://www.jhu.edu/~signals/fourier2/index.html
CPSC 538a MPEG Audio Tutorial January 12, 2004 11 of 17
MPEG Basics
Bitstream formatting
Encoded bitstream
Psychoacoustic model
12 of 17
Algorithm overview
1. Use convolution filters to divide the audio signal (e.g., 48 kHz sound) into 32 frequency subbands --> subband filtering. 512 sample FIFO buffer used. Determine amount of masking for each band caused by nearby band using the psychoacoustic model shown above. If the power in a band is below the masking threshold, don't encode it. Otherwise, determine number of bits needed to represent the coefficient such that noise introduced by quantization is below the masking effect (Recall that one fewer bit of quantization introduces about 6 dB of noise). Format bitstream
13 of 17
2.
3. 4.
5.
Example
After analysis, the first levels of 16 of the 32 bands are:
Band 1 2 3 4 5 6
2
10 11 12 13 14 15 16
2 3 5 3 1
Level(db)0
8 12 10 6
10 60 35 20 15
If the level of the 8th band is 60dB, it gives a masking of 12 dB in the 7th band, 15dB in the 9th. Level in 7th band is 10 dB ( < 12 dB ), so ignore it. Level in 9th band is 35 dB ( > 15 dB ), so send it.
Only the amount above the masking level needs to be sent, so instead of using 6 bits to encode it, we can use 4 bits saving 2 bits (= 12 dB).
CPSC 538a MPEG Audio Tutorial January 12, 2004 14 of 17
MPEG layers
Layer 1
DCT-type filter with one frame equal frequency spread per band Psychoacoustic model only uses frequency masking.
Layer 2
Use three frames in filter before, current, next, a total of 1152 samples models a bit of temporal masking
Layer 3 (mp3)
Better critical band filter is used (non-equal frequencies) psychoacoustic model includes temporal masking effects takes into account stereo redundancy uses Huffman coder
15 of 17
Related Work
MPEG phase 2
Multichannel (5.1) audio support Significant in driving DVD sales
16 of 17
References
SFU CMPT 365 Course Contents Spring 2003 Basics of Digital Audio, http://www.cs.sfu.ca/CC/365/mark/material/notes/Chap3/Chap3.1/Chap3.1.html, retrieved January 7, 2004 Audio Compression, http://www.cs.sfu.ca/CC/365/mark/material/notes/Chap4/Chap4.4/Chap4.4.html, retrieved January 7, 2004 Audio and Multimedia Layer 3, http://www.iis.fraunhofer.de/amm/techinf/layer3/index.html, January 8, 2004 MP3 Backgrounder http://www.audioactive.com/intro/papers/backbone.html, January 8, 2004 Scheirer, E. D., The MPEG-4 Structured Audio, Proceedings of ICASSP98 Scheirer, E.D., SAOL / MPEG-4 Structured Audio homepage, http://web.media.mit.edu/~eds/mpeg4-old/ PERKOWSKI, M. A., Speech Signals in Time and Frequency Domain http://www.ee.pdx.edu/~mperkows/CLASS_480/transmit1/A003.time-andfrequency-domain.pdf, January 11, 2004 Graps, A. An Introduction to Wavelets" IEEE Computational Sciences and Engineering, Volume 2, Number 2, Summer 1995, pp 50-61. Also available at http://www.amara.com/IEEEwave/IEEEwavelet.html Signals Demonstrations http://www.jhu.edu/~signals/index.html
17 of 17