You are on page 1of 3

BASIC OF AUDIO

Applications of Sound and Audio

Sound (and its derivatives; speech, music, etc., generally referred to, if audible to
humans, as audio) has a significant part to play in multimedia applications.
From interacting through a multi-modal user interface (e.g., Surfing the Web by
voice) and text-to-speech systems (Apple Speech Technologies), to software agents
capable of expressing themselves in natural language (e.g., VirtualFriend); Internet-
based radio and TV (e.g, RealAudio, and Internet Radio and TV sites); video-
conferencing (e.g., Cu-SeeMe) and Internet telephony (e.g., VocalTec
Communications); generating computer music, sounds for games, and computer-
controlled musical instruments (e.g., MIDI); to personalised elevator music and
refrigerators that hum along to your mood, audio is essential.
This lecture presents general properties of sound, and how to convert it into a bit
stream that can be manipulated by a computer (digitization). Finally we give an
overview of speech recognition and synthesis

Properties of Sound

Sound is created by the vibration of matter and manifests itself when the pressure
waves in the air created by the vibration reach an acoustic device (such as an ear, tape
recorder, microphone, loud speaker, etc.) capable of converting the pressure waves.
[Philosophical issues... the world is completely silent, sounds are only "inside our
heads". If a tree falls in a forest, and there is nothing to hear it, does it make a sound?
In space (a vacuum), nobody hears you scream.]
These vibrations displace the air, and the alterations in pressure propogate through the
air in a wave-like motion (a waveform (see Figure below).
The shape of the waveform that repreats at regular intervals is called a period, and
sounds musical (e.g., a bird singing). A waveform that is not periodic sounds like
noise (e.g., me singing!).
The frequency of a sound is the number of periods in a second and is measured
in hertz (Hz). 1000 Hz = 1 kiloHertz (kHz).
Audible (to humans) frequency occurs in the 20Hz to 20kHz range. Other frequency
ranges are:

Infra-sound 0 - 20Hz
Ultrasound 20kHz - 1GHz
Hypersound 1GHz - 10THz

The amplitude of a sound is a property subjectively heard as loudness.

Digitizing Audio

Natural sound occurs as continuous, and hence, analog, pressure waves. In order to
covert these pressure waves into a representation a computer can manipulate, it is
necessary to digitize them.
An Analog-to-Digital Convervter (ADC) measures the amplitude of pressure waves at
regular time intervals (called samples) to generate a digital representation of the
sound. The reverse conversion, to play digital sound through an analog device (such
as speakers) is performed by a Digital-to-Analog Converter (DAC).
The number of samples taken per second is called the sampling rate. CD quality
sound is sampled at 44,100 Hz, which means that it is sampled 44,100 times per
second. This appears to be well above the frequency range of the human ear.
However, the Nyquist sampling theorem states that "For lossless digitization, the
sampling rate should be at least twice the maximum frequency responses." The human
ear can hear sound in the range 20Hz to 20KHz, and the bandwidth (19980Hz) is
slightly less than half the CD standard sampling rate. Following the Nyquist theorem,
this means that CD quality sound can represent frequencies only up to 22,050Hz,
which is much closer to that of human hearing.
Just as the waveform is sampled at discrete times, the value of the sample taken is also
represented as a discrete value. The resolution or quantization of a sample value is
dependent on the number of bits used to represent the amplitude. The greater the
number of bits used, the better the resolution, but the more storage space is required.
Typically, amplitude is sampled as either 8-bit (resulting in 256 possible sample
values) or 16-bit (yielding 65536 values).

You might also like