You are on page 1of 3

by

Audio Quality and Robert Orban


VP Chief Engineer

Greg Ogonowski

Netcasting VP Product Development

Orban/CRL

a fter a five-year hiatus, netcasting is once


again making headlines in business
publications. Thanks in part to streaming-
autos, the acoustic dynamic range is severely
limited by wind and road noise. In most
apartments and multi-family dwellings,
shown that a combination of multiband
compression and sophisticated peak limiting
is the most effective way to do this.
enabled 3GPP devices, some analysts see the available dynamic range is limited • The final function is to help improve the
netcasting as supplanting broadcasting, by the need to avoid disturbing family intelligibility of substandard program
because netcasting offers a virtually limitless and neighbors with excessive sound levels. material, particularly news actualities
number of programming formats. In public spaces like buses, subways, and and incoming telephone calls. Properly
Meanwhile, the local FM broadcasters in a airports, there is a wide variety of acoustic designed multiband compression like that
given market, out of economic necessity, noise sources. There are relatively few used in Optimods can make startling
have to broadcast mass-appeal formats that environments where the full, uncompressed improvements in this material without need
don’t fully satisfy anyone. dynamic range of the original program for preprocessing in a production studio.
Netcasters have much to learn from material is useable or desirable.
broadcasters, however, about what is • Second is to ensure a consistent presentation. Preprocessing each program element
commonly called “broadcast-quality” audio. In radio, program material from different before it is stored on a playout system is not
Without such understanding, netcasters cannot producers is constantly juxtaposed. Yet most as effective as preprocessing the mixed
fully compete with traditional broadcast successful broadcasters agree that achieving audio on the program line immediately
media. The topic of broadcast-quality audio a “major market” sonic image requires an before it is streamed. The latter technique
can be divided into two main technology overall consistency of sound texture and maximizes the smoothness of transition
threads: audio processing and codecs.

Audio Processing
Broadcasters have been accustomed to
processing audio for AM and FM transmission
Experience has shown
with transmission audio processors like that a combination of multiband compression and
Orban’s Optimod series. These processors
compress dynamic range to make the signal sophisticated peak limiting is the most effective way to
comfortably listenable in noisy environments,
and also to make the best use of the dynamic broadcast a louder and punchier audio signal.
range limitations of the channel itself. In
analog services (like FM radio), this dynamic
range varies as a function of reception spectral balance from source to source. between program elements and makes voice
conditions, which are poorest in the fringes Multiband compression can achieve this. from announcers or presenters merge
of the signal. Audio processing therefore By setting a target spectral balance and smoothly into the program flow, even if the
also increases the potential coverage area of automatically re-equalizing program announcer is talking over music.
analog transmissions. material that does not have this balance, It is important to understand that AM,
Digital transmissions behave differently. the multiband compression helps the FM, or TV audio processors that employ
The technical specifications of the transmission radio station achieve a “big-time,” highly pre-emphasis/de-emphasis and/or clipping
system determine the signal-to-noise ratio. produced sound that sounds authoritative peak limiters are highly inappropriate for
This does not change with the signal to listeners. use with the perceptual audio coders used in
strength in wireless transmission (and is even • Third is to reduce the peak-to-average netcasting. The pre-emphasis/de-emphasis
more irrelevant in a wired environment). ratio of the signal to increase its relative limiting in these devices unnecessarily limits
Internet reception anomalies are typically loudness by comparison to an unprocessed high frequency headroom. Further, their
audio drop-outs rather than added noise. signal normalized to the same peak level. clipping limiters create high frequency
What role does audio processing play in a In netcasting, all signals share the same components— distortion—that the perceptual
system with a very low noise floor? It can “pipe.” Each signal has exactly the same audio coders would otherwise not encode.
still have several vital functions: reach. So the only thing that netcaster can An audio processor like Orban’s Optimod-PC
• First is dynamic range compression to do to stand out from his neighbors (and 1100 Professional PCI Sound Card consists of
accommodate the signal into typical listening possible competitors) is to broadcast a louder several cascaded stages. These include Input
environments like autos and homes. In and punchier audio signal. Experience has Conditioning, including defeatable highpass

www.streamingmedia.com 11
filtering and defeatable phase rotation; Stereo allows the user to adjust the tonal balance and thus introduces sidebands that are far
Enhancement; Two-Band Gated Automatic in a much more detailed way. EQ has two removed in frequency from their associated
Gain Control (AGC), with target-zone window purposes in a broadcast processor. The first Fourier “carriers.” The “carriers” hence
gating and silence gating; Equalization, is to establish a signature for a given station have little ability to psychoacoustically mask
including high-frequency enhancement; that brands the station, creating a “house the resulting sidebands when compared
Multiband Compression in either two or five sound” by subtly emphasizing the bass, with the sidebands that a look-ahead
bands, depending on the processing structure; midrange, or high frequencies. The second limiter introduces because the look-ahead
and Look-Ahead Limiting. purpose is to compensate for the frequency limiter’s gain control signal has a much
A highpass filter removes low frequency contouring caused by the subsequent multi- lower bandwidth. Therefore, compared to a
noise that can contaminate some recordings band compression and limiting. These may hard clipper, a look-ahead limiter produces
and microphone chains. This noise can create an overall spectral coloration that can considerably less audible modulation
otherwise cause problems with the rest of be corrected or augmented by carefully distortion. This is particularly important
the audio processing and with the codec, chosen fixed EQ before these multiband when one is driving a low bitrate codec
which should never waste its bit budget by dynamics stages. because one does not want to waste precious
encoding noise. The phase rotator makes Multiband compression and limiting may bits encoding this distortion.
speech more symmetrical, reducing its peak- occur in one or two stages, depending on the Simple wideband look-ahead limiting
to-average ratio by as much as six decibels developer. If it occurs in two stages, the can still produce audible intermodulation
without adding nonlinear distortion. Hence, multiband compressor and limiter can have distortion between heavy bass and midrange
phase rotation can be very useful for different crossovers and even different numbers material. Advanced-technology look-ahead
loudness processing of speech. of bands. If it occurs in one stage, the limiters use sophisticated techniques to reduce
There are a number of stereo enhancement compressor and limiter functions can “talk” such IM distortion without compromising
technologies available. Orban prefers one to each other, optimizing their interaction. loudness capability.
based on its patented algorithm that Both design approaches can yield good
increases the energy in the stereo difference sound, and each has its own set of tradeoffs. Codecs
signal (L–R) whenever a transient is detected Usually using anywhere between four and The basic principle of perceptual coding
in the stereo sum signal (L+R). By operating six bands, the multiband compressor/limiter is to divide the audio into frequency bands
only on transients, this algorithm increases reduces dynamic range and increases audio and then to code each frequency band with
width, brightness, and punch without density to achieve competitive loudness and the minimum number of bits that will yield
unnaturally increasing reverb (which is impact. It’s common for each band to be gated no audible change in that band. Reducing
usually predominantly in the L–R channel). at low levels to prevent noise rush-up, and the number of bits used to encode a given
Gating circuitry detects “mono” material developers often have proprietary algorithms frequency band raises the quantization
noise floor in that band. If the noise floor is
raised too far, it can become audible and
Netcasters have much to learn cause artifacts.
A second major source of artifacts in codecs

from broadcasters, however, about what is pre- and post-echo caused by ringing of the
narrow bandpass filters used to divide the
signal into frequency bands. This ringing
is commonly called “broadcast-quality” audio. worsens as the number of bands increases, so
some codecs may adaptively switch the number
of bands in use, depending on whether the
with slight channel or phase imbalances and for doing this while minimizing the audible sound has significant transient content.
suppresses enhancement so this built-in side effects of the gating. This ringing manifests itself as a smearing
imbalance is not exaggerated. A look-ahead limiter controls the peak of sharp transient sounds in music, such as
AGC compensates for varying input levels. level at the output of the processor to those produced by claves and wood blocks.
This is particularly important nowadays prevent clipping the codec. The limiter
because many CDs are aggressively processed prevents overshoots by examining a few Psychoacoustic Models
for loudness when they are mastered. milli-seconds of the unprocessed sound Perceptual coders exploit complex models
Older CDs with the same peak levels can before it is limited. This way the limiter can of the human auditory system to estimate
have average levels (which correspond anticipate peaks that are coming up. whether a given amount of added noise can
approximately to loudness) more than 10dB One can model any peak limiter as be heard. They then adjust the number of
below the average levels of current product. multiplying its input signal by a gain control bits used to code each frequency band such
This means that the common technique of signal. This is a form of amplitude modulation. that the added noise is undetectable by the
normalizing audio files for the same peak Amplitude modulation produces sidebands ear if the total “bit budget” is sufficiently
level causes huge problems with source-to- around the “carrier” signal. In the case of a high. Because the psychoacoustic model in a
source loudness consistency, particularly peak limiter, each Fourier component of the perceptual coder is an approximation that
when old and new material is juxtaposed. input signal is a separate “carrier” and the never exactly matches the behavior of the
AGC can go a long way toward smoothing peak limiting process produces modulation ear, it is desirable to leave some safety factor
out such inconsistencies. sidebands around each Fourier component. when choosing the number of bits to use
Equalization applies processing similar to Considered from this point of view, a hard for each frequency band. This safety factor
“tone controls” to the signal but typically clipper has a wideband gain control signal is often called the “mask-to-noise ratio,”

12 Webcast Essentials 2005


measured in dB. For example, a mask-to-noise almost any codec. This system transmits provide the absolute best possible sound per
ratio of 12dB in a given band would mean only lower frequencies (for example, below bit that the current state-of-the-art will
that the quantization noise in that band 8kHz) via the codec. The decoder at the allow, without the typical resonant, phasey,
could be raised by 12dB before it would be receiver creates higher frequencies from the watery character of older-technology
heard. (That is, there is a safety margin of lower frequencies by a process similar to codecs, like the Windows Media Audio.
two bits in that band’s coding.) For the most that used by “psychoacoustic exciters.” A WMA has become a de facto “lowest
efficient coding, the mask-to-noise ratio low-bandwidth signal in the compressed common denominator” codec in netcasting,
should be the same in all bands, ensuring bitstream provides “clues” to modulate these mostly because Microsoft ships a player
that the sound elements equitably share the created high frequencies so that they will with every copy of Windows. However, all
available bits in the transmission channel. match the original high frequencies as closely third-party bias-controlled tests known to us

Coding Efficiency
Different sounds will vary greatly in the
efficiency with which a perceptual coding
system can encode them. Therefore, for a
There’s little reason not to consider using aacPlus to deliver
constant transmission bitrate, the mask-to-
noise ratio will constantly change. Pure sounds
quality audio to the increasingly
sophisticated
having an extended harmonic structure (such
as a pitch pipe) are particularly difficult to Internet streaming audio audience.
encode because each harmonic must be
encoded, the harmonics occupy many
different frequency bands, and the overall
spectrum has many “holes” that are not as possible. Adding SBR to the basic AAC that compared aacPlus to WMA at the same
well-masked, so that added noise can be eas- codec creates aacPlus, which offers the best bit rate always rated WMA lower when
ily heard. The output of a multiband audio subjective quality currently available at bitrates subjective audio quality is the performance
processor that uses clipping is another sound below 128kbps. At bitrates below 128kbps, full metric. Because WMA has never been passed
that is difficult to encode, because the clipper subjective transparency cannot be achieved through formal MPEG testing procedures,
creates added distortion spectrum that does at the current state-of-the-art, yet the sound it is unclear what bitrate one must use with
not mask quantization noise well, yet may can still be very satisfying. (In the phraseology WMA to achieve parity with any given
cause the encoder to waste bits when trying of the ITU 1 to 5 subjective quality scale, this aacPlus bitrate, if even possible.
to encode the distortion. means that audible differences introduced Orban’s Opticodec-PC family of Streaming
by the codec are judged by expert listeners and File Audio Encoders uses the genuine
The AAC and aacPlus Codecs to be “detectable, but not annoying.”) Coding Technologies aacPlus codec. Because
AAC is intended for very high-quality Coding Technologies’ aacPlus v2, the Opticodec-PC supports both RTSP/RTP
coding with compression up to 12:1. The latest in MPEG-4 Audio and previously (Real and QuickTime) and HTTP/ICY
AAC codec is about 30% more efficient than known as “Enhanced aacPlus,” is aacPlus (SHOUTcast and Icecast2) standards-based
MP3 and about twice as efficient as MP2. coupled with the new MPEG Parametric streaming servers, and because free player
The AAC codec can achieve “transparency” Stereo technique created by Coding clients are available from Real, Winamp, and
(that is, listeners cannot audibly distinguish Technologies and Philips. Where SBR many mobile phones, there is little reason
the codec’s output from its input in a enables audio codecs to deliver the same not to consider using aacPlus to deliver
statistically significant way) at a stereo quality at half the bitrate, Parametric Stereo quality audio to the increasingly sophisticated
bitrate of 128Kbps, while MP2 requires enhances the codec efficiency a second Internet streaming audio audience while
about 256Kbps for the same quality. The time for low-bitrate stereo signals. Both saving on bandwidth costs thanks to aacPlus’s
MP3 codec cannot achieve transparency at SBR and Parametric Stereo are backward- higher efficiency than MP3 and WMA.
any bitrate, although its performance at and forward-compatible methods to enhance
192Kbps and higher is still very good. the efficiency of any audio codec. As a
AAC stands for Advanced Audio Coding. result, aacPlus v2 delivers streaming and
Intended to replace Layer 3, AAC was downloadable 5.1 multichannel audio at
info

developed by the MPEG group that 128Kbps, near CD-quality stereo at


includes Dolby, Fraunhofer (FhG), AT&T, 32Kbps, excellent quality stereo at 24Kbps,
Sony, and Nokia—companies that have also and great quality for mixed content down
been involved in the development of audio to 16 Kbps and below. MPEG standardized Readers can obtain more
codecs such as MP3 and AC3 (also known Coding Technologies’ aacPlus as MPEG-4 information about Orban/CRL at:
contact

as Dolby Digital™). (AAC does not stand HE-AAC (MPEG ISO/IEC 14496-
for Apple Audio Codec, although Apple was 3:2001/AMD-1: Bandwidth Extension). Orban
1525 Alvarado St.
one of the first to implement this technology With the addition of MPEG Parametric
San Leandro, CA 94577
with the introduction of Apple iTunes, the Stereo (MPEG ISO/IEC 14496-
510.351.3500
most successful downloadable music source, 3:2001/AMD-2: Parametric coding for high www.orban.com
and QuickTime 6.) quality audio), aacPlus v2 is the state-of-
Coding Technologies’ “Spectral Band the-art in low-bitrate standards-based audio Greg Ogonowski, gogonowski@orban.com
Replication” (SBR) process can be added to codecs. The Coding Technologies codecs John Schaab, jschaab@orban.com

www.streamingmedia.com 13

You might also like