You are on page 1of 48

A Guide to MPEG

Fundamentals and
Protocol Analysis
(Including DVB and ATSC)

Copyright © 1997, Tektronix, Inc. All rights reserved.


Contents

Section 1 Introduction to MPEG . . . . . . . . . . . . . . .3 Section 5 Packetized Elementary Streams (PES) . . .28


1.1 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 5.1 PES packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28
1.2 Why compression is needed . . . . . . . . . . . . . . . . . .3 5.2 Time stamps . . . . . . . . . . . . . . . . . . . . . . . . . . . .28
1.3 Applications of compression . . . . . . . . . . . . . . . . .3 5.3 PTS/DTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28
1.4 Introduction to video compression . . . . . . . . . . . . .4
Section 6 Program Streams . . . . . . . . . . . . . . . . .29
1.5 Introduction to audio compression . . . . . . . . . . . . .6
1.6 MPEG signals . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 6.1 Recording vs. transmission . . . . . . . . . . . . . . . . .29
1.7 Need for monitoring and analysis . . . . . . . . . . . . . .7 6.2 Introduction to program streams . . . . . . . . . . . . .29
1.8 Pitfalls of compression . . . . . . . . . . . . . . . . . . . . . .7 Section 7 Transport streams . . . . . . . . . . . . . . . .30
Section 2 Compression in Video . . . . . . . . . . . . . . .8 7.1 The job of a transport stream . . . . . . . . . . . . . . . .30
2.1 Spatial or temporal coding? . . . . . . . . . . . . . . . . . .8 7.2 Packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30
2.2 Spatial coding . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 7.3 Program Clock Reference (PCR) . . . . . . . . . . . . . .31
2.3 Weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 7.4 Packet Identification (PID) . . . . . . . . . . . . . . . . . .31
2.4 Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11 7.5 Program Specific Information (PSI) . . . . . . . . . . .32
2.5 Entropy coding . . . . . . . . . . . . . . . . . . . . . . . . . . .11 Section 8 Introduction to DVB/ATSC . . . . . . . . . . .33
2.6 A spatial coder . . . . . . . . . . . . . . . . . . . . . . . . . . .11
2.7 Temporal coding . . . . . . . . . . . . . . . . . . . . . . . . .12 8.1 An overall view . . . . . . . . . . . . . . . . . . . . . . . . . . .33
2.8 Motion compensation . . . . . . . . . . . . . . . . . . . . . .13 8.2 Remultiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . .33
2.9 Bidirectional coding . . . . . . . . . . . . . . . . . . . . . . .14 8.3 Service Information (SI) . . . . . . . . . . . . . . . . . . . .34
2.10 I, P, and B pictures . . . . . . . . . . . . . . . . . . . . . . .14 8.4 Error correction . . . . . . . . . . . . . . . . . . . . . . . . . .34
2.11 An MPEG compressor . . . . . . . . . . . . . . . . . . . .16 8.5 Channel coding . . . . . . . . . . . . . . . . . . . . . . . . . .35
2.12 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . .19 8.6 Inner coding . . . . . . . . . . . . . . . . . . . . . . . . . . . .36
2.13 Profiles and levels . . . . . . . . . . . . . . . . . . . . . . .20 8.7 Transmitting digits . . . . . . . . . . . . . . . . . . . . . . . .37
2.14 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21 Section 9 MPEG Testing . . . . . . . . . . . . . . . . . . .38
Section 3 Audio Compression . . . . . . . . . . . . . . .22 9.1 Testing requirements . . . . . . . . . . . . . . . . . . . . . .38
3.1 The hearing mechanism . . . . . . . . . . . . . . . . . . . .22 9.2 Analyzing a Transport Stream . . . . . . . . . . . . . . . .38
3.2 Subband coding . . . . . . . . . . . . . . . . . . . . . . . . . .23 9.3 Hierarchic view . . . . . . . . . . . . . . . . . . . . . . . . . .39
3.3 MPEG Layer 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .24 9.4 Interpreted view . . . . . . . . . . . . . . . . . . . . . . . . . .40
3.4 MPEG Layer 2 . . . . . . . . . . . . . . . . . . . . . . . . . . .25 9.5 Syntax and CRC analysis . . . . . . . . . . . . . . . . . . .41
3.5 Transform coding . . . . . . . . . . . . . . . . . . . . . . . . .25 9.6 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41
3.6 MPEG Layer 3 . . . . . . . . . . . . . . . . . . . . . . . . . . .25 9.7 Timing Analysis . . . . . . . . . . . . . . . . . . . . . . . . . .42
3.7 AC-3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25 9.8 Elementary stream testing . . . . . . . . . . . . . . . . . .43
9.9 Sarnoff compliant bit streams . . . . . . . . . . . . . . . .43
Section 4 Elementary Streams . . . . . . . . . . . . . . .26 9.10 Elementary stream analysis . . . . . . . . . . . . . . . .43
4.1 Video elementary stream syntax . . . . . . . . . . . . . .26 9.11 Creating a transport stream . . . . . . . . . . . . . . . .44
4.2 Audio elementary streams . . . . . . . . . . . . . . . . . .27 9.12 Jitter generation . . . . . . . . . . . . . . . . . . . . . . . . .44
9.13 DVB tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46
SECTION 1 Such data cannot be distinguished A smaller amount of storage is
INTRODUCTION TO MPEG from any other kind of data; needed for a given amount of
MPEG is one of the most popular therefore, digital video and source material. With high-
audio/video compression tech- audio become the province of density recording, such as with
niques because it is not just a computer technology. tape, compression allows highly
single standard. Instead it is a The convergence of computers miniaturized equipment for
range of standards suitable for and audio/video is an inevitable consumer and Electronic News
different applications but based consequence of the key inventions Gathering (ENG) use. The access
on similar principles. MPEG is of computing and Pulse Code time of tape improves with com-
an acronym for the Moving Modulation. Digital media can pression because less tape needs
Picture Experts Group which was store any type of information, so to be shuttled to skip over a
set up by the ISO (International it is easy to utilize a computer given amount of program. With
Standards Organization) to work storage device for digital video. expensive storage media such as
on compression. The nonlinear workstation was RAM, compression makes new
the first example of an application applications affordable.
MPEG can be described as the
interaction of acronyms. As ETSI of convergent technology that When working in real time, com-
stated "The CAT is a pointer to did not have an analog forerunner. pression reduces the bandwidth
enable the IRD to find the EMMs Another example, multimedia, needed. Additionally, compres-
associated with the CA system(s) mixed the storage of audio, video, sion allows faster-than-real-time
that it uses." If you can under- graphics, text and data on the transfer between media, for
stand that sentence you don't same medium. Multimedia is example, between tape and disk.
need this book. impossible in the analog domain. A compressed recording format
can afford a lower recording
1.1 Convergence 1.2 Why compression is needed
density and this can make the
Digital techniques have made The initial success of digital recorder less sensitive to
rapid progress in audio and video was in post-production environmental factors and
video for a number of reasons. applications, where the high cost maintenance.
Digital information is more robust of digital video was offset by its
and can be coded to substantially limitless layering and effects 1.3 Applications of compression
eliminate error. This means that capability. However, production- Compression has a long associa-
generation loss in recording and standard digital video generates tion with television. Interlace is
losses in transmission are elimi- over 200 megabits per second of a simple form of compression
nated. The Compact Disc was data and this bit rate requires giving a 2:1 reduction in band-
the first consumer product to extensive capacity for storage width. The use of color-difference
demonstrate this. and wide bandwidth for trans- signals instead of GBR is another
mission. Digital video could only form of compression. Because
While the CD has an improved be used in wider applications if
sound quality with respect to its the eye is less sensitive to color
the storage and bandwidth detail, the color-difference signals
vinyl predecessor, comparison of requirements could be eased;
quality alone misses the point. need less bandwidth. When
easing these requirements is the color broadcasting was intro-
The real point is that digital purpose of compression.
recording and transmission tech- duced, the channel structure of
niques allow content manipula- Compression is a way of monochrome had to be retained
tion to a degree that is impossible expressing digital audio and and composite video was devel-
with analog. Once audio or video video by using less data. oped. Composite video systems,
are digitized they become data. Compression has the following such as PAL, NTSC and SECAM,
advantages: are forms of compression because
they use the same bandwidth for
color as was used for monochrome.

3
Figure 1.1a shows that in tradi- compression lowers storage cost. encing it is not. In some cases, a
tional television systems, the Compression also lowers band- very complex coder would be
GBR camera signal is converted width, which allows more users too expensive. It follows that
to Y, Pr, Pb components for pro- to access a given server. This there is no one ideal compres-
duction and encoded into ana- characteristic is also important sion system.
logue composite for transmission. for VOD (Video On Demand) In practice, a range of coders is
Figure 1.1b shows the modern applications. needed which have a range of
equivalent. The Y, Pr, Pb signals processing delays and complexi-
are digitized and carried as Y, 1.4 Introduction to video
ties. The power of MPEG is that
Cr, Cb signals in SDI form compression
it is not a single compression
through the production process In all real program material, format, but a range of standard-
prior to being encoded with there are two types of components ized coding tools that can be
MPEG for transmission. Clearly, of the signal: those which are combined flexibly to suit a range
MPEG can be considered by the novel and unpredictable and of applications. The way in
broadcaster as a more efficient those which can be anticipated. which coding has been performed
replacement for composite The novel component is called is included in the compressed
video. In addition, MPEG has entropy and is the true informa- data so that the decoder can
greater flexibility because the bit tion in the signal. The remainder automatically handle whatever
rate required can be adjusted to is called redundancy because it the coder decided to do.
suit the application. At lower bit is not essential. Redundancy may
rates and resolutions, MPEG can MPEG coding is divided into
be spatial, as it is in large plain
be used for video conferencing several profiles that have different
areas of picture where adjacent
and video telephones. complexity, and each profile can
pixels have almost the same
be implemented at a different
DVB and ATSC (the European- value. Redundancy can also be
level depending on the resolution
and American-originated digital- temporal as it is where similarities
of the input picture. Section 2
television broadcasting standards) between successive pictures are
considers profiles and levels
would not be viable without used. All compression systems
in detail.
compression because the band- work by separating the entropy
width required would be too from the redundancy in the There are many different digital
great. Compression extends the encoder. Only the entropy is video formats and each has a
playing time of DVD (digital recorded or transmitted and the different bit rate. For example a
video/versatile disc) allowing decoder computes the redundancy high definition system might
full-length movies on a standard from the transmitted signal. have six times the bit rate of a
size compact disc. Compression Figure 1.2a shows this concept. standard definition system.
also reduces the cost of Electronic Consequently just knowing the
An ideal encoder would extract
News Gathering and other contri- bit rate out of the coder is not
all the entropy and only this will
butions to television production. very useful. What matters is the
be transmitted to the decoder.
compression factor, which is the
In tape recording, mild compres- An ideal decoder would then
ratio of the input bit rate to the
sion eases tolerances and adds reproduce the original signal. In
compressed bit rate, for example
reliability in Digital Betacam and practice, this ideal cannot be
2:1, 5:1, and so on.
Digital-S, whereas in SX, DVC, reached. An ideal coder would
DVCPRO and DVCAM, the goal be complex and cause a very Unfortunately the number of
is miniaturization. In magnetic long delay in order to use tem- variables involved make it very
disk drives, such as the Tektronix poral redundancy. In certain difficult to determine a suitable
®
Profile storage system, that are applications, such as recording compression factor. Figure 1.2a
used in file servers and networks or broadcasting, some delay is shows that for an ideal coder, if
(especially for news purposes), acceptable, but in videoconfer- all of the entropy is sent, the
quality is good. However, if the
compression factor is increased
Analog in order to reduce the bit rate,
Composite
G Y Out not all of the entropy is sent and
B Pr Composite the quality falls. Note that in a
Camera Matrix
R Pb Encoder compressed system when the
(PAL, NTSC
or SECAM) quality loss occurs, compression
a) is steep (Figure 1.2b). If the
available bit rate is inadequate,
it is better to avoid this area by
Digital
Compressed reducing the entropy of the
G Y Y Y Out input picture. This can be done
B Pr Cr Production Cr MPEG by filtering. The loss of resolu-
Camera Matrix ADC
R Pb Cb Process Cb Coder
tion caused by the filtering is
subjectively more acceptable
b) SDI than the compression artifacts.

Figure 1.1.

4
To identify the entropy perfectly,
an ideal compressor would have PCM Video
to be extremely complex. A
practical compressor may be less
complex for economic reasons
and must send more data to be
sure of carrying all of the entropy.
Figure 1.2b shows the relationship
between coder complexity and Entropy Ideal Coder Non-Ideal Short Delay
sends only Coder has to Coder has to
performance. The higher the com- Entropy send more send even more
pression factor required, the more a)
complex the encoder has to be.
The entropy in video signals
varies. A recording of an
announcer delivering the news Worse Worse
Quality Quality
has much redundancy and is easy
Compression Factor

Compression Factor
to compress. In contrast, it is
more difficult to compress a
recording with leaves blowing in Better
the wind or one of a football Quality
crowd that is constantly moving
and therefore has less redundancy Better
(more information or entropy). In Quality
either case, if all the entropy is
not sent, there will be quality loss.
Thus, we may choose between a
Complexity Latency
constant bit-rate channel with b) c)
variable quality or a constant
quality channel with variable bit Figure 1.2.
rate. Telecommunications network
operators tend to prefer a constant Typically, many coefficients will compression because it is in
bit rate for practical purposes, be zero, or nearly zero, and these itself a compression technique.
but a buffer memory can be used coefficients can be omitted, Temporal coding is made more
to average out entropy variations resulting in a reduction in bit rate. complex because pixels in one
if the resulting increase in delay field are in a different position to
Inter-coding relies on finding
is acceptable. In recording, a those in the next.
similarities between successive
variable bit rate maybe easier to
pictures. If a given picture is Motion compensation minimizes
handle and DVD uses variable
available at the decoder, the next but does not eliminate the
bit rate, speeding up the disc
picture can be created by sending differences between successive
where difficult material exists.
only the picture differences. The pictures. The picture-difference
Intra-coding (intra = within) is a picture differences will be is itself a spatial image and can
technique that exploits spatial increased when objects move, be compressed using transform-
redundancy, or redundancy but this magnification can be based intra-coding as previously
within the picture; inter-coding offset by using motion compen- described. Motion compensation
(inter = between) is a technique sation, since a moving object simply reduces the amount of
that exploits temporal redundancy. does not generally change its data in the difference image.
Intra-coding may be used alone, appearance very much from one The efficiency of a temporal
as in the JPEG standard for still picture to the next. If the motion coder rises with the time span
pictures, or combined with can be measured, a closer approx- over which it can act. Figure 1.2c
inter-coding as in MPEG. imation to the current picture shows that if a high compression
Intra-coding relies on two char- can be created by shifting part of factor is required, a longer time
acteristics of typical images. the previous picture to a new span in the input must be con-
First, not all spatial frequencies location. The shifting process is sidered and thus a longer coding
are simultaneously present, and controlled by a vector that is delay will be experienced. Clearly
second, the higher the spatial transmitted to the decoder. The temporally coded signals are dif-
frequency, the lower the ampli- vector transmission requires less ficult to edit because the content
tude is likely to be. Intra-coding data than sending the picture- of a given output picture may be
requires analysis of the spatial difference data. based on image data which was
frequencies in an image. This MPEG can handle both interlaced transmitted some time earlier.
analysis is the purpose of trans- and non-interlaced images. An Production systems will have
forms such as wavelets and DCT image at some point on the time to limit the degree of temporal
(discrete cosine transform). axis is called a "picture," whether coding to allow editing and this
Transforms produce coefficients it is a field or a frame. Interlace limitation will in turn limit the
which describe the magnitude of is not ideal as a source for digital available compression factor.
each spatial frequency.
5
1.5 Introduction to audio complicating audio compression Stream differs from a Program
compression is that delayed resonances in Stream in that the PES packets
The bit rate of a PCM digital poor loudspeakers actually mask are further subdivided into short
audio channel is only about one compression artifacts. Testing a fixed-size packets and in that
megabit per second, which is compressor with poor speakers multiple programs encoded with
about 0.5% of 4:2:2 digital video. gives a false result, and signals different clocks can be carried.
With mild video compression which are apparently satisfactory This is possible because a trans-
schemes, such as Digital may be disappointing when port stream has a program clock
Betacam, audio compression is heard on good equipment. reference (PCR) mechanism
unnecessary. But, as the video which allows transmission of
compression factor is raised, it 1.6 MPEG signals multiple clocks, one of which is
becomes necessary to compress The output of a single MPEG selected and regenerated at the
the audio as well. audio or video coder is called decoder. A Single Program
an Elementary Stream. An Transport Stream (SPTS) is also
Audio compression takes advan-
Elementary Stream is an endless possible and this may be found
tage of two facts. First, in typical
near real-time signal. For conve- between a coder and a multi-
audio signals, not all frequencies
nience, it can be broken into plexer. Since a Transport Stream
are simultaneously present.
convenient-sized data blocks in can genlock the decoder clock
Second, because of the phenom-
a Packetized Elementary Stream to the encoder clock, the Single
enon of masking, human hearing
(PES). These data blocks need Program Transport Stream
cannot discern every detail of
header information to identify (SPTS) is more common than
an audio signal. Audio compres-
the start of the packets and must the Program Stream.
sion splits the audio spectrum
into bands by filtering or trans- include time stamps because A Transport Stream is more than
forms, and includes less data packetizing disrupts the time axis. just a multiplex of audio and
when describing bands in which Figure 1.3 shows that one video video PES. In addition to the
the level is low. Where masking PES and a number of audio PES compressed audio, video and
prevents or reduces audibility of can be combined to form a data, a Transport Stream
a particular band, even less data Program Stream, provided that includes a great deal of metadata
needs to be sent. all of the coders are locked to a describing the bit stream. This
common clock. Time stamps in includes the Program Association
Audio compression is not as
each PES ensure lip-sync Table (PAT) that lists every pro-
easy to achieve as is video com-
between the video and audio. gram in the transport stream.
pression because of the acuity of
Program Streams have variable- Each entry in the PAT points to
hearing. Masking only works
length packets with headers. a Program Map Table (PMT) that
properly when the masking and
They find use in data transfers lists the elementary streams
the masked sounds coincide
to and from optical and hard making up each program. Some
spatially. Spatial coincidence is
disks, which are error free programs will be open, but some
always the case in mono record-
and in which files of arbitrary programs may be subject to con-
ings but not in stereo recordings,
sizes are expected. DVD uses ditional access (encryption) and
where low-level signals can
Program Streams. this information is also carried
still be heard if they are in a
in the metadata.
different part of the soundstage. For transmission and digital
Consequently, in stereo and sur- broadcasting, several programs The Transport Stream consists
round sound systems, a lower and their associated PES can be of fixed-size data packets, each
compression factor is allowable multiplexed into a single containing 188 bytes. Each
for a given quality. Another factor Transport Stream. A Transport packet carries a packet identifier
code (PID). Packets in the same
elementary stream all have the
Video Video Video same PID, so that the decoder
Packetizer
Data Encoder PES (or a demultiplexer) can select
Program Program
Stream the elementary stream(s) it
MUX Stream
Audio Audio Audio (DVD) wants and reject the remainder.
Packetizer Packet-continuity counts ensure
Data Encoder PES
that every packet that is needed
to decode a stream is received.
Elementary Single
An effective synchronization
Stream Transport Program system is needed so that
Stream
MUX Transport decoders can correctly identify
Stream the beginning of each packet
Data
and deserialize the bit stream
into words.
Figure 1.3.

6
1.7 Need for monitoring and analysis Traditional video testing tools, occur whose level varies and
The MPEG transport stream is the signal generator, the wave- depends on the motion. This
an extremely complex structure form monitor and vectorscope, problem often occurs in sports-
using interlinked tables and are not appropriate in analyzing coverage video.
coded identifiers to separate the MPEG systems, except to ensure Coarse quantizing results in
programs and the elementary that the video signals entering luminance contouring and pos-
streams within the programs. and leaving an MPEG system terized color. These can be seen
Within each elementary stream, are of suitable quality. Instead, as blotchy shadows and blocking
there is a complex structure, a reliable source of valid MPEG on large areas of plain color.
allowing a decoder to distinguish test signals is essential for Subjectively, compression artifacts
between, for example, vectors, testing receiving equipment are more annoying than the
coefficients and quantization and decoders. With a suitable relatively constant impairments
tables. analyzer, the performance of resulting from analog television
encoders, transmission systems, transmission systems.
Failures can be divided into
multiplexers and remultiplexers
two broad categories. In the first The only solution to these prob-
can be assessed with a high
category, the transport system lems is to reduce the compression
degree of confidence. As a long
correctly multiplexes and delivers factor. Consequently, the com-
standing supplier of high grade
information from an encoder to pression user has to make a
test equipment to the video
a decoder with no bit errors or value judgment between the
industry, Tektronix continues to
added jitter, but the encoder or economy of a high compression
provide test and measurement
the decoder has a fault. In the factor and the level of artifacts.
solutions as the technology
second category, the encoder
evolves, giving the MPEG user In addition to extending the
and decoder are fine, but the
the confidence that complex encoding and decoding delay,
transport of data from one to the
compressed systems are correctly temporal coding also causes
other is defective. It is very
functioning and allowing rapid difficulty in editing. In fact, an
important to know whether the
diagnosis when they are not. MPEG bit stream cannot be
fault lies in the encoder, the
arbitrarily edited at all. This
transport, or the decoder if a 1.8 Pitfalls of compression restriction occurs because in
prompt solution is to be found.
MPEG compression is lossy in temporal coding the decoding
Synchronizing problems, such that what is decoded, is not of one picture may require the
as loss or corruption of sync identical to the original. The contents of an earlier picture
patterns, may prevent reception entropy of the source varies, and the contents may not be
of the entire transport stream. and when entropy is high, the available following an edit.
Transport-stream protocol compression system may leave The fact that pictures may be
defects may prevent the decoder visible artifacts when decoded. sent out of sequence also
from finding all of the data for a In temporal compression, complicates editing.
program, perhaps delivering redundancy between successive If suitable coding has been used,
picture but not sound. Correct pictures is assumed. When this edits can take place only at
delivery of the data but with is not the case, the system fails. splice points, which are relatively
excessive jitter can cause decoder An example is video from a press widely spaced. If arbitrary editing
timing problems. conference where flashguns are is required, the MPEG stream
If a system using an MPEG firing. Individual pictures con- must undergo a read-modify-
transport stream fails, the fault taining the flash are totally dif- write process, which will result
could be in the encoder, the ferent from their neighbors, and in generation loss.
multiplexer, or in the decoder. coding artifacts become obvious.
The viewer is not interested in
How can this fault be isolated? Irregular motion or several editing, but the production user
First, verify that a transport independently moving objects will have to make another value
stream is compliant with the on screen require a lot of vector judgment about the edit flexibility
MPEG-coding standards. If the bandwidth and this requirement required. If greater flexibility is
stream is not compliant, a may only be met by reducing required, the temporal compres-
decoder can hardly be blamed the picture-data bandwidth. sion has to be reduced and a
for having difficulty. If it is, the Again, visible artifacts may higher bit rate will be needed.
decoder may need attention.

7
SECTION 2 Spatial compression relies on sufficient accuracy, the output
COMPRESSION IN VIDEO similarities between adjacent of the inverse transform is iden-
This section shows how video pixels in plain areas of picture tical to the original waveform.
compression is based on the and on dominant spatial fre- The most well known transform
perception of the eye. Important quencies in areas of patterning. is the Fourier transform. This
enabling techniques, such as The JPEG system uses spatial transform finds each frequency
transforms and motion compen- compression only, since it is in the input signal. It finds each
sation, are considered as an designed to transmit individual frequency by multiplying the
introduction to the structure of still pictures. However, JPEG input waveform by a sample of
an MPEG coder. may be used to code a succession a target frequency, called a basis
of individual pictures for video. function, and integrating the
2.1 Spatial or temporal coding? In the so-called "Motion JPEG" product. Figure 2.1 shows that
application, the compression when the input waveform does
As was seen in Section 1, video
factor will not be as good as if not contain the target frequency,
compression can take advantage
temporal coding was used, but the integral will be zero, but
of both spatial and temporal
the bit stream will be freely when it does, the integral will
redundancy. In MPEG, temporal
editable on a picture-by- be a coefficient describing the
redundancy is reduced first by
picture basis. amplitude of that component
using similarities between suc-
cessive pictures. As much as frequency.
2.2 Spatial coding
possible of the current picture is The results will be as described
created or "predicted" by using The first step in spatial coding
if the frequency component is in
information from pictures already is to perform an analysis of spa-
phase with the basis function.
sent. When this technique is tial frequency using a transform.
However if the frequency com-
used, it is only necessary to A transform is simply a way of
ponent is in quadrature with the
send a difference picture, which expressing a waveform in a dif-
basis function, the integral will
eliminates the differences ferent domain, in this case, the
still be zero. Therefore, it is
between the actual picture and frequency domain. The output
necessary to perform two
the prediction. The difference of a transform is a set of coeffi-
searches for each frequency,
picture is then subject to spatial cients that describe how much
with the basis functions in
compression. As a practical mat- of a given frequency is present.
quadrature with one another so
ter it is easier to explain spatial An inverse transform reproduces
that every phase of the input
compression prior to explaining the original waveform. If the
will be detected.
temporal compression. coefficients are handled with
The Fourier transform has the
disadvantage of requiring coeffi-
cients for both sine and cosine
Input
No Correlation components of each frequency.
if Frequency In the cosine transform, the input
Basis Different
Function
waveform is time-mirrored with
itself prior to multiplication by
the basis functions. Figure 2.2
shows that this mirroring cancels
Input out all sine components and
High Correlation
if Frequency doubles all of the cosine compo-
Basis the Same nents. The sine basis function
Function is unnecessary and only one
coefficient is needed for each
Figure 2.1. frequency.
The discrete cosine transform
Mirror
(DCT) is the sampled version of
the cosine transform and is used
Sine Component extensively in two-dimensional
Inverts at Mirror – Cancels form in MPEG. A block of 8 x 8
pixels is transformed to become
a block of 8 x 8 coefficients.
Since the transform requires
multiplication by fractions,
there is wordlength extension,
Cosine Component resulting in coefficients that
Coherent Through Mirror
have longer wordlength than the
pixel values. Typically an 8-bit
Figure 2.2.

8
pixel block results in an 11-bit is least visible to the viewer. that the technical measurement
coefficient block. Thus, a DCT 2.3 Weighting relates to the subjective result.
does not result in any compres- Compression reduces the accuracy
sion; in fact it results in the Figure 2.4 shows that the
human perception of noise in of coefficients and has a similar
opposite. However, the DCT effect to using shorter wordlength
converts the source pixels into a pictures is not uniform but is a
function of the spatial frequency. samples in PCM; that is, the
form where compression is easier. noise level rises. In PCM, the
More noise can be tolerated at
Figure 2.3 shows the results of high spatial frequencies. Also, result of shortening the word-
an inverse transform of each of video noise is effectively masked length is that the noise level
the individual coefficients of an by fine detail in the picture, rises equally at all frequencies.
8 x 8 DCT. In the case of the whereas in plain areas it is highly As the DCT splits the signal into
luminance signal, the top-left visible. The reader will be aware different frequencies, it becomes
coefficient is the average bright- that traditional noise measure- possible to control the spectrum
ness or DC component of the ments are always weighted so of the noise. Effectively, low-
whole block. Moving across frequency coefficients are
the top row, horizontal spatial
frequency increases. Moving
down the left column, vertical H
spatial frequency increases. In V
real pictures, different vertical
and horizontal spatial frequen-
cies may occur simultaneously
and a coefficient at some point
within the block will represent
all possible horizontal and
vertical combinations.
Figure 2.3 also shows the
coefficients as a one dimensional
horizontal waveform. Combining
these waveforms with various
amplitudes and either polarity
can reproduce any combination
of 8 pixels. Thus combining the Horizontal spatial
64 coefficients of the 2-D DCT frequency waveforms
will result in the original 8 x 8
pixel block. Clearly for color Figure 2.3.
pictures, the color difference
samples will also need to be
handled. Y, Cr, and Cb data are
assembled into separate 8 x 8
arrays and are transformed
individually.
In much real program material, Human
many of the coefficients will Vision
have zero or near zero values Sensitivity
and, therefore, will not be
transmitted. This fact results in
significant compression that is
virtually lossless. If a higher
compression factor is needed,
then the wordlength of the non-
Spatial Frequency
zero coefficients must be reduced.
This reduction will reduce accu- Figure 2.4.
racy of these coefficients and
will introduce losses into the
process. With care, the losses
can be introduced in a way that

9
Figure 2.5 shows that, in the increased noise. Coefficients As an alternative to truncation,
weighting process, the coeffi- representing higher spatial fre- weighted coefficients may be
cients from the DCT are divided quencies are requantized with nonlinearly requantized so that
by constants that are a function large steps and suffer more the quantizing step size increases
of two-dimensional frequency. noise. However, fewer steps with the magnitude of the coef-
Low-frequency coefficients will means that fewer bits are needed ficient. This technique allows
be divided by small numbers, to identify the step and a com- higher compression factors but
and high-frequency coefficients pression is obtained. worse levels of artifacts.
will be divided by large numbers. In the decoder, a low-order zero Clearly, the degree of compres-
Following the division, the will be added to return the sion obtained and, in turn, the
least-significant bit is discarded weighted coefficients to their output bit rate obtained, is a
or truncated. This truncation is correct magnitude. They will function of the severity of the
a form of requantizing. In the then be multiplied by inverse requantizing process. Different
absence of weighting, this weighting factors. Clearly, at bit rates will require different
requantizing would have the high frequencies the multiplica- weighting tables. In MPEG, it is
effect of doubling the size of the tion factors will be larger, so the possible to use various different
quantizing step, but with weight- requantizing noise will be greater. weighting tables and the table in
ing, it increases the step size Following inverse weighting, use can be transmitted to the
according to the division factor. the coefficients will have their decoder, so that correct decoding
As a result, coefficients repre- original DCT output values, plus automatically occurs.
senting low spatial frequencies requantizing error, which will
are requantized with relatively be greater at high frequency
small steps and suffer little than at low frequency.

7842 199 448 362 342 112 31 22 980 12 23 16 13 4 1 0

198 151 181 264 59 37 14 3 12 9 8 11 2 1 0 0

142 291 218 87 27 88 27 12 7 13 8 3 0 2 0 1

111 133 159 119 58 65 36 2 Divide by Divide by 5 6 6 4 2 1 0 0


Quant Quant
49 85 217 50 8 3 14 12 Matrix Scale 2 3 8 1 0 0 0 0

58 120 60 40 41 11 2 1 2 4 2 1 1 0 0 0

30 121 61 22 30 1 0 1 1 4 2 1 0 0 0 0

22 28 2 33 24 51 44 81 0 0 1 0 0 0 0 0

Input DCT Coefficients Output DCT Coefficients


(a more complex block) Value for display only
not actual results
8 16 19 22 26 27 29 34 Code Linear Non-Linear
Quant Scale Quant Scale
16 16 22 24 27 29 34 37 1 2 1

19 22 26 27 29 34 34 38 8 16 8

22 22 26 27 29 34 37 40 16 32 24

22 26 27 29 32 35 40 48 20 40 40

26 27 29 32 35 40 48 58 24 48 56
26 27 29 34 38 48 56 69 28 56 88
27 29 35 38 46 56 69 83 31 62 112

Quant Matrix Values Quant Scale Values


Value used corresponds Not all code values are shown
to the coefficient location One value used for complete 8x8 block

Figure 2.5.

10
2.4 Scanning (RLC) allows these coefficients 2.6 A spatial coder
In typical program material, the to be handled more efficiently. Figure 2.7 ties together all of
significant DCT coefficients are Where repeating values, such as the preceding spatial coding
generally found in the top-left a string of 0s, are present, run concepts. The input signal is
corner of the matrix. After length coding simply transmits assumed to be 4:2:2 SDI (Serial
weighting, low-value coefficients the number of zeros rather than Digital Interface), which may
might be truncated to zero. each individual bit. have 8- or 10-bit wordlength.
More efficient transmission can The probability of occurrence of MPEG uses only 8-bit resolution
be obtained if all of the non-zero particular coefficient values in therefore, a rounding stage will
coefficients are sent first, fol- real video can be studied. In be needed when the SDI signal
lowed by a code indicating that practice, some values occur very contains 10-bit words. Most
the remainder are all zero. often; others occur less often. MPEG profiles operate with 4:2:0
Scanning is a technique which This statistical information can sampling; therefore, a vertical
increases the probability of be used to achieve further com- low pass filter/interpolation
achieving this result, because it pression using variable length stage will be needed. Rounding
sends coefficients in descending coding (VLC). Frequently occur- and color subsampling intro-
order of magnitude probability. ring values are converted to duces a small irreversible loss of
Figure 2.6a shows that in a non- short code words, and infrequent information and a proportional
interlaced system, the probability values are converted to long reduction in bit rate. The raster
of a coefficient having a high code words. To aid deserializa- scanned input format will need
value is highest in the top-left tion, no code word can be the to be stored so that it can be
corner and lowest in the bottom- prefix of another. converted to 8 x 8 pixel blocks.
right corner. A 45 degree diagonal
zig-zag scan is the best sequence
to use here.
In Figure 2.6b, the scan for an
interlaced source is shown. In
an interlaced picture, an 8 x 8
DCT block from one field
extends over twice the vertical
screen area, so that for a given
picture detail, vertical frequencies
will appear to be twice as great
as horizontal frequencies. Thus,
the ideal scan for an interlaced
picture will be on a diagonal
that is twice as steep. Figure 2.6b
shows that a given vertical spa- Zigzag or Classic (nominally for frames) Alternate (nominally for fields)
tial frequency is scanned before a) b)
scanning the same horizontal Figure 2.6.
spatial frequency.

2.5 Entropy coding Rate Control


In real video, not all spatial
Quantizing Data
frequencies are simultaneously
present; therefore, the DCT
Full Convert
coefficient matrix will have zero Bitrate Entropy Compressed
4:2:2 to DCT Quantize Coding Buffer Data
terms in it. Despite the use of 10-bit 8-bit 4:2:0
scanning, zero coefficients will Data
still appear between the signifi- Information lost No Loss Data reduced Data reduced
cant values. Run length coding Data reduced No Data reduced (information lost) (no loss)
Quantizing Entropy Coding
Reduce the number of bits for each coefficient. Variable Length Coding Run Length Coding
Give preference to certain coefficients. Use short words for most frequent Send a unique code word instead
Reduction can differ for each coefficient. values (like Morse Code) of strings of zeros

Figure 2.7.

11
The DCT stage transforms the system, a buffer memory is Following an inverse transform,
picture information to the fre- used to absorb variations in the 8 x 8 pixel block is recreated.
quency domain. The DCT itself coding difficulty. Highly To obtain a raster scanned output,
does not achieve any compres- detailed pictures will tend to fill the blocks are stored in RAM,
sion. Following DCT, the coeffi- the buffer, whereas plain pic- which is read a line at a time.
cients are weighted and truncated, tures will allow it to empty. If To obtain a 4:2:2 output from
providing the first significant the buffer is in danger of over- 4:2:0 data, a vertical interpolation
compression. The coefficients flowing, the requantizing steps process will be needed as shown
are then zig-zag scanned to will have to be made larger, so in Figure 2.8.
increase the probability that the that the compression factor is The chroma samples in 4:2:0 are
significant coefficients occur effectively raised. positioned half way between
early in the scan. After the last In the decoder, the bit stream is luminance samples in the vertical
non-zero coefficient, an EOB deserialized and the entropy axis so that they are evenly
(end of block) code is generated. coding is reversed to reproduce spaced when an interlaced
Coefficient data are further com- the weighted coefficients. The source is used.
pressed by run length and vari- inverse weighting is applied and
able length coding. In a variable coefficients are placed in the 2.7 Temporal coding
bit-rate system, the quantizing is matrix according to the zig-zag Temporal redundancy can be
fixed, but in a fixed bit-rate scan to recreate the DCT matrix. exploited by inter-coding or
transmitting only the differences
between pictures. Figure 2.9
N shows that a one-picture delay
combined with a subtractor can
N+1 compute the picture differences.
The picture difference is an
N+2 image in its own right and can
be further compressed by the
N+3
spatial coder as was previously
4:2:2 Rec 601 4:1:1 described. The decoder reverses
the spatial coding and adds the
difference picture to the previous
picture to obtain the next picture.
1 Luminance sample Y
There are some disadvantages to
2 Chrominance samples Cb, Cr this simple system. First, as
only differences are sent, it is
impossible to begin decoding
after the start of the transmission.
4:2:0 This limitation makes it difficult
for a decoder to provide pictures
Figure 2.8.
following a switch from one
bit stream to another (as occurs
+ Picture when the viewer changes chan-
nels). Second, if any part of the
Difference
_ difference data is incorrect, the
error in the picture will
Picture Delay propagate indefinitely.
The solution to these problems is
This Previous to use a system that is not com-
Picture Picture pletely differential. Figure 2.10
shows that periodically complete
Figure 2.9. pictures are sent. These are
called Intra-coded pictures (or
I-pictures), and they are obtained
Start Intra Temporal Temporal Temporal Intra by spatial compression only. If
an error or a channel switch
occurs, it will be possible to
Difference Difference Difference Restart resume correct decoding at the
next I-picture.
Figure 2.10.

12
2.8 Motion compensation
Part of Moving Object Actions:
Motion reduces the similarities
between pictures and increases Motion 1. Compute Motion Vector
the data needed to create the Vector
2. Shift Data from Picture N
difference picture. Motion com- Using Vector to Make
pensation is used to increase the Predicted Picture N+1
similarity. Figure 2.11 shows the 3. Compare Actual Picture
principle. When an object moves with Predicted Picture
across the TV screen, it may 4. Send Vector and
appear in a different place in Picture N Picture N+1 Prediction Error
each picture, but it does not
change in appearance very much. Figure 2.11.
The picture difference can be
reduced by measuring the motion
at the encoder. This is sent to 8 8 8
16
the decoder as a vector. The Cr 8 8 8
decoder uses the vector to shift
part of the previous picture to a 16 4xY
more appropriate place in the 8 8 8
new picture. Cb 8 8 8
One vector controls the shifting
of an entire area of the picture a) 4:2:0 has 1/4 as many chroma sampling points as Y.
that is known as a macroblock.
The size of the macroblock is 8
determined by the DCT coding 8
and the color subsampling
structure. Figure 2.12a shows
2 x Cr
that, with a 4:2:0 system, the 8 8
8
vertical and horizontal spacing 16
8 8 8
of color samples is exactly twice
the spacing of luminance. A
single 8 x 8 DCT block of color 16 4xY
samples extends over the same 8 8 8
area as four 8 x 8 luminance 8 8 8
blocks; therefore this is the
minimum picture area which 2 x Cb
can be shifted by a vector. One
8
4:2:0 macroblock contains four
8
luminance blocks: one Cr block
and one Cb block.
In the 4:2:2 profile, color is only b) 4:2:2 has twice as much chroma data as 4:2:0.
subsampled in the horizontal Figure 2.12.
axis. Figure 2.12b shows that in
4:2:2, a single 8 x 8 DCT block with a resolution of half a pixel objects in the image and there
of color samples extends over over the entire search range. will be occasions where part of
two luminance blocks. A 4:2:2 When the greatest correlation is the macroblock moves and part
macroblock contains four lumi- found, this correlation is assumed of it does not. In this case, it is
nance blocks: two Cr blocks and to represent the correct motion. impossible to compensate prop-
two Cb blocks. erly. If the motion of the moving
The motion vector has a vertical
The motion estimator works by and horizontal component. In part is compensated by trans-
comparing the luminance data typical program material, mitting a vector, the stationary
from two successive pictures. A motion continues over a number part will be incorrectly shifted,
macroblock in the first picture of pictures. A greater compression and it will need difference data
is used as a reference. When the factor is obtained if the vectors to be corrected. If no vector is
input is interlaced, pixels will are transmitted differentially. sent, the stationary part will be
be at different vertical locations Consequently, if an object correct, but difference data will
in the two fields, and it will, moves at constant speed, the be needed to correct the moving
therefore, be necessary to inter- vectors do not change and the part. A practical compressor
polate one field before it can be vector difference is zero. might attempt both strategies
compared with the other. The and select the one which required
correlation between the reference Motion vectors are associated the least difference data.
and the next picture is measured with macroblocks, not with real
at all possible displacements

13
2.9 Bidirectional coding moved backwards in time to 2.10 I, P and B pictures
When an object moves, it conceals create part of an earlier picture. In MPEG, three different types
the background at its leading Figure 2.13 shows the concept of pictures are needed to support
edge and reveals the background of bidirectional coding. On an differential and bidirectional
at its trailing edge. The revealed individual macroblock basis, a coding while minimizing error
background requires new data to bidirectionally coded picture propagation:
be transmitted because the area can obtain motion-compensated I pictures are Intra-coded pictures
of background was previously data from an earlier or later that need no additional informa-
concealed and no information picture, or even use an average tion for decoding. They require
can be obtained from a previous of earlier and later data. a lot of data compared to other
picture. A similar problem Bidirectional coding significantly picture types, and therefore they
occurs if the camera pans: new reduces the amount of difference are not transmitted any more
areas come into view and nothing data needed by improving the frequently than necessary. They
is known about them. MPEG degree of prediction possible. consist primarily of transform
helps to minimize this problem MPEG does not specify how an coefficients and have no vectors.
by using bidirectional coding, encoder should be built, only I pictures allow the viewer to
which allows information to be what constitutes a compliant bit switch channels, and they arrest
taken from pictures before and stream. However, an intelligent error propagation.
after the current picture. If a compressor could try all three
background is being revealed, it P pictures are forward Predicted
coding strategies and select the
will be present in a later picture, from an earlier picture, which
one that results in the least data
and the information can be could be an I picture or a P pic-
to be transmitted.
ture. P-picture data consists of
vectors describing where, in the
previous picture, each macro-
Picture block should be taken from, and
(N) not of transform coefficients that
describe the correction or differ-
ence data that must be added to
Revealed Area is that macroblock. P pictures
Not in Picture (N)
require roughly half the data of
T an I picture.
I Picture
M (N+1) B pictures are Bidirectionally
E predicted from earlier or later I
or P pictures. B-picture data
Revealed Area is consists of vectors describing
in Picture N+2
where in earlier or later pictures
data should be taken from. It
Picture also contains the transform
(N+2) coefficients that provide the
correction. Because bidirectional
prediction is so effective, the
Figure 2.13.
correction data are minimal
and this helps the B picture to
typically require one quarter the
data of an I picture.

14
Figure 2.14 introduces the
Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame Frame Rec 601
concept of the GOP or Group Video Frames
Of Pictures. The GOP begins I B B P B B P B B I B B P
with an I picture and then has
P pictures spaced throughout.
The remaining pictures are
B pictures. The GOP is defined temporal_reference
as ending at the last picture 0 3 1 2 6 4 5 0 7 8 3 1 2
before the next I picture. The
GOP length is flexible, but 12 or Elementary
I P B B P B B I B B P B B Stream
15 pictures is a common value.
Clearly, if data for B pictures are
to be taken from a future picture, Figure 2.14.

that data must already be avail-


able at the decoder. Consequently, 50 –
bidirectional coding requires
that picture data is sent out of
Higher
sequence and temporarily 40 – Quality
stored. Figure 2.14 also shows
that the P-picture data are sent
30 –
before the B-picture data. Note Bit Rate
that the last B pictures in the MBit/Sec Constant
GOP cannot be transmitted until 20 – Lower Quality
after the I picture of the next Quality Curve
GOP since this data will be
needed to bidirectionally decode 10 –
them. In order to return pictures
to their correct sequence, a tem-
poral reference is included with I IB IBBP
each picture. As the picture rate
is also embedded periodically in Figure 2.15.
headers in the bit stream, an
MPEG file may be displayed by, coded pictures between intra- compression factor and coding
for example, a personal computer, or forward-predicted pictures delay. For a given quality, sending
in the correct order and timescale. must be restricted to reduce cost only I pictures requires more
and minimize delay, if delay is than twice the bit rate of an
Sending picture data out of
an issue. IBBP sequence. Where the ability
sequence requires additional
memory at the encoder and Figure 2.15 shows the tradeoff to edit is important, an IB
decoder and also causes delay. that must be made between sequence is a useful compromise.
The number of bidirectionally

15
Reordering Frame Delay Quantizing
Tables

In
Rate
Norm Control

Disable
Backward for I,P
Prediction
Error
Reorder
Subtract Spatial
Coder
Forward
Forward QT RC - Backward
Prediction Decision
Pass (I) Error Spatial
Coder
Subtract

F Spatial
Data
Forward Backward
Predictor Predictor Out
B
Forward Backward
Vectors Vectors F

B Vectors
Out
Past Motion Future
Picture Estimator Picture

GOP Current
Spatial Control Picture Spatial
Decoder Decoder

I Pictures
(shaded areas are unused)

Figure 2.16a.

2.11 An MPEG compressor order. The data then enter the the data pass straight through to
Figures 2.16a, b, and c show a subtractor and the motion esti- be spatially coded. Subtractor
typical bidirectional motion mator. To create an I picture, see output data also pass to a frame
compensator structure. Pre- Figure 2.16a, the end of the store that can hold several pic-
processed input video enters a input delay is selected and the tures. The I picture is held in
series of frame stores that can be subtractor is turned off, so that the store.
bypassed to change the picture

16
Reordering Frame Delay Quantizing
Tables

In
Rate
Norm Control

Disable
Backward for I,P
Prediction
Error
Reorder
Subtract Spatial
Coder
Forward
Forward QT RC - Backward
Prediction Decision
Pass (I) Error Spatial
Coder
Subtract

F Spatial
Data
Forward Backward
Predictor Predictor Out
B
Forward Backward
Vectors Vectors F

B Vectors
Out
Past Motion Future
Picture Estimator Picture

GOP Current
Spatial Control Picture Spatial
Decoder Decoder

P Pictures
(shaded areas are unused)

Figure 2.16b.

To encode a P picture, see with the P picture in the input prediction error, which is
Figure 2.16b, the B pictures in store to create forward motion spatially coded and sent along
the input buffer are bypassed, so vectors. The I picture is shifted with the vectors. The prediction
that the future picture is selected. by these vectors to make a pre- error is also added to the pre-
The motion estimator compares dicted P picture. The predicted dicted P picture to create a locally
the I picture in the output store P picture is subtracted from the decoded P picture that also
actual P picture to produce the enters the output store.

17
Reordering Frame Delay Quantizing
Tables

In
Rate
Norm Control

Disable
Backward for I,P
Prediction
Error
Reorder
Subtract Spatial
Coder
Forward
Forward QT RC - Backward
Prediction Decision
Pass (I) Error Spatial
Coder
Subtract

F Spatial
Data
Forward Backward
Predictor Predictor Out
B
Forward Backward
Vectors Vectors F

B Vectors
Out
Past Motion Future
Picture Estimator Picture

GOP Current
Spatial Control Picture Spatial
Decoder Decoder

B Pictures
(shaded area is unused)

Figure 2.16c.

The output store then contains forward or backward data are output is spatially coded and
an I picture and a P picture. A selected according to which rep- the vectors are added in a multi-
B picture from the input buffer resent the smallest differences. plexer. Syntactical data is also
can now be selected. The motion The picture differences are then added which identifies the type
compensator, see Figure 2.16c, spatially coded and sent with of picture (I, P or B) and pro-
will compare the B picture with the vectors. vides other information to help a
the I picture that precedes it and When all of the intermediate decoder (see section 4). The out-
the P picture that follows it to B pictures are coded, the input put data are buffered to allow
obtain bidirectional vectors. memory will once more be temporary variations in bit rate.
Forward and backward motion bypassed to create a new P pic- If the bit rate shows a long term
compensation is performed to ture based on the previous increase, the buffer will tend to
produce two predicted B pictures. P picture. fill up and to prevent overflow
These are subtracted from the the quantization process will
current B picture. On a macro- Figure 2.17 shows an MPEG have to be made more severe.
block-by-macroblock basis, the coder. The motion compensator Equally, should the buffer show
signs of underflow, the quantiza-
Syntactical tion will be relaxed to maintain
Quantizing Data the average bit rate. This means
Tables that the store contains exactly
what the store in the decoder
Spatial Entropy and Out will contain, so that the results
Run Length Coding MUX Buffer
Bidirectional Data Demand of all previous coding errors are
In Coder Clock
(Fig. 2.13) Motion Differential present. These will automatically
Vectors Coder be reduced when the predicted
picture is subtracted from the
Rate Control actual picture because the differ-
ence data will be more accurate.
Figure 2.17.

18
2.12 Preprocessing If a high compression factor is this behavior would result in a
A compressor attempts to elimi- required, the level of artifacts serious increase in vector data.
nate redundancy within the can increase, especially if input In 60 Hz video, 3:2 pulldown is
picture and between pictures. quality is poor. In this case, it used to obtain 60 Hz from 24 Hz
Anything which reduces that may be better to reduce the film. One frame is made into
redundancy is undesirable. entropy entering the coder using two fields, the next is made into
Noise and film grain are particu- prefiltering. The video signal is three fields, and so on.
larly problematic because they subject to two-dimensional, Consequently, one field in five
generally occur over the entire low-pass filtering, which is completely redundant. MPEG
picture. After the DCT process, reduces the number of coeffi- handles film material best by
noise results in more non-zero cients needed and reduces the discarding the third field in 3:2
coefficients, which the coder level of artifacts. The picture systems. A 24 Hz code in the
cannot distinguish from genuine will be less sharp, but less transmission alerts the decoder
picture data. Heavier quantizing sharpness is preferable to a high to recreate the 3:2 sequence by
will be required to encode all of level of artifacts. re-reading a field store. In 50
the coefficients, reducing picture In most MPEG-2 applications, and 60 Hz telecine, pairs of
quality. Noise also reduces simi- 4:2:0 sampling is used, which fields are deinterlaced to create
larities between successive pic- requires a chroma downsam- frames, and then motion is
tures, increasing the difference pling process if the source is measured between frames. The
data needed. 4:2:2. In MPEG-1, the luminance decoder can recreate interlace
Residual subcarrier in video and chroma are further down- by reading alternate lines in the
decoded from composite video sampled to produce an input frame store.
is a serious problem because it picture or SIF (Source Input A cut is a difficult event for a
results in high, spatial frequencies Format), that is only 352-pixels compressor to handle because it
that are normally at a low level wide. This technique reduces results in an almost complete
in component programs. Subcar- the entropy by a further factor. prediction failure, requiring a
rier also alternates from picture For very high compression, the large amount of correction data.
to picture causing an increase in QSIF (Quarter Source Input If a coding delay can be tolerated,
difference data. Naturally, any Format) picture, which is a coder may detect cuts in
composite decoding artifact 176-pixels wide, is used. advance and modify the GOP
that is visible in the input to the Downsampling is a process that structure dynamically, so that
MPEG coder is likely to be combines a spatial low-pass filter the cut is made to coincide with
reproduced at the decoder. with an interpolator. Downsam- the generation of an I picture. In
pling interlaced signals is prob- this case, the cut is handled
Any practice that causes un-
lematic because vertical detail is with very little extra data. The
wanted motion is to be avoided.
spread over two fields which last B pictures before the I frame
Unstable camera mountings, in
may decorrelate due to motion. will almost certainly need to
addition to giving a shaky picture,
increase picture differences and When the source material is use forward prediction. In some
vector transmission requirements. telecine, the video signal has applications that are not real-
This will also happen with different characteristics than time, such as DVD mastering, a
telecine material if film weave normal video. In 50 Hz video, coder could take two passes at
or hop due to sprocket hole pairs of fields represent the the input video: one pass to
damage is present. In general, same film frame, and there is no identify the difficult or high
video that is to be compressed motion between them. Thus, the entropy areas and create a coding
must be of the highest quality motion between fields alternates strategy, and a second pass to
possible. If high quality cannot between zero and the motion actually compress the input video.
be achieved, then noise reduction between frames. Since motion
and other stabilization techniques vectors are sent differentially,
will be desirable.

19
2.13 Profiles and levels low level uses a low resolution picture with moderate signal-to-
MPEG is applicable to a wide input having only 352 pixels noise ratio results. If, however,
range of applications requiring per line. The majority of broad- that picture is locally decoded
different performance and com- cast applications will require and subtracted pixel-by-pixel
plexity. Using all of the encoding the MP@ML (Main Profile at from the original, a quantizing
tools defined in MPEG, there are Main Level) subset of MPEG, noise picture results. This picture
millions of combinations possi- which supports SDTV (Standard can be compressed and trans-
ble. For practical purposes, the Definition TV). mitted as the helper signal. A
MPEG-2 standard is divided The high-1440 level is a high simple decoder only decodes
into profiles, and each profile definition scheme that doubles the main, noisy bit stream, but a
is subdivided into levels (see the definition compared to the more complex decoder can
Figure 2.18). A profile is basically main level. The high level not decode both bit streams and
a subset of the entire coding only doubles the resolution but combine them to produce a
repertoire requiring a certain maintains that resolution with low noise picture. This is the
complexity. A level is a parame- 16:9 format by increasing the principle of SNR scaleability.
ter such as the size of the picture number of horizontal samples As an alternative, coding only
or bit rate used with that profile. from 1440 to 1920. the lower spatial frequencies in
In principle, there are 24 combi- a HDTV picture can produce a
In compression systems using
nations, but not all of these have main bit stream that an SDTV
spatial transforms and requan-
been defined. An MPEG decoder receiver can decode. If the lower
tizing, it is possible to produce
having a given Profile and Level definition picture is locally
scaleable signals. A scaleable
must also be able to decode decoded and subtracted from
process is one in which the input
lower profiles and levels. the original picture, a definition-
results in a main signal and a
The simple profile does not sup- "helper" signal. The main signal enhancing picture would result.
port bidirectional coding, and so can be decoded alone to give a This picture can be coded into a
only I and P pictures will be picture of a certain quality, but, helper signal. A suitable decoder
output. This reduces the coding if the information from the helper could combine the main and
and decoding delay and allows signal is added, some aspect of helper signals to recreate the
simpler hardware. The simple the quality can be improved. HDTV picture. This is the
profile has only been defined at principle of Spatial scaleability.
For example, a conventional
Main level (SP@ML). The High profile supports both
MPEG coder, by heavily requan-
The Main Profile is designed for tizing coefficients, encodes a SNR and spatial scaleability as
a large proportion of uses. The well as allowing the option of
4:2:2 sampling.
The 4:2:2 profile has been
4:2:0 4:2:0, 4:2:2 developed for improved compat-
HIGH 1920x1152 1920x1152 ibility with digital production
80 Mb/s 100 Mb/s
I,P,B I,P,B equipment. This profile allows
4:2:2 operation without requiring
4:2:0 4:2:0 4:2:0, 4:2:2 the additional complexity of
1440x1152 1440x1152 1440x1152 using the high profile. For
HIGH-1440 60 Mb/s 60 Mb/s 80 Mb/s
I,P,B I,P,B I,P,B example, an HP@ML decoder
must support SNR scaleability,
4:2:0 4:2:0 4:2:0 4:2:0, 4:2:2 which is not a requirement for
720x576 720x576 4:2:2
MAIN 720x608 720x576 720x576 production. The 4:2:2 profile
15 Mb/s 15 Mb/s 15 Mb/s 20 Mb/s
I,P I,P,B 50 Mb/s has the same freedom of GOP
I,P,B I,P,B I,P,B
structure as other profiles, but
4:2:0 4:2:0 in practice it is commonly used
352x288 352x288 with short GOPs making editing
LOW 4 Mb/s 4 Mb/s
I,P,B I,P,B easier. 4:2:2 operation requires a
higher bit rate than 4:2:0, and
LEVEL SIMPLE MAIN
PROFILE 4:2:2 SNR SPATIAL HIGH the use of short GOPs requires
PROFILE
an even higher bit rate for a
given quality.
Figure 2.18.

20
2.14 Wavelets Figure 2.19 contrasts the fixed of pitch in steady tones.
All transforms suffer from block size of the DFT/DCT with For video coding, wavelets have
uncertainty because the more the variable size of the wavelet. the advantage of producing reso-
accurately the frequency domain Wavelets are especially useful lution scaleable signals with
is known, the less accurately the for audio coding because they almost no extra effort. In moving
time domain is known (and vice automatically adapt to the con- video, the advantages of wavelets
versa). In most transforms such flicting requirements of the are offset by the difficulty of
as DFT and DCT, the block accurate location of transients in assigning motion vectors to a
length is fixed, so the time and time and the accurate assessment variable size block, but in still-
frequency resolution is fixed. picture or I-picture coding this
The frequency coefficients rep-
resent evenly spaced values on
a linear scale. Unfortunately, Wavelet
FFT Transform
because human senses are loga-
rithmic, the even scale of the
DFT and DCT gives inadequate
frequency resolution at one end
and excess resolution at the other.
The wavelet transform is not
affected by this problem because
its frequency resolution is a
fixed fraction of an octave and
therefore has a logarithmic
characteristic. This is done by
changing the block length as a
function of frequency. As fre-
quency goes down, the block
becomes longer. Thus, a charac-
teristic of the wavelet transform
is that the basis functions all
contain the same number of Constant
cycles, and these cycles are sim- Constant
Size Windows Number of Cycles
ply scaled along the time axis to in FFT in Basis Function
search for different frequencies.
Figure 2.19.

21
SECTION 3 The physical hearing mechanism frequencies available determines
AUDIO COMPRESSION consists of the outer, middle the frequency range of human
Lossy audio compression is and inner ears. The outer ear hearing, which in most people
based entirely on the character- comprises the ear canal and the is from 20 Hz to about 15 kHz.
istics of human hearing, which eardrum. The eardrum converts Different frequencies in the
must be considered before any the incident sound into a vibra- input sound cause different
description of compression is tion in much the same way as areas of the membrane to
possible. Surprisingly, human does a microphone diaphragm. vibrate. Each area has different
hearing, particularly in stereo, is The inner ear works by sensing nerve endings to allow pitch
actually more critically discrim- vibrations transmitted through a discrimination. The basilar
inating than human vision, and fluid. The impedance of fluid is membrane also has tiny muscles
consequently audio compression much higher than that of air and controlled by the nerves that
should be undertaken with care. the middle ear acts as an imped- together act as a kind of positive
As with video compression, ance-matching transformer that feedback system that improves
audio compression requires a improves power transfer. the Q factor of the resonance.
number of different levels of Figure 3.1 shows that vibrations The resonant behavior of the
complexity according to the are transferred to the inner ear basilar membrane is an exact
required compression factor. by the stirrup bone, which acts parallel with the behavior of a
on the oval window. Vibrations transform analyzer. According
3.1 The hearing mechanism in the fluid in the ear travel up to the uncertainty theory of
Hearing comprises physical the cochlea, a spiral cavity in transforms, the more accurately
processes in the ear and nervous/ the skull (shown unrolled in the frequency domain of a signal
mental processes that combine Figure 3.1 for clarity). The basilar is known, the less accurately the
to give us an impression of membrane is stretched across time domain is known.
sound. The impression we the cochlea. This membrane Consequently, the more able a
receive is not identical to the varies in mass and stiffness transform is able to discriminate
actual acoustic waveform present along its length. At the end near between two frequencies, the
in the ear canal because some the oval window, the membrane less able it is to discriminate
entropy is lost. Audio compres- is stiff and light, so its resonant between the time of two events.
sion systems that lose only that frequency is high. At the distant Human hearing has evolved
part of the entropy that will be end, the membrane is heavy and with a certain compromise that
lost in the hearing mechanism soft and resonates at low fre- balances time-uncertainty
will produce good results. quency. The range of resonant discrimination and frequency
discrimination; in the balance,
Stirrup neither ability is perfect.
Bone The imperfect frequency
discrimination results in the
Basilar Membrane
inability to separate closely
spaced frequencies. This inability
is known as auditory masking,
Cochlea (unrolled) defined as the reduced sensitivity
to sound in the presence
Ear
Drum of another.

Outer Middle Inner


Ear Ear Ear

Figure 3.1.

22
Figure 3.2a shows that the be present for at least about 1 3.2 Subband coding
threshold of hearing is a function millisecond before it becomes Figure 3.4 shows a band-splitting
of frequency. The greatest sensi- audible. Because of this slow compandor. The band-splitting
tivity is, not surprisingly, in the response, masking can still take filter is a set of narrow-band,
speech range. In the presence of place even when the two signals linear-phase filters that overlap
a single tone, the threshold is involved are not simultaneous. and all have the same band-
modified as in Figure 3.2b. Note Forward and backward masking width. The output in each band
that the threshold is raised for occur when the masking sound consists of samples representing
tones at higher frequency and to continues to mask sounds at a waveform. In each frequency
some extent at lower frequency. lower levels before and after the band, the audio input is ampli-
In the presence of a complex masking sound's actual duration. fied up to maximum level prior
input spectrum, such as music, Figure 3.3 shows this concept. to transmission. Afterwards, each
the threshold is raised at nearly Masking raises the threshold of level is returned to its correct
all frequencies. One consequence hearing, and compressors take value. Noise picked up in the
of this behavior is that the hiss advantage of this effect by raising transmission is reduced in each
from an analog audio cassette is the noise floor, which allows band. If the noise reduction is
only audible during quiet pas- the audio waveform to be compared with the threshold of
sages in music. Companding expressed with fewer bits. The hearing, it can be seen that
makes use of this principle by noise floor can only be raised at greater noise can be tolerated in
amplifying low-level audio sig- frequencies at which there is some bands because of masking.
nals prior to recording or trans- effective masking. To maximize Consequently, in each band after
mission and returning them to effective masking, it is necessary companding, it is possible to
their correct level afterwards. to split the audio spectrum into reduce the wordlength of sam-
The imperfect time discrimina- different frequency bands to ples. This technique achieves a
tion of the ear is due to its reso- allow introduction of different compression because the noise
nant response. The Q factor is amounts of companding and introduced by the loss of resolu-
such that a given sound has to noise in each band. tion is masked.

Sound Sound
Pressure Pressure

1 kHz Sine Wave


Threshold
in Quiet

Masking
Threshold

20 Hz 1 kHz 20 kHz 20 Hz 1 kHz 20 kHz

Figure 3.2a. Figure 3.2b.

Sound
Pressure

Input Sub-
Band
Pre- Post- Filter
Masking Masking Level Gain Factor
Detect

Companded
X Audio

Time

Figure 3.3. Figure 3.4.

23
Figure 3.5 shows a simple band- output of the filter there are 12 be reversed at the decoder.
splitting coder as is used in samples in each of 32 bands. The filter bank output is also
MPEG Layer 1. The digital Within each band, the level is analyzed to determine the spec-
audio input is fed to a band- amplified by multiplication to trum of the input signal. This
splitting filter that divides the bring the level up to maximum. analysis drives a masking model
spectrum of the signal into a The gain required is constant for that determines the degree of
number of bands. In MPEG this the duration of a block, and a masking that can be expected in
number is 32. The time axis is single scale factor is transmitted each band. The more masking
divided into blocks of equal with each block for each band available, the less accurate the
length. In MPEG layer 1, this is in order to allow the process to samples in each band can be.
384 input samples, so in the The sample accuracy is reduced
by requantizing to reduce
wordlength. This reduction is

}
Bandsplitting Scaler
Filter also constant for every word in
In and
32 Quantizer a band, but different bands can
Subbands
use different wordlengths. The
MUX Out wordlength needs to be trans-
mitted as a bit allocation code
Dynamic for each band to allow the
Masking Bit and decoder to deserialize the bit
Thresholds Scale Factor
Allocater stream properly.
and Coder
3.3 MPEG Layer 1
X32
Figure 3.6 shows an MPEG
Figure 3.5. Level 1 audio bit stream.
Following the synchronizing
384 PCM Audio Input Samples
pattern and the header, there are
Duration 8 msec @ 48 kHz 32-bit allocation codes of four
Header Bit Allocation bits each. These codes describe
CRC Scale Factors Anc Data the wordlength of samples in
Subband each subband. Next come the 32
Samples scale factors used in the com-
GR0 GR1 GR2
panding of each band. These
GR11
scale factors determine the gain
needed in the decoder to return
the audio to the correct level.
Optional 6 Bit Linear Unspecified
Length The scale factors are followed,
4 Bit Linear in turn, by the audio data in
20 Bit System each band.
12 Bit Sync 0 1 2 31
Figure 3.7 shows the Layer 1
decoder. The synchronization
Figure 3.6. pattern is detected by the timing
generator, which deserializes the
bit allocation and scale factor
Sync Scale
DET data. The bit allocation data then
MPEG Factors X32
Layer 1 allows deserialization of the
32
In Input variable length samples. The
Samples
Demux X Filter Out requantizing is reversed and the
Bit X32
Allocation
compression is reversed by the
X32
scale factor data to put each
band back to the correct level.
These 32 separate bands are
Figure 3.7. then combined in a combiner
filter which produces the
audio output.

24
3.4 MPEG Layer 2
Wide
Figure 3.8 shows that when the Sub-Band
band-splitting filter is used to
drive the masking model, the
spectral analysis is not very
accurate, since there are only 32 Is the Masking
Like This?
bands and the energy could be
anywhere in the band. The
noise floor cannot be raised very Or Like This?
much because, in the worst case
shown, the masking may not
operate. A more accurate spectral
analysis would allow a higher
compression factor. In MPEG
layer 2, the spectral analysis is Figure 3.8.
performed by a separate process.
A 512-point FFT working directly response cannot increase or performed by the psychoacoustic
from the input is used to drive reduce rapidly. Consequently, if model. It has been found that
the masking model instead. To an audio waveform is trans- pre-echo is associated with the
resolve frequencies more accu- formed into the frequency entropy in the audio rising
rately, the time span of the domain, the coefficients do not above the average value. To
transform has to be increased, need to be sent very often. This obtain the highest compression
which is done by raising the principle is the basis of transform factor, nonuniform quantizing
block size to 1152 samples. coding. For higher compression of the coefficients is used along
factors, the coefficients can be with Huffman coding. This
While the block-companding requantized, making them less technique allocates the shortest
scheme is the same as in Layer 1, accurate. This process produces wordlengths to the most common
not all of the scale factors are noise which will be placed at code values.
transmitted, since they contain a frequencies where the masking
degree of redundancy on real is the greatest. A by-product of a 3.7 AC-3
program material. The scale factor transform coder is that the input The AC-3 audio coding technique
of successive blocks in the same spectrum is accurately known, is used with the ATSC system
band exceeds 2 dB less than so a precise masking model can instead of one of the MPEG
10% of the time, and advantage be created. audio coding schemes. AC-3 is a
is taken of this characteristic by
transform-based system that
analyzing sets of three successive 3.6 MPEG Layer 3 obtains coding gain by requan-
scale factors. On stationary pro-
This complex level of coding is tizing frequency coefficients.
grams, only one scale factor out
really only required when the The PCM input to an AC-3
of three is sent. As transient
highest compression factor is coder is divided into overlapping
content increases in a given sub-
needed. It has a degree of com- windowed blocks as shown in
band, two or three scale factors
monality with Layer 2. A discrete Figure 3.9. These blocks contain
will be sent. A scale factor
cosine transform is used having 512 samples each, but because
select code is also sent to allow
384 output coefficients per block. of the complete overlap, there is
the decoder to determine what
This output can be obtained by 100% redundancy. After the
has been sent in each subband.
direct processing of the input transform, there are 512 coeffi-
This technique effectively
samples, but in a multi-level cients in each block, but because
halves the scale factor bit rate.
coder, it is possible to use a of the redundancy, these coeffi-
hybrid transform incorporating cients can be decimated to 256
3.5 Transform coding
the 32-band filtering of Layers 1 coefficients using a technique
Layers 1 and 2 are based on and 2 as a basis. If this is done,
band-splitting filters in which called Time Domain Aliasing
the 32 subbands from the QMF Cancellation (TDAC).
the signal is still represented as (Quadrature Mirror Filter) are
a waveform. However, Layer 3 each further processed by a
adopts transform coding similar 12-band MDCT (Modified
to that used in video coding. As Discreet Cosine Transform) to
was mentioned above, the ear obtain 384 output coefficients.
performs a kind of frequency 512
transform on the incident sound Two window sizes are used to Samples
and because of the Q factor of avoid pre-echo on transients.
The window switching is Figure 3.9.
the basilar membrane, the

25
The input waveform is analyzed, analysis of the input to a finite in absolute form. The remainder
and if there is a significant tran- accuracy on a logarithmic scale are transmitted differentially
sient in the second half of the called the spectral envelope. This and the decoder adds the differ-
block, the waveform will be spectral analysis is the input to ence to the previous exponent.
split into two to prevent pre- the masking model that deter- Where the input audio has a
echo. In this case, the number of mines the degree to which noise smooth spectrum, the exponents
coefficients remains the same, can be raised at each frequency. in several frequency bands may
but the frequency resolution The masking model drives the be the same. Exponents can be
will be halved and the temporal requantizing process, which grouped into sets of two or four
resolution will be doubled. A reduces the accuracy of each with flags that describe what
flag is set in the bit stream to coefficient by rounding the has been done.
indicate to the decoder that this mantissae. A significant propor- Sets of six blocks are assembled
has been done. tion of the transmitted data into an AC-3 sync frame. The
The coefficients are output in consist of these mantissae. first block of the frame always
floating-point notation as a The exponents are also transmit- has full exponent data, but in
mantissa and an exponent. The ted, but not directly as there is cases of stationary signals, later
representation is the binary further redundancy within them blocks in the frame can use the
equivalent of scientific notation. that can be exploited. Within a same exponents.
Exponents are effectively scale block, only the first (lowest fre-
factors. The set of exponents in quency) exponent is transmitted
a block produce a spectral

SECTION 4 encoders. By standardizing the decoders. The approach also


ELEMENTARY STREAMS decoder, they can be made at allows the use of proprietary
An elementary stream is basically low cost. In contrast, the coding algorithms, which need
the raw output of an encoder encoder can be more complex not enter the public domain.
and contains no more than is and more expensive without a
great cost penalty, but with the 4.1 Video elementary stream syntax
necessary for a decoder to
approximate the original picture potential for better picture Figure 4.1 shows the construc-
or audio. The syntax of the com- quality as complexity increases. tion of the elementary video
pressed signal is rigidly defined When the encoder and the stream. The fundamental unit of
in MPEG so that decoders can decoder are different in com- picture information is the DCT
be guaranteed to operate on it. plexity, the coding system is block, which represents an 8 x 8
The encoder is not defined said to be asymmetrical. array of pixels that can be Y, Cr,
except that it must somehow The MPEG approach also allows or Cb. The DC coefficient is sent
produce the right syntax. for the possibility that quality first and is represented more
will improve as coding algo- accurately than the other coeffi-
The advantage of this approach
rithms are refined while still cients. Following the remaining
is that it suits the real world in
producing bit streams that can coefficients, an end of block
which there are likely to be
be understood by earlier (EOB) code is sent.
many more decoders than

Motion Sync I.P.B. Time Global Open/ Quantizing Level


Vector Pattern Vectors Closed Matrices
Picture Profile
Rate

Block Xn Xn Group Xn
of Macro Xn Slice
Xn Video
Block Picture of
Coefficients Pictures Sequence

Chroma Picture
Type Size
Progressive Aspect
Interlace Ratio

Figure 4.1.

26
Blocks are assembled into structure allows recovery by Without the sequence header
macroblocks, which are the providing a desynchronizing data, a decoder cannot under-
fundamental units of a picture point in the bit stream. stand the bit stream, and there-
and which can be motion com- A number of slices are combined fore sequence headers become
pensated. Each macroblock has to make a picture that is the entry points at which decoders
a two-dimensional motion vector active part of a field or a frame. can begin correct operation. The
in the header. In B pictures, the The picture header defines spacing of entry points influences
vectors can be backward as well whether the picture was I, P, or the delay in correct decoding
as forward. The motion compen- B coded and includes a temporal that occurs when the viewer
sation can be field or frame reference so that the picture can switches from one television
based and this is indicated. The be presented at the correct time. channel to another.
scale used for coefficient In the case of pans and tilts, the
requantizing is also indicated. 4.2 Audio elementary streams
vectors in every macroblock will
Using the vectors, the decoder be the same. A global vector can Various types of audio can be
obtains information from earlier be sent for the whole picture, embedded in an MPEG-2 multi-
and later pictures to produce a and the individual vectors then plex. These types include audio
predicted picture. The blocks become differences from this coded according to MPEG layers
are inverse transformed to pro- global value. 1, 2, 3, or AC-3. The type of
duce a correction picture that is audio encoding used must be
added to the predicted picture Pictures may be combined to
included in a descriptor that a
to produce the decoded output. produce a GOP (group of pic-
decoder will read in order to
In 4:2:0 coding, each macroblock tures) that must begin with an I
invoke the appropriate type
will have four Y blocks and two picture. The GOP is the funda-
of decoding.
color difference blocks. To make mental unit of temporal coding.
In the MPEG standard, the use The audio compression process
it possible to identify which
of a GOP is optional, but it is a is quite different from the video
block describes which compo-
practical necessity. Between I process. There is no equivalent
nent, the blocks are sent in a
pictures, a variable number of P to the different I, P, and B frame
specified order.
and/or B pictures may be placed types, and audio frames always
Macroblocks are assembled into as was described in Section 2. A contain the same amount of
slices that must always represent GOP may be open or closed. In audio data. There is no equiva-
horizontal strips of picture from a closed GOP, the last B pictures lent of bidirectional coding and
left to right. In MPEG, slices can do not require the I picture in audio frames are not transmitted
start anywhere and be of arbitrary the next GOP for decoding and out of sequence.
size, but in ATSC (Advanced the bit stream could be cut at In MPEG-2 audio, the descriptor
Television Systems Committee) the end of the GOP. in the sequence header contains
they must start at the left-hand
If GOPs are used, several GOPs the layer that has been used to
edge of the picture. Several
may be combined to produce a compress the audio and the type
slices can exist across the screen
video sequence. The sequence of compression used (for exam-
width. The slice is the funda-
begins with a sequence start ple, joint stereo), along with the
mental unit of synchronization
code, followed by a sequence original sampling rate. The
for variable length and differen-
header and ends with a audio sequence is assembled
tial coding. The first vectors in a
sequence end code. Additional from a number of access units
slice are sent absolutely, whereas
sequence headers can be placed (AUs) which will be coded
the remaining vectors are trans-
throughout the sequence. This audio frames.
mitted differentially. In I pictures,
the first DC coefficients in the approach allows decoding to If AC-3 coding is used, as in
slice are sent absolutely and the begin part way through the ATSC, this usage will be reflected
remaining DC coefficients are sequence, as might happen in in the sequence header. The
transmitted differentially. In dif- playback of digital video discs audio access unit (AU) is an
ference pictures, this technique and tape cassettes. The AC-3 sync frame as described in
is not worthwhile. sequence header specifies the Section 3.7. The AC-3 sync
vertical and horizontal size of frame represents a time span
In the case of a bit error in the the picture, the aspect ratio, the equivalent of 1536 audio samples
elementary stream, either the chroma subsampling format, the and will be 32 ms for 48 kHz
deserialization of the variable picture rate, the use of progres- sampling and 48 ms for 32 kHz.
length symbols will break down sive scan or interlace, the profile,
or subsequent differentially- level, and bit rate, and the
coded coefficients or vectors quantizing matrices used in
will be incorrect. The slice intra and inter-coded pictures.

27
SECTION 5 two parameters (start code prefix A time stamp is a 33-bit number
PACKETIZED ELEMENTARY and stream ID) comprise the that is a sample of a counter
STREAMS (PES) packet start code that identifies driven by a 90-kHz clock. This
For practical purposes, the the beginning of a packet. It is clock is obtained by dividing
continuous elementary streams important not to confuse the the 27-MHz program clock by
carrying audio or video from packet in a PES with the much 300. Since presentation times
compressors need to be broken smaller packet used in transport are evenly spaced, it is not
into packets. These packets are streams that, unfortunately, essential to include a time
identified by headers that contain shares the same name. stamp in every presentation
time stamps for synchronizing. Because MPEG only defines the unit. Instead, time stamps can
PES packets can be used to transport stream, not the be interpolated by the decoder,
create Program Streams or encoder, a designer might but they must not be more than
Transport Streams. choose to build a multiplexer 700 ms apart in either program
that converts from elementary streams or transport streams.
5.1 PES packets streams to a transport stream in Time stamps indicate where a
In the Packetized Elementary one step. In this case, the PES particular access unit belongs in
Stream (PES), an endless ele- packets may never exist in an time. Lip sync is obtained by
mentary stream is divided into identifiable form, but instead, incorporating time stamps into
packets of a convenient size for they are logically present in the the headers in both video and
the application. This size might transport stream payload. audio PES packets. When a
be a few hundred kilobytes, decoder receives a selected PES
although this would vary with 5.2 Time stamps packet, it decodes each access
the application. After compression, pictures are unit and buffers it into RAM.
sent out of sequence because of When the time line count reaches
Each packet is preceded by a
bidirectional coding. They the value of the time stamp, the
PES packet header. Figure 5.1
require a variable amount of RAM is read out. This operation
shows the contents of a header.
data and are subject to variable has two desirable results. First,
The packet begins with a start
delay due to multiplexing and effective timebase correction is
code prefix of 24 bits and a
transmission. In order to keep obtained in each elementary
stream ID that identifies the
the audio and video locked stream. Second, the video and
contents of the packet as video
together, time stamps are audio elementary streams can be
or audio and further specifies
periodically incorporated in synchronized together to make
the type of audio coding. These
each picture. a program.

5.3 PTS/DTS
When bidirectional coding is
used, a picture may have to be
decoded some time before it is
presented, so that it can act as
the source of data for a B picture.
Although, for example, pictures
can be presented in the order
IBBP, they will be transmitted in
the order IPBB. Consequently,
two types of time stamp exist.
The decode time stamp (DTS)
indicates the time when a picture
must be decoded, whereas a
presentation time stamp (PTS)
indicates when it must be pre-
sented to the decoder output.
B pictures are decoded and pre-
sented simultaneously so that
they only contain PTS. When an
IPBB sequence is received, both
I and P must be decoded before
the first B picture. A decoder
can only decode one picture at a
Figure 5.1. time; therefore the I picture is
decoded first and stored. While
the P picture is being decoded,
the decoded I picture is output
so that it can be followed by the
B pictures.

28
Figure 5.2 shows that when and P, the difference between PTS and DTS time stamp.
an access unit containing an I DTS and PTS in the P pictures Audio packets may contain sev-
picture is received, it will have will be greater. eral access units and the packet
both DTS and PTS in the header The PTS/DTS flags in the packet header contains a PTS. Because
and these time stamps will be header are set to indicate the audio packets are never trans-
separated by one picture period. presence of PTS alone or both mitted out of sequence, there is
If bidirectional coding is being no DTS in an audio packet.
used, a P picture must follow
and this picture also has a DTS
and a PTS time stamp, but the I P1 B1 B2 P2 B3
separation between the two PTS N+1 N+4 N+2 N+3 N+7 N+5
stamp times is three picture
DTS N N+1 — — N+4 —
periods to allow for the inter-
vening B pictures. Thus, if IPBB Time (in
is received, I is delayed one pic- Pictures) N N+1 N+2 N+3 N+4 N+5
ture period, P is delayed three Decode I Decode P1 Decode B1 Decode B2 Decode P2 Decode B3
picture periods, the two Bs are
not delayed at all, and the
presentation sequence becomes N N+1 N+2 N+3 N+4 N+5
IBBP. Clearly, if the GOP struc- Present I Present B1 Present B2 Present P2 Present B3
ture is changed such that there I B B P B
are more B pictures between I
Figure 5.2.

SECTION 6 will tend to empty the buffer, 6.2 Introduction to program streams
PROGRAM STREAMS and the drive system will simply A Program Stream is a PES
Program Streams are one way of increase the access rate to restore packet multiplex that carries
combining several PES packet balance. This technique only several elementary streams that
streams and are advantageous works if the audio and video were encoded using the same
for recording applications such were encoded from the same master clock or system time
as DVD. clock; otherwise, they will slip clock (STC). This stream might
over the length of the recording. be a video stream and its associ-
6.1 Recording vs. transmission To satisfy these conflicting ated audio streams, or a multi-
For a given picture quality, the requirements, Program Streams channel audio-only program.
data rate of compressed video and Transport Streams have The elementary video stream is
will vary with picture content. been devised as alternatives. A divided into Access Units (AUs),
A variable bit-rate channel will Program Stream works well on a each of which contains com-
give the best results. In trans- single program with variable bit pressed data describing one
mission, most practical channels rate in a recording environment; picture. These pictures are iden-
are fixed and the overall bit-rate a Transport Stream works well tified as I, P, or B and each carries
is kept constant by the use of on multiple programs in a an Access Unit number that
stuffing (meaningless data). fixed bit-rate transmission indicates the correct display
environment. sequence. One video Access
In a DVD, the use of stuffing is a Unit becomes one program-
waste of storage capacity. The problem of genlocking to
stream packet. In video, these
However, a storage medium can the source does not occur in a
packets vary in size. For example,
be slowed down or speeded up, DVD player. The player deter-
an I picture packet will be much
either physically or, in the case mines the time base of the video
larger than a B picture packet.
of a disk drive, by changing the with a local SPG (internal or
Digital audio Access Units are
rate of data transfer requests. external) and simply obtains
generally of the same size and
This approach allows a variable- data from the disk in order to
several are assembled into one
rate channel to be obtained supply pictures on that time
program stream packet. These
without capacity penalty. When base. In transmission, the
packets should not be confused
a medium is replayed, the speed decoder has to recreate the time
with transport stream packets
can be adjusted to keep a data base at the encoder or it will
that are smaller and of fixed
buffer approximately half full, suffer overflow or underflow.
size. Video and audio Access
irrespective of the actual bit rate Thus, a Transport Stream uses
Unit boundaries rarely coincide
which can change dynamically. Program Clock Reference (PCR),
on the time axis, but this lack of
If the decoder reads from the whereas a Program Stream has
coincidence is not a problem
buffer at an increased rate, it no need for the Program Clock.
because each boundary has its
own time stamp structure.

29
SECTION 7 conditional access (CA) informa- so that the time stamps for the
TRANSPORT STREAMS tion to administer this protection. elementary streams in each
A transport stream is more than The transport stream contains program stream become useful.
a multiplex of many PES packets. program specific information Consequently, one definition of
In Program Streams, time stamps (PSI) to handle these tasks. a program is a set of elementary
are sufficient to recreate the time The transport layer converts the streams sharing the same
axis because the audio and video PES data into small packets of timing reference.
are locked to a common clock. constant size that are self con- In a Single Program Transport
For transmission down a data tained. When these packets arrive Stream (SPTS), there will be one
network over distance, there is at the decoder, there may be jitter PCR channel that recreates one
an additional requirement to in the timing. The use of time program clock for both audio and
recreate the clock for each pro- division multiplexing also causes video. The SPTS is often used
gram at the decoder. This requires delay, but this factor is not fixed as the communication between
an additional layer of syntax to because the proportion of the bit an audio/video coder and
provide program clock reference stream allocated to each program a multiplexer.
(PCR) signals. is not fixed. Time stamps are part
of the solution, but they only 7.2 Packets
7.1 The job of a transport stream work if a stable clock is available. Figure 7.1 shows the structure
The transport stream carries The transport stream must con- of a transport stream packet.
many different programs and tain further data allowing the The size is a constant 188 bytes
each may use a different com- recreation of a stable clock. and it is always divided into a
pression factor and a bit rate that The operation of digital video header and a payload. Figure 7.1.a
can change dynamically even production equipment is heavily shows the minimum header of 4
though the overall bit rate stays dependent on the distribution of bytes. In this header, the most
constant. This behavior is called a stable system clock for synchro- important information is:
statistical multiplexing and it nization. In video production, The sync byte. This byte is rec-
allows a program that is handling genlocking is used, but over long ognized by the decoder so that
difficult material to borrow distances, the distribution of a the header and the payload can
bandwidth from a program han- separate clock is not practical. be deserialized.
dling easy material. Each video In a transport stream, the different
PES can have a different number The transport error indicator.
programs may have originated
of audio and data PESs associated This indicator is set if the error
in different places that are not
with it. Despite this flexibility, a correction layer above the trans-
necessarily synchronized. As a
decoder must be able to change port layer is experiencing a raw-
result, the transport stream has
from one program to the next bit error rate (BER) that is too
to provide a separate means of
and correctly select the appro- high to be correctable. It indicates
synchronizing for each program.
priate audio and data channels. that the packet may contain
This additional synchronization errors. See Section 8 for details
Some of the programs can be
method is called a Program of the error correction layer.
protected so that they can only
Clock Reference (PCR) and it
be viewed by those who have The Packet Identification (PID).
recreates a stable reference clock
paid a subscription or fee. The This thirteen-bit code is used to
that can be divided down to
transport stream must contain distinguish between different
create a time line at the decoder,
types of packets. More will be
said about PID later.
188 Bytes
The continuity counter. This
Header Payload four-bit value is incremented by
the encoder as each new packet
Minimum 4-Byte Header having the same PID is sent. It is
Sync Transport Start Transport PID Scrambling Adaptation Continuity Adaption Payload used to determine if any packets
Byte Error Indicator Priority Control Field Counter Field are lost, repeated, or out of
Indicator Control sequence.
8 1 1 1 13 2 2 4
a) In some cases, more header
information is needed, and if
this is the case, the adaptation
Adaptation Discontinuity Random Elem Stream 5 Flags Optional Stuffing field control bits are set to indi-
Field Indicator Access Priority Fields Bytes
Length Indicator Indicator cate that the header is larger
8 1 1 1 5 than normal. Figure 7.1b shows
b) that when this happens the
extra header length is described
by the adaptation field length
PCR OPCR Splice Transport Adaption
Countdown Private Field code. Where the header is
Data Extension extended, the payload becomes
48 48 8
smaller to maintain constant
packet length.
Figure 7.1.
30
7.3 Program Clock Reference (PCR) PCR counter. The local PCR is of all of the different streams it
The encoder used for a particular compared with the PCR from contains can vary. This require-
program will have a 27 MHz the packet header, and the dif- ment is handled by the use of
program clock. In the case of an ference is the PCR phase error. null packets that contain all
SDI (Serial Digital Interface) This error is filtered to control zeros in the payload. If the real
input, the bit clock can be the VCO that eventually will payload rate falls, more null
divided by 10 to produce the bring the local PCR count into packets are inserted. Null packets
encoder program clock. Where step with the header PCRs. always have the same PID,
several programs originate in Heavy VCO filtering ensures which is 8191 or thirteen 1's.
the same production facility, it that jitter in PCR transmission In a given transport stream, all
is possible that they will all have does not modulate the clock. packets belonging to a given
the same clock. In case of an The discontinuity indicator will elementary stream will have the
analog video input, the H sync reset the local PCR count and, same PID. Packets in another
period will need to be multiplied optionally, may be used to reduce elementary stream will have
by a constant in a phase locked the filtering to help the system another PID. The demultiplexer
loop to produce 27 MHz. quickly lock to the new timing. can easily select all data for a
The adaptation field in the MPEG requires that PCRs are sent given elementary stream simply
packet header is periodically at a rate of at least 10 PCRs per by accepting only packets with
used to include the PCR (program second, whereas DVB specifies a the right PID. Data for an entire
clock reference) code that allows minimum of 25 PCRs per second. program can be selected using
generation of a locked clock at the PIDs for video, audio, and
the decoder. If the encoder or a 7.4 Packet Identification (PID) teletext data. The demultiplexer
remultiplexer has to switch A 13-bit field in the transport can correctly select packets only
sources, the PCR may have a packet header contains the if it can correctly associate them
discontinuity. The continuity Packet Identification Code (PID). with the transport stream to
count can also be disturbed. The PID is used by the demulti- which they belong. The demul-
This event is handled by the plexer to distinguish between tiplexer can do this task only if
discontinuity indicator, which packets containing different types it knows what the right PIDs are.
tells the decoder to expect a dis- of information. The transport- This is the function of the Pro-
turbance. Otherwise, a disconti- stream bit rate must be constant, gram Specific Information (PSI).
nuity is an error condition. even though the sum of the rates
Figure 7.2 shows how the PCR
is used by the decoder to recreate
a remote version of the 27-MHz 27 MHz
clock for each program. The Clock
encoder clocks drive a constantly
running binary counter, and the Transport
Video
value of these counters are sam- Video In Encoder Stream
Elementary Stream Formation
pled periodically and placed in
the header adaptation fields as PCR = X PCR = X Plus
the PCR. (The PCR, like the PTS, 188 Byte Packets n bits the Time of
Exactly n bits
is a 33-bit number that is a sam-
ple of a counter driven by a
90-kHz clock). Each encoder
produces packets having a dif- Transport PCR Low
Stream Compare Pass
ferent PID. The decoder recog- Decoder Filter
nizes the packets with the correct
PID for the selected program and Local
PCR
ignores others. At the decoder, a
VCO generates a nominal 27-MHz 27 MHz 27 MHz
Clock Xtal
clock and this drives a local Load VCO

Receiver 27 MHz Clock

Figure 7.2.

31
7.5 Program Specific (PAT) packets (PID = 0) that streams that may be available to
Information (PSI) specify the PIDs of all Program the same decoder, for example,
PSI is carried in packets having Map Table (PMT) packets. The by tuning to a different RF chan-
unique PIDs, some of which are first entry in the PAT, program 0, nel or steering a dish to a differ-
standardized and some of which is always reserved for network ent satellite. The NIT may list a
are specified by the Program data and contains the PID of number of other transport
Association Table (PAT) and the network information table streams and each one may have
Conditional Access Table (CAT). (NIT) packets. a descriptor that specifies the
These packets must be included The PIDs for Entitlement radio frequency, orbital position,
periodically in every transport Control Messages (ECM) and and so on. In MPEG, only the
stream. The PAT always has a Entitlement Management NIT is mandatory for this
PID of 0, and the CAT always Messages (EMM) are listed in purpose. In DVB, additional
has a PID of 1. These values and the Conditional Access Table metadata, known as DVB-SI, is
the null packet PID of 8191 are (CAT) packets (PID = 1). included, and the NIT is consid-
the only fixed PIDs in the whole ered to be part of DVB-SI. This
As Figure 7.3 shows, the PIDs of operation is discussed in
MPEG system. The demultiplexer
the video, audio, and data ele- Section 8. When discussing the
must determine all of the
mentary streams that belong in subject in general, the term
remaining PIDs by accessing the
the same program stream are PSI/SI is used.
appropriate table. However, in
listed in the Program Map Table
ATSC and DVB, PMTs may Upon first receiving a transport
(PMT) packets. Each PMT packet
require specific PIDs. In this stream, the demultiplexer must
has its own PID.
respect (and in some others), look for PIDs 0 and 1 in the
MPEG and DVB/ATSC are not A given network information packet headers. All PID 0 packets
fully interchangeable. table contains details of more contain the Program Association
than just the transport stream Table (PAT). All PID 1 packets
The program streams that exist
carrying it. Also included are contain Conditional Access
in the transport stream are listed
details of other transport Table (CAT) data.
in the program association table
By reading the PAT, the demux
can find the PIDs of the
Program Association Table (PID 0) Network Information Table
Network Information Table
Program 0 16 Private (NIT) and of each Program Map
Program 1 22 Network Table (PMT). By finding the
Data
Program 3 33 PMTs, the demux can find the
--- --- PIDs of each elementary stream.
Program k 55 Consequently, if the decoding of
--- ---
a particular program is required,
reference to the PAT and then
Conditional Access the PMT is all that is needed to
Table (PID 1)
Stream 1 Video 54 Stream 1 Video 19 find the PIDs of all of the ele-
Stream 2 Audio 48 Stream 2 Audio 81 Conditional mentary streams in the program.
Stream 3 Audio 49 Stream 3 Audio 82 Access Data If the program is encrypted,
--- --- --- --- --- --- then access to the CAT will also
Stream k Data 66 Stream k Data 88 be necessary. As demultiplexing
Program Map is impossible without a PAT, the
--- --- --- Tables --- --- ---
lockup speed is a function of
how often the PAT packets are
Transport Stream
sent. MPEG specifies a maximum
Prog 1 Prog 3 Prog 1 Prog 3 Prog 3 Prog 3 Prog 1 Prog 3 Prog 3 of 0.5 seconds between the PAT
PAT PMT PMT EMM Audio 2 Audio 2 Video 1 Video 1 Video 1 Audio 1 Video 1
packets and the PMT packets
0 22 33 1 49 82 19 19 54 81 19
that are referred to in those PAT
packets. In DVB and ATSC, the
Figure 7.3.
NIT may reside in packets that
have a specific PID.

32
SECTION 8 Operators, who will use an addi- The addition of error correction
INTRODUCTION TO DVB/ATSC tional layer of error correction as obviously increases the bit rate
MPEG compression is already needed. This layer should be as far as the transmitter or cable
being used in broadcasting and transparent to the destination. is concerned. Unfortunately,
will become increasingly impor- A particular transmitter or cable reliable, economical radio and
tant in the future. A discussion operator may not want all of the cable-transmission of data
of the additional requirements programs in a transport stream. requires more than serializing
of broadcasting data follows. Several transport streams may the data. Practical systems
be received and a selection of require channel coding.
8.1 An overall view channels may be made and
8.2 Remultiplexing
ATSC stands for the Advanced encoded into a single output
Television Systems Committee, transport stream using a remul- Remultiplexing is a complex
which is a U.S. organization tiplexer. The configuration may task because it has to output a
that defines standards for terres- change dynamically. compliant bit stream that is
trial digital broadcasting and assembled from parts of others.
Broadcasting in the digital
cable distribution. DVB refers The required data from a given
domain consists of conveying
to the Digital Video Broadcasting input transport stream can be
the entire transport stream to
Project and to the standards and selected with reference to the
the viewer. Whether the channel
practices established by the program association table and
is cable, satellite, or terrestrial,
DVB Project. This project was the program map tables that will
the problems are much the
originally a European project, but disclose the PIDs of the programs
same. Metadata describing the
produces standards and guides required. It is possible that the
transmission must be encoded
accepted in many areas of the same PIDs have been used in
into the transport stream in a
world. These standards and two input transport streams;
standardized way. In DVB, this
guides encompass all transmis- therefore, the PIDs of one or
metadata is called Service
sion media, including satellite, more elementary streams may
Information (DVB-SI) and
cable, and terrestrial broadcasting. have to be changed. The packet
includes details of programs car-
headers must pass on the pro-
Digital broadcasting has different ried on other multiplexes and
gram clock reference (PCR) that
distribution and transmission services such as teletext.
will allow the final decoder to
requirements, as is shown in In broadcasting, there is much recreate a 27 MHz clock. As the
Figure 8.1. Broadcasters will less control of the signal quality position of packets containing
produce transport streams that and noise or interference is a PCR may be different in the new
contain several television pro- possibility. This requires some multiplex, the remultiplexer
grams. Transport streams have form of forward error correction may need to edit the PCR values
no protection against errors, and (FEC) layer. Unlike the FEC to reflect their new position on
in compressed data, the effect of used by the Telecommunications the time axis.
errors is serious. Transport Network Operators, which can
streams need to be delivered The program map tables and
be proprietary, (or standardized
error-free to transmitters, satellite program association tables will
as per ETSI, which defines DVB
uplinks, and cable head ends. In need to be edited to reflect the
transmission over SDH and PDH
this context, error free means a new transport stream structure,
networks), the FEC used in
BER of 1 x 10-11. This task is as will the Conditional Access
broadcasting must be standard-
normally entrusted to Tables (CAT).
ized so that receivers will be
Telecommunications Network able to handle it.

Contributor A Telecom Network Operator Presentation


Suite
Prog 1
TS Add Send Error TS
Prog 2
FEC Correct
Prog 3 Telecom Network Operator
Mux
TS Add Send Error
DVB-SI
FEC Correct
Contributor B
Prog 1 Remux
TS Add Send Error TS
Prog 2 (A1, A3, B2, B3)
FEC Correct
Prog 3
Mux

Inner Interleave Outer Energy


Modulate Dispersal
FEC FEC

TV Transmitter

Figure 8.1.
33
If the sum of the selected pro- contains Electronic Program the scope of this book. Briefly,
gram stream bit rates, is less Guide (EPG) information, such R-S codes add redundancy to
than the output bit rate, the as the nature of a program, the the data to make a code-word
remultiplexer will create stuffing timing and the channel on which such that when each symbol is
packets with suitable PIDs. it can be located, and the coun- used as a term in a minimum of
However, if the transport streams tries in which it is available. two simultaneous equations, the
have come from statistical mul- Programs can also be rated so sum (or syndrome) is always
tiplexers, it is possible that the that parental judgment can zero if there is no error. This
instantaneous bit rate of the new be exercised. zero condition is obtained
transport stream will exceed the DVB-SI can include the following irrespective of the data and
channel capacity. This condition options over and above makes checking easy. In
might occur if several selected MPEG-PSI: Transport streams, the packets
programs in different transport are always 188 bytes long. The
streams simultaneously contain Service Description Table (SDT). addition of 16 bytes of R-S
high entropy. In this case, the Each service in a DVB transport redundancy produces a standard
only solution is to recompress stream can have a service FEC code-word of 204 bytes.
and create new, shorter coeffi- descriptor and these descriptors
are assembled into the Service In the event that the syndrome
cients in one or more bit streams is non-zero, solving the simulta-
to reduce the bit rate. Description Table. A service may
be television, radio, or teletext. neous equations will result in
The service descriptor includes two values needed for error
8.3 Service Information (SI)
the name of the service provider. correction: the location of the
In the future, digital delivery error and the nature of the error.
will mean that there will be a Event Information Table (EIT). However, if the size of the error
large number of programs, tele- EIT is an optional table for DVB exceeds half the amount of
text, and services available to which contains program names, redundancy added, the error
the viewer and these may be start times, durations, and so on. cannot be corrected.
spread across a number of dif- Bouquet Association Table Unfortunately, in typical trans-
ferent transport streams. Both (BAT). The BAT is an optional mission channels, the signal
the viewer and the Integrated table for DVB that provides quality is statistical. This means
Receiver Decoder (IRD) will details of bouquets, which are that while single bits may be in
need help to display what is collections of services marketed error due to noise, on occasion a
available and to output the as a single product. large number of bits, known as a
selected service. This capability burst, can be corrupted together.
requires metadata beyond the Time and Date Table (TDT). The
This corruption might be due to
capabilities of MPEG-PSI TDT is an option that embeds a
lightning or interference from
(Program Specific Information) UTC time and date stamp in the
electrical equipment.
and is referred to as DVB-SI transport stream.
It is not economic to protect
(Service Information). DVB-SI is
8.4 Error correction every code word against such
considered to include the NIT,
Error correction is necessary bursts because they do not occur
which must be present in all
because conditions on long often enough. The solution is to
MPEG transport streams.
transmission paths cannot be use a technique known as inter-
DVB-SI is embedded in the leaving. Figure 8.2 shows that,
controlled. In some systems,
transport stream as additional when interleaving is used, the
error detection is sufficient
transport packets with unique source data are FEC coded, but
because it can be used to request
PIDs and carries technical infor- prior to transmission, they are
a retransmission. Clearly, this
mation for IRDs. DVB-SI also fed into a RAM buffer. Figure 8.3
approach will not work with
real-time signals such as televi- shows one possible technique in
sion. Instead, a system called which data enters the RAM in
Non-Sequential
Data In Data Out Forward Error Correction (FEC) rows and are then read out in
RAM columns. The reordered data are
is used in which sufficient extra
bits, known as redundancy, are now transmitted. On reception,
added to the data to allow the the data are put back to their
Address decoder to perform corrections original order, or deinterleaved,
Reorder in real time. by using a second RAM. The
result of the interleaving process
The FEC used in modern systems is that a burst of errors in the
Figure 8.2. is usually based on the Reed- channel after deinterleaving
Solomon (R-S) codes. A full becomes a large number of
discussion of these is outside single-symbol errors, which are
more readily correctable.

34
Vertical Vertical
Codewords Codeword

Horizontal Sheared
Codewords Data Array

= Symbol = Symbol

Figure 8.3. Figure 8.4.

When a burst error reaches the third the bandwidth of binary. the bit pattern. In Quadrature
maximum correctable size, the As an analog transmitter Phase Shift Keying (QPSK),
system is vulnerable to random requires about 8-bit resolution, there are four possible phases.
bit errors that make code-words transmitting three-bit resolution The phases are 90 degrees apart,
uncorrectable. The use of an can be achieved with a much therefore each symbol carries
inner code applied after inter- lower SNR and with lower two bits.
leave and corrected before dein- transmitter power. In the ATSC For cable operation, multi-level
terleave can prevent random system, it is proposed to use a signaling and phase shift keying
errors from entering the system called 8-VSB. Eight-level can be combined in Quadrature
deinterleave memory. modulation is used and the Amplitude Modulation (QAM),
As Figure 8.3 shows, when this transmitter output has a vestigial which is a discrete version of
approach is used with a block lower sideband like that used in the modulation method used in
interleave structure, the result is analog television transmissions. composite video subcarrier.
a product code. Figure 8.4 In satellite broadcasting, there is Figure 8.5 shows that if two
shows that interleave can also less restriction on bandwidth, quadrature carriers, known as I
be convolutional, in which the but the received signal has a and Q, are individually modu-
data array is sheared by applying poor SNR, which precludes lated into eight levels each, the
a different delay to each row. using multi-level signaling. In result is a carrier which can
Convolutional, or cross inter- this case, it is possible to transmit have 64 possible combinations
leave, has the advantage that data by modulating the phase of of amplitude and phase. Each
less memory is needed to inter- a signal. In Phase Shift Keying symbol carries six bits, therefore
leave and deinterleave. (PSK), the carrier is transmitted the bandwidth required is one
in different phases according to third that of QPSK.
8.5 Channel coding
Raw-serial binary data is unsuit-
able for transmission for several I
X
reasons. Runs of identical bits
cause DC offsets and lack a bit Carrier
clock. There is no control of the In 3 8 Levels
spectrum and the bandwidth Bits
required is too great. In practical DAC 64 - Quam
radio and cable systems, a mod- + Out
6 Bits/Symbol
ulation scheme called a channel
code is necessary. 90˚
Shift 4 Quadrant
In terrestrial broadcasting, Modulator
channel bandwidth is at a
Q
premium and channel codes X
which minimize bandwidth are
required. In multi-level signaling,
3 8 Levels
the transmitter power is switched Bits
between, for example, eight dif-
ferent levels. This approach DAC
results in each symbol represents
Figure 8.5.
three data bits, requiring one

35
Randomized
Data Transmit
Data
In Data
Out

Pseudo-Random
Sequence Pseudo-Random
Generator Sequence
Generator

Figure 8.6.

In the schemes described above, be subtracted at the receiver as principle of COFDM (Coded
the transmitted signal spectrum shown. Randomizing cannot be Orthogonal Frequency Division
is signal dependent. Some parts applied to sync patterns, or they Multiplexing), which is proposed
of the spectrum may contain could not be detected. for use with DVB terrestrial
high energy and cause interfer- In the above systems, a baseband broadcasts (DVB-T).
ence to other services, whereas signal is supplied to a modulator
other parts of the spectrum may 8.6 Inner coding
that operates on a single carrier
contain little energy and be sus- to produce the transmitted side- The inner code of a FEC system
ceptible to interference. In prac- band(s). In digital systems with is designed to prevent random
tice, randomizing is necessary to fixed bit rates, the frequency of errors from reducing the power
decorrelate the transmitted the sidebands is known. An of the interleave scheme. A
spectrum from the data content. alternative to a wideband system suitable inner code can prevent
Figure 8.6 shows that when is one that produces many such errors by giving an apparent
randomizing or energy dispersal narrowband carriers at carefully increase to the SNR of the trans-
is used, a pseudo-random regulated spacing. Figure 8.7a mission. In trellis coding, which
sequence is added to the serial shows that a digitally modulated can be used with multi-level
data before it is input to the carrier has a spectral null at signaling, several multi-level
modulator. The result is that the each side. Another identical symbols are associated into a
transmitted spectrum is noise like carrier can be placed here with- group. The waveform that results
that found in relatively stationary out interference because the from a particular group of sym-
statistics. Clearly, an identical two are mutually orthogonal as bols is called a trellis. If each
and synchronous sequence must Figure 8.7b shows. This is the symbol can have eight levels,
then in three symbols there can
be 512 possible trellises.
Spectral In trellis coding, the data are
Nulls coded such that only certain
trellis waveforms represent
valid data. If only 64 of the
a)
trellises represent error-free
data, then two data bits per
symbol can be sent instead of
Mutually three. The remaining bit is a
Orthogonal
Carriers form of redundancy because
trellises other than the correct
64 must be due to errors. If a
trellis is received in which the
b) level of one of the symbols is
ambiguous due to noise, the
ambiguity can be resolved
Figure 8.7. because the correct level must
be the one which gives a valid
trellis. This technique is known
as maximum-likelihood decoding.

36
The 64 valid trellises should be
made as different as possible to
make the system continue to 1
work with a poorer signal to
noise ratio. If the trellis coder
makes an error, the outer code
will correct it.
In DVB, Viterbi convolutional
Data input 1 2 3 4 5 6 7 Data output
coding may be used. Figure 8.8
shows that following interleave,
the data are fed to a shift register.
The contents of the shift register
produce two outputs that repre-
sent different parity checks on
the input data so that bit errors 2
can be corrected. Clearly, there
will be two output bits for every
input bit; therefore the coder Figure 8.8.
shown is described as a 1/2 rate
coder. Any rate between 1/1 and coder that adds redundancy to modulator output is then upcon-
1/2 would still allow the original the data. A convolutional inter- verted to produce the RF output.
data to be transmitted, but the leave process then reorders the At the receiver, the bit clock is
amount of redundancy would data so that adjacent data in the extracted and used to control
vary. Failing to transmit the transport stream are not adjacent the timing of the whole system.
entire 1/2 output is called punc- in the transmission. An inner The channel coding is reversed
turing and it allows any required trellis coder is then used to to obtain the raw data plus the
balance to be obtained between produce a multi-level signal for transmission errors. The inner
bit rate and correcting power. the VSB modulator. code corrects random errors and
Figure 8.10 shows a DVB-T may identify larger errors to
8.7 Transmitting digits transmitter. Service information help the outer coder after
Figure 8.9 shows the elements is added as before, followed by deinterleaving. The randomizing
of an ATSC digital transmitter. the randomizing stage for energy is removed and the result is the
Service information describing dispersal. Outer R-S check sym- original transport stream. The
the transmission is added to the bols are added prior to inter- receiver must identify the
transport stream. This stream is leaving. After the interleaver, Program Association Table (PAT)
then randomized prior to routing the inner coding process takes and the Service Information (SI)
to an outer R-S error correction place, and the coded data is fed and Program Map Tables (PMT)
to a COFDM modulator. The that the PAT points to so that
the viewer can be told what is
available in the transport stream

TS
Energy Outer Convolutional Trellis VSB RF
Dispersal FEC Interleave Coder Modulator Stage
SI

Figure 8.9.

TS
Energy Outer Convolutional Viterbi COFDM RF
Dispersal FEC Interleave Encoder Modulator Stage
DVB-SI

Figure 8.10.

37
SECTION 9 One of the characteristics of user can observe any required
MPEG TESTING MPEG that distances it most details. Many general types of
The ability to analyze existing from traditional broadcast video analysis can take place in real
transport streams for compliance equipment is the existence of time on a live transport stream.
is essential, but this ability must multiple information layers, in These include displays of the
be complemented by an ability which each layer is hoped to be hierarchy of programs in the
to create known-compliant transparent to the one below. It transport stream and of the
transport streams. is very important to be able to proportion of the stream bit rate
establish in which layer any fault allocated to each stream.
9.1 Testing requirements resides to avoid a fruitless search. More detailed analysis is only
Although the technology of For example, if the picture mon- possible if part of a transport
MPEG differs dramatically from itor on an MPEG decoder is stream is recorded so that it can
the technology that preceded it, showing visible defects, these be picked apart later. This tech-
the testing requirements are defects could be due to a number nique is known as deferred time
basically the same. On an opera- of possibilities. Perhaps the testing and could be used, for
tional basis, the user wants to encoder is not allowing a high example, to examine the contents
have a simple, regular confidence enough bit rate or perhaps the of a time stamp.
check that ensures all is well. In encoder is faulty, causing the When used for deferred time
the event of a failure, the location transport stream to deliver the testing, the MPEG transport-
of the fault needs to be estab- faulty information. On the other stream analyzer is acting like a
lished rapidly. For the purpose hand, the encoder might operate logic analyzer that provides
of equipment design, the nature correctly, but the transport layer data-interpretation tools specific
of problems need to be explored corrupts the data. In DVB, there to MPEG. Like all logic analyzers,
in some detail. As with all signal are even more layers such as a real-time triggering mechanism
testing, the approach is to com- energy dispersal, error correction, is required to determine the
bine the generation of known and interleaving. Such complex- time or conditions under which
valid signals for insertion into a ity requires a structured approach capture will take place. Figure 9.1
system with the ability to mea- to fault finding, using the right shows that an analyzer contains
sure signals at various points. tools. The discussion of protocol a real-time section, a storage
analysis of the compressed data section, and a deferred section.
in this primer may help the user In real-time analysis, only that
In derive such an approach. section operates, and a signal
Real Time
Analysis Reading the discussion of another source needs to be connected.
+ important aspect of testing for
Triggering For capture, the real-time section
compressed television, picture- is used to determine when to
quality assessment, may also trigger the capture. The analyzer
be helpful. This later discussion includes tools known as filters
is found in the Tektronix that allow selective analysis to
publication, A Guide to Video be applied before or after capture.
Storage Measurements for Compressed
System Television Systems. Once the capture is completed,
the deferred section can operate
9.2 Analyzing a Transport Stream on the captured data and the
input signal is no longer neces-
An MPEG transport stream has
sary. There is a good parallel in
an extremely complex structure,
the storage oscilloscope which
but an analyzer such as the MTS
Deferred can display the real-time input
100 can break down the structure
Time directly or save it for later study.
Analysis in a logical fashion such that the

Figure 9.1.

38
9.3 Hierarchic view
When analyzing an unfamiliar
transport stream, the hierarchic
view is an excellent starting
point because it enables a graphic
view of every component in the
stream. Figure 9.2 shows an
example of a hierarchic display
such as that provided by the
Tektronix MTS 200. Beginning
at top left of the entire transport
stream, the stream splits and an
icon is presented for every
stream component. Figure 9.3
shows the different icons that
Figure 9.2.
the hierarchical view uses and
their meaning. The user can Icon Element Type
very easily see how many pro- Multiplex transport packets. This icon represents all (188- and 204-byte transport packets
gram streams are present and that make up the stream. If you visualize the transport stream as a train, this icon represents
the video and audio content of every car in the train, regardless of its configuration (flat car, boxcar, or hopper for example)
each. Each icon represents the and what it contains.
top layer of a number of lower Transport packets of a particular PID (Program ID). Other elements (tables, clocks, PES
analysis and information layers. packets) are the “payload” contained within transport packets or are constructed from the
payload of several transport packets that have the same PID. The PID number appears
The analyzer creates the hierar- under the icon.
chic view by using the PAT and In the hierarchic view, the icon to the right of this icon represents the payload of packets
PMT in the Program Specific with this PID.
Information (PSI) data in the Transport Packets that contain independent PCR clocks. The PID appears under the icon.
transport stream. The PIDs from
these tables are displayed
beneath each icon. PAT and PMT
data are fundamental to the PAT (Program Association Table) sections. Always contained in PID 0 transport packets.
operation of any demultiplexer
or decoder; if the analyzer cannot
display a hierarchic view or dis-
PMT (Program Map Table) sections
plays a view which is obviously
wrong, the transport stream
under test has a PAT/PMT error.
It is unlikely that equipment NIT (Network Information Table) Provides access to SI Tables through the PSI/SI command
further up the line will be able from the Selection menu. Also used for Private sections.
to interpret the stream at all. When the DVB option (in the Options menu) is selected, this icon can also represent SDT,
BAT, EIT, and TDT sections.
Packetized Elementary Stream (PES). This icon represents all packets that, together, contain a
given elementary stream. Individual PES packets are assembled from the payloads of several
transport packets.
Video elementary stream

Audio elementary stream

Data elementary stream

ECM (Entitlement Control Message) sections

EMM (Entitlement Management Message) sections

Figure 9.3.

39
The ability of a demux or Figure 9.6 shows an example of 9.4 Interpreted view
decoder to lock to a transport a MUX-allocation pie-chart dis- As an alternative to checking for
stream depends on the frequency play. The hierarchical view and specific data in unspecified
with which the PSI data are sent. the MUX Allocation Chart show places, it is possible to analyze
The PSI/SI rate option shown in the number of elements in the unspecified data in specific
Figure 9.4 displays the frequency transport stream and the propor- places, including in the individ-
of insertion of system informa- tion of bandwidth allocated, but ual transport stream packets, the
tion. PSI/SI information should they do not show how the dif- tables, or the PES packets. This
also be consistent with the actual ferent elements are distributed analysis is known as the inter-
content in the bit stream. For in time in the multiplex. The preted view because the analyzer
example, if a given PID is refer- PID map function displays, in automatically parses and decodes
enced in a PMT, it should be order of reception, a list of the the data and then displays its
possible to find PIDs of this value PID of every packet in a section meaning. Figure 9.7 shows an
in the bit stream. The consistency of the transport stream. Each example of an interpreted view.
check function makes such a PID type is color coded differ- As the selected item is changed,
comparison. Figure 9.5 shows a ently, so it is possible to assess the elapsed time and byte count
consistency-error readout. the uniformity of the multi- relative to the start of the stream
A MUX allocation chart may plexing. Bursts of SI data in a can be displayed.
graphically display the propor- Transport Stream, seen as
tions of the transport stream contiguous packets, may cause
allocated to each PID or program. buffer overflows in the decoder.

Figure 9.5.

Figure 9.4.

Figure 9.6. Figure 9.7.

40
The interpreted view can also to stream-bit errors, but consis- Alternatively, the filtering can
be expanded to give a full expla- tent CRC errors point to a be used to search for unknown
nation of the meaning of any hardware fault. PIDs based on some condition.
field in the interpreted item. For example, if the audio-com-
Figure 9.8 shows an example of 9.6 Filtering pression levels being used are
a field explanation. Because A transport stream contains a not known, audio-sequence
MPEG has so many different great amount of data, and in real headers containing, say, Layer 2
parameters, field explanations fault conditions, it is probable coding can be searched for.
can help recall their meanings. that, unless a serious problem It may be useful to filter so that
exists, much of the data is valid only certain elementary-stream
9.5 Syntax and CRC analysis and that perhaps only one ele- access units are selected. At a
To ship program material, the mentary stream or one program very low level, it may be useful
transport stream relies completely is affected. In this case, it is to filter on a specific four-byte
on the accurate use of syntax by more effective to test selectively, bit pattern. In the case of false
encoders. Without correct settings which is the function of filtering. detection of sync patterns, this
of fixed flag bits, sync patterns, Essentially, filtering allows the filter would allow the source of
packet-start codes, and packet user of an analyzer to be more the false syncs to be located.
counts, a decoder may misinter- selective when examining a In practice, it is useful to be
pret the bit stream. The syntax transport stream. Instead of able to concatenate filters. In
check function considers all bits accepting every bit, the user can other words, it is possible to filter
that are not program material and analyze only those parts of the not just on packets having a PID
displays any discrepancies. data that meet certain conditions. of a certain value, but perhaps
Spurious discrepancies could be
One condition results from only packets of that PID value
due to transmission errors; con-
filtering packet headers so that which also contain a PES header.
sistent discrepancies point to a
only packets with a given PID By combining filters in this way,
faulty encoder or multiplexer.
are analyzed. This approach it is possible to be very specific
Figure 9.9 shows a syntax-
makes it very easy to check the about the data to be displayed.
error display.
PAT by selecting PID 0, and, As fault finding proceeds, this
Many MPEG tables have from there, all other PIDs can be progressive filtering allows opti-
checksums or CRCs attached for read out. If the PIDs of a suspect mization of the search process.
error detection. The analyzer stream are known, perhaps from
can recalculate the checksums viewing a hierarchical display,
and compare them with the it is easy to select a single PID
actual checksum. Again, spurious for analysis.
CRC mismatches could be due

Figure 9.9.

Figure 9.8.

41
9.7 Timing Analysis In MPEG, the reordering and
The tests described above check use of different picture types
for the presence of the correct causes delay and requires
elements and syntax in the buffering at both encoder and
transport stream. However, to decoder. A given elementary
display real-time audio and stream must be encoded within
video correctly, the transport the constraints of the availability
stream must also deliver accurate of buffering at the decoder.
timing to the decoders. This task MPEG defines a model decoder
can be confirmed by analyzing called the T-STD (Transport
the PCR and time-stamp data. Stream Target Decoder); an
encoder or multiplexer must not
The correct transfer of program
distort the data flow beyond the
clock data is vital because this
buffering ability of the T-STD.
data controls the entire timing
The transport stream contains
Figure 9.10. of the decoding process. PCR
parameters called VBV (Video
analysis can show that, in each
Buffer Verify) specifying the
program, PCR data is sent at a
amount of buffering needed by a
sufficient rate and with sufficient
given elementary stream.
accuracy to be compliant.
The T-STD analysis displays the
The PCR data from a multiplexer
buffer occupancy graphically so
may be precise, but remulti-
that overflows or underflows
plexing may put the packets of a
can be easily seen. Figure 9.14
given program at a different place
shows a buffering display.
on the time axis, requiring that
the PCR data be edited by the The output of a normal com-
remultiplexer. Consequently, it pressor/multiplexer is of limited
is important to test for PCR jitter use because it is not deterministic.
after the data is remultiplexed. If a decoder defect is seen, there
is no guarantee that the same
Figure 9.10 shows a PCR display
defect will be seen on a repeat of
that indicates the times at which
Figure 9.11. the test because the same video
PCRs were received. At the next
signal will not result in the same
display level, each PCR can be
transport stream. In this case, an
opened to display the PCR data,
absolutely repeatable transport
as is shown in Figure 9.11. To
stream is essential so that the
measure jitter, the analyzer
defect can be made to occur at
predicts the PCR value by using
will for study or rectification.
the previous PCR and the bit
rate to produce what is called Transport stream jitter should be
the interpolated PCR or PCRI. within certain limit, but a well-
The actual PCR value is sub- designed decoder should be able
tracted from PCRI to give an to recover programs beyond this
estimate of the jitter. The figure limit in order to guarantee reli-
also shows the time since the able operation. There is no way
previous PCR arrived. to test for this capability using
existing transport streams
An alternate approach shown in
because, if they are compliant,
Figure 9.12 provides a graphical
Figure 9.12. the decoder is not being tested.
display of PCR jitter and PCR
If there is a failure, it will not be
repetition rate which is updated
reproducible and it may not be
in real-time.
clear whether the failure was
Once PCR data is known to be due to jitter or some other non-
correct, the time stamps can be compliance. The solution is to
analyzed. Figure 9.13 shows a generate a transport stream that
time-stamp display for a selected is compliant in every respect
elementary stream. The time of and then add a controlled
arrival, the presentation time, amount of jitter to it so that jitter
and, where appropriate, the is then known to be the only
decode times are all shown. source of noncompliance. The
multiplexer feature of the MTS
200 is designed to create
such signals.

Figure 9.13.

42
9.8 Elementary stream testing No access to the internal working
Because of the flexible nature of of the decoder is required. To
the MPEG bit stream, the number avoid the need for lengthy
of possibilities and combinations analysis of the decoder output,
it can contain is almost incalcu- the bit streams have been
lable. As the encoder is not designed to create a plain picture
defined, encoder manufacturers when they complete so that it is
are not compelled to use every only necessary to connect a
possibility; indeed, for economic picture monitor to the decoder
reasons, this is unlikely. This output to view them.
fact makes testing quite difficult There are a number of these Figure 9.14.
because the fact that a decoder simple pictures. Figure 9.15
works with a particular encoder shows the gray verify screen.
does not prove compliance. The user should examine the
That decoder may simply not be verify screen to look for discrep-
using the modes that cause the ancies that will display well
decoder to fail. against the gray field. There are
A further complication occurs also some verify pictures which
because encoders are not deter- are not gray.
ministic and will not produce Some tests will result in no
the same bit stream if the video picture at all if there is a failure.
or audio input is repeated. There These tests display the word
is little chance that the same "SUCCESS!" on screen when
alignment will exist between I, they complete. Figure 9.15.
P, and B pictures and the video
Further tests require the viewer
frames. If a decoder fails a given 9.10 Elementary stream analysis
to check for smooth motion of
test, it may not fail the next time
a moving element across the An elementary stream is a pay-
the test is run, making fault-
picture. Timing or ordering load that the transport stream
finding difficult. A failure with
problems will cause visible jitter. must deliver transparently. The
a given encoder does not deter-
The suite of Sarnoff tests may be transport stream will do so
mine whether the fault lies with
used to check all of the MPEG whether or not the elementary
the encoder or the decoder. The
syntax elements in turn. In one stream is compliant. In other
coding difficulty depends heavily
test, the bit stream begins with I words, testing a transport stream
on the nature of the program
pictures only, adds P pictures, for compliance simply means
material, and any given program
and then adds B pictures to test checking that it is delivering
material will not necessarily
whether all MPEG picture types elementary streams unchanged.
exercise every parameter over
can be handled and correctly It does not mean that the ele-
the whole coding range.
reordered. Backward compati- mentary streams were properly
To make tests that have meaning- assembled in the first place.
bility with MPEG-1 can be
ful results, two tools are required:
proven. Another bit stream tests The elementary-stream structure
A known source of compliant using a range of different GOP or syntax is the responsibility of
test signals that deliberately structures. There are tests that the compressor. Therefore an
explore the whole coding range. check the operation of motion elementary stream test is essen-
These signals must be determin- vectors over the whole range of tially a form of compressor test.
istic so that a decoder failure will values, and there are tests that It should be noted that a com-
give repeatable symptoms. The vary the size of slices or the pressor can produce compliant
Sarnoff compliant bit streams amount of stuffing. syntax, and yet still have poor
are designed to perform this task. audio or video quality. However,
In addition to providing decoder
An elementary stream analyzer tests, the Sarnoff streams also if the syntax is incorrect, a
that allows the entire syntax include sequences that cause a decoder may not be able to
from an encoder to be checked good decoder to produce stan- interpret the elementary stream.
for compliance. dard video test signals to check Since compressors are algorith-
DACs, signal levels, and com- mic rather than deterministic, an
9.9 Sarnoff compliant bit streams posite or Y/C encoders. These elementary stream may be inter-
sequences turn the decoder into mittently noncompliant if some
These bit streams have been
a video test-pattern generator less common mode of operation
specifically designed by the
capable of producing conven- is not properly implemented.
David Sarnoff Research Center
for decoder compliance testing. tional video signals such as zone
They can be multiplexed into a plates, ramps, and color bars.
transport stream feeding
a decoder.

43
As transport streams often con- 9.12 Jitter generation
tain several programs that come The MPEG decoder has to recre-
from different coders, elementary- ate a continuous clock by using
stream problems tend to be the clock samples in PCR data
restricted to one program, whereas to drive a phase-locked loop.
transport stream problems tend The loop needs filtering and
to affect all programs. If problems damping so that jitter in the
are noted with the output of a time of arrival of PCR data does
particular decoder, then the not cause instability in the clock.
Sarnoff compliance tests should
To test the phase-locked loop
be run on that decoder. If these
performance, a signal with
are satisfactory, the fault may lie
known jitter is required; other-
in the input signal. If the trans-
wise, the test is meaningless.
port stream syntax has been
The MTS 200 can generate
tested, or if other programs are
Figure 9.16. simulated jitter for this purpose.
working without fault, then an
Because it is a reference genera-
elementary stream analysis
tor, the MTS 200 has highly
is justified.
stable clock circuits and the
Elementary stream analysis can actual output jitter is very small.
begin at the top level of the To create the effect of jitter, the
syntax and continue downwards. timing of the PCR data is not
Sequence headers are very changed at all. Instead, the PCR
important as they tell the values are modified so that the
decoder all of the relevant PCR count they contain is slightly
modes and parameters used in different from the ideal. The
the compression. The elemen- modified value results in phase
tary-stream syntax described in errors at the decoder that are
Sections 4.1 and 4.2 should be indistinguishable from real jitter.
used as a guide. Figure 9.16
The advantage of this approach
shows part of a sequence header
is that jitter of any required
displayed on an MTS 200. At a
Figure 9.17. magnitude can easily be added
lower level of syntax, Figure 9.17
to any program stream simply
shows a picture header.
by modifying the PCR data and
Audio leaving all other data intact.
9.11 Creating a transport stream
Prog Video Other program streams in the
PTS/DTS
Whenever the decoder is suspect, transport stream need not have
it is useful to be able to generate jitter added. In fact, it may be
Audio a test signal of known quality.
Prog Video best to have a stable program
Figure 9.18 shows that an MPEG stream to use as a reference.
PTS/DTS transport stream must include
TS
Audio
MUX Program Specific Information For different test purposes, the
(PSI), such as PAT, PMT and time base may be modulated in
Prog Video
NIT, describing one or more a number of ways that determine
PTS/DTS
program streams. Each program the spectrum of the loop phase
PAT stream must contain its own error in order to test the loop
PMT program clock reference (PCR) filtering. Square-wave jitter
NIT and elementary streams having alternates between values
periodic time stamps. which are equally early or late.
Sinusoidal jitter values cause
Figure 9.18.
A DVB transport stream will the phase error to be a sampled
contain additional service infor- sine wave. Random jitter causes
mation, such as BAT, SDT and the phase error to be similar
EIT tables. A PSI/SI editor to noise.
enables insertion of any desired
compliant combination of The MTS 200 also has the ability
PSI/SI into a custom test stream. to add a fixed offset to the PCR
data. This offset will cause a
Clearly, each item requires a constant shift in the decode time.
share of the available transport- If two programs in the same
stream rate. The multiplexer transport stream are decoded to
provides a rate gauge to display baseband video, a PCR shift in
the total bit rate used. The one program produces a change
remainder of the bit rate is used of sync phase with respect to
up by inserting stuffing packets the unmodified program.
with PIDs that contain all 1's,
which a decoder will reject.

44
9.13 DVB tests Similarly, the MTS 200 can add
Part of Encoder
DVB transmission testing can be and check Reed-Solomon pro-
divided into distinct areas. The tection and can interleave and
deinterleave data. These stages TS
purpose of the DVB equipment Randomizer
(FEC coder, transmitter, receiver, in the encoder and the reverse
and error correction) is to deliver stages in the decoder can be
a transport stream to the receiver tested. The inner redundancy
with negligible error. In this after interleaving can also be
sense, the DVB layer is (or added and checked. The only
should be) transparent to MPEG, process the MTS 200 cannot Energy TS
Dispersal
and certain testing can assume perform is the modulation, Remover Test
that the DVB transmission layer which is an analog process.
is intact. This assumption is However, the MTS 100 can drive
reasonable because any defect a modulator with a compliant
Figure 9.19.
in the DVB layer will usually signal so that it can be tested on
result in excessive bit errors, a constellation analyzer or with
and this condition will cause an IRD. This approach is also a
both an MPEG decoder and a very simple way of producing
syntax tester to misinterpret the compliant-test transmissions.
stream. Consequently, if a trans- Once the DVB layer is found to
port stream tester connected to a be operating at an acceptable
DVB/ATSC receiver finds a error rate, a transport stream
wide range of many errors, the tester at the receiver is essentially
problem is probably not in the testing the transport stream as it
MPEG equipment, but due to left the encoder or remultiplexer.
errors in the transmission. Therefore, the transport-stream
Finding transmission faults test should be performed at a
requires the ability to analyze point prior to the transmitter.
various points in the DVB The transport stream tests
Chain, checking for conformation should then run through the
with DVB specifications and tests outlined above. Most of
MPEG standards. Figure 8.1 these tests operate on transport
stream headers, and so there Figure 9.20.
showed the major processes in
the DVB chain. For example, if an should be no difficulty if the
encoder and an IRD (integrated program material is scrambled.
receiver decoder) are incompati- In addition to these tests, the
ble, which one is in error? Or, if DVB-specific tests include testing
the energy dispersal randomizing for the presence and the frequency
is not correct at either encoder or of the SDT (Service Description
decoder, how is one to proceed? Table), NIT (Network Information
The solution is to use an analyzer Table), and the EIT (Event
that can add or remove compli- Information Table). This testing
ant energy dispersal. Figure 9.19 can be performed using the DVB
shows that if energy dispersal is option in the MTS 200.
removed, the encoders random- Figure 9.20 shows a hierarchic
izer can be tested. Alternatively, view of DVB-SI tables from a
if randomized data are available, transport stream. Figure 9.21 Figure 9.21.
the energy dispersal removal at shows a display of decoded
the IRD can be tested. descriptor data.

45
Glossary DVB-SI - DVB Service IRD - Integrated Receiver
AAU - Audio Access Unit. See Information. Information carried Decoder. A combined RF receiver
Access unit. in a DVB multiplex describing and MPEG decoder that is used
the contents of different multi- to adapt a TV set to digital
AC-3 - An audio-coding tech- plexes. Includes NIT, SDT, EIT, transmissions.
nique used with ATSC. (See TDT, BAT, RST, and ST. (See
Section 4, Elementary Streams.) Level - The size of the input
Section 8, Introduction to picture in use with a given
Access unit - The coded data for DVB/ATSC.) profile. (See Section 2,
a picture or block of sound and DVC - Digital Video Cassette. Compression in Video.)
any stuffing (null values) that
follows it. Elementary Stream - The raw Macroblock - The screen area
output of a compressor carrying represented by several lumi-
ATSC - Advanced Television a single video or audio signal. nance and color-difference DCT
Systems Committee. blocks that are all steered by
ECM - Entitlement Control
Bouquet - A group of transport Message. Conditional access one motion vector.
streams in which programs are information specifying control Masking - A psychoacoustic
identified by combination of words or other stream-specific phenomenon whereby certain
network ID and PID (part of scrambling parameters. sounds cannot be heard in the
DVB-SI). presence of others.
EIT - Event Information Table.
CAT - Conditional Access Table. Part of DVB-SI. NIT - Network Information
Packets having PID (see Section 7, Table. Information in one trans-
Transport Streams) codes of 1 EMM - Entitlement Management
Message. Conditional access port stream that describes many
and that contain information transport streams.
about the scrambling system. information specifying autho-
See ECM and EMM. rization level or services of Null packets - Packets of "stuff-
specific decoders. An individual ing" that carry no data but are
Channel code - A modulation decoder or a group of decoders necessary to maintain a constant
technique that converts raw data may be addressed. bit rate with a variable payload.
into a signal that can be recorded Null packets always have a PID
or transmitted by radio or cable. ENG - Electronic News
Gathering. Term used to of 8191 (all 1s). (See Section 7,
CIF - Common Interchange describe use of video-recording Transport Streams.)
Format. A 352 x 240 pixel format instead of film in news coverage. PAT - Program Association
for 30 fps video conferencing. Table. Data appearing in packets
EPG - Electronic Program Guide.
Closed GOP - A Group Of A program guide delivered by having PID (see Section 7,
Pictures in which the last pic- data transfer rather than printed Transport Streams) code of zero
tures do not need data from the paper. that the MPEG decoder uses to
next GOP for bidirectional determine which programs exist
coding. Closed GOP is used to FEC - Forward Error Correction. in a Transport Stream. PAT
make a splice point in a System in which redundancy is points to PMT, which, in turn,
bit stream. added to the message so that points to the video, audio, and
errors can be corrected dynami- data content of each program.
Coefficient - A number specifying cally at the receiver.
the amplitude of a particular PCM - Pulse Code Modulation.
frequency in a transform. GOP - Group Of Pictures. A A technical term for an analog
GOP starts with an I picture and source waveform, for example,
DCT - Discrete Cosine Transform. ends with the last picture before audio or video signals, expressed
DTS - Decoding Time Stamp. Part the next I picture. as periodic, numerical samples.
of PES header indicating when Inter-coding - Compression that PCM is an uncompressed digital
an access unit is to be decoded. uses redundancy between suc- signal.
DVB - Digital Video Broadcasting. cessive pictures; also known as PCR - Program Clock Reference.
The broadcasting of television temporal coding. The sample of the encoder clock
programs using digital modula- Interleaving - A technique used count that is sent in the program
tion of a radio frequency carrier. with error correction that breaks header to synchronize the
The broadcast can be terrestrial up burst errors into many smaller decoder clock.
or from a satellite. errors. PCRI - Interpolated Program
DVB-IRD - See IRD. Intra-coding - Compression Clock Reference. A PCR estimated
that works entirely within one from a previous PCR and used
picture; also known as spatial to measure jitter.
coding.

46
PID - Program Identifier. A 13- QCIF - One-quarter-resolution TDAC - Time Domain Aliasing
bit code in the transport packet (176 x 144 pixels) Common Cancellation. A coding technique
header. PID 0 indicates that the Interchange Format. See CIF. used in AC-3 audio compression.
packet contains a PAT PID. (See QSIF - One-quarter-resolution TDT - Time and Date Table.
Section 7, Transport Streams.) Source Input Format. See SIF. Used in DVB-SI.
PID 1 indicates a packet that
contains CAT. The PID 8191 (all RLC - Run Length Coding. A T-STD - Transport Stream
1s) indicates null (stuffing) coding scheme that counts System Target Decoder. A
packets. All packets belonging number of similar bits instead decoder having a certain
to the same elementary stream of sending them individually. amount of buffer memory
have the same PID. SDI - Serial Digital Interface. assumed to be present by an
Serial coaxial cable interface encoder.
PMT - Program Map Tables. The
tables in PAT that point to standard intended for production Transport stream - A multiplex
video, audio, and data content digital video signals. of several program streams that
of a transport stream. STC - System Time Clock. The are carried in packets.
common clock used to encode Demultiplexing is achieved by
Packets - A term used in two different packet IDs (PIDs). See
contexts: in program streams, a video and audio in the same
program. PSI, PAT, PMT, and PCR.
packet is a unit that contains
one or more presentation units; SDT - Service Description Table. Truncation - Shortening the
in transport streams, a packet is A table listing the providers of wordlength of a sample or coef-
a small, fixed-size data quantum. each service in a transport ficient by removing low-order
stream. bits.
Preprocessing - The video signal
processing that occurs before SI - See DVB-SI. VAU - Video Access Unit. One
MPEG Encoding. Noise reduc- compressed picture in program
SIF - Source Input Format. A stream.
tion, downsampling, cut-edit
half-resolution input signal used
identification, and 3:2 pulldown VLC - Variable Length Coding.
by MPEG-1.
identification are examples of A compressed technique that
preprocessing. ST - Stuffing Table. allocates short codes to frequency
Profile - Specifies the coding Scalability - A characteristic of values and long codes to
syntax used. MPEG2 that provides for multi- infrequent values.
ple quality levels by providing VOD - Video On Demand. A
Program Stream - A bit stream
layers of video data. Multiple system in which television
containing compressed video,
layers of data allow a complex programs or movies are trans-
audio, and timing information.
decoder to produce a better mitted to a single consumer
PTS - Presentation Time Stamps - picture by using more layers of only when requested.
The time at which a presentation data, while a more simple
unit is to be available to the decoder can still produce a Vector - A motion compensation
viewer. picture using only the first layer parameter that tells a decoder
of data. how to shift part of a previous
PSI - Program Specific picture to more closely approxi-
Information. Information that Stuffing - Meaningless data mate the current picture.
keeps track of the different added to maintain constant bit
programs in an MPEG transport rate. Wavelet - A transform in the
stream and in the elementary basis function that is not of
Syndrome - Initial result of an fixed length but that grows
streams in each program. PSI
error checking calculation. longer as frequency reduces.
includes PAT, PMT, NIT, CAT,
Generally, if the syndrome is
ECM, and EMM. Weighting - A method of changing
zero, there is assumed to be
PSI/SI - A general term for com- no error. the distribution of the noise that
bined MPEG PSI and DVB-SI. is due to truncation by premul-
tiplying values.
PU - Presentation Unit. One
compressed picture or block
of audio.

47
For further information, contact Tektronix:
World Wide Web: http://www.tek.com; ASEAN Countries (65) 356-3900; Australia & New Zealand 61 (2) 888-7066; Austria, Eastern Europe, & Middle East 43 (1) 7 0177-261; Belgium 32 (2) 725-96-10;
Brazil and South America 55 (11) 3741 8360; Canada 1 (800) 661-5625; Denmark 445 (44) 850700; Finland 358 (9) 4783 400; France & North Africa 33 (1) 69 86 81 08; Germany 49 (221) 94 77-400;
Hong Kong (852) 2585-6688; India 91 (80) 2275577; Italy 39 (2) 250861; Japan (Sony/Tektronix Corporation) 81 (3) 3448-4611; Mexico, Central America, & Caribbean 52 (5) 666-6333;
The Netherlands 31 23 56 95555; Norway 47 (22) 070700; People’s Republic of China (86) 10-62351230; Republic of Korea 82 (2) 528-5299; Spain & Portugal 34 (1) 372 6000; Sweden 46 (8) 629 6500;
Switzerland 41 (41) 7119192; Taiwan 886 (2) 765-6362; United Kingdom & Eire 44 (1628) 403300; USA 1 (800) 426-2200
From other areas, contact: Tektronix, Inc. Export Sales, P.O. Box 500, M/S 50-255, Beaverton, Oregon 97077-0001, USA (503) 627-1916

Copyright © 1997, Tektronix, Inc. All rights reserved. Tektronix products are covered by U.S. and foreign patents, issued and pending.
Information in this publication supersedes that in all previously published material. Specification and price change privileges reserved.
TEKTRONIX and TEK are registered trademarks.
10/97 FL5419 25W–11418–0

You might also like