You are on page 1of 40

Digital Image Processing Fall 2010

Prof. Dmitry Goldgof

Digital Video Processing


Matthew Shreve

Computer Science and Engineering


University of South Florida

mshreve@cse.usf.edu

Outline

Basics of Video
Digital Video
MPEG
Summary

Basics of Video
Static scene capture Image
Bring in motion Video
Image sequence: A 3-D signal
2 spatial dimensions & 1 time dimension
Continuous I (x, y, t) discrete I (m, n, tk)

Video Camera
Frame-by-frame capturing
CCD sensors (Charge-Coupled Devices)

2-D array of solid-state sensors


Each sensor corresponds to a pixel
Stored in a buffer and sequentially read out
Widely used

Progressive vs. Interlaced


Videos
Progressive
Every pixel on the screen is refreshed in order (monitors) or
simultaneously (films)

Interlaced
Refreshed twice every frame; the little gun at the back of your
CRT shoots all the correct phosphors on the even numbered rows
of pixels first and then odd numbered rows
NTSC frame-rate of 29.97 means the screen is redrawn 59.94
times a second
In other words, 59.94 half-frames per second or 59.94 fields per
second

Progressive vs. Interlaced


Videos
How interlaced video could cause problems
Suppose you resize a 720 x 480 interlaced video to 576
x 384 (20% reduction)
How does resizing work?
takes a sample of the pixels from the original source and
blends them together to create the new pixels

In case of interlaced video, you might end of blending


scan lines of two completely different images!

Progressive vs. Interlaced


Videos

Image in full 720 x 480 resolution

Observe distinct scan lines

Progressive vs. Interlaced


Videos

Image after being resized to 576x384

Some scan lines blended together!

DIGITAL VIDEO

Why Digital?
Exactness
Exact reproduction without degradation
Accurate duplication of processing result

Convenient & powerful computer-aided


processing
Can perform rather sophisticated processing through
hardware or software

Easy storage and transmission


1 DVD can store a three-hour movie !!!
Transmission of high quality video through network in
reasonable time

Digital Video Coding


The basic idea is to remove redundancy in video
and encode it
Perceptual redundancy
The Human Visual System is less sensitive to color and
high frequencies

Spatial redundancy
Pixels in a neighborhood have close luminance levels
Low frequency

How about temporal redundancy?


Differences between subsequent frames can be small.
Shouldnt we exploit this?

Hybrid Video Coding


Hybrid ~ combination of Spatial, Perceptual, &
Temporal redundancy removal
Issues to be handled
Not all regions are easily inferable from previous frame
Occlusion ~ solved by backward prediction using future frames as
reference
The decision of whether to use prediction or not is made adaptively

Drifting and error propagation


Solved by encoding reference regions or frames at constant intervals of
time

Random access
Solved by encoding frame without prediction at constant intervals of
time

Bit allocation
according to statistics
constant and variable bit-rate requirement

MPEG combines all of these features !!!

MPEG
MPEG Moving Pictures Experts Group
Coding of moving pictures and associated audio

Picture part
Can achieve compression ratio of about 50:1 through storing only
the difference between successive frames
Even higher compression ratios possible

Bit Rate
Defined in two ways
bits per second (all inter-frame compression algorithms)
bits per frame (most intra-frame compression algorithms
except DV and MJPEG)

What does this mean?


If you encode something in MPEG, specify it to be 1.5
Mbps; it doesnt matter what the frame-rate is, it takes
the same amount of space lower frame-rate will look
sharper but less smooth
If you do the same with a codec like Huffyuv or Intel
Indeo, you will get the same image quality through all of
them, but the smoothness and file sizes will change as
frame-rate changes

MPEG-1 Compression Aspects


Lossless and Lossy compression are both used for a high
compression rate
Down-sampled chrominance
Perceptual redundancy

Intra-frame compression
Spatial redundancy
Correlation/compression within a frame
Based on baseline JPEG compression standard

Inter-frame compression
Temporal redundancy
Correlation/compression between like frames

Audio compression
Three different layers (MP3)

Perceptual Redundancy
Here is an image represented with 8-bits per pixel

Perceptual Redundancy
The same image at 7-bits per pixel

Perceptual Redundancy
At 6-bits per pixel

Perceptual Redundancy
At 5-bits per pixel

Perceptual Redundancy
At 4-bits per pixel

Perceptual Redundancy
It is clear that we dont all these bits!
Our previous example illustrated the eyes sensitivity
to luminance

We can build a perceptual model


Give more importance to what is perceivable to the
Human Visual System
Usually this is a function of the spatial frequency

Fundamentals of JPEG

Encoder

DCT

Quantizer

Entropy coder

Compressed
image data

IDCT

Dequantizer

Entropy
decoder

Decoder

Fundamentals of JPEG

JPEG works on 88 blocks


Extract 88 block of pixels
Convert to DCT domain
Quantize each coefficient
Different stepsize for each coefficient
Based on sensitivity of human visual system

Order coefficients in zig-zag order


Similar frequencies are grouped together

Run-length encode the quantized values and then


use Huffman coding on what is left

Random Access and


Inter-frame Compression
Temporal Redundancy
Only perform repeated encoding of the parts of a picture
frame that are rapidly changing
Do not repeatedly encode background elements and still
elements

Random access capability


Prediction that does not depend upon the user accessing the first
frame (skipping through movie scenes, arbitrary point pick-up)

Sample (2D) Motion


Field
Target Frame

Anchor Frame

Motion Field

2-D Motion Corresponding to


Camera Motion

Camera zoom

Camera rotation around Z-axis (roll)

General Considerations
for Motion Estimation
Two categories of approaches:
Feature based (more often used in object tracking, 3D
reconstruction from 2D)
Intensity based (based on constant intensity
assumption) (more often used for motion compensated
prediction, required in video coding, frame
interpolation)

Three important questions


How to represent the motion field?
What criteria to use to estimate motion parameters?
How to search motion parameters?

Motion Representation

Global:
Entire motion
field is
represented by a
few global
parameters

Pixel-based:
One MV at each
pixel, with some
smoothness
constraint
between adjacent
MVs.

Block-based:
Entire frame is
divided into
blocks, and
motion in each
block is
characterized by
a few
parameters.

Region-based:
Entire frame is
divided into
regions, each
region
corresponding to
an object or subobject with
consistent
motion,
represented by a

Also mesh-based

anchor
frame
Predicted target
frame

Motion field

target frame

Examples

Half-pel Exhaustive Block Matching Algorithm (EBMA)

Predicted target
frame

Examples

Three-level Hierarchical Block Matching Algorithm

mesh-based
method

EBMA

Examples

EBMA vs. Mesh-based Motion Estimation

Motion Compensated
Prediction
Divide current frame, i, into disjoint 1616
macroblocks
Search a window in previous frame, i-1, for
closest match
Calculate the prediction error
For each of the four 88 blocks in the
macroblock, perform DCT-based coding
Transmit motion vector + entropy coded
prediction error (lossy coding)

MPEG-1 Video Coding


Most MPEG1 implementations use a large number of I
frames to ensure fast access
Somewhat low compression ratio by itself

For predictive coding, P frames depend on only a small


number of past frames
Using less past frames reduces the propagation error

To further enhance compression in an MPEG-1 file,


introduce a third frame called the B frame bidirectional frame
B frames are encoded using predictive coding of only two other
frames: a past frame and a future frame

By looking at both the past and the future, helps reduce


prediction error due to rapid changes from frame to frame
(i.e. a fight scene or fast-action scene)

Predictive coding hierarchy:


I, P and B frames

I frames (black) do not depend on any other frame and are


encoded separately
Called Anchor frame

P frames (red) depend on the last P frame or I frame


(whichever is closer)
Also called Anchor frame

B frames (blue) depend on two frames: the closest past P or


I frame, and the closest future P or I frame
B frames are NOT used to predict other B frames, only P frames and
I frames are used for predicting other frames

MPEG-1 Temporal Order of


Compression
I frames are generated and compressed first
Have no frame dependence

P frames are generated and compressed second


Only depend upon the past I frame values

B frames are generated and compressed last


Depend on surrounding frames
Forward prediction needed

Adaptive Predictive Coding in


MPEG-1
Coding each block in P-frame
Predictive block using previous I/P frame as reference
Intra-block ~ encode without prediction
use this if prediction costs more bits than non-prediction
good for occluded area
can also avoid error propagation

Coding each block in B-frame


Intra-block ~ encode without prediction
Predictive block
use previous I/P frame as reference (forward prediction)
or use future I/P frame as reference (backward prediction)
or use both for prediction

MPEG Library
The MPEG Library is a C library for decoding MPEG-1
video streams and dithering them to a variety of color
schemes.
Most of the code in the library comes directly from an
old version of the Berkeley MPEG player (mpeg_play)
The Library can be downloaded from
http://starship.python.net/~gward/mpeglib/mpeg_lib-1.3.1.tar.gz

It works good on all modern Unix and Unix-like


platforms with an ANSI C compiler. I have tested it on
grad.
NOTE - This is not the best library available. But it works good for MPEG-1 and it is fairly easy to
use. If you are inquisitive, you should check MPEG Software Simulation Group at
http://www.mpeg.org/MPEG/MSSG/ where you can find a free MPEG-2 video coder/decoder.

MPEGe Library
The MPEGe(ncoding) Library is designed to allow you to
create MPEG movies from your application
The library can be downloaded from the files section of
http://groups.yahoo.com/group/mpegelib/
The encoder library uses the Berkeley MPEG encoder
engine, which handles all the complexities of MPEG
streams
As was the case with the decoder, this library can write
only one MPEG movie at a time
The library works good with most of the common image
formats
To keep things simple, we will stick to PPM

MPEGe Library Functions


The library consists of 3 simple functions
MPEGe_open for initializing the encoder.
MPEGe_image called each time you want to add a frame
to the sequence. The format of the image pointed to by
image is that used by the SDSC Image library
SDSC is a powerful library which will allow you to read/write 32
different image types and also contains functions to manipulate
them. The source code as well as pre-compiled binaries can be
downloaded at ftp://ftp.sdsc.edu/pub/sdsc/graphics/

MPEGe_close called to end the MPEG sequence. This


function will reset the library to a sane state and create
the MPEG end sequences and close the output file

Note: All functions return non NULL (i.e. TRUE) on success and Zero (or
FALSE) on failure.

Usage Details

You are not required to write code using the libraries to decode and encode
MPEG streams
Copy the binary executables from

http://www.csee.usf.edu/~mshreve/readframes
http://www.csee.usf.edu/~mshreve/encodeframes

Usage

To read frames from an MPEG movie (say test.mpg) and store them in a directory
extractframes (relative to your current working directory) with the prefix
testframe (to the filename)
readframes test.mpg extractframes/testframe

This will decode all the frames of test.mpg into the directory extractframes with
the filenames testframe0.ppm, testframe1.ppm
To encode,
encodeframes 0 60 extractframes/testframe testresult.mpg

This will encode images testframe0.ppm to testframe60.ppm from the directory


extractframes into testresult.mpg

In order to convert between PPM and PGM formats, copy the script from

http://www.csee.usf.edu/~mshreve/batchconvert

You might also like