Videoprocessing

Digital Image Processing Fall 2010
Prof. Dmitry Goldgof
Digital Video Processing

Matthew Shreve
Computer Science and Engineering

University of South Florida
mshreve@cse.usf.edu
Outline
Basics of Video
Digital Video
MPEG
Summary
Basics of Video
Static scene capture Image
Bring in motion Video
Image sequence: A 3-D signal
2 spatial dimensions & 1 time dimension
Continuous I (x, y, t) discrete I (m, n, tk)
Video Camera
Frame-by-frame capturing
CCD sensors (Charge-Coupled Devices)
2-D array of solid-state sensors

Each sensor corresponds to a pixel
Stored in a buffer and sequentially read out
Widely used
Progressive vs. Interlaced

Videos
Progressive
Every pixel on the screen is refreshed in order (monitors) or
simultaneously (films)
Interlaced
Refreshed twice every frame; the little gun at the back of your
CRT shoots all the correct phosphors on the even numbered rows
of pixels first and then odd numbered rows
NTSC frame-rate of 29.97 means the screen is redrawn 59.94
times a second
In other words, 59.94 half-frames per second or 59.94 fields per
second

Videos
How interlaced video could cause problems
Suppose you resize a 720 x 480 interlaced video to 576
x 384 (20% reduction)
How does resizing work?
takes a sample of the pixels from the original source and
blends them together to create the new pixels
In case of interlaced video, you might end of blending

scan lines of two completely different images!

Videos
Image in full 720 x 480 resolution
Observe distinct scan lines

Videos
Image after being resized to 576x384
Some scan lines blended together!
DIGITAL VIDEO
Why Digital?
Exactness
Exact reproduction without degradation
Accurate duplication of processing result
Convenient & powerful computer-aided

processing
Can perform rather sophisticated processing through
hardware or software
Easy storage and transmission

1 DVD can store a three-hour movie !!!
Transmission of high quality video through network in
reasonable time
Digital Video Coding

The basic idea is to remove redundancy in video
and encode it
Perceptual redundancy
The Human Visual System is less sensitive to color and
high frequencies
Spatial redundancy
Pixels in a neighborhood have close luminance levels
Low frequency
How about temporal redundancy?

Differences between subsequent frames can be small.
Shouldnt we exploit this?
Hybrid Video Coding

Hybrid ~ combination of Spatial, Perceptual, &
Temporal redundancy removal
Issues to be handled
Not all regions are easily inferable from previous frame
Occlusion ~ solved by backward prediction using future frames as
reference
The decision of whether to use prediction or not is made adaptively
Drifting and error propagation

Solved by encoding reference regions or frames at constant intervals of
time
Random access
Solved by encoding frame without prediction at constant intervals of
time
Bit allocation
according to statistics
constant and variable bit-rate requirement
MPEG combines all of these features !!!
MPEG
MPEG Moving Pictures Experts Group
Coding of moving pictures and associated audio
Picture part
Can achieve compression ratio of about 50:1 through storing only
the difference between successive frames
Even higher compression ratios possible
Bit Rate
Defined in two ways
bits per second (all inter-frame compression algorithms)
bits per frame (most intra-frame compression algorithms
except DV and MJPEG)
What does this mean?

If you encode something in MPEG, specify it to be 1.5
Mbps; it doesnt matter what the frame-rate is, it takes
the same amount of space lower frame-rate will look
sharper but less smooth
If you do the same with a codec like Huffyuv or Intel
Indeo, you will get the same image quality through all of
them, but the smoothness and file sizes will change as
frame-rate changes
MPEG-1 Compression Aspects

Lossless and Lossy compression are both used for a high
compression rate
Down-sampled chrominance
Perceptual redundancy
Intra-frame compression
Spatial redundancy
Correlation/compression within a frame
Based on baseline JPEG compression standard
Inter-frame compression
Temporal redundancy
Correlation/compression between like frames
Audio compression
Three different layers (MP3)
Perceptual Redundancy
Here is an image represented with 8-bits per pixel
The same image at 7-bits per pixel
At 6-bits per pixel
At 5-bits per pixel
At 4-bits per pixel
It is clear that we dont all these bits!
Our previous example illustrated the eyes sensitivity
to luminance
We can build a perceptual model

Give more importance to what is perceivable to the
Human Visual System
Usually this is a function of the spatial frequency
Fundamentals of JPEG
Encoder
DCT
Quantizer
Entropy coder
Compressed
image data
IDCT
Dequantizer
Entropy
decoder
Decoder
Fundamentals of JPEG
JPEG works on 88 blocks

Extract 88 block of pixels
Convert to DCT domain
Quantize each coefficient
Different stepsize for each coefficient
Based on sensitivity of human visual system
Order coefficients in zig-zag order

Similar frequencies are grouped together
Run-length encode the quantized values and then

use Huffman coding on what is left
Random Access and

Inter-frame Compression
Temporal Redundancy
Only perform repeated encoding of the parts of a picture
frame that are rapidly changing
Do not repeatedly encode background elements and still
elements
Random access capability

Prediction that does not depend upon the user accessing the first
frame (skipping through movie scenes, arbitrary point pick-up)
Sample (2D) Motion

Field
Target Frame
Anchor Frame
Motion Field
2-D Motion Corresponding to

Camera Motion
Camera zoom
Camera rotation around Z-axis (roll)
General Considerations
for Motion Estimation
Two categories of approaches:
Feature based (more often used in object tracking, 3D
reconstruction from 2D)
Intensity based (based on constant intensity
assumption) (more often used for motion compensated
prediction, required in video coding, frame
interpolation)
Three important questions

How to represent the motion field?
What criteria to use to estimate motion parameters?
How to search motion parameters?
Motion Representation
Global:
Entire motion
field is
represented by a
few global
parameters
Pixel-based:
One MV at each
pixel, with some
smoothness
constraint
between adjacent
MVs.
Block-based:
Entire frame is
divided into
blocks, and
motion in each
block is
characterized by
a few
parameters.
Region-based:
Entire frame is
divided into
regions, each
region
corresponding to
an object or subobject with
consistent
motion,
represented by a
Also mesh-based
anchor
frame
Predicted target
frame
Motion field
target frame
Examples
Half-pel Exhaustive Block Matching Algorithm (EBMA)
Predicted target
frame
Examples
Three-level Hierarchical Block Matching Algorithm
mesh-based
method
EBMA
Examples
EBMA vs. Mesh-based Motion Estimation
Motion Compensated
Prediction
Divide current frame, i, into disjoint 1616
macroblocks
Search a window in previous frame, i-1, for
closest match
Calculate the prediction error
For each of the four 88 blocks in the
macroblock, perform DCT-based coding
Transmit motion vector + entropy coded
prediction error (lossy coding)
MPEG-1 Video Coding

Most MPEG1 implementations use a large number of I
frames to ensure fast access
Somewhat low compression ratio by itself
For predictive coding, P frames depend on only a small

number of past frames
Using less past frames reduces the propagation error
To further enhance compression in an MPEG-1 file,

introduce a third frame called the B frame bidirectional frame
B frames are encoded using predictive coding of only two other
frames: a past frame and a future frame
By looking at both the past and the future, helps reduce

prediction error due to rapid changes from frame to frame
(i.e. a fight scene or fast-action scene)
Predictive coding hierarchy:

I, P and B frames
I frames (black) do not depend on any other frame and are

encoded separately
Called Anchor frame
P frames (red) depend on the last P frame or I frame

(whichever is closer)
Also called Anchor frame
B frames (blue) depend on two frames: the closest past P or

I frame, and the closest future P or I frame
B frames are NOT used to predict other B frames, only P frames and
I frames are used for predicting other frames
MPEG-1 Temporal Order of

Compression
I frames are generated and compressed first
Have no frame dependence
P frames are generated and compressed second

Only depend upon the past I frame values
B frames are generated and compressed last

Depend on surrounding frames
Forward prediction needed
Adaptive Predictive Coding in

MPEG-1
Coding each block in P-frame
Predictive block using previous I/P frame as reference
Intra-block ~ encode without prediction
use this if prediction costs more bits than non-prediction
good for occluded area
can also avoid error propagation
Coding each block in B-frame

Intra-block ~ encode without prediction
Predictive block
use previous I/P frame as reference (forward prediction)
or use future I/P frame as reference (backward prediction)
or use both for prediction
MPEG Library
The MPEG Library is a C library for decoding MPEG-1
video streams and dithering them to a variety of color
schemes.
Most of the code in the library comes directly from an
old version of the Berkeley MPEG player (mpeg_play)
The Library can be downloaded from
http://starship.python.net/~gward/mpeglib/mpeg_lib-1.3.1.tar.gz
It works good on all modern Unix and Unix-like

platforms with an ANSI C compiler. I have tested it on
grad.
NOTE - This is not the best library available. But it works good for MPEG-1 and it is fairly easy to
use. If you are inquisitive, you should check MPEG Software Simulation Group at
http://www.mpeg.org/MPEG/MSSG/ where you can find a free MPEG-2 video coder/decoder.
MPEGe Library
The MPEGe(ncoding) Library is designed to allow you to
create MPEG movies from your application
The library can be downloaded from the files section of
http://groups.yahoo.com/group/mpegelib/
The encoder library uses the Berkeley MPEG encoder
engine, which handles all the complexities of MPEG
streams
As was the case with the decoder, this library can write
only one MPEG movie at a time
The library works good with most of the common image
formats
To keep things simple, we will stick to PPM
MPEGe Library Functions

The library consists of 3 simple functions
MPEGe_open for initializing the encoder.
MPEGe_image called each time you want to add a frame
to the sequence. The format of the image pointed to by
image is that used by the SDSC Image library
SDSC is a powerful library which will allow you to read/write 32
different image types and also contains functions to manipulate
them. The source code as well as pre-compiled binaries can be
downloaded at ftp://ftp.sdsc.edu/pub/sdsc/graphics/
MPEGe_close called to end the MPEG sequence. This

function will reset the library to a sane state and create
the MPEG end sequences and close the output file
Note: All functions return non NULL (i.e. TRUE) on success and Zero (or
FALSE) on failure.
Usage Details
You are not required to write code using the libraries to decode and encode
MPEG streams
Copy the binary executables from
http://www.csee.usf.edu/~mshreve/readframes
http://www.csee.usf.edu/~mshreve/encodeframes
Usage
To read frames from an MPEG movie (say test.mpg) and store them in a directory
extractframes (relative to your current working directory) with the prefix
testframe (to the filename)
readframes test.mpg extractframes/testframe
This will decode all the frames of test.mpg into the directory extractframes with
the filenames testframe0.ppm, testframe1.ppm
To encode,
encodeframes 0 60 extractframes/testframe testresult.mpg
This will encode images testframe0.ppm to testframe60.ppm from the directory

extractframes into testresult.mpg
In order to convert between PPM and PGM formats, copy the script from
http://www.csee.usf.edu/~mshreve/batchconvert

Videoprocessing

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Videoprocessing

Uploaded by

Copyright:

Available Formats

Digital Image Processing Fall 2010

Prof. Dmitry Goldgof

Digital Video Processing

Computer Science and Engineering

2-D array of solid-state sensors

Progressive vs. Interlaced

Progressive vs. Interlaced

In case of interlaced video, you might end of blending

Progressive vs. Interlaced

Image in full 720 x 480 resolution

Observe distinct scan lines

Progressive vs. Interlaced

Image after being resized to 576x384

Some scan lines blended together!

Convenient & powerful computer-aided

Easy storage and transmission

Digital Video Coding

How about temporal redundancy?

Hybrid Video Coding

Drifting and error propagation

MPEG combines all of these features !!!

What does this mean?

MPEG-1 Compression Aspects

We can build a perceptual model

JPEG works on 88 blocks

Order coefficients in zig-zag order

Run-length encode the quantized values and then

Random Access and

Random access capability

Sample (2D) Motion

2-D Motion Corresponding to

Camera rotation around Z-axis (roll)

Three important questions

Half-pel Exhaustive Block Matching Algorithm (EBMA)

Three-level Hierarchical Block Matching Algorithm

EBMA vs. Mesh-based Motion Estimation

MPEG-1 Video Coding

For predictive coding, P frames depend on only a small

To further enhance compression in an MPEG-1 file,

By looking at both the past and the future, helps reduce

Predictive coding hierarchy:

I frames (black) do not depend on any other frame and are

P frames (red) depend on the last P frame or I frame

B frames (blue) depend on two frames: the closest past P or

MPEG-1 Temporal Order of

P frames are generated and compressed second

B frames are generated and compressed last

Adaptive Predictive Coding in

Coding each block in B-frame

It works good on all modern Unix and Unix-like

MPEGe Library Functions

MPEGe_close called to end the MPEG sequence. This

This will encode images testframe0.ppm to testframe60.ppm from the directory

You might also like