Data Compression (DC)

D
A
T
A
C
O
M
P
R
E
S
S
I
O
N
NAME: JAGARNATH PASWAN (DC)
Email:paswan.jagarnath@gmail.com
What is Data Compression
• Data Compression refers to the process of reducing

the amount of data required to represent a given
quantity of Information.
• In fact, Data are means by which information is
conveyed.
• Data that either provide no relevant information or
simply restate that which is already known, is said to
contain data redundancy.
Why Data Compression
• In terms of communications, the bandwidth of a
digital communication link can be effectively increased
by compressing data at the sending end and
decompressing data at the receiving end.
• In terms of storage, the capacity of a storage device

can be effectively increased with methods that
compresses a body of data on its way to a storage
device and decompresses it when it is retrieved.
Compression Techniques Basis Techniques
Run-Length Coding
Entropy Coding Huffman Coding
Arithmetic Coding
Liv-Zempel-Welch (LZW) Coding
Prediction DPCM
Transformation DCT
Layer Coding Subband Coding

Source Coding
Vector Quantization
JPEG
Hybrid Coding
MPEG
Data Compression Methods
Lossless Compression
Text Entropy Sannon-fano, Huffman, Arithmetic
Dictionary LZ, LZW
Lossy Compression
Audio Audio Codec part DPCM, Sub-band
Image Method RLE, DCT
Video Video Codec part DCT, Vector Quantization
Hybrid Compression
Video Confrence Method JPEG, MPEG

Lossless Data Compression
1838
Samuel Finley Breeze Morse
Information Theory
“ A Mathematical Theory of Communication ”
1948
-Prof. Dr. Claude Elwood Shannon
Information Theory
Entropy (in our context) - smallest number of bits
needed, on the average, to represent a symbol
(the average on all the symbols code lengths).
Note: log2(pi )is the uncertainty in symbol (or the

“surprise” when we see this symbol). Entropy –
average “surprise”.
Assumption: there are no dependencies between

the symbols’ appearances
Information is quantifiable as:

Average Information = - log2 (prob. of occurrence)
For English: -log(1/26) = 4.7
Shannon-Fano Data Compression
"The transmission of information". Technical Report No. 65
1949
-Prof. Dr. Robert Mario Fano
Shannon-Fano Data Compression
1. Line up the symbols
by falling probability Symbol A B C DE
of incidence. 1 1
0 0 1
2. Divide the symbols in Code 1 1
0 1 0
two groups, so that 0 1
both groups have
equal or almost equal H(s)=2Bit *
sum of the (15+7+6)
probabilities. +
3. Assign value 0 to the 3Bit * (6+5)
first group, and value 1 /
to the second. 39 symbols
4. For each of the both = 2.28 Bit Per
groups go to step 2. Symbol
Symbol A B C D E
Count 15 7 6 6 5
Probabilities 0.38461538 0.17948718 0.15384615 0.15384615 0.12820513
Huffman Data Compression
“ A Method for the Construction of Minimum-Redundancy Codes ”
1952
Dr. David Albert Huffman
Huffman Data Compression
1. Line up the symbols by falling Symbol A B C D E
probabilities
1 1 1 1
2. Link two symbols with least
probabilities into one new Code 0 0 0 1 1
symbol which probability is a 0 1 0 1
sum of probabilities of two
symbols
3. Go to step 2. until you
generate a single symbol 1Bit * 15
which probability is 1 +
4. Trace the coding tree from a 3 Bit * (7+6+6+5)
root (the generated symbol
/
with probability 1) to origin
symbols, and assign to each 39 Symbols
lower branch 1, and to each = 2.23 BPS
upper branch 0
Symbol A B C D E
Count 15 7 6 6 5
Probabilities 0.38461538 0.17948718 0.15384615 0.15384615 0.12820513
Arithmetic Coding
"Generalized Kraft Inequality and Arithmetic Coding"
1976
- Prof. Peter Elias
-Prof. Jorma Rissanen
-Prof. Richard Clark Pasco
Source Probability Initial subInterval
Symbol Arithmetic Coding
a1 0.2 [0.0, 0.2]
Let the message to be
a2 0.2 [0.2, 0.4]
a3 0.4 [0.4, 0.8] encoded be a3a3a1a2a4
a4 0.2 [0.8, 1.0]
1.0 0.8 0.72 0.592 0.5728
0.8 0.72 0.688 0.5856

0.57152
0.4 0.56 0.624 0.5728 056896
0.2 0.48 0.592 0.5664 0.56768
0.0 0.4
0.56 0.56 0.5664
Arithmetic Coding
Decoding:
Decode 0.572.
Since 0.8>code word > 0.4, the first symbol should be a3.
1.0 0.8 0.72 0.592 0.5728
0.8 0.72 0.688 0.5856

0.57152
Therefore, the
0.4 0.56 0.624 0.5728 056896 message is
a3a3a1a2a4
0.2 0.48 0.592 0.5664 0.56768
0.0 0.4
0.56 0.56 0.5664
LZ Data Compression
“ A Universal Algorithm for Sequential Data Compression “
1977
-Prof. Abraham Lempel
-Prof. Dr. Jacob Ziv
LZ Data Compression
23
Total no of bit = 23 After comptession = 13

Comparison
Huffman Arithmetic Lempel-Ziv
Probabilities Known in advance Known in advance Not known in advance
Alphabet Known in advance Known in advance Not known in advance
Data loss None None None

Symbols Not used Not used Used – better
dependency compression
Preprocessing Tree building None First pass on data
(can be eliminated)
Entropy If probabilities are Very close Best results when
negative powers of 2 alphabet not known
Codewords One codeword for One codeword for Codewords for set
each symbol all data of alphabet
Intuition Intuitive Not intuitive Not intuitive
LZW Data Compression
“ A Technique for High Performance Data Compression ”
1984.
-Prof. Abraham Lempel
-Prof. Dr. Jacob Ziv
-Dr. Terry A. Welch
Currently Pixel Encoded Dictionary Dictionary
LZW Recognized Being Output Location Entry
Sequence Processed (Code Word)
Compression
Algorithm 39
39 39 39 256 39-39
39 39 126 126 39 126 39 257 39-126
126 126 126 258 126-126
39 39 126 126
126 39 126 259 126-39
39 39 126 126 39 39
39-39 126 256 260 39-39-126
126 126
126-126 39 258 261 126-126-39
Total No. Of bit =12 39 39
Now Coded bit =7
39-39 126
39-39-126 126 260 262 39-39-126-126
126 eof
Rate-Distortion Theory
“ A Mathematical Theory of Communication ”
1948
Rate-Distortion Theory
– Rate–distortion theory is a major
branch of information theory which
provides the theoretical foundations for
lossy data compression. it addresses
the problem of determining the
minimal amount of entropy (or
information) R that should be
communicated over a channel, so that
the source (input signal) can be
approximately reconstructed at the
Where
R(D) = Rate Distortion Function
receiver (output signal) without
H = Trade off rate exceeding a given distortion D.
D = Distortion
Distortion Measures
• A distortion measure is a mathematical quality that specifies
how close an approximation is to its original
– The average pixel difference is given by the Mean Square
Error (MSE)
• The size of the error relative to the signal is given by the

signal-to-noise ratio (SNR)
• Another common measure is the peak-signal-to-noise ratio

(PSNR)
DPCM Data Compression
“Differential Quantization of Communication Signals”
1950
C.Chapin Cutler
An Audio Signal Schematic Diagram

fn = 130 150 140 200 230
fn’ = 130 130 142 144 167
(fn - fn’) = e= 0 20 -2 56 63
e’ = 0 24 -8 56 56
fn”= 130 154 134 200 223
fn’= (fn”+ fn-1”)/2 e.g. (154+134)/2=144

e’= Q[en]= 16* trunc ((255+ en)/16) – 256 +8
Prediction Error = fn – fn’
Reconstruction Error = Quantization Error

fn” – fn = e’ – e = q
DCT Data Compression
“Discrete Cosine Transform”
1974
Dr. Nasir Ahmed
Dr.T. Natarajan
Dr. Kamisetty R. Rao
The One-Dimensional DCT
The most common DCT definition of a 1-D sequence of length N(8) is
for u = 0,1,2,…,N −1. C(u)=Transform Coefficient, f(x)= 1D Matrix Pixel value
Similarly, the inverse transformation is defined as
for x = 0,1,2,…,N −1. f(x)= 1D Matrix Pixel value, C(u)=Transform Coefficient

The Two-Dimensional DCT
The 2-D DCT is a direct extension of the 1-D case and is given by
for u,v = 0,1,2,…,N −1 . α(u) = α(v) = u=0 or v=0

u>0 or v>0
The inverse transform is defined as
u=0 or v=0
for x,y = 0,1,2,…,N −1. α(u) = α(v) =
u>0 or v>0
One dimensional cosine basis function The Matrix form of equestion

Step 1: Sample the Image into 8*8 Step 2: The original image is Leveled off
Block by subreacting 128 from each entry.
Step 3: DCT Transform by Matrix Step 4: Now DCT Matrix is Divided
Manipulation D=T M T’ by Quantization Table.
162.3
= DC coefficient
Quantization Table Quality Level 50
DCT Transform Matrix
Step 5: Dividing D by Q and rounding Step 6: Now Zig- Zag Scan to
to nearest integer value. compress AC coefficent .
Quantized Matrix Zig- Zag Scan

Decompression:
N=
Compresion between Original and Decompressed image
Baboon Original Image DCT Baboon Decompress Image

JPEG (Joint Photographic Experts Group)
JPEG (pronounced jay-peg) is a most commonly used standard method of lossy
compression for photographic images.
JPEG itself specifies only how an image is transformed into a stream of bytes,
but not how those bytes are encapsulated in any particular storage medium.
A further standard, created by the Independent JPEG Group, called JFIF (JPEG
File Interchange Format) specifies how to produce a file suitable for computer
storage and transmission from a JPEG stream.
In common usage, when one speaks of a "JPEG file" one generally means a JFIF
file, or sometimes an Exif JPEG file.
JPEG/JFIF is the format most used for storing and transmitting photographs on
the web.. It is not as well suited for line drawings and other textual or iconic
graphics because its compression method performs badly on these types of images
Baseline JPEG compression
Y = luminance
Cr, Cb = chrominance
Y= 0.299R + 0.587G + 0.114B

U=Cb= 0.492(B − Y)= − 0.147R − 0.289G + 0.436B
V=Cr= 0.877(R − Y)= 0.615R − 0.515G − 0.100B
JPEG File Interchange Format (JFIF)
The encoded data is written into the JPEG File Interchange Format (JFIF), which,
as the name suggests, is a simplified format allowing JPEG-compressed images
to be shared across multiple platforms and applications.
JFIF includes embedded image and coding parameters, framed by appropriate

header information.
Specifically, aside from the encoded data, a JFIF file must store all coding and
quantization tables that are necessary for the JPEG decoder to do its job properly.
MPEG Data Compression
“Motion Picture Experts Group”
1980
Motion Picture Experts Group (MPEG)
MPEG-1 Data Compression
MPEG- 7 Data Compression
H.261 Data Compression
H.261 Data Compression
Refrences
Digital Image Processing 2nd Edition
-by Rafael C. Gonzalez and Richard E. Woods
http://en.wikipedia.org/wiki/Data_compression
http://navatrump.de/Technology/Datacompression/compression.html
Multimedia Fundamentals Vol 1

-by Ralf Steinmetz and Klara Nahrstedt

Data Compression (DC)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Compression (DC)

Uploaded by

Copyright:

Available Formats

D

• Data Compression refers to the process of reducing

• In terms of storage, the capacity of a storage device

Entropy Coding Huffman Coding

Liv-Zempel-Welch (LZW) Coding

Layer Coding Subband Coding

Dictionary LZ, LZW

Audio Audio Codec part DPCM, Sub-band

Image Method RLE, DCT

Video Video Codec part DCT, Vector Quantization

Video Confrence Method JPEG, MPEG

Note: log2(pi )is the uncertainty in symbol (or the

Assumption: there are no dependencies between

Information is quantifiable as:

1.0 0.8 0.72 0.592 0.5728

0.8 0.72 0.688 0.5856

0.4 0.56 0.624 0.5728 056896

0.2 0.48 0.592 0.5664 0.56768

1.0 0.8 0.72 0.592 0.5728

0.8 0.72 0.688 0.5856

Total no of bit = 23 After comptession = 13

Alphabet Known in advance Known in advance Not known in advance

Data loss None None None

• The size of the error relative to the signal is given by the

• Another common measure is the peak-signal-to-noise ratio

An Audio Signal Schematic Diagram

fn’ = 130 130 142 144 167

fn”= 130 154 134 200 223

fn’= (fn”+ fn-1”)/2 e.g. (154+134)/2=144

Reconstruction Error = Quantization Error

for u = 0,1,2,…,N −1. C(u)=Transform Coefficient, f(x)= 1D Matrix Pixel value

Similarly, the inverse transformation is defined as

for x = 0,1,2,…,N −1. f(x)= 1D Matrix Pixel value, C(u)=Transform Coefficient

for u,v = 0,1,2,…,N −1 . α(u) = α(v) = u=0 or v=0

One dimensional cosine basis function The Matrix form of equestion

Quantized Matrix Zig- Zag Scan

Baboon Original Image DCT Baboon Decompress Image

Y= 0.299R + 0.587G + 0.114B

JFIF includes embedded image and coding parameters, framed by appropriate

Multimedia Fundamentals Vol 1

You might also like