You are on page 1of 60

D

A
T
A
C
O
M
P
R
E
S
S
I
O
N
NAME: JAGARNATH PASWAN (DC)
Email:paswan.jagarnath@gmail.com
What is Data Compression

• Data Compression refers to the process of reducing


the amount of data required to represent a given
quantity of Information.
• In fact, Data are means by which information is
conveyed.
• Data that either provide no relevant information or
simply restate that which is already known, is said to
contain data redundancy.
Why Data Compression
• In terms of communications, the bandwidth of a
digital communication link can be effectively increased
by compressing data at the sending end and
decompressing data at the receiving end.

• In terms of storage, the capacity of a storage device


can be effectively increased with methods that
compresses a body of data on its way to a storage
device and decompresses it when it is retrieved.
Compression Techniques Basis Techniques

Run-Length Coding

Entropy Coding Huffman Coding

Arithmetic Coding

Liv-Zempel-Welch (LZW) Coding

Prediction DPCM

Transformation DCT

Layer Coding Subband Coding


Source Coding
Vector Quantization
JPEG
Hybrid Coding
MPEG
Data Compression Methods
Lossless Compression
Text Entropy Sannon-fano, Huffman, Arithmetic

Dictionary LZ, LZW

Lossy Compression

Audio Audio Codec part DPCM, Sub-band

Image Method RLE, DCT

Video Video Codec part DCT, Vector Quantization

Hybrid Compression

Video Confrence Method JPEG, MPEG


Lossless Data Compression

1838
Samuel Finley Breeze Morse
Information Theory
“ A Mathematical Theory of Communication ”

1948
-Prof. Dr. Claude Elwood Shannon
Information Theory
Entropy (in our context) - smallest number of bits
needed, on the average, to represent a symbol
(the average on all the symbols code lengths).

Note: log2(pi )is the uncertainty in symbol (or the


“surprise” when we see this symbol). Entropy –
average “surprise”.

Assumption: there are no dependencies between


the symbols’ appearances

Information is quantifiable as:


Average Information = - log2 (prob. of occurrence)
For English: -log(1/26) = 4.7
Shannon-Fano Data Compression
"The transmission of information". Technical Report No. 65

1949
-Prof. Dr. Claude Elwood Shannon
-Prof. Dr. Robert Mario Fano
Shannon-Fano Data Compression
1. Line up the symbols
by falling probability Symbol A B C DE
of incidence. 1 1
0 0 1
2. Divide the symbols in Code 1 1
0 1 0
two groups, so that 0 1
both groups have
equal or almost equal H(s)=2Bit *
sum of the (15+7+6)
probabilities. +
3. Assign value 0 to the 3Bit * (6+5)
first group, and value 1 /
to the second. 39 symbols
4. For each of the both = 2.28 Bit Per
groups go to step 2. Symbol
Symbol A B C D E
Count 15 7 6 6 5
Probabilities 0.38461538 0.17948718 0.15384615 0.15384615 0.12820513
Huffman Data Compression
“ A Method for the Construction of Minimum-Redundancy Codes ”

1952
Dr. David Albert Huffman
Huffman Data Compression
1. Line up the symbols by falling Symbol A B C D E
probabilities
1 1 1 1
2. Link two symbols with least
probabilities into one new Code 0 0 0 1 1
symbol which probability is a 0 1 0 1
sum of probabilities of two
symbols
3. Go to step 2. until you
generate a single symbol 1Bit * 15
which probability is 1 +
4. Trace the coding tree from a 3 Bit * (7+6+6+5)
root (the generated symbol
/
with probability 1) to origin
symbols, and assign to each 39 Symbols
lower branch 1, and to each = 2.23 BPS
upper branch 0

Symbol A B C D E
Count 15 7 6 6 5
Probabilities 0.38461538 0.17948718 0.15384615 0.15384615 0.12820513
Arithmetic Coding
"Generalized Kraft Inequality and Arithmetic Coding"

1976
- Prof. Peter Elias
-Prof. Jorma Rissanen
-Prof. Richard Clark Pasco
Source Probability Initial subInterval
Symbol Arithmetic Coding
a1 0.2 [0.0, 0.2]
Let the message to be
a2 0.2 [0.2, 0.4]
a3 0.4 [0.4, 0.8] encoded be a3a3a1a2a4
a4 0.2 [0.8, 1.0]

1.0 0.8 0.72 0.592 0.5728

0.8 0.72 0.688 0.5856


0.57152

0.4 0.56 0.624 0.5728 056896

0.2 0.48 0.592 0.5664 0.56768

0.0 0.4
0.56 0.56 0.5664
Arithmetic Coding
Decoding:
Decode 0.572.
Since 0.8>code word > 0.4, the first symbol should be a3.

1.0 0.8 0.72 0.592 0.5728

0.8 0.72 0.688 0.5856


0.57152

Therefore, the
0.4 0.56 0.624 0.5728 056896 message is
a3a3a1a2a4
0.2 0.48 0.592 0.5664 0.56768

0.0 0.4
0.56 0.56 0.5664
LZ Data Compression
“ A Universal Algorithm for Sequential Data Compression “

1977
-Prof. Abraham Lempel
-Prof. Dr. Jacob Ziv
LZ Data Compression
23

Total no of bit = 23 After comptession = 13


Comparison
Huffman Arithmetic Lempel-Ziv
Probabilities Known in advance Known in advance Not known in advance

Alphabet Known in advance Known in advance Not known in advance

Data loss None None None


Symbols Not used Not used Used – better
dependency compression
Preprocessing Tree building None First pass on data
(can be eliminated)
Entropy If probabilities are Very close Best results when
negative powers of 2 alphabet not known

Codewords One codeword for One codeword for Codewords for set
each symbol all data of alphabet
Intuition Intuitive Not intuitive Not intuitive
LZW Data Compression
“ A Technique for High Performance Data Compression ”

1984.
-Prof. Abraham Lempel
-Prof. Dr. Jacob Ziv
-Dr. Terry A. Welch
Currently Pixel Encoded Dictionary Dictionary
LZW Recognized Being Output Location Entry
Sequence Processed (Code Word)
Compression
Algorithm 39
39 39 39 256 39-39
39 39 126 126 39 126 39 257 39-126
126 126 126 258 126-126
39 39 126 126
126 39 126 259 126-39
39 39 126 126 39 39
39-39 126 256 260 39-39-126
126 126
126-126 39 258 261 126-126-39
Total No. Of bit =12 39 39
Now Coded bit =7
39-39 126
39-39-126 126 260 262 39-39-126-126
126 eof
Rate-Distortion Theory
“ A Mathematical Theory of Communication ”

1948
-Prof. Dr. Claude Elwood Shannon
Rate-Distortion Theory
– Rate–distortion theory is a major
branch of information theory which
provides the theoretical foundations for
lossy data compression. it addresses
the problem of determining the
minimal amount of entropy (or
information) R that should be
communicated over a channel, so that
the source (input signal) can be
approximately reconstructed at the
Where
R(D) = Rate Distortion Function
receiver (output signal) without
H = Trade off rate exceeding a given distortion D.
D = Distortion
Distortion Measures
• A distortion measure is a mathematical quality that specifies
how close an approximation is to its original
– The average pixel difference is given by the Mean Square
Error (MSE)

• The size of the error relative to the signal is given by the


signal-to-noise ratio (SNR)

• Another common measure is the peak-signal-to-noise ratio


(PSNR)
DPCM Data Compression
“Differential Quantization of Communication Signals”

1950
C.Chapin Cutler
DPCM Data Compression

An Audio Signal Schematic Diagram


DPCM Data Compression
fn = 130 150 140 200 230

fn’ = 130 130 142 144 167

(fn - fn’) = e= 0 20 -2 56 63

e’ = 0 24 -8 56 56

fn”= 130 154 134 200 223

fn’= (fn”+ fn-1”)/2 e.g. (154+134)/2=144


e’= Q[en]= 16* trunc ((255+ en)/16) – 256 +8
Prediction Error = fn – fn’

Reconstruction Error = Quantization Error


fn” – fn = e’ – e = q
DCT Data Compression
“Discrete Cosine Transform”

1974
Dr. Nasir Ahmed
Dr.T. Natarajan
Dr. Kamisetty R. Rao
DCT Data Compression
The One-Dimensional DCT
The most common DCT definition of a 1-D sequence of length N(8) is

for u = 0,1,2,…,N −1. C(u)=Transform Coefficient, f(x)= 1D Matrix Pixel value

Similarly, the inverse transformation is defined as

for x = 0,1,2,…,N −1. f(x)= 1D Matrix Pixel value, C(u)=Transform Coefficient


DCT Data Compression
The Two-Dimensional DCT
The 2-D DCT is a direct extension of the 1-D case and is given by

for u,v = 0,1,2,…,N −1 . α(u) = α(v) = u=0 or v=0


u>0 or v>0
The inverse transform is defined as

u=0 or v=0
for x,y = 0,1,2,…,N −1. α(u) = α(v) =
u>0 or v>0
DCT Data Compression

One dimensional cosine basis function The Matrix form of equestion


DCT Data Compression
DCT Data Compression
Step 1: Sample the Image into 8*8 Step 2: The original image is Leveled off
Block by subreacting 128 from each entry.
DCT Data Compression
Step 3: DCT Transform by Matrix Step 4: Now DCT Matrix is Divided
Manipulation D=T M T’ by Quantization Table.

162.3

= DC coefficient
Quantization Table Quality Level 50
DCT Transform Matrix
DCT Data Compression
Step 5: Dividing D by Q and rounding Step 6: Now Zig- Zag Scan to
to nearest integer value. compress AC coefficent .

Quantized Matrix Zig- Zag Scan


DCT Data Compression
Decompression:

N=
DCT Data Compression
Compresion between Original and Decompressed image
DCT Data Compression
DCT Data Compression

Baboon Original Image DCT Baboon Decompress Image


JPEG (Joint Photographic Experts Group)
JPEG (pronounced jay-peg) is a most commonly used standard method of lossy
compression for photographic images.

JPEG itself specifies only how an image is transformed into a stream of bytes,
but not how those bytes are encapsulated in any particular storage medium.

A further standard, created by the Independent JPEG Group, called JFIF (JPEG
File Interchange Format) specifies how to produce a file suitable for computer
storage and transmission from a JPEG stream.

In common usage, when one speaks of a "JPEG file" one generally means a JFIF
file, or sometimes an Exif JPEG file.

JPEG/JFIF is the format most used for storing and transmitting photographs on
the web.. It is not as well suited for line drawings and other textual or iconic
graphics because its compression method performs badly on these types of images
Baseline JPEG compression
Y = luminance
Cr, Cb = chrominance

Y= 0.299R + 0.587G + 0.114B


U=Cb= 0.492(B − Y)= − 0.147R − 0.289G + 0.436B
V=Cr= 0.877(R − Y)= 0.615R − 0.515G − 0.100B
JPEG File Interchange Format (JFIF)

The encoded data is written into the JPEG File Interchange Format (JFIF), which,
as the name suggests, is a simplified format allowing JPEG-compressed images
to be shared across multiple platforms and applications.

JFIF includes embedded image and coding parameters, framed by appropriate


header information.

Specifically, aside from the encoded data, a JFIF file must store all coding and
quantization tables that are necessary for the JPEG decoder to do its job properly.
MPEG Data Compression
“Motion Picture Experts Group”

1980
Motion Picture Experts Group (MPEG)
MPEG-1 Data Compression
“Motion Picture Experts Group”
MPEG-1 Data Compression
“Motion Picture Experts Group”
MPEG-1 Data Compression
“Motion Picture Experts Group”
MPEG-1 Data Compression
“Motion Picture Experts Group”
MPEG-1 Data Compression
“Motion Picture Experts Group”
MPEG-1 Data Compression
“Motion Picture Experts Group”
MPEG-1 Data Compression
“Motion Picture Experts Group”
MPEG-2 Data Compression
“Motion Picture Experts Group”
MPEG-2 Data Compression
“Motion Picture Experts Group”
MPEG-4 Data Compression
“Motion Picture Experts Group”
MPEG-4 Data Compression
“Motion Picture Experts Group”
MPEG- 7 Data Compression
“Motion Picture Experts Group”
H.261 Data Compression
“Motion Picture Experts Group”
H.261 Data Compression
“Motion Picture Experts Group”
Refrences
Digital Image Processing 2nd Edition
-by Rafael C. Gonzalez and Richard E. Woods

http://en.wikipedia.org/wiki/Data_compression

http://navatrump.de/Technology/Datacompression/compression.html

Multimedia Fundamentals Vol 1


-by Ralf Steinmetz and Klara Nahrstedt

You might also like