You are on page 1of 34

Technical Seminar Presentation 2005

National Institute of Science and Technology

A Review of Data Compression Techniques

Presented
by
Sudeepta Mishra
Roll# CS200117052
At
NIST,Berhampur

Under the guidance of


Mr. Rowdra Ghatak
Sudeepta Mishra CS200117052 [1]
Technical Seminar Presentation 2005

Introduction
National Institute of Science and Technology

• Data compression is the process of encoding data so


that it takes less storage space or less transmission time
than it would if it were not compressed.
• Compression is possible because most real-world data
is very redundant

Sudeepta Mishra CS200117052 [2]


Technical Seminar Presentation 2005

Different Compression Techniques


National Institute of Science and Technology

• Mainly two types of data Compression techniques are


there.
– Loss less Compression.
Useful in spreadsheets, text,
executable program Compression.
– Lossy less Compression.
Compression of images, movies and sounds.

Sudeepta Mishra CS200117052 [3]


Technical Seminar Presentation 2005

Types of Loss less data Compression


National Institute of Science and Technology

• Dictionary coders.
– Zip (file format).
– Lempel Ziv.
• Entropy encoding.
– Huffman coding (simple entropy coding).
• Run-length encoding.

Sudeepta Mishra CS200117052 [4]


Technical Seminar Presentation 2005

Dictionary-Based Compression
National Institute of Science and Technology

• Dictionary-based algorithms do not encode


single symbols as variable-length bit strings;
they encode variable-length strings of symbols
as single tokens.
• The tokens form an index into a phrase
dictionary.
• If the tokens are smaller than the phrases
they replace, compression occurs.

Sudeepta Mishra CS200117052 [5]


Technical Seminar Presentation 2005

Types of Dictionary
National Institute of Science and Technology

• Static Dictionary.
• Semi-Adaptive Dictionary.
• Adaptive Dictionary.
– Lempel Ziv algorithms belong to this category of
dictionary coders. The dictionary is being built in a
single pass, while at the same time encoding the data.
– The decoder can build up the dictionary in the same
way as the encoder while decompressing the data.

Sudeepta Mishra CS200117052 [6]


Technical Seminar Presentation 2005

Dictionary-Based Compression: Example


National Institute of Science and Technology

• Using a English Dictionary the string:


“A good example of how dictionary based compression works”
• Gives : 1/1 822/3 674/4 1343/60 928/75 550/32 173/46 421/2
• Using the dictionary as lookup table, each word is coded as
x/y, where, x gives the page no. and y gives the number of
the word on that page. If the dictionary has 2,200 pages
with less than 256 entries per page: Therefore x requires 12
bits and y requires 8 bits, i.e., 20 bits per word (2.5 bytes per
word). Using ASCII coding the above string requires 48
bytes, whereas our encoding requires only 20 (<-2.5 * 8)
bytes: 50% compression.

Sudeepta Mishra CS200117052 [7]


Technical Seminar Presentation 2005

Lempel Ziv
National Institute of Science and Technology

• It is a family of algorithms, stemming from the two


algorithms proposed by Jacob Ziv and Abraham Lempel in
their landmark papers in 1977 and 1978.

LZ77 LZ78
LZJ

LZR LZFG
LZW

LZSS LZH LZB LZC LZT LZMW

Sudeepta Mishra CS200117052 [8]


Technical Seminar Presentation 2005

LZW Algorithm
National Institute of Science and Technology

• It is An improved version of LZ78 algorithm.


• Published by Terry Welch in 1984.
• A dictionary that is indexed by “codes” is used.
The dictionary is assumed to be initialized with
256 entries (indexed with ASCII codes 0 through
255) representing the ASCII table.

Sudeepta Mishra CS200117052 [9]


Technical Seminar Presentation 2005

The LZW Algorithm (Compression)


National Institute of Science and Technology

W = NIL;
while (there is input){
K = next symbol from input;
if (WK exists in the dictionary) {
W = WK;
} else {
output (index(W));
add WK to the dictionary;
W = K;
}
}

Sudeepta Mishra CS200117052 [10]


Technical Seminar Presentation 2005

The LZW Algorithm (Compression) Flow Chart


National Institute of Science and Technology

START

W= NULL

YES
IS EOF STOP
? NO

K=NEXT INPUT

YES
IS WK
W=WK
FOUND?
NO

OUTPUT INDEX OF W

ADD WK TO DICTIONARY

W=K

Sudeepta Mishra CS200117052 [11]


Technical Seminar Presentation 2005

The LZW Algorithm (Compression) Example


National Institute of Science and Technology

• Input string is a b d c a d a c
• The Initial Dictionary
contains symbols like
a, b, c, d with their
index values as 1, 2, 3,
4 respectively.
• Now the input string
a 1
is read from left to
right. Starting from b 2
a. c 3
d 4

Sudeepta Mishra CS200117052 [12]


Technical Seminar Presentation 2005

The LZW Algorithm (Compression) Example


National Institute of Science and Technology

• W = Null a b d c a d a c
• K=a
• WK = a
K
In the dictionary.

a 1
b 2
c 3
d 4

Sudeepta Mishra CS200117052 [13]


Technical Seminar Presentation 2005

The LZW Algorithm (Compression) Example


National Institute of Science and Technology

• K = b. a b d c a d a c
• WK = ab
is not in the dictionary.
K
• Add WK to
dictionary 1
• Output code for a.
a 1 ab 5
• Set W = b
b 2
c 3
d 4

Sudeepta Mishra CS200117052 [14]


Technical Seminar Presentation 2005

The LZW Algorithm (Compression) Example


National Institute of Science and Technology

• K=d a b d c a d a c
• WK = bd
Not in the dictionary.
K
Add bd to dictionary.
• Output code b
1 2
• Set W = d a 1 ab 5
b 2 bd 6
c 3
d 4

Sudeepta Mishra CS200117052 [15]


Technical Seminar Presentation 2005

The LZW Algorithm (Compression) Example


National Institute of Science and Technology

• K=a a b d a b d a c
• WK = da
not in the dictionary.
K
• Add it to dictionary.
1 2 4
• Output code d
• Set W = a a 1 ab 5
b 2 bd 6
c 3 da 7
d 4

Sudeepta Mishra CS200117052 [16]


Technical Seminar Presentation 2005

The LZW Algorithm (Compression) Example


National Institute of Science and Technology

• K=b a b d a b d a c
• WK = ab
It is in the dictionary.
K
1 2 4

a 1 ab 5
b 2 bd 6
c 3 da 7
d 4

Sudeepta Mishra CS200117052 [17]


Technical Seminar Presentation 2005

The LZW Algorithm (Compression) Example


National Institute of Science and Technology

• K=d a b d a b d a c
• WK = abd
Not in the dictionary.
K
• Add W to the
1 2 4 5
dictionary.
• Output code for W.
a 1 ab 5
• Set W = d
b 2 bd 6
c 3 da 7
d 4 abd 8

Sudeepta Mishra CS200117052 [18]


Technical Seminar Presentation 2005

The LZW Algorithm (Compression) Example


National Institute of Science and Technology

• K=a a b d a b d a c
• WK = da
In the dictionary.
K
1 2 4 5

a 1 ab 5
b 2 bd 6
c 3 da 7
d 4 abd 8

Sudeepta Mishra CS200117052 [19]


Technical Seminar Presentation 2005

The LZW Algorithm (Compression) Example


National Institute of Science and Technology

• K=c a b d a b d a c
• WK = dac
Not in the dictionary.
K
• Add WK to the
1 2 4 5 7
dictionary.
• Output code for W.
a 1 ab 5 dac 9
• Set W = c
b 2 bd 6
• No input left so
c 3 da 7
output code for W.
d 4 abd 8

Sudeepta Mishra CS200117052 [20]


Technical Seminar Presentation 2005

The LZW Algorithm (Compression) Example


National Institute of Science and Technology

• The final output a b d a b d a c


string is
124573
K
• Stop.
1 2 4 5 7 3

a 1 ab 5 dac 9
b 2 bd 6
c 3 da 7
d 4 abd 8

Sudeepta Mishra CS200117052 [21]


Technical Seminar Presentation 2005

LZW Decompression Algorithm


National Institute of Science and Technology

read a character k;
output k;
w = k;
while ( read a character k )
/* k could be a character or a code. */
{ entry = dictionary entry for k;
output entry;
add w + entry[0] to dictionary;
w = entry; }

Sudeepta Mishra CS200117052 [22]


Technical Seminar Presentation 2005

LZW Decompression Algorithm Flow Chart


National Institute of Science and Technology

START

K=INPUT

Output K

W=K

YES
IS EOF STOP
?
NO

K=NEXT INPUT

ENTRY=DICTIONARY INDEX (K)

Output ENTRY

ADD W+ENTRY[0] TO DICTIONARY

W=ENTRY

Sudeepta Mishra CS200117052 [23]


Technical Seminar Presentation 2005

The LZW Algorithm (Decompression) Example


National Institute of Science and Technology

1 2 4 5 7 3
• K=1
• Out put K (i.e. a)
K
• W=K
a

a 1
b 2
c 3
d 4

Sudeepta Mishra CS200117052 [24]


Technical Seminar Presentation 2005

The LZW Algorithm (Decompression) Example


National Institute of Science and Technology

1 2 4 5 7 3
• K=2
• entry = b
K
• Output entry
• Add W + entry[0] to a b
dictionary
• W = entry[0] (i.e. b)
a 1 ab 5
b 2
c 3
d 4

Sudeepta Mishra CS200117052 [25]


Technical Seminar Presentation 2005

The LZW Algorithm (Decompression) Example


National Institute of Science and Technology

1 2 4 5 7 3
• K=4
• entry = d
K
• Output entry
• Add W + entry[0] to a b d
dictionary
• W = entry[0] (i.e. d)
a 1 ab 5
b 2 bd 6
c 3
d 4

Sudeepta Mishra CS200117052 [26]


Technical Seminar Presentation 2005

The LZW Algorithm (Decompression) Example


National Institute of Science and Technology

1 2 4 5 7 3
• K=5
• entry = ab
K
• Output entry
• Add W + entry[0] to a b d a b
dictionary
• W = entry[0] (i.e. a)
a 1 ab 5
b 2 bd 6
c 3 da 7
d 4

Sudeepta Mishra CS200117052 [27]


Technical Seminar Presentation 2005

The LZW Algorithm (Decompression) Example


National Institute of Science and Technology

1 2 4 5 7 3
• K=7
• entry = da
K
• Output entry
• Add W + entry[0] to a b d a b d a
dictionary
• W = entry[0] (i.e. d)
a 1 ab 5
b 2 bd 6
c 3 da 7
d 4 abd 8

Sudeepta Mishra CS200117052 [28]


Technical Seminar Presentation 2005

The LZW Algorithm (Decompression) Example


National Institute of Science and Technology

1 2 4 5 7 3
• K=3
• entry = c
K
• Output entry
• Add W + entry[0] to a b d a b d a c
dictionary
• W = entry[0] (i.e. c)
a 1 ab 5 dac 9
b 2 bd 6
c 3 da 7
d 4 abd 8

Sudeepta Mishra CS200117052 [29]


Technical Seminar Presentation 2005

Advantages
National Institute of Science and Technology

• As LZW is adaptive dictionary coding no need to


transfer the dictionary explicitly.
• It will be created at the decoder side.
• LZW can be made really fast, it grabs a fixed number
of bits from input, so bit parsing is very easy, and table
look up is automatic.

Sudeepta Mishra CS200117052 [30]


Technical Seminar Presentation 2005

Problems with the encoder


National Institute of Science and Technology

• What if we run out of space?


– Keep track of unused entries and use LRU (Last
Recently Used).
– Monitor compression performance and flush
dictionary when performance is poor.

Sudeepta Mishra CS200117052 [31]


Technical Seminar Presentation 2005

Conclusion
National Institute of Science and Technology

• LZW has given new dimensions for the development of


new compression techniques.
• It has been implemented in well known compression
format like Acrobat PDF and many other types of
compression packages.
• In combination with other compression techniques
many other different compression techniques are
developed like LZMS.

Sudeepta Mishra CS200117052 [32]


Technical Seminar Presentation 2005

REFERENCES
National Institute of Science and Technology

[1] http://www.bambooweb.com/articles/d/a/Data_Compression.html
[2] http://tuxtina.de/files/seminar/LempelZivReport.pdf
[3] BELL, T. C., CLEARY, J. G., AND WITTEN, I. H. Text
Compression. Prentice Hall, Upper Sadle River, NJ, 1990.
[4] http://www.cs.cf.ac.uk/Dave/Multimedia/node214.html
[5] http://download.cdsoft.co.uk/tutorials/rlecompression/Run-
Length Encoding (RLE) Tutorial.htm
[6] David Salomon, Data Compression The Complete Reference,
Second Edition. Springer-Verlac, New York, Inc, 2001 reprint.
[7] http://www.programmersheaven.com/2/Art_Huffman_p1.htm
[8] http://www.programmersheaven.com/2/Art_Huffman_p2.htm
[9] Khalid Sayood, Introduction to Data Compression Second
Edition, Chapter 5, pp. 137-157, Harcourt India Private Limited.

Sudeepta Mishra CS200117052 [33]


Technical Seminar Presentation 2005
National Institute of Science and Technology

Thank You

Sudeepta Mishra CS200117052 [34]

You might also like