You are on page 1of 102

LOSSLESS SOURCE

CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING

Lossless Source Coding Algorithms

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

Frans M.J. Willems


Department of Electrical Engineering
Eindhoven University of Technology

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

ISIT 2013, Istanbul, Turkey

Outline
LOSSLESS SOURCE
CODING ALGORITHMS

1
2

Frans M.J. Willems


INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
6
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
7

INTRODUCTION
HUFFMAN and TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding, Individual Redundancy
CONTEXT-TREE WEIGHTING
IID, unknown
Binary Tree-Sources
Context Trees
Coding Probabilities
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Choosing a Topic
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

POSSIBLE TOPICS:
Multi-user Information Theory (with Edward van der Meulen (KUL),
Andries Hekstra)
Lossless Source Coding (with Tjalling Tjalkens, Yuri Shtarkov (IPPI),
Paul Volf)
Watermarking, Embedding, and Semantic Coding (with Martin van
Dijk, Ton Kalker (Philips Research))
Biometrics (with Tanya Ignatenko)
LOSSLESS SOURCE CODING ALGORITHMS
WHY?
Not many sessions at ISIT 2012! Is lossless source coding DEAD?
Lossless Source Coding is about UNDERSTANDING data. Universal
Lossless Source Coding is focussing on FINDING STRUCTURE in
data. MDL principle [Rissanen].
ALGORITHMS are fun (Piet Schalkwijk).

Choosing a Topic
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

POSSIBLE TOPICS:
Multi-user Information Theory (with Edward van der Meulen (KUL),
Andries Hekstra)
Lossless Source Coding (with Tjalling Tjalkens, Yuri Shtarkov (IPPI),
Paul Volf)
Watermarking, Embedding, and Semantic Coding (with Martin van
Dijk, Ton Kalker (Philips Research))
Biometrics (with Tanya Ignatenko)
LOSSLESS SOURCE CODING ALGORITHMS
WHY?
Not many sessions at ISIT 2012! Is lossless source coding DEAD?
Lossless Source Coding is about UNDERSTANDING data. Universal
Lossless Source Coding is focussing on FINDING STRUCTURE in
data. MDL principle [Rissanen].
ALGORITHMS are fun (Piet Schalkwijk).

Choosing a Topic
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

POSSIBLE TOPICS:
Multi-user Information Theory (with Edward van der Meulen (KUL),
Andries Hekstra)
Lossless Source Coding (with Tjalling Tjalkens, Yuri Shtarkov (IPPI),
Paul Volf)
Watermarking, Embedding, and Semantic Coding (with Martin van
Dijk, Ton Kalker (Philips Research))
Biometrics (with Tanya Ignatenko)
LOSSLESS SOURCE CODING ALGORITHMS
WHY?
Not many sessions at ISIT 2012! Is lossless source coding DEAD?
Lossless Source Coding is about UNDERSTANDING data. Universal
Lossless Source Coding is focussing on FINDING STRUCTURE in
data. MDL principle [Rissanen].
ALGORITHMS are fun (Piet Schalkwijk).

Lecture Structure
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

TUTORIAL, binary case, my favorite algorithms, ...


REMARKS, open problems, ...

Outline
LOSSLESS SOURCE
CODING ALGORITHMS

1
2

Frans M.J. Willems


INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
6
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
7

INTRODUCTION
HUFFMAN and TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding, Individual Redundancy
CONTEXT-TREE WEIGHTING
IID, unknown
Binary Tree-Sources
Context Trees
Coding Probabilities
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Binary Sources, Sequences, IID


LOSSLESS SOURCE
CODING ALGORITHMS

INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

x1 x2 xN

Binary Source

Frans M.J. Willems

The binary source produces a sequence x1N = x1 x2 xN with components


{0, 1} with probability P(x1N ).
Definition (Binary IID Source)

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

For an independent identically distributed (i.i.d.) source with parameter


, for 0 1,
N
Y
P(x1N ) =
P(xn ),
n=1

where
P(1) = , and P(0) = 1 .
A sequence

x1N

containing N w zeros and w ones has probability


P(x1N ) = (1 )Nw w .

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Entropy IID Source

The ENTROPY of this source is h() = (1 ) log2

1
1

+ log2

(bits).

Binary Sources, Sequences, IID


LOSSLESS SOURCE
CODING ALGORITHMS

INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

x1 x2 xN

Binary Source

Frans M.J. Willems

The binary source produces a sequence x1N = x1 x2 xN with components


{0, 1} with probability P(x1N ).
Definition (Binary IID Source)

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

For an independent identically distributed (i.i.d.) source with parameter


, for 0 1,
N
Y
P(x1N ) =
P(xn ),
n=1

where
P(1) = , and P(0) = 1 .
A sequence

x1N

containing N w zeros and w ones has probability


P(x1N ) = (1 )Nw w .

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Entropy IID Source

The ENTROPY of this source is h() = (1 ) log2

1
1

+ log2

(bits).

Fixed-to-Variable (FV) Length Codes


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

IDEA:
Give more probable sequences shorter codewords than less probable
sequences.

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

Definition (FV-Length Code)


A FV-length code assigns to source sequence x1N a binary codeword c(x1N )
of length L(x1N ). The rate of a FV code is

R=

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

E [L(X1N )]
(code-symbols/source-symbol).
N

GOAL:
We would like to find decodable FV-length codes that MINIMIZE this rate.

Prefix Codes
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Definition (Prefix code)


In a prefix code no codeword is the prefix of any other codeword.
We focus on prefix codes. Codewords in a prefix code can be regarded as
leaves in a rooted tree. Prefix codes lead to instantaneous decodability.
Example
x1N
00
01
10
11

c(x1N )
0
10
110
111

L(x1N )
1
2
3
3

111
11
1

110
10

Prefix-Codes (cont.)
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems

Theorem (Kraft, 1949)


(a) The lengths of the codewords in a prefix code satisfy Krafts inequality

INTRODUCTION

X
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

2L(x1 ) 1.

x1N X N

(b) For codeword lengths satisfying Krafts inequality there exists a prefix
code with these lengths.
This leads to:

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Theorem (Fano, 1961)


(a) Any prefix code satisfies
E [L(X1N )] H(X1N ) = Nh(),
or equivalently R h(). The minimum is achieved if and only if
L(x1N ) = log2 (P(x1N )) (ideal codeword length) for all x1N X N with
nonzero P(x1N ).
(b) There exist prefix codes with
E [L(X1N )] < H(X1N ) + 1 = Nh() + 1,
or equivalently R < h() + 1/N.

Huffmans Code
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

Definition (Optimal FV-length Code)


A code that minimizes the expected codeword-length E [L(X1N )] (and the
rate R) is called optimal.

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Theorem (Huffman, 1952)


The Huffman construction leads to an optimal FV-length code.
CONSTRUCTION:
Consider the set of probabilities {P(x1N ), x1N X N }.
Replace two smallest probabilities by a probability which is their sum.
Label the branches from these two smallest probabilities to their sum
with code-symbols 0 and 1.
Continue like this until only one probability (equal to 1) is left.
Obviously Huffmans code results in E [L(X1N )] < H(X1N ) + 1 = Nh() + 1
and therefore R < h() + 1/N.

Huffmans Code
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

Definition (Optimal FV-length Code)


A code that minimizes the expected codeword-length E [L(X1N )] (and the
rate R) is called optimal.

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Theorem (Huffman, 1952)


The Huffman construction leads to an optimal FV-length code.
CONSTRUCTION:
Consider the set of probabilities {P(x1N ), x1N X N }.
Replace two smallest probabilities by a probability which is their sum.
Label the branches from these two smallest probabilities to their sum
with code-symbols 0 and 1.
Continue like this until only one probability (equal to 1) is left.
Obviously Huffmans code results in E [L(X1N )] < H(X1N ) + 1 = Nh() + 1
and therefore R < h() + 1/N.

Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

.063

0
1

.090

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

.063

0
0

.147

.126

.216

1
.063

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.147

.363
0

.147
.343

.294

0
1

.637

1.00

Now E [L(X1N )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+


2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.

Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

.063

0
1

.090

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

.063

0
0
.126

.147

.216

1
.063

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.147

.363
0

.147
.343

.294

0
1

.637

1.00

Now E [L(X1N )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+


2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.

Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

.063

0
1

.090

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

.063

0
0
.126

.147

.216

1
.063

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.147

.363
0

.147
.343

.294

0
1

.637

1.00

Now E [L(X1N )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+


2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.

Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

.063

0
1

.090

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

.063

0
0
.126

.147

.216

1
.063

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.147

.363
0

.147
.343

.294

0
1

.637

1.00

Now E [L(X1N )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+


2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.

Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

.063

0
1

.090

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

.063

0
0
.126

.147

.216

1
.063

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.147

.363
0

.147
.343

.294

0
1

.637

1.00

Now E [L(X1N )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+


2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.

Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

.063

0
1

.090

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

.063

0
0
.126

.147

.216

1
.063

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.147

.363
0

.147
.343

.294

0
1

.637

1.00

Now E [L(X1N )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+


2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.

Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

.063

0
1

.090

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

.063

0
0
.126

.147

.216

1
.063

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.147

.363
0

.147
.343

.294

0
1

.637

1.00

Now E [L(X1N )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+


2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.

Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

.063

0
1

.090

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

.063

0
0
.126

.147

.216

1
.063

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.147

.363
0

.147
.343

.294

0
1

.637

1.00

Now E [L(X1N )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+


2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.

Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

.063

0
1

.090

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

.063

0
0
.126

.147

.216

1
.063

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.147

.363
0

.147
.343

.294

0
1

.637

1.00

Now E [L(X1N )] = 4(.027 + .063 + .063 + .063) + 3(.147 + .147)+


2(.147 + .343) = 2.726. Therefore R = 2.726/3 = 0.909.

Remarks: Huffman Code


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Note that R h() when N .


Always E [L(X1N )] 1. For 0 a Huffman code has expected
codeword length E [L(X1N )] 1 and rate R 1/N.
Better bounds exist for Huffman codes than E [L(X1N )] < H(X1N ) + 1.
E.g. Gallager [1978] showed that
E [L(X1N )] H(X1N ) max P(x1N ) + 0.086.
x1N

Adaptive Huffman Codes (Gallager [1978]).

Variable-to-Fixed (VF) Length Codes


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

IDEA:
Parse the source output into variable-length segments of roughly the same
probability. Code all these segments with codewords of fixed length.
Definition (VF-Length Code)
A VF-length code is defined by a set of variable-length source segments.
Each segment x in the set gets a unique binary codeword c(x ) of length
L. The length of a segment x is denoted as N(x ). The rate of a
VF-code is

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

R=

L
(code-symbols/source symbol).
E [N(X )]

GOAL:
We would like to find parsable VF-length codes that MINIMIZE this rate.

Proper-and-Complete Segment Sets


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Definition (Proper-and-Complete Segment Sets)

HUFFMAN-TUNSTALL

A set of source segments is proper-and-complete if each semi-infinite


source sequence has a unique prefix in this segment set.

Binary IID Sources


Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

We focus on proper-and-complete segments sets. Segments in a


proper-and-complete set can be regarded as leaves in a rooted tree. Such
sets guarantee instantaneous parsability.

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Example
x
1
01
001
000

N(x )
1
2
3
3

c(x )
11
10
01
00

01
0

001
00
000

Proper-and-Complete Segment Sets: Leaf-Node Lemma


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Assume that the source is IID with parameter . Consider a set of segments
and all their prefixes. Depict them in a tree. The segments are leaves, the
prefixes nodes. Note that all the nodes and leaves have a probability. E.g.
P(10) = (1 ). Let F () be a function on nodes, leaves.

HUFFMAN-TUNSTALL

Binary IID Sources


Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

1 F (101)
0

F ()
0

F (11)

F (1)
F (10)

0 F (100)

F (0)

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Lemma (Massey, 1983)


X
P(l)[F (l) F ()] =
lleaves

X
nnodes

P(n)

X
ssons of n

P(s)
[F (s) F (n)].
P(n)

Let F (x ) = # of edges from x to root, then


X
E [N(X )] =
P(x ).
x nodes

Let F (x ) = log2 P(x ), then


H(X ) = E [N(X )]h().

Proper-and-Complete Segment Sets: Leaf-Node Lemma


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Assume that the source is IID with parameter . Consider a set of segments
and all their prefixes. Depict them in a tree. The segments are leaves, the
prefixes nodes. Note that all the nodes and leaves have a probability. E.g.
P(10) = (1 ). Let F () be a function on nodes, leaves.

HUFFMAN-TUNSTALL

Binary IID Sources


Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

1 F (101)
0

F ()
0

F (11)

F (1)
F (10)

0 F (100)

F (0)

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Lemma (Massey, 1983)


X
P(l)[F (l) F ()] =
lleaves

X
nnodes

P(n)

X
ssons of n

P(s)
[F (s) F (n)].
P(n)

Let F (x ) = # of edges from x to root, then


X
E [N(X )] =
P(x ).
x nodes

Let F (x ) = log2 P(x ), then


H(X ) = E [N(X )]h().

Proper-and-Complete Segment Sets: Result


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

Theorem
For any proper-and-complete segment set with no more than 2L segments

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

L H(X ) = E [N(X )]h(),


or R = L/E [N(X )] h().

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

More precisely, since


R=

L
L
=
h(),
E [N(X )]
H(X )

we should make H(X ) as close as possible to L, hence all segments


should have roughly the same probability.

Tunstalls Code
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems

Consider 0 < 1/2.

INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

Definition (Optimal VF-length Code)


A code that maximizes the expected segment-length E [N(X )] is called
optimal. Such a code minimizes the rate R.

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Theorem (Tunstall, 1967)


The Tunstall construction leads to an optimal code.
CONSTRUCTION:
Start with the empty segment which has unit probability.
As long as the number of segments is smaller than 2L replace a
segment s with largest probability P(s) by two segments s0 and s1.
The probabilities of the new segments (leaves) are
P(s0) = P(s)(1 ) and P(s1) = P(s).
The Tunstall construction results in H(X ) L log2 (1/) and therefore
R L+logL () h() (Jelinek and Schneider [1972]).
2

Tunstalls Code
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems

Consider 0 < 1/2.

INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

Definition (Optimal VF-length Code)


A code that maximizes the expected segment-length E [N(X )] is called
optimal. Such a code minimizes the rate R.

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Theorem (Tunstall, 1967)


The Tunstall construction leads to an optimal code.
CONSTRUCTION:
Start with the empty segment which has unit probability.
As long as the number of segments is smaller than 2L replace a
segment s with largest probability P(s) by two segments s0 and s1.
The probabilities of the new segments (leaves) are
P(s0) = P(s)(1 ) and P(s1) = P(s).
The Tunstall construction results in H(X ) L log2 (1/) and therefore
R L+logL () h() (Jelinek and Schneider [1972]).
2

Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.

HUFFMAN-TUNSTALL

.090

Binary IID Sources


Huffman Code
Tunstall Code

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.063

ENUMERATIVE CODING

.300

1.00

1
0

.210

.210

.147

.700
0

.147

.343

.490

1
0

.103
1

.072

.168

.240

Now E [N(X )] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 and


therefore R = 3/3.283 = 0.914.

Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.

HUFFMAN-TUNSTALL

.090

Binary IID Sources


Huffman Code
Tunstall Code

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.063

ENUMERATIVE CODING

.300

1.00

1
0

.210

.210

.147

.700
0

.147

.343

.490

1
0

.103
1

.072

.168

.240

Now E [N(X )] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 and


therefore R = 3/3.283 = 0.914.

Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.

HUFFMAN-TUNSTALL

.090

Binary IID Sources


Huffman Code
Tunstall Code

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.063

ENUMERATIVE CODING

.300

1.00

1
0

.210

.210

.147

.700
0

.147

.343

.490

1
0

.103
1

.072

.168

.240

Now E [N(X )] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 and


therefore R = 3/3.283 = 0.914.

Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.

HUFFMAN-TUNSTALL

.090

Binary IID Sources


Huffman Code
Tunstall Code

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.063

ENUMERATIVE CODING

.300

1.00

1
0

.210

.210

.147

.700
0

.147

.343

.490

1
0

.103
1

.072

.168

.240

Now E [N(X )] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 and


therefore R = 3/3.283 = 0.914.

Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.

HUFFMAN-TUNSTALL

.090

Binary IID Sources


Huffman Code
Tunstall Code

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.063

ENUMERATIVE CODING

.300

1.00

1
0

.210

.210

.147

.700
0

.147

.343

.490

1
0

.103
1

.072

.168

.240

Now E [N(X )] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 and


therefore R = 3/3.283 = 0.914.

Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.

HUFFMAN-TUNSTALL

.090

Binary IID Sources


Huffman Code
Tunstall Code

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.063

ENUMERATIVE CODING

.300

1.00

1
0

.210

.210

.147

.700
0

.147

.343

.490

1
0

.103
1

.072

.168

.240

Now E [N(X )] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 and


therefore R = 3/3.283 = 0.914.

Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.

HUFFMAN-TUNSTALL

.090

Binary IID Sources


Huffman Code
Tunstall Code

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.063

ENUMERATIVE CODING

.300

1.00

1
0

.210

.210

.147

.700
0

.147

.343

.490

1
0

.103
1

.072

.168

.240

Now E [N(X )] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 and


therefore R = 3/3.283 = 0.914.

Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.

HUFFMAN-TUNSTALL

.090

Binary IID Sources


Huffman Code
Tunstall Code

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.063

ENUMERATIVE CODING

.300

1.00

1
0

.210

.210

.147

.700
0

.147

.343

.490

1
0

.103
1

.072

.168

.240

Now E [N(X )] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 and


therefore R = 3/3.283 = 0.914.

Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.

HUFFMAN-TUNSTALL

.090

Binary IID Sources


Huffman Code
Tunstall Code

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

.063

ENUMERATIVE CODING

.300

1.00

1
0

.210

.210

.147

.700
0

.147

.343

.490

1
0

.103
1

.072

.168

.240

Now E [N(X )] = 1.0 + .7 + .3 + .49 + .21 + .343 + .240 = 3.283 and


therefore R = 3/3.283 = 0.914.

Remarks: Tunstall Code


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Note that R h() when L .


For 0 a Tunstall code has expected segment length
E [N(X )] 2L 1 and rate R L/(2L 1). Better than Huffman
for L = N.
In each step in the Tunstall procedure, a leaf with the largest
probability is changed into a node. This leads to:
The largest increase in expected segment length (Massey LN-lemma),
and P(n) P(l) for all nodes n and leaves l.
Therefore for any two leafs l and l 0 we can say that
0

P(l) P(n) P(l ).


So leaves cannot differ too much in probability. This fact is used to lower
bound H(X ) L log2 (1/).

Optimal VF-length codes can also be found by fixing a number and


defining a node to be internal if its probability is (Khodak, 1969).
The size of the segment set is not completely controllable now.
Run-length Codes (Golomb [1966]).

Outline
LOSSLESS SOURCE
CODING ALGORITHMS

1
2

Frans M.J. Willems


INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
6
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
7

INTRODUCTION
HUFFMAN and TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding, Individual Redundancy
CONTEXT-TREE WEIGHTING
IID, unknown
Binary Tree-Sources
Context Trees
Coding Probabilities
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Lexicographical Ordering
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

IDEA:
Sequences having the same weight (and probability) only need to be
INDEXED. The binary representation of the index can be taken as
codeword.
Definition (Lexicographical Ordering, Index)
In a lexicographical ordering (0 < 1) we say that x1N < y1N if xn < yn for
the smallest index n such that xn 6= yn .
Consider a subset S of the set {0, 1}N . Let iS (x1N ) be the lexicographical
index of x1N S, i.e., the number of sequences y1N < x1N for y1N S.
Example
Let N = 5 and S = {x1N : w (x1N ) = 2} where w (x1N ) is the weight of x1N .

Then |S| = 52 = 10 and:
iS (11000) = 9

iS (01100) = 4

iS (10100) = 8

iS (01010) = 3

iS (10010) = 7

iS (01001) = 2

iS (10001) = 6

iS (00110) = 1

iS (01100) = 5

iS (00011) = 0

Sequential Enumeration
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems

Theorem (Cover, 1973)


INTRODUCTION
HUFFMAN-TUNSTALL

From the sequence x1N S we can compute index

Binary IID Sources


Huffman Code
Tunstall Code

iS (x1N ) =

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

#S (x1 , x2 , , xn1 , 0),

n=1,N:xn =1

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

where #S (x1 , x2 , , xk ) denotes the number of sequences in S


having prefix x1 , , xk .
Moreover from the index iS (x1N ) the sequence x1N can be computed if
numbers #S (x1 , x2 , , xn1 , 0) for n = 1, N are available.
The index of a sequence can be represented by a codeword of fixed length
dlog2 |S|e.
Example

Index iS (10100) = #S (0) + #S (100) = 42 +
|S| = 10 the corresponding codeword is 1000.

2
1

= 6 + 2 = 8 hence, since

FV: Pascal-Triangle Method


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

IDEA:
Index sequences of fixed weight. Later use a Huffman code (or a
fixed-length code) to describe the weights.

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING

Example (Lynch (1966), Davisson (1966), Schalkwijk (1972))



P
Let N = 5 and S = {x1N :
xn = 2}. Then |S| = 52 = 10.
1

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

1
0

ARITHMETIC CODING

Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

1
0

10

CONCLUSION

i(10100) = 6 + 2 = 8.

0
3

1
0

0
6

0
2

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy

Index from Sequence:

0
3

1
1
0
1

Sequence from Index:


Index i = 8, now
a) 8 6 hence x1 = 1,
b) i < 6 + 3 hence x2 = 0,
c) i 6 + 2 hence x3 = 1,
d) x4 = x5 = 0.
Pascal Triangle.

FV: Pascal-Triangle Method (cont.)


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

First note that


H(X1N ) = H(X1N , w (X1N )) = H(W ) + H(X1N |W ).
If we use enumerative coding for X1N given weight w , since all sequences
with a fixed weight have equal probability

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

E [L(X1N |W )]

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

P(w ) log2 d

w =0,1,N

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

<

P(w ) log2

N 
e
w

N 

w =0,1,N

+ 1 = H(X1N |W ) + 1.

If W is encoded using a Huffman code we obtain


E [L(X1N )]

E [L(W )] + E [L(X1N |W )]

H(W ) + 1 + H(X1N |W ) + 1

H(X1N ) + 2.

Worse than Huffman, but no big code-table needed however.

Remarks: FV Pascal-Triangle Method


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Enumeration for sequences generated by Markov Sources (Cover


[1973]).
Universal approach:
Davisson [1966]
If W is encoded with a fixed-length codeword of dlog2 (N + 1)e bits, then
entropy is achieved for every for N .
Lexicographical ordering also possible for variable-length source
segments.

VF: Petry Code


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

IDEA:
Modify the Tunstall segment sets such that the segments can be indexed.
Again let 0 < 1/2. It can be shown that a proper-and-complete
segment set is a Tunstall set (maximal E [N(X )] given the number of
segments) if and only if for all nodes n and all leaves l
P(n) P(l).

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING

Consequence
If the segments x in a proper-and-complete segment set satisfy

Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

P(x 1 ) > P(x ),


where x 1 is x without the last symbol, this segment set is a Tunstall
set. Constant determines the size of the set.
Since

P(x ) = (1 )n0 (x ) n1 (x ) ,

where n0 (x ) is the number or zeros in x and n1 (x ) the number of ones


in x , etc., this is equivalent to
An0 (x 1 ) + Bn1 (x 1 ) < C An0 (x ) + Bn1 (x )
for A = logb (1 ), B = logb , C = logb , and some log-base b.

VF: Petry Code (cont.)


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems

Note that log-base b has to satisfy

INTRODUCTION

1 = (1 ) + = b A + b B .

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

2
For special values of , A and B are integers.
E.g. for = (1 ) we
obtain that A = 1 and B = 2 for b = (1 + 5)/2. Now C can also
assumed to be integer. The corresponding codes are called Petry codes.

Definition (Petry (Schalkwijk), 1982)


Fix integers A and B. The segments x in a proper-and-complete Petry
segment set satisfy
An0 (x 1 ) + Bn1 (x 1 ) < C An0 (x ) + Bn1 (x ).
Integer C can be chosen to control the size of the set.
Linear Array
Petry codes can be implemented using a linear array.

VF: Petry Code (cont.)


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

Example
Consider A = 1, B = 2. Costs, step-sizes.
For given C , let S(C ) denote the resulting segment set and (C ) its
cardinality. Let S(1) = S(0) = , then S(1) = {0, 1}, S(2) = {00, 01, 1},
etc. Moreover now (1) = (0) = 1, (1) = 2 and (2) = 3, etc. It is
easy to see that
(C ) = (C 1) + (C 2),
and therefore (3) = 5, (4) = 8, (5) = 13, (6) = 21, (7) = 34, and
(8) = 55.
Now take C = 8. Note that 010010 S(8).
We can now determine the index i(010010) using Covers formula:

CONTEXT-TREE
WEIGHTING

i(010010) = #(00) + #(01000) = (6) + (2) = 21 + 3 = 24.

IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

55

34

21

13

VF: Petry Code (cont.)


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

Theorem (Tjalkens & W. (1987))


A Petry code with parameters A < B, and C is a Tunstall code for
parameter q where q = b B when b is the solution of b A + b B = 1.
For arbitrary the rate

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

log2 (C )
C + (B 1)

(h() + d(||q)).
E [N(X )]
C

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Example
In the table q for several values of A and B:
A
1
2
3

2
0.382

3
0.318
0.4302

4
0.276
0.382
0.450

5
0.245
0.346
0.412

Remarks: VF Petry Code


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Note that log2 (C )/E [N(X )] h() + d(|q) when C , hence


a Petry code achieves entropy for = q.
Tjalkens and W. investigated VF-length Petry codes for Markov
sources, again with a linear array for each state.
VF-length universal enumerative solutions exist (Lawrence [1977],
Tjalkens and W. [1992]).
The numbers in the linear array show exponential behaviour. Also an
array b2i/M ef for i = 1, M can be used, through which we make
steps (Tjalkens [PhD, 1987]). This reduces the storage complexity
and is similar to Rissanen [1976] multiplication-avoiding arithmetic
coding (Generalized Kraft inequality).

Outline
LOSSLESS SOURCE
CODING ALGORITHMS

1
2

Frans M.J. Willems


INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
6
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
7

INTRODUCTION
HUFFMAN and TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding, Individual Redundancy
CONTEXT-TREE WEIGHTING
IID, unknown
Binary Tree-Sources
Context Trees
Coding Probabilities
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Idea Elias
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Elias:
If source sequences are ORDERED LEXICOGRAPHICALLY then
codewords can be COMPUTED SEQUENTIALLY from the source
sequence using conditional PROBABILITIES of next symbol given the
previous ones, and vice versa.

Source Intervals
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

Definition
Order the source sequences x1N {0, 1}N lexicographically according to
0 < 1.
Now, to each source sequence x1N {0, 1}N there corresponds a
source-interval
I (x1N ) = [Q(x1N ), Q(x1N ) + P(x1N ))
with
Q(x1N ) =

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

P(
x N ).

x1N <x1N

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

By construction the source intervals are all disjoint. Their union is [0, 1).
Example
Consider an I.I.D. source with = 0.2 and N = 2.
x1N
00
01
10
11

P(x1N )
0.64
0.16
0.16
0.04

Q(x1N )
0
0.64
0.8
0.96

I (x1N )
[0 , 0.64)
[0.64 , 0.8)
[0.8 , 0.96)
[0.96 , 1)

Code Intervals
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

A codeword c with length L can be regarded as a binary fraction .c.


If we concatenate this codeword with others the corresponding fraction
can increase, but no more than 2L .
Definition
To a codeword c(x1N ) with length L(x1N ) there corresponds a code interval
N

J(x1N ) = [.c(x1N ), .c(x1N ) + 2L(x1 ) ).


Note that J(x1N ) [0, 1).

Arithmetic coding: Encoding and Decoding


LOSSLESS SOURCE
CODING ALGORITHMS

Procedure

Frans M.J. Willems

ENCODING: Choose c such that the code interval source interval,


i.e.
[.c, .c + 2L ) [Q(x1N ), Q(x1N ) + P(x1N )).

INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

DECODING: Is possible since there is only one source interval that


contains the code interval.

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

Theorem
For sequence x1N with source-interval I (x1N ) = [Q(x1N ), Q(x1N ) + P(x1N ))
take c(x1N ) as the codeword with

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

CONCLUSION

dlog2

.c(x1N )

dQ(x1N ) 2L(x1 ) e 2L(x1 ) .

Then
J(c(x1N )) I (x1N ).

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy

1
e+1
P(x1N )

L(x1N )

and
L(x1N ) < log2

1
+ 2,
P(x1N )

i.e. less than two bits above the ideal codeword length.

LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

Example
I.I.D. source with = 0.2 and N = 2.
1.00
0.96

11
10

ENUMERATIVE CODING

0.80

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

0.64

01

111110
1101
1011

63/64
31/32
7/8
13/16
3/4
11/16

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING

00

IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

1/4
00
Source Intervals
0.00

Code Intervals
0

Source-intervals are disjoint code-intervals are disjoint prefix


condition holds.

LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

Example
I.I.D. source with = 0.2 and N = 2.
1.00
0.96

11
10

ENUMERATIVE CODING

0.80

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

0.64

01

111110
1101
1011

63/64
31/32
7/8
13/16
3/4
11/16

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING

00

IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

1/4
00
Source Intervals
0.00

Code Intervals
0

Source-intervals are disjoint code-intervals are disjoint prefix


condition holds.

Arithmetic Coding: Sequential Computation (Elias)


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems

Example (Connection to Covers formula)


Let L = 3 and = 0.2.

INTRODUCTION

111

HUFFMAN-TUNSTALL

11

Binary IID Sources


Huffman Code
Tunstall Code

110
1

ENUMERATIVE CODING

101

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING

10
100

Intervals
Universal Coding,
Individual Redundancy

011
01

CONTEXT-TREE
WEIGHTING

010
0

IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

001
00
000

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Q(101)

P(0) + P(100) = 0.8 + 0.2 0.8 0.8 = 0.928.

P(101)

P(1)P(0)P(1) = 0.2 0.8 0.2 = 0.032.

Arithmetic Coding: Sequential Computation (Elias)


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

In general
Q(x1N )

P(x1N )

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

N
Y

P(xn |x1 , x2 , , xn1 ).

n=1

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

P(x1 , x2 , , xn1 , 0),

n=1,N:xn =1

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

Sequential Computation
If we have access to P(x1 , x2 , , xn , 0) and P(x1 , x2 , , xn , 1) after
having processed P(x1 , x2 , , xn ) for n = 1, 2, , N we can compute
I (x1N ) sequentially.

Universal Coding
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Coding Probabilities
If the actual probabilities P(x1N ) are not known arithmetic coding is still
possible if instead of P(x1N ) we use coding probabilities Pc (x1N ) satisfying

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

Pc (x1N )

>

0 for all x1N , and

Pc (x1N )

1.

x1N

Then
L(x1N ) < log2

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

1
+ 2.
Pc (x1N )

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

x1N

Encoder

c(x1N )

Decoder

x1N

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy

Pc (x1 xn1 , 0), Pc (x1 xn1 , 1)


for n = 1, N

Pc (x1 xn1 , 0), Pc (x1 xn1 , 1)


for n = 1, N

CONCLUSION

PROBLEM: How do we choose the coding probabilities Pc (x1N )?

Individual Redundancy
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Definition
The individual redundancy (x1N ) of a sequence x1N is defined as

HUFFMAN-TUNSTALL

(x1N ) = L(x1N ) log2

Binary IID Sources


Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

i.e. codeword-length minus ideal codeword-length.


Bound Individual Redundancy
Arithmetic coding based on coding probabilities {Pc (x1N ), x1N {0, 1}N }
yields

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

1
,
P(x1N )

(x1N ) < log2

P(x1N )
1
1
+ 2 log2
= log2
+ 2.
N
N
Pc (x1 )
P(x1 )
Pc (x1N )

We say that the CODING redundancy < 2 bits.


The coding probabilities should be as large as possible (as close as possible
to the actual probabilities). Next focus on remaining part of the individual
redundancy log2

P(x1N )
.
Pc (x1N )

Remarks: Arithmetic Coding


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Shannon [1948] already described relation between codewords and


intervals, ordered probabilities however. Called Shannon-Fano code.
Shannon-Fano-Elias, arbitrary ordering, but not sequential.
Finite precision issues arithmetic coding solved by Pasco [1976] and
Rissanen [1976].

Outline
LOSSLESS SOURCE
CODING ALGORITHMS

1
2

Frans M.J. Willems


INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
6
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
7

INTRODUCTION
HUFFMAN and TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding, Individual Redundancy
CONTEXT-TREE WEIGHTING
IID, unknown
Binary Tree-Sources
Context Trees
Coding Probabilities
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

CTW: Universal Codes


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

IDEA:
Find good coding probabilities for sources with UNKNOWN
PARAMETERS and STRUCTURE. Use WEIGHTING!

Coding for a Binary IID Source, Unknown


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Definition (Krichevsky-Trofimov estimator (1981))


A good coding probability Pc (x1N ) for a sequence x1N that contains a zeroes
and b = N a ones is
Z 1
1
p
(1 )a b d.
Pe (a, b) =
=0 (1 )
(Dirichlet-(1/2, 1/2) prior, weighting).
Theorem
Upperbound on the PARAMETER redundancy
log2

P(x1N )
1
1
a (1 )b
log2 (a + b) + 1 = log2 (N) + 1.
= log2
N
Pe (a, b)
2
2
Pc (x1 )

for all and x1N with a zeros and b ones.


Probability of a sequence with a zeroes and b ones followed by a zero
Pe (a + 1, b) =

a + 1/2
Pe (a, b),
a+b+1

hence SEQUENTIAL COMPUTATION is possible!

Individual Redundancy Binary IID source


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

The total individual redundancy


(x1N ) < log2

a (1 )b
+2
Pe (a, b)

1
log2 (N) + 1
2


+ 2.

for all and x1N with a zeroes and b ones.


Shtarkov [1988]: 12 log2 N behaviour is asymptotically optimal for
individual redundancy for N (NML-estimator)!
Rissanen [1984]: Also for expected redundancy
asymptotically optimal.

1
2

log2 N behaviour is

CTW: Binary Tree-Sources


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems

Definition
xn2

xn1

xn

INTRODUCTION
HUFFMAN-TUNSTALL

1 = 0.1

Binary IID Sources


Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

10 = 0.3

Intervals
Universal Coding,
Individual Redundancy

IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

10

ARITHMETIC CODING

CONTEXT-TREE
WEIGHTING

00 = 0.5
parameters

00

(tree-) model M = {00, 10, 1}

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

P(Xn = 1| , Xn1 = 1)

0.1

P(Xn = 1| , Xn2 = 1, Xn1 = 0)

0.3

P(Xn = 1| , Xn2 = 0, Xn1 = 0)

0.5

CTW: Binary Tree-Sources


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems

Definition
xn2

xn1

xn

INTRODUCTION
HUFFMAN-TUNSTALL

1 = 0.1

Binary IID Sources


Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

10 = 0.3

Intervals
Universal Coding,
Individual Redundancy

IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

10

ARITHMETIC CODING

CONTEXT-TREE
WEIGHTING

00 = 0.5
parameters

00

(tree-) model M = {00, 10, 1}

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

P(Xn = 1| , Xn1 = 1)

0.1

P(Xn = 1| , Xn2 = 1, Xn1 = 0)

0.3

P(Xn = 1| , Xn2 = 0, Xn1 = 0)

0.5

CTW: Binary Tree-Sources


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems

Definition
xn2

xn1

xn

INTRODUCTION
HUFFMAN-TUNSTALL

1 = 0.1

Binary IID Sources


Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

10 = 0.3

Intervals
Universal Coding,
Individual Redundancy

IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

10

ARITHMETIC CODING

CONTEXT-TREE
WEIGHTING

00 = 0.5
parameters

00

(tree-) model M = {00, 10, 1}

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

P(Xn = 1| , Xn1 = 1)

0.1

P(Xn = 1| , Xn2 = 1, Xn1 = 0)

0.3

P(Xn = 1| , Xn2 = 0, Xn1 = 0)

0.5

CTW: Problem, Concepts


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

PROBLEM:
What are good coding probabilities for sequences x1N produced by a
tree-source with
an unknown tree-model,

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

and unknown parameters?


CONCEPTS:
CONTEXT TREE (Rissanen [1983])
WEIGHTING: If P1 (x) or P2 (x) are two alternative coding
probabilities for sequence x, then the weighted probability

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Pw (x) =

1
P1 (x) + P2 (x)
max(P1 (x), P2 (x)),
2
2

thus we loose at most a factor of 2, which is one bit in redundancy.

CTW: Problem, Concepts


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

PROBLEM:
What are good coding probabilities for sequences x1N produced by a
tree-source with
an unknown tree-model,

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

and unknown parameters?


CONCEPTS:
CONTEXT TREE (Rissanen [1983])
WEIGHTING: If P1 (x) or P2 (x) are two alternative coding
probabilities for sequence x, then the weighted probability

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Pw (x) =

1
P1 (x) + P2 (x)
max(P1 (x), P2 (x)),
2
2

thus we loose at most a factor of 2, which is one bit in redundancy.

Context Trees
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

Definition (Context Tree)


111
11

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

011
1
101
01
001

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

110
10
010
0
100
00
000
Node s contains the sequence of source symbols that have occurred
following context s. Depth is D.

Context-tree splits up sequences in subsequences


LOSSLESS SOURCE
CODING ALGORITHMS

Example

Frans M.J. Willems

7 n

INTRODUCTION

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

x1N

past
-

1
346

36
3

17

1234567
157

0
1257

1
2

Coding Probabilities: Leaves of the Context-Tree


LOSSLESS SOURCE
CODING ALGORITHMS

111
11

Frans M.J. Willems

011
1

INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

101
01
001

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

110
10

ARITHMETIC CODING

010

Intervals
Universal Coding,
Individual Redundancy

100

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy

0
tree-model M = {00, 10, 1}

00
000
CTW: Leaves

The subsequence corresponding to a leaf s of the context tree is IID. A


good coding probability for this subsequence is therefore

Pw (s) = Pe (as , bs ),

CONCLUSION

where as and bs are the number of zeroes and ones of this subsequence.

Coding Probabilities: Internal nodes of the Context-Tree


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

111
11
011
1
101
01
001

110
10

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

010
0
100
00

tree-model M = {00, 10, 1}

000
The subsequence corresponding to a node s of the context tree is
IID if the node s is not an internal node of the actual tree-model,
a combination of the subsequences that correspond to nodes 0s and
1s, if s is an internal node of the actual tree-model.

Weighting
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

CTW: Internal Nodes


Weighting the coding probabilities corresponding to both alternatives yields
the coding probability

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Pw (s) =

Pe (as , bs ) + Pw (0s) Pw (1s)


2

for the subsequence that corresponds to node s.


Recursively we find in the root of the context-tree the coding probability
Pw () for the entire source sequence x1N .
IMPORTANT:
Coding probability Pw () can be computed sequentially.

Redundancy (tree-model M = {00, 10, 1})


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems

Actual probability:
b00
b10
P(x1N ) = (1 00 )a00 00
(1 10 )a10 10
(1 1 )a1 1b1 .

INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

Lower bound coding probability:

Pw ()

ENUMERATIVE CODING

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

1
Pw (0) Pw (1)
2
1
11
Pw (00) Pw (10) Pe (a1 , b1 )
22
2
111
1
1
Pe (a00 , b00 ) Pe (a10 , b10 ) Pe (a1 , b1 ).
222
2
2

Parameter redundancy bounds for the subsequences in the leaves of


tree-model M = {00, 10, 1}:
log2

b00
(1 00 )a00 00
Pe (a00 , b00 )

1
log2 (a00 + b00 ) + 1,
2

log2

b10
(1 10 )a10 10
Pe (a10 , b10 )

1
log2 (a10 + b10 ) + 1,
2

(1 1 )a1 1b1
Pe (a1 , b1 )

1
log2 (a1 + b1 ) + 1.
2

log2

Redundancy (General)
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

(x1N )

<

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

P(x1N )
+2
Pw ()
1
1
log2 32 + log2 (a00 + b00 ) + 1 + log2 (a10 + b10 ) + 1
2
2
1
+ log2 (a1 + b1 ) + 1 + 2
2


N
3
log2
+ 3 + 2,
5+
2
3
log2

for all x1N , and all 00 , 10 , and 1 .


Theorem (W., Shtarkov, and Tjalkens (1995))
In general for a tree source with |M| leaves (parameters):


|M|
N
(x1N ) < (2|M| 1) +
log2
+ |M| + 2 bits.
2
|M|
(model, parameter, and coding redundancies)

Simulation: Model plus Parameter Redundancies


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems

Redundancies for the CTW method, but also for methods focussing on
M = , M = {0, 1}, actual model M = {00, 10, 1}, M = {0, 01, 11} and
M = {00, 10, 01, 11}. The CTW method improves over the best model!

INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

redundancies(n)
40

35

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

30

25

20

15

10

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

50

100

150

200

250

300

350

400

450

500

Remarks: Context-Tree Weighting


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

CTW implements a weighting (Bayes mixture) over all tree-models


with depth not exceeding D, i.e.
X
Pw () =
P(M)Pe (x1N |M),
MD

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

with Pe (x1N |M) =

sM

Pe (as , bs ) and P(M) = 2(2|M|1) .

There is one tree-model of depth 0 (i.e., the IID model). If there are
#d models of depth not exceeding d then #d+1 = #2d + 1. Therefore
#1 = 2, #2 = 5, #3 = 26, #4 = 677, #5 = 458330,
#6 = 210066388901, #7 = 4.4128 1022 , #8 = 1.9473 1045 , etc.
Straightforward analysis. No model-estimation that only gives
asymptotic results as in e.g. Rissanen [1983, 1986], Weinberger,
Rissanen, and Feder [1995]).
Number of computations needed to process the source sequence x1N is
linear in N. Same holds for the storage complexity.

Remarks: Context-Tree Weighting (cont.)


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

Optimal parameter redundancy behavior in Rissanen [1984] sense (i.e.,


1
log2 N bits/parameter).
2
A modified version achieves entropy not only for tree sources but for
all stationary ergodic sources.

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

More general context-algorithms (splitting rules) were proposed. The


context of xn need not be xnd , xnd+1 , , xn1 .
A two-pass version (context-tree maximizing) exists that finds the
best model (MDL) matching to the source sequence. Now

Pm (s) =

max[Pe (as , bs ), Pm (0s) Pm (1s)]


.
2

If a (minimal) tree source with model generates the sequence x1N the
maximizing method produces a model estimate which is correct with
probability one as N .

Outline
LOSSLESS SOURCE
CODING ALGORITHMS

1
2

Frans M.J. Willems


INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
6
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
7

INTRODUCTION
HUFFMAN and TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding, Individual Redundancy
CONTEXT-TREE WEIGHTING
IID, unknown
Binary Tree-Sources
Context Trees
Coding Probabilities
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Lempel-Ziv 1977 Compression


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING

IDEA:
Let the data speak for itself.
LZ77 Compression is achieved by replacing repeated segments in the data
with pointers and lengths. To avoid deadlock an uncoded symbol is added
to each pointer and length.
Example (LZ77)

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

search buffer

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING

a
ab r
ab r ac
ab r ac adab r

IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

a
b
a
a
a

a
b
r
c
d

look-ahead buffer
ab r ac adab r a

output
(0,-,a)

b
r
a
a
a

(0,-,b)
(0,-,r)
(3,1,c)
(2,1,d)
(7,4, )

r
a
c
d
b

a
c
a
a
r

c
a
d
b
a

a
d
a
r

dab r a
ab r a
b r a
a

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

QUESTION:
Why does this method work? Note that the statistics of the data are
unknown!

Lempel-Ziv 1977 Compression


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING

IDEA:
Let the data speak for itself.
LZ77 Compression is achieved by replacing repeated segments in the data
with pointers and lengths. To avoid deadlock an uncoded symbol is added
to each pointer and length.
Example (LZ77)

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

search buffer

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING

a
ab r
ab r ac
ab r ac adab r

IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

a
b
a
a
a

a
b
r
c
d

look-ahead buffer
ab r ac adab r a

output
(0,-,a)

b
r
a
a
a

(0,-,b)
(0,-,r)
(3,1,c)
(2,1,d)
(7,4, )

r
a
c
d
b

a
c
a
a
r

c
a
d
b
a

a
d
a
r

dab r a
ab r a
b r a
a

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

QUESTION:
Why does this method work? Note that the statistics of the data are
unknown!

Repetition Times
LOSSLESS SOURCE
CODING ALGORITHMS

Consider the discrete stationary and ergodic process


, X3 , X2 , X1 , X0 , X1 , X2 , .

Frans M.J. Willems


INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Suppose that X1 = x for symbol-value x X with Pr{X1 = x} > 0. We


say that the repetition time of the x that occurred at time t = 1 is m if
X1m = x and Xt 6= x for t = 2 m, , 0.
m=4

?
X3

X2

X1

X0

X1

=x

6= x

6= x

6= x

=x

X2

Definition (Average Repetition Time)


Let Qx (m) be the conditional probability that the repetition time of the x
occurring at t = 1 is m, hence

Qx (m) = Pr{X1m = x, X2m 6= x, , X0 6= x|X1 = x}.


The average repetition time for symbol-value x with Pr{X1 = x} > 0 is
now defined as
X

T (x) =
mQx (m).
m=1,2,

Repetition Times
LOSSLESS SOURCE
CODING ALGORITHMS

Consider the discrete stationary and ergodic process


, X3 , X2 , X1 , X0 , X1 , X2 , .

Frans M.J. Willems


INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Suppose that X1 = x for symbol-value x X with Pr{X1 = x} > 0. We


say that the repetition time of the x that occurred at time t = 1 is m if
X1m = x and Xt 6= x for t = 2 m, , 0.
m=4

?
X3

X2

X1

X0

X1

=x

6= x

6= x

6= x

=x

X2

Definition (Average Repetition Time)


Let Qx (m) be the conditional probability that the repetition time of the x
occurring at t = 1 is m, hence

Qx (m) = Pr{X1m = x, X2m 6= x, , X0 6= x|X1 = x}.


The average repetition time for symbol-value x with Pr{X1 = x} > 0 is
now defined as
X

T (x) =
mQx (m).
m=1,2,

Kacs Result
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

Example
Consider an IID (binary) process and assume that Pr{X1 = 1} = > 0.
Then
Q1 (m)

T (1)

ENUMERATIVE CODING

(1 )m1 and
X
1
m(1 )m1 = .

m=1,2,

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

Theorem (Kac, 1947)


For stationary and ergodic processes, for any x with Pr{X1 = x} > 0,
T (x) =

Note that Kacs result holds also for sliding N-blocks, hence

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

1
.
Pr{X1 = x}

T ((x1 , x2 , , xN )) =

1
,
Pr{(X1 , X2 , , XN ) = (x1 , x2 , , xN )

if Pr{(X1 , X2 , , XN ) = (x1 , x2 , , xN )} > 0. Now the repetition time is


equal to m when m is the smallest positive integer such that
(x1m , x2m , , xNm ) = (x1 , x2 , , xN ).

Kacs Result
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

Example
Consider an IID (binary) process and assume that Pr{X1 = 1} = > 0.
Then
Q1 (m)

T (1)

ENUMERATIVE CODING

(1 )m1 and
X
1
m(1 )m1 = .

m=1,2,

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

Theorem (Kac, 1947)


For stationary and ergodic processes, for any x with Pr{X1 = x} > 0,
T (x) =

Note that Kacs result holds also for sliding N-blocks, hence

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

1
.
Pr{X1 = x}

T ((x1 , x2 , , xN )) =

1
,
Pr{(X1 , X2 , , XN ) = (x1 , x2 , , xN )

if Pr{(X1 , X2 , , XN ) = (x1 , x2 , , xN )} > 0. Now the repetition time is


equal to m when m is the smallest positive integer such that
(x1m , x2m , , xNm ) = (x1 , x2 , , xN ).

Kacs Result
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

Example
Consider an IID (binary) process and assume that Pr{X1 = 1} = > 0.
Then
Q1 (m)

T (1)

ENUMERATIVE CODING

(1 )m1 and
X
1
m(1 )m1 = .

m=1,2,

Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

Theorem (Kac, 1947)


For stationary and ergodic processes, for any x with Pr{X1 = x} > 0,
T (x) =

Note that Kacs result holds also for sliding N-blocks, hence

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

1
.
Pr{X1 = x}

T ((x1 , x2 , , xN )) =

1
,
Pr{(X1 , X2 , , XN ) = (x1 , x2 , , xN )

if Pr{(X1 , X2 , , XN ) = (x1 , x2 , , xN )} > 0. Now the repetition time is


equal to m when m is the smallest positive integer such that
(x1m , x2m , , xNm ) = (x1 , x2 , , xN ).

Universal Source Coding Based on Repetition Times


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems

Suppose that our source is binary i.e. Xt {0, 1} for all integer t.
m=4

INTRODUCTION

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

x7

x6

x5

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

x4

x3

x2

x1

x1

x2

x3

x0

x1

x2

x3

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

A. The encoder wants to convey a source block x1N = (x1 , x2 , , xN ) to


the decoder. Both encoder and decoder have access to buffers containing
all previous source symbols , x2 , x1 , x0 .
B. Using these previous source symbols the encoder can determine the
repetition time m of x1N . It is the smallest integer m that satisfies

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Nm
x1m
= x1N ,

Nm
where x1m
= (x1m , x2m , , xNm ).

Universal Source Coding Based on Repetition Times (cont.)


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

C. Repetition time m is now encoded and sent to the decoder. The code
for m consists of a preamble p(m) and an index i(m) and has length l(m).
Example
Code table for the waiting time m for N = 3:
m
1
2
3
4
5
6
7
8

p(m)
00
01
01
10
10
10
10
11

i(m)
0
1
00
01
10
11
copy of x1 x2 x3

l(m)
2+0=2
2+1=3
2+1=3
2+2=4
2+2=4
2+2=4
2+2=4
2+3=5

In general there are N + 1 groups. There are index groups with 1, 2, up to


2N1 elements, hence the index lengths are 0, 1, up to N 1. The last
group is the copy-group. A copy has length N. We use a preamble
p(m) of dlog2 (N + 1)e bits to specify one of these N + 1 alternatives.

Universal Source Coding Based on Repetition Times (cont.)


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

For arbitrary N we get for the code-block length l(m)



dlog2 (N + 1)e + blog2 mc if m < 2N ,
l(m) =
dlog2 (N + 1)e + N
if m 2N .
This results in the upper bound
l(m) dlog2 (N + 1)e + log2 m.

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

D. After decoding m the decoder can reconstruct x1N using the previous
source symbols in the buffer. With this block x1N both the encoder and
decoder can update their buffers.
E. Then the next block

2N
xN+1
= xN+1 , xN+2 , , x2N

is processed in a similar way, etc.

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Note:
Buffers need only contain the previous 2N 1 source symbols!

Universal Source Coding Based on Repetition Times (cont.)


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

For arbitrary N we get for the code-block length l(m)



dlog2 (N + 1)e + blog2 mc if m < 2N ,
l(m) =
dlog2 (N + 1)e + N
if m 2N .
This results in the upper bound
l(m) dlog2 (N + 1)e + log2 m.

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

D. After decoding m the decoder can reconstruct x1N using the previous
source symbols in the buffer. With this block x1N both the encoder and
decoder can update their buffers.
E. Then the next block

2N
xN+1
= xN+1 , xN+2 , , x2N

is processed in a similar way, etc.

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Note:
Buffers need only contain the previous 2N 1 source symbols!

Analysis of the Repetition-Time Algorithm


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

Assume that a certain x1N occurred as first block. What is then the average
codeword length L(x1N ) for x1N ?
L(x1N )

X
m=1,2,

(a)

X
m=1,2,

Qx N (m)l(m)
1

Qx N (m)dlog2 (N + 1)e +
1

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

dlog2 (N + 1)e + log2


dlog2 (N + 1)e + log2

X
m=1,2,

(c)

Qx N (m) log2 m

m=1,2,

(b)

mQx N (m)
1

1
.
Pr{X1N = x1N }

Here (a) follows from the upper bound for l(m), (b) from Jensens
inequality. Furthermore (c) follows from Kacs theorem.
Ideal codeword length plus dlog2 (N + 1)e.

Analysis of the Repetition-Time Algorithm (cont.)


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

The probability that x1N occurs as block is Pr{X1N = x1N }. For the average
codeword length L(X1N ) we therefore get
L(X1N )

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

Pr{X1N = x1N }L(x1N )

x1N

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

Pr{X1N

x1N }

x1N

1
dlog2 (N + 1)e + log2
Pr{X1N = x1N }

dlog2 (N + 1)e + H(X1N ).

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

For the rate RN we now obtain


RN =

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

H(X1N )
L(X1N )
dlog2 (N + 1)e

+
.
N
N
N

Theorem (W., 1986, 1989)


The repetition-time algorithm achieves entropy since
lim RN = lim

H(X1N )
dlog2 (N + 1)e
+
N
N

!
= H (X ).

Analysis of the Repetition-Time Algorithm (cont.)


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION

The probability that x1N occurs as block is Pr{X1N = x1N }. For the average
codeword length L(X1N ) we therefore get
L(X1N )

HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

Pr{X1N = x1N }L(x1N )

x1N

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

Pr{X1N

x1N }

x1N

1
dlog2 (N + 1)e + log2
Pr{X1N = x1N }

dlog2 (N + 1)e + H(X1N ).

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

For the rate RN we now obtain


RN =

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

H(X1N )
L(X1N )
dlog2 (N + 1)e

+
.
N
N
N

Theorem (W., 1986, 1989)


The repetition-time algorithm achieves entropy since
lim RN = lim

H(X1N )
dlog2 (N + 1)e
+
N
N

!
= H (X ).

Remarks: Repetition-Time Algorithm


LOSSLESS SOURCE
CODING ALGORITHMS

Universal algorithm.

Frans M.J. Willems


INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy

Assume that , X1 , X0 , X1 , X2 , is stationary and ergodic with


entropy H (X ). Let the random variable M be the repetition time of
the source block X1N .
Theorem (Wyner & Ziv, 1989)
Fix an  > 0. Then
o
n
lim Pr M 2N(H (X )+) = 0.

This results implies that the buffer can be much smaller than 2N 1 if the
entropy is known to be smaller than 1.
This result was crucial in proving that the LZ77 algorithm achieves entropy
(Wyner & Ziv [1994]).

Elias [1987], interval and recency-rank coding methods (symbols).


Hershkovitz and Ziv [1998] studied conditional repetition times.

REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

When better than CTW?


CTW incremental redundancy for an N-block is NK /(2B ln(2)) bits
for K parameters. This redundancy is larger than log2 (N + 1) for
(K /2N ) > 2 ln(N + 1)/N. For N = 24 we get K /2N > 0.2682.

Outline
LOSSLESS SOURCE
CODING ALGORITHMS

1
2

Frans M.J. Willems


INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code

ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code

ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy

CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
6
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
7

INTRODUCTION
HUFFMAN and TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding, Individual Redundancy
CONTEXT-TREE WEIGHTING
IID, unknown
Binary Tree-Sources
Context Trees
Coding Probabilities
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

CONCLUSION
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

Recent developments:
DUDE (Weismann, Ordentlich, Serroussi, Verdu, Weinberger [2005]),
resulted in study of bi-directional contexts and splitting rules
(Ordentlich, Weinberger, and Weissman [2005]).
Directed Mutual Information (Marko [1973], Massey [1990]):
P
I (X1N Y1N ) =
H(Yn |Y1n1 ) H(Yn |Y1n1 , X1n1 , Xn ) is a
generalisation of Granger causality [1969]. CTW-methods were used
to estimate these quantities (Liao, Permuter, Kim, Zhao, Kim, and
Weissman [2012]).
Questions:
LZ learns from seeing once. CTW is optimal for tree sources but seems
to take more time. What are the algorithms between CTW and LZ?
Suppose that the data have left-right symmetry hence
P(a, b) = P(b, a), P(a, b, c) = P(c, b, a), P(a, b, c, d) = P(d, c, b, a),
etc. This reduces the number of parameters. Algorithm? Relevant for
image-compression.
CTW can handle side-information by considering it as context (e.g.
Cai, Kulkarni and Verdu [2005]). But what if the side-information is
not-properly aligned? Relevant for reference-based genome
compression (Chern et al. [2012]).

Source Coding is FUN!


LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION

(Ulm, 1997)

You might also like