Professional Documents
Culture Documents
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Outline
LOSSLESS SOURCE
CODING ALGORITHMS
1
2
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
6
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
7
INTRODUCTION
HUFFMAN and TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding, Individual Redundancy
CONTEXT-TREE WEIGHTING
IID, unknown
Binary Tree-Sources
Context Trees
Coding Probabilities
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Choosing a Topic
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
POSSIBLE TOPICS:
Multi-user Information Theory (with Edward van der Meulen (KUL),
Andries Hekstra)
Lossless Source Coding (with Tjalling Tjalkens, Yuri Shtarkov (IPPI),
Paul Volf)
Watermarking, Embedding, and Semantic Coding (with Martin van
Dijk, Ton Kalker (Philips Research))
Biometrics (with Tanya Ignatenko)
LOSSLESS SOURCE CODING ALGORITHMS
WHY?
Not many sessions at ISIT 2012! Is lossless source coding DEAD?
Lossless Source Coding is about UNDERSTANDING data. Universal
Lossless Source Coding is focussing on FINDING STRUCTURE in
data. MDL principle [Rissanen].
ALGORITHMS are fun (Piet Schalkwijk).
Choosing a Topic
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
POSSIBLE TOPICS:
Multi-user Information Theory (with Edward van der Meulen (KUL),
Andries Hekstra)
Lossless Source Coding (with Tjalling Tjalkens, Yuri Shtarkov (IPPI),
Paul Volf)
Watermarking, Embedding, and Semantic Coding (with Martin van
Dijk, Ton Kalker (Philips Research))
Biometrics (with Tanya Ignatenko)
LOSSLESS SOURCE CODING ALGORITHMS
WHY?
Not many sessions at ISIT 2012! Is lossless source coding DEAD?
Lossless Source Coding is about UNDERSTANDING data. Universal
Lossless Source Coding is focussing on FINDING STRUCTURE in
data. MDL principle [Rissanen].
ALGORITHMS are fun (Piet Schalkwijk).
Choosing a Topic
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
POSSIBLE TOPICS:
Multi-user Information Theory (with Edward van der Meulen (KUL),
Andries Hekstra)
Lossless Source Coding (with Tjalling Tjalkens, Yuri Shtarkov (IPPI),
Paul Volf)
Watermarking, Embedding, and Semantic Coding (with Martin van
Dijk, Ton Kalker (Philips Research))
Biometrics (with Tanya Ignatenko)
LOSSLESS SOURCE CODING ALGORITHMS
WHY?
Not many sessions at ISIT 2012! Is lossless source coding DEAD?
Lossless Source Coding is about UNDERSTANDING data. Universal
Lossless Source Coding is focussing on FINDING STRUCTURE in
data. MDL principle [Rissanen].
ALGORITHMS are fun (Piet Schalkwijk).
Lecture Structure
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Outline
LOSSLESS SOURCE
CODING ALGORITHMS
1
2
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
6
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
7
INTRODUCTION
HUFFMAN and TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding, Individual Redundancy
CONTEXT-TREE WEIGHTING
IID, unknown
Binary Tree-Sources
Context Trees
Coding Probabilities
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
x1 x2 xN
Binary Source
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
where
P(1) = , and P(0) = 1 .
A sequence
x1N
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
1
1
+ log2
(bits).
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
x1 x2 xN
Binary Source
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
where
P(1) = , and P(0) = 1 .
A sequence
x1N
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
1
1
+ log2
(bits).
IDEA:
Give more probable sequences shorter codewords than less probable
sequences.
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
R=
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
E [L(X1N )]
(code-symbols/source-symbol).
N
GOAL:
We would like to find decodable FV-length codes that MINIMIZE this rate.
Prefix Codes
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
c(x1N )
0
10
110
111
L(x1N )
1
2
3
3
111
11
1
110
10
Prefix-Codes (cont.)
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
X
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
2L(x1 ) 1.
x1N X N
(b) For codeword lengths satisfying Krafts inequality there exists a prefix
code with these lengths.
This leads to:
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffmans Code
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffmans Code
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
.063
0
1
.090
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
.063
0
0
.147
.126
.216
1
.063
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.147
.363
0
.147
.343
.294
0
1
.637
1.00
Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
.063
0
1
.090
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
.063
0
0
.126
.147
.216
1
.063
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.147
.363
0
.147
.343
.294
0
1
.637
1.00
Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
.063
0
1
.090
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
.063
0
0
.126
.147
.216
1
.063
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.147
.363
0
.147
.343
.294
0
1
.637
1.00
Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
.063
0
1
.090
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
.063
0
0
.126
.147
.216
1
.063
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.147
.363
0
.147
.343
.294
0
1
.637
1.00
Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
.063
0
1
.090
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
.063
0
0
.126
.147
.216
1
.063
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.147
.363
0
.147
.343
.294
0
1
.637
1.00
Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
.063
0
1
.090
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
.063
0
0
.126
.147
.216
1
.063
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.147
.363
0
.147
.343
.294
0
1
.637
1.00
Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
.063
0
1
.090
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
.063
0
0
.126
.147
.216
1
.063
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.147
.363
0
.147
.343
.294
0
1
.637
1.00
Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
.063
0
1
.090
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
.063
0
0
.126
.147
.216
1
.063
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.147
.363
0
.147
.343
.294
0
1
.637
1.00
Huffmans Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let N = 3 and = 0.3, then h(0.3) = 0.881.
.027
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
.063
0
1
.090
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
.063
0
0
.126
.147
.216
1
.063
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.147
.363
0
.147
.343
.294
0
1
.637
1.00
IDEA:
Parse the source output into variable-length segments of roughly the same
probability. Code all these segments with codewords of fixed length.
Definition (VF-Length Code)
A VF-length code is defined by a set of variable-length source segments.
Each segment x in the set gets a unique binary codeword c(x ) of length
L. The length of a segment x is denoted as N(x ). The rate of a
VF-code is
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
R=
L
(code-symbols/source symbol).
E [N(X )]
GOAL:
We would like to find parsable VF-length codes that MINIMIZE this rate.
HUFFMAN-TUNSTALL
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Example
x
1
01
001
000
N(x )
1
2
3
3
c(x )
11
10
01
00
01
0
001
00
000
Assume that the source is IID with parameter . Consider a set of segments
and all their prefixes. Depict them in a tree. The segments are leaves, the
prefixes nodes. Note that all the nodes and leaves have a probability. E.g.
P(10) = (1 ). Let F () be a function on nodes, leaves.
HUFFMAN-TUNSTALL
1 F (101)
0
F ()
0
F (11)
F (1)
F (10)
0 F (100)
F (0)
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
X
nnodes
P(n)
X
ssons of n
P(s)
[F (s) F (n)].
P(n)
Assume that the source is IID with parameter . Consider a set of segments
and all their prefixes. Depict them in a tree. The segments are leaves, the
prefixes nodes. Note that all the nodes and leaves have a probability. E.g.
P(10) = (1 ). Let F () be a function on nodes, leaves.
HUFFMAN-TUNSTALL
1 F (101)
0
F ()
0
F (11)
F (1)
F (10)
0 F (100)
F (0)
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
X
nnodes
P(n)
X
ssons of n
P(s)
[F (s) F (n)].
P(n)
Theorem
For any proper-and-complete segment set with no more than 2L segments
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
L
L
=
h(),
E [N(X )]
H(X )
Tunstalls Code
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Tunstalls Code
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.
HUFFMAN-TUNSTALL
.090
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.063
ENUMERATIVE CODING
.300
1.00
1
0
.210
.210
.147
.700
0
.147
.343
.490
1
0
.103
1
.072
.168
.240
Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.
HUFFMAN-TUNSTALL
.090
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.063
ENUMERATIVE CODING
.300
1.00
1
0
.210
.210
.147
.700
0
.147
.343
.490
1
0
.103
1
.072
.168
.240
Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.
HUFFMAN-TUNSTALL
.090
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.063
ENUMERATIVE CODING
.300
1.00
1
0
.210
.210
.147
.700
0
.147
.343
.490
1
0
.103
1
.072
.168
.240
Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.
HUFFMAN-TUNSTALL
.090
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.063
ENUMERATIVE CODING
.300
1.00
1
0
.210
.210
.147
.700
0
.147
.343
.490
1
0
.103
1
.072
.168
.240
Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.
HUFFMAN-TUNSTALL
.090
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.063
ENUMERATIVE CODING
.300
1.00
1
0
.210
.210
.147
.700
0
.147
.343
.490
1
0
.103
1
.072
.168
.240
Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.
HUFFMAN-TUNSTALL
.090
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.063
ENUMERATIVE CODING
.300
1.00
1
0
.210
.210
.147
.700
0
.147
.343
.490
1
0
.103
1
.072
.168
.240
Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.
HUFFMAN-TUNSTALL
.090
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.063
ENUMERATIVE CODING
.300
1.00
1
0
.210
.210
.147
.700
0
.147
.343
.490
1
0
.103
1
.072
.168
.240
Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.
HUFFMAN-TUNSTALL
.090
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.063
ENUMERATIVE CODING
.300
1.00
1
0
.210
.210
.147
.700
0
.147
.343
.490
1
0
.103
1
.072
.168
.240
Tunstalls Construction
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Example
Let L = 3 and = 0.3. Again h(0.3) = 0.881.
HUFFMAN-TUNSTALL
.090
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
.063
ENUMERATIVE CODING
.300
1.00
1
0
.210
.210
.147
.700
0
.147
.343
.490
1
0
.103
1
.072
.168
.240
Outline
LOSSLESS SOURCE
CODING ALGORITHMS
1
2
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
6
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
7
INTRODUCTION
HUFFMAN and TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding, Individual Redundancy
CONTEXT-TREE WEIGHTING
IID, unknown
Binary Tree-Sources
Context Trees
Coding Probabilities
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Lexicographical Ordering
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
IDEA:
Sequences having the same weight (and probability) only need to be
INDEXED. The binary representation of the index can be taken as
codeword.
Definition (Lexicographical Ordering, Index)
In a lexicographical ordering (0 < 1) we say that x1N < y1N if xn < yn for
the smallest index n such that xn 6= yn .
Consider a subset S of the set {0, 1}N . Let iS (x1N ) be the lexicographical
index of x1N S, i.e., the number of sequences y1N < x1N for y1N S.
Example
Let N = 5 and S = {x1N : w (x1N ) = 2} where w (x1N ) is the weight of x1N .
Then |S| = 52 = 10 and:
iS (11000) = 9
iS (01100) = 4
iS (10100) = 8
iS (01010) = 3
iS (10010) = 7
iS (01001) = 2
iS (10001) = 6
iS (00110) = 1
iS (01100) = 5
iS (00011) = 0
Sequential Enumeration
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
iS (x1N ) =
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
n=1,N:xn =1
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
2
1
= 6 + 2 = 8 hence, since
IDEA:
Index sequences of fixed weight. Later use a Huffman code (or a
fixed-length code) to describe the weights.
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
1
0
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
1
0
10
CONCLUSION
i(10100) = 6 + 2 = 8.
0
3
1
0
0
6
0
2
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
0
3
1
1
0
1
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
E [L(X1N |W )]
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
P(w ) log2 d
w =0,1,N
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
<
P(w ) log2
N
e
w
N
w =0,1,N
+ 1 = H(X1N |W ) + 1.
E [L(W )] + E [L(X1N |W )]
H(W ) + 1 + H(X1N |W ) + 1
H(X1N ) + 2.
IDEA:
Modify the Tunstall segment sets such that the segments can be indexed.
Again let 0 < 1/2. It can be shown that a proper-and-complete
segment set is a Tunstall set (maximal E [N(X )] given the number of
segments) if and only if for all nodes n and all leaves l
P(n) P(l).
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Consequence
If the segments x in a proper-and-complete segment set satisfy
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
P(x ) = (1 )n0 (x ) n1 (x ) ,
INTRODUCTION
1 = (1 ) + = b A + b B .
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
2
For special values of , A and B are integers.
E.g. for = (1 ) we
obtain that A = 1 and B = 2 for b = (1 + 5)/2. Now C can also
assumed to be integer. The corresponding codes are called Petry codes.
Example
Consider A = 1, B = 2. Costs, step-sizes.
For given C , let S(C ) denote the resulting segment set and (C ) its
cardinality. Let S(1) = S(0) = , then S(1) = {0, 1}, S(2) = {00, 01, 1},
etc. Moreover now (1) = (0) = 1, (1) = 2 and (2) = 3, etc. It is
easy to see that
(C ) = (C 1) + (C 2),
and therefore (3) = 5, (4) = 8, (5) = 13, (6) = 21, (7) = 34, and
(8) = 55.
Now take C = 8. Note that 010010 S(8).
We can now determine the index i(010010) using Covers formula:
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
55
34
21
13
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
log2 (C )
C + (B 1)
(h() + d(||q)).
E [N(X )]
C
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Example
In the table q for several values of A and B:
A
1
2
3
2
0.382
3
0.318
0.4302
4
0.276
0.382
0.450
5
0.245
0.346
0.412
Outline
LOSSLESS SOURCE
CODING ALGORITHMS
1
2
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
6
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
7
INTRODUCTION
HUFFMAN and TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding, Individual Redundancy
CONTEXT-TREE WEIGHTING
IID, unknown
Binary Tree-Sources
Context Trees
Coding Probabilities
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Idea Elias
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Elias:
If source sequences are ORDERED LEXICOGRAPHICALLY then
codewords can be COMPUTED SEQUENTIALLY from the source
sequence using conditional PROBABILITIES of next symbol given the
previous ones, and vice versa.
Source Intervals
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
Definition
Order the source sequences x1N {0, 1}N lexicographically according to
0 < 1.
Now, to each source sequence x1N {0, 1}N there corresponds a
source-interval
I (x1N ) = [Q(x1N ), Q(x1N ) + P(x1N ))
with
Q(x1N ) =
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
P(
x N ).
x1N <x1N
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
By construction the source intervals are all disjoint. Their union is [0, 1).
Example
Consider an I.I.D. source with = 0.2 and N = 2.
x1N
00
01
10
11
P(x1N )
0.64
0.16
0.16
0.04
Q(x1N )
0
0.64
0.8
0.96
I (x1N )
[0 , 0.64)
[0.64 , 0.8)
[0.8 , 0.96)
[0.96 , 1)
Code Intervals
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Procedure
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
Theorem
For sequence x1N with source-interval I (x1N ) = [Q(x1N ), Q(x1N ) + P(x1N ))
take c(x1N ) as the codeword with
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
CONCLUSION
dlog2
.c(x1N )
Then
J(c(x1N )) I (x1N ).
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
1
e+1
P(x1N )
L(x1N )
and
L(x1N ) < log2
1
+ 2,
P(x1N )
i.e. less than two bits above the ideal codeword length.
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
Example
I.I.D. source with = 0.2 and N = 2.
1.00
0.96
11
10
ENUMERATIVE CODING
0.80
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
0.64
01
111110
1101
1011
63/64
31/32
7/8
13/16
3/4
11/16
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
00
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
1/4
00
Source Intervals
0.00
Code Intervals
0
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
Example
I.I.D. source with = 0.2 and N = 2.
1.00
0.96
11
10
ENUMERATIVE CODING
0.80
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
0.64
01
111110
1101
1011
63/64
31/32
7/8
13/16
3/4
11/16
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
00
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
1/4
00
Source Intervals
0.00
Code Intervals
0
INTRODUCTION
111
HUFFMAN-TUNSTALL
11
110
1
ENUMERATIVE CODING
101
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
10
100
Intervals
Universal Coding,
Individual Redundancy
011
01
CONTEXT-TREE
WEIGHTING
010
0
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
001
00
000
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Q(101)
P(101)
In general
Q(x1N )
P(x1N )
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
N
Y
n=1
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
n=1,N:xn =1
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
Sequential Computation
If we have access to P(x1 , x2 , , xn , 0) and P(x1 , x2 , , xn , 1) after
having processed P(x1 , x2 , , xn ) for n = 1, 2, , N we can compute
I (x1N ) sequentially.
Universal Coding
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Coding Probabilities
If the actual probabilities P(x1N ) are not known arithmetic coding is still
possible if instead of P(x1N ) we use coding probabilities Pc (x1N ) satisfying
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
Pc (x1N )
>
Pc (x1N )
1.
x1N
Then
L(x1N ) < log2
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
1
+ 2.
Pc (x1N )
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
x1N
Encoder
c(x1N )
Decoder
x1N
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Individual Redundancy
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
Definition
The individual redundancy (x1N ) of a sequence x1N is defined as
HUFFMAN-TUNSTALL
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
1
,
P(x1N )
P(x1N )
1
1
+ 2 log2
= log2
+ 2.
N
N
Pc (x1 )
P(x1 )
Pc (x1N )
P(x1N )
.
Pc (x1N )
Outline
LOSSLESS SOURCE
CODING ALGORITHMS
1
2
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
6
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
7
INTRODUCTION
HUFFMAN and TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding, Individual Redundancy
CONTEXT-TREE WEIGHTING
IID, unknown
Binary Tree-Sources
Context Trees
Coding Probabilities
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
IDEA:
Find good coding probabilities for sources with UNKNOWN
PARAMETERS and STRUCTURE. Use WEIGHTING!
P(x1N )
1
1
a (1 )b
log2 (a + b) + 1 = log2 (N) + 1.
= log2
N
Pe (a, b)
2
2
Pc (x1 )
a + 1/2
Pe (a, b),
a+b+1
a (1 )b
+2
Pe (a, b)
1
log2 (N) + 1
2
+ 2.
1
2
log2 N behaviour is
Definition
xn2
xn1
xn
INTRODUCTION
HUFFMAN-TUNSTALL
1 = 0.1
10 = 0.3
Intervals
Universal Coding,
Individual Redundancy
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
10
ARITHMETIC CODING
CONTEXT-TREE
WEIGHTING
00 = 0.5
parameters
00
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
P(Xn = 1| , Xn1 = 1)
0.1
0.3
0.5
Definition
xn2
xn1
xn
INTRODUCTION
HUFFMAN-TUNSTALL
1 = 0.1
10 = 0.3
Intervals
Universal Coding,
Individual Redundancy
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
10
ARITHMETIC CODING
CONTEXT-TREE
WEIGHTING
00 = 0.5
parameters
00
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
P(Xn = 1| , Xn1 = 1)
0.1
0.3
0.5
Definition
xn2
xn1
xn
INTRODUCTION
HUFFMAN-TUNSTALL
1 = 0.1
10 = 0.3
Intervals
Universal Coding,
Individual Redundancy
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
10
ARITHMETIC CODING
CONTEXT-TREE
WEIGHTING
00 = 0.5
parameters
00
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
P(Xn = 1| , Xn1 = 1)
0.1
0.3
0.5
PROBLEM:
What are good coding probabilities for sequences x1N produced by a
tree-source with
an unknown tree-model,
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Pw (x) =
1
P1 (x) + P2 (x)
max(P1 (x), P2 (x)),
2
2
PROBLEM:
What are good coding probabilities for sequences x1N produced by a
tree-source with
an unknown tree-model,
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Pw (x) =
1
P1 (x) + P2 (x)
max(P1 (x), P2 (x)),
2
2
Context Trees
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
011
1
101
01
001
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
110
10
010
0
100
00
000
Node s contains the sequence of source symbols that have occurred
following context s. Depth is D.
Example
7 n
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
x1N
past
-
1
346
36
3
17
1234567
157
0
1257
1
2
111
11
011
1
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
101
01
001
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
110
10
ARITHMETIC CODING
010
Intervals
Universal Coding,
Individual Redundancy
100
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
0
tree-model M = {00, 10, 1}
00
000
CTW: Leaves
Pw (s) = Pe (as , bs ),
CONCLUSION
where as and bs are the number of zeroes and ones of this subsequence.
111
11
011
1
101
01
001
110
10
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
010
0
100
00
000
The subsequence corresponding to a node s of the context tree is
IID if the node s is not an internal node of the actual tree-model,
a combination of the subsequences that correspond to nodes 0s and
1s, if s is an internal node of the actual tree-model.
Weighting
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Pw (s) =
Actual probability:
b00
b10
P(x1N ) = (1 00 )a00 00
(1 10 )a10 10
(1 1 )a1 1b1 .
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
Pw ()
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
1
Pw (0) Pw (1)
2
1
11
Pw (00) Pw (10) Pe (a1 , b1 )
22
2
111
1
1
Pe (a00 , b00 ) Pe (a10 , b10 ) Pe (a1 , b1 ).
222
2
2
b00
(1 00 )a00 00
Pe (a00 , b00 )
1
log2 (a00 + b00 ) + 1,
2
log2
b10
(1 10 )a10 10
Pe (a10 , b10 )
1
log2 (a10 + b10 ) + 1,
2
(1 1 )a1 1b1
Pe (a1 , b1 )
1
log2 (a1 + b1 ) + 1.
2
log2
Redundancy (General)
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
(x1N )
<
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
P(x1N )
+2
Pw ()
1
1
log2 32 + log2 (a00 + b00 ) + 1 + log2 (a10 + b10 ) + 1
2
2
1
+ log2 (a1 + b1 ) + 1 + 2
2
N
3
log2
+ 3 + 2,
5+
2
3
log2
Redundancies for the CTW method, but also for methods focussing on
M = , M = {0, 1}, actual model M = {00, 10, 1}, M = {0, 01, 11} and
M = {00, 10, 01, 11}. The CTW method improves over the best model!
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
redundancies(n)
40
35
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
30
25
20
15
10
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
50
100
150
200
250
300
350
400
450
500
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
sM
There is one tree-model of depth 0 (i.e., the IID model). If there are
#d models of depth not exceeding d then #d+1 = #2d + 1. Therefore
#1 = 2, #2 = 5, #3 = 26, #4 = 677, #5 = 458330,
#6 = 210066388901, #7 = 4.4128 1022 , #8 = 1.9473 1045 , etc.
Straightforward analysis. No model-estimation that only gives
asymptotic results as in e.g. Rissanen [1983, 1986], Weinberger,
Rissanen, and Feder [1995]).
Number of computations needed to process the source sequence x1N is
linear in N. Same holds for the storage complexity.
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Pm (s) =
If a (minimal) tree source with model generates the sequence x1N the
maximizing method produces a model estimate which is correct with
probability one as N .
Outline
LOSSLESS SOURCE
CODING ALGORITHMS
1
2
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
6
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
7
INTRODUCTION
HUFFMAN and TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding, Individual Redundancy
CONTEXT-TREE WEIGHTING
IID, unknown
Binary Tree-Sources
Context Trees
Coding Probabilities
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
IDEA:
Let the data speak for itself.
LZ77 Compression is achieved by replacing repeated segments in the data
with pointers and lengths. To avoid deadlock an uncoded symbol is added
to each pointer and length.
Example (LZ77)
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
search buffer
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
a
ab r
ab r ac
ab r ac adab r
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
a
b
a
a
a
a
b
r
c
d
look-ahead buffer
ab r ac adab r a
output
(0,-,a)
b
r
a
a
a
(0,-,b)
(0,-,r)
(3,1,c)
(2,1,d)
(7,4, )
r
a
c
d
b
a
c
a
a
r
c
a
d
b
a
a
d
a
r
dab r a
ab r a
b r a
a
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
QUESTION:
Why does this method work? Note that the statistics of the data are
unknown!
IDEA:
Let the data speak for itself.
LZ77 Compression is achieved by replacing repeated segments in the data
with pointers and lengths. To avoid deadlock an uncoded symbol is added
to each pointer and length.
Example (LZ77)
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
search buffer
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
a
ab r
ab r ac
ab r ac adab r
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
a
b
a
a
a
a
b
r
c
d
look-ahead buffer
ab r ac adab r a
output
(0,-,a)
b
r
a
a
a
(0,-,b)
(0,-,r)
(3,1,c)
(2,1,d)
(7,4, )
r
a
c
d
b
a
c
a
a
r
c
a
d
b
a
a
d
a
r
dab r a
ab r a
b r a
a
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
QUESTION:
Why does this method work? Note that the statistics of the data are
unknown!
Repetition Times
LOSSLESS SOURCE
CODING ALGORITHMS
?
X3
X2
X1
X0
X1
=x
6= x
6= x
6= x
=x
X2
T (x) =
mQx (m).
m=1,2,
Repetition Times
LOSSLESS SOURCE
CODING ALGORITHMS
?
X3
X2
X1
X0
X1
=x
6= x
6= x
6= x
=x
X2
T (x) =
mQx (m).
m=1,2,
Kacs Result
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
Example
Consider an IID (binary) process and assume that Pr{X1 = 1} = > 0.
Then
Q1 (m)
T (1)
ENUMERATIVE CODING
(1 )m1 and
X
1
m(1 )m1 = .
m=1,2,
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
Note that Kacs result holds also for sliding N-blocks, hence
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
1
.
Pr{X1 = x}
T ((x1 , x2 , , xN )) =
1
,
Pr{(X1 , X2 , , XN ) = (x1 , x2 , , xN )
Kacs Result
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
Example
Consider an IID (binary) process and assume that Pr{X1 = 1} = > 0.
Then
Q1 (m)
T (1)
ENUMERATIVE CODING
(1 )m1 and
X
1
m(1 )m1 = .
m=1,2,
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
Note that Kacs result holds also for sliding N-blocks, hence
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
1
.
Pr{X1 = x}
T ((x1 , x2 , , xN )) =
1
,
Pr{(X1 , X2 , , XN ) = (x1 , x2 , , xN )
Kacs Result
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
Example
Consider an IID (binary) process and assume that Pr{X1 = 1} = > 0.
Then
Q1 (m)
T (1)
ENUMERATIVE CODING
(1 )m1 and
X
1
m(1 )m1 = .
m=1,2,
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
Note that Kacs result holds also for sliding N-blocks, hence
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
1
.
Pr{X1 = x}
T ((x1 , x2 , , xN )) =
1
,
Pr{(X1 , X2 , , XN ) = (x1 , x2 , , xN )
Suppose that our source is binary i.e. Xt {0, 1} for all integer t.
m=4
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
x7
x6
x5
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
x4
x3
x2
x1
x1
x2
x3
x0
x1
x2
x3
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Nm
x1m
= x1N ,
Nm
where x1m
= (x1m , x2m , , xNm ).
C. Repetition time m is now encoded and sent to the decoder. The code
for m consists of a preamble p(m) and an index i(m) and has length l(m).
Example
Code table for the waiting time m for N = 3:
m
1
2
3
4
5
6
7
8
p(m)
00
01
01
10
10
10
10
11
i(m)
0
1
00
01
10
11
copy of x1 x2 x3
l(m)
2+0=2
2+1=3
2+1=3
2+2=4
2+2=4
2+2=4
2+2=4
2+3=5
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
D. After decoding m the decoder can reconstruct x1N using the previous
source symbols in the buffer. With this block x1N both the encoder and
decoder can update their buffers.
E. Then the next block
2N
xN+1
= xN+1 , xN+2 , , x2N
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Note:
Buffers need only contain the previous 2N 1 source symbols!
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
D. After decoding m the decoder can reconstruct x1N using the previous
source symbols in the buffer. With this block x1N both the encoder and
decoder can update their buffers.
E. Then the next block
2N
xN+1
= xN+1 , xN+2 , , x2N
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Note:
Buffers need only contain the previous 2N 1 source symbols!
Assume that a certain x1N occurred as first block. What is then the average
codeword length L(x1N ) for x1N ?
L(x1N )
X
m=1,2,
(a)
X
m=1,2,
Qx N (m)l(m)
1
Qx N (m)dlog2 (N + 1)e +
1
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
X
m=1,2,
(c)
Qx N (m) log2 m
m=1,2,
(b)
mQx N (m)
1
1
.
Pr{X1N = x1N }
Here (a) follows from the upper bound for l(m), (b) from Jensens
inequality. Furthermore (c) follows from Kacs theorem.
Ideal codeword length plus dlog2 (N + 1)e.
The probability that x1N occurs as block is Pr{X1N = x1N }. For the average
codeword length L(X1N ) we therefore get
L(X1N )
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
x1N
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
Pr{X1N
x1N }
x1N
1
dlog2 (N + 1)e + log2
Pr{X1N = x1N }
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
H(X1N )
L(X1N )
dlog2 (N + 1)e
+
.
N
N
N
H(X1N )
dlog2 (N + 1)e
+
N
N
!
= H (X ).
The probability that x1N occurs as block is Pr{X1N = x1N }. For the average
codeword length L(X1N ) we therefore get
L(X1N )
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
x1N
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
Pr{X1N
x1N }
x1N
1
dlog2 (N + 1)e + log2
Pr{X1N = x1N }
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
H(X1N )
L(X1N )
dlog2 (N + 1)e
+
.
N
N
N
H(X1N )
dlog2 (N + 1)e
+
N
N
!
= H (X ).
Universal algorithm.
This results implies that the buffer can be much smaller than 2N 1 if the
entropy is known to be smaller than 1.
This result was crucial in proving that the LZ77 algorithm achieves entropy
(Wyner & Ziv [1994]).
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Outline
LOSSLESS SOURCE
CODING ALGORITHMS
1
2
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
6
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
7
INTRODUCTION
HUFFMAN and TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding, Individual Redundancy
CONTEXT-TREE WEIGHTING
IID, unknown
Binary Tree-Sources
Context Trees
Coding Probabilities
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
CONCLUSION
LOSSLESS SOURCE
CODING ALGORITHMS
Frans M.J. Willems
INTRODUCTION
HUFFMAN-TUNSTALL
Binary IID Sources
Huffman Code
Tunstall Code
ENUMERATIVE CODING
Lexicographical Ordering
FV: Pascal- Method
VF: Petry Code
ARITHMETIC CODING
Intervals
Universal Coding,
Individual Redundancy
CONTEXT-TREE
WEIGHTING
IID, unknown
Tree Sources
Context Trees
Coding Prbs., Redundancy
REPETITION TIMES
LZ77
Repetition Times, Kac
Repetition-Time Algorithm
Achieving Entropy
CONCLUSION
Recent developments:
DUDE (Weismann, Ordentlich, Serroussi, Verdu, Weinberger [2005]),
resulted in study of bi-directional contexts and splitting rules
(Ordentlich, Weinberger, and Weissman [2005]).
Directed Mutual Information (Marko [1973], Massey [1990]):
P
I (X1N Y1N ) =
H(Yn |Y1n1 ) H(Yn |Y1n1 , X1n1 , Xn ) is a
generalisation of Granger causality [1969]. CTW-methods were used
to estimate these quantities (Liao, Permuter, Kim, Zhao, Kim, and
Weissman [2012]).
Questions:
LZ learns from seeing once. CTW is optimal for tree sources but seems
to take more time. What are the algorithms between CTW and LZ?
Suppose that the data have left-right symmetry hence
P(a, b) = P(b, a), P(a, b, c) = P(c, b, a), P(a, b, c, d) = P(d, c, b, a),
etc. This reduces the number of parameters. Algorithm? Relevant for
image-compression.
CTW can handle side-information by considering it as context (e.g.
Cai, Kulkarni and Verdu [2005]). But what if the side-information is
not-properly aligned? Relevant for reference-based genome
compression (Chern et al. [2012]).
(Ulm, 1997)