Information Theory and Coding (Lecture 1) : Dr. Farman Ullah

Information Theory and Coding
(Lecture 1)
Dr. Farman Ullah
farmankttk@gmail.com
1
What is Information Theory about?
It is a mathematical theory of information.
Information is usually obtained by getting some "messages" (speech, text,
images, etc.) from others.
When obtaining information from a message, you may care about:
What is the meaning of a message?
How important is the message?
It is a mathematical theory of information.
Information is usually obtained by getting some "messages" (speech, text,
images, etc.) from others.
When obtaining information from a message, you may care about:
What is the meaning of a message?
How important is the message?
How much information can I get from the message?
Information theory is about the quantication of information.

It is a mathematical theory of information (primarily for communication
systems)
Establishes the fundamental limits of various types of information
processing
Built upon probability theory and statistical decision theory
Main Focus: ultimate performance limit (usually the efciency of

information processing) as certain resources (usually the total amount of
time) scales to an asymptotic regime, given that the desired task is
accomplished at a certain "satisfactory" level.
Course Overview
Claude E. Shannon
(1916 2001)
Course Overview
Information theory
a mathematical theory of communication
Course Overview
Information theory
the mathematical theory of
communication
Origin of Information Theory
Course Overview
Origin of Information Theory
Course Overview
Shannon's 1948 paper is generally considered as the "birth" of information theory and (modern)
digital communication.
It was set clear that information theory is about the quantication of information.
In particular, it focuses on characterizing the necessary and sucient condition of whether or
not a destination terminal can reproduce a message generated by a source terminal.
Course Overview General Overview of Information Theory
What is Information Theory

about?
It is about the analysis of fundamental
limits Course Overview General Overview of Information Theory
1 Stochastic modeling
It is a unied theory based on stochastic modeling (information source, noisy channel, etc.).
2 Theorems, not just denitions

It provides rigorous theorems on optimal performance of algorithms (coding schemes), rather
than merely denitions.
3 Sharp phase transition

It draws the boundary between what is possible to achieve and what is impossible, leading to
math-driven system design.
Any source of informationChange to an efficient representati
transmission, i.e., error control co
Change to an efficient representation,
i.e., data compression.
Source Channel Channel

Source
coder coder
Channel Channel Source Sink,

decoder decoder receiver
Recover from channel distortion.

Uncompress
The channel is anything transmitting or storing information

a radio link, a cable, a disk, a CD, a piece of paper,
Fundamental Entities
H Source R Channel Channel
Source
coder coder
C

C decoder decoder receiver
H: The information content of the source.

R: Rate from the source coder.
C: Channel capacity.
Fundamental Theorems
H Source R Channel Channel
Source
coder coder
C

C decoder decoder receiver
Shannon 1: Error-free transmission possible if RH and CR.

Source coding theorem (simplified)
Shannon 2: Source coding and channel coding can beChannel codi
optimized independently, and binary symbols can be used as
intermediate format. Assumption: Arbitrarily long delays.
What is Coding Theory?
This will include the following topics: theory of compression and coding of
image and speech signals based on Shannon's information theory;
introduction to information theory (entropy, etc.); characteristics of sources
such as voice and image; sampling theorem; methods and properties of
lossless and lossy coding; vector quantization; transform coding; and
subband coding.
What is Channel Coding Theory?
Channel coding refers to the class of signal transformations designed to

improve communications performance by enabling the transmitted signals to
better withstand the effects of various channel impairments, such as noise,
fading and jamming. Usually the goal of channel coding is to reduce the
probability of bit error or to reduce the required signal to noise ratio at the
cost of expending more bandwidth. At the channel codes, redundancy is
inserted into the transmitted data stream so that the receiver can detect and
possibly correct errors that occur during transmission. This course deals with
block codes and convolutional codes.
Part 2: Stochastic sources
A source outputs symbols X1, X2, ...
Each symbol take its value from an alphabet
A = (a1, a2, ).
Model: P(X1,,XN) assumed to be known for
Example 1: A
all combinations.
each taking it
A = (a, , z, A
Source X1, X2,
Example 2: A (digitized) grayscale image

Two Special Cases
1. The Memoryless Source
Each symbol independent of the previous
ones.
P(X1, X2, , Xn) = P(X1) P(X2) P(Xn)
2. The Markov Source
Each symbol depends on the previous one.
P(X1, X2, , Xn) = P(X1) P(X2|X1) P(X3|X2)
P(Xn|Xn-1)
The Markov Source
A symbol depends only on the previous
symbol, so the source can be modelled by a
state diagram.
0.7 b
a 0.5 A ternary source with

1.0
alphabet A = (a, b, c).
0.3
0.2 c
0.3
The Markov Source
Assume we are in state a, i.e., Xk = a.
The probabilities for the next symbol are:
0.7 b
P(Xk+1 = a | Xk = a) = 0.3
a 0.5
1.0 P(Xk+1 = b | Xk = a) = 0.7
0.3
P(Xk+1 = c | Xk = a) = 0
0.2 c
0.3
The Markov Source
So, if Xk+1 = b, we know that Xk+2 will
equal c.
0.7 b
P(Xk+2 = a | Xk+1 = b) = 0
a 0.5
1.0 P(Xk+2 = b | Xk+1 = b) = 0
0.3
P(Xk+2 = c | Xk+1 = b) = 1
0.2 c
0.3
The Markov Source
If all the states can be reached, the
stationary probabilities for the states can be
calculated from the given transition
probabilities.
Sta
Markov models can be used to represent pro
sources with dependencies more than one kw
step back.
Use a state diagram with several symbols in
each state.
Analysis and Synthesis
Stochastic models can be used for
analysing a source.
Find a model that well represents the real-world
source, and then analyse the model instead of
the real world.
Stochastic models can be used for
synthesizing a source.
Use a random number generator in each step of
a Markov model to generate a sequence
simulating the source.
Part 3: Information and Entropy
Assume a binary memoryless source, e.g., a flip of
a coin. How much information do we receive when
we are told that the outcome is heads?
If its a fair coin, i.e., P(heads) = P (tails) = 0.5, we say
that the amount of information is 1 bit.
If we already know that it will be (or was) heads, i.e.,
P(heads) = 1, the amount of information is zero!
If the coin is not fair, e.g., P(heads) = 0.9, the amount of
information is more than zero but less than one bit!
Intuitively, the amount of information received is the
same if P(heads) = 0.9 or P (heads) = 0.1.
Self Information
So, lets look at it the way Shannon did.
Assume a memoryless source with
alphabet A = (a1, , an)
symbol probabilities (p1, , pn).
How much information do we get when
finding out that the next symbol is ai?
According to Shannon the self information
of ai is
Why?
Assume two independent events A and B, with
probabilities P(A) = pA and P(B) = pB.
For both the events to happen, the probability is

pA pB. However, the amount of information
should be added, not multiplied.
Logarithms satisfy this!

No, we want the information to increase with
decreasing probabilities, so lets use the negative
logarithm.
Self Information
Example 1:
Example 2:
Which logarithm? Pick the one you like! If you pick the natural log,
youll measure in nats, if you pick the 10-log, youll get Hartleys,
if you pick the 2-log (like everyone else), youll get bits.
Self Information
On average over all the symbols, we get:
H(X) is called the first order entropy of the source.
This can be regarded as the degree of uncertainty

about the following symbol.
Entropy
Example: Binary Memoryless Source
BMS 01101000
Let
Of t e
n de
Then note
d
1
The uncertainty (information) is greatest when
0 0.5 1
Entropy: Three properties
1. It can be shown that 0 H log N.
2. Maximum entropy (H = log N) is reached
when all symbols are equiprobable, i.e.,
pi = 1/N.
3. The difference log N H is called the
redundancy of the source.
Part 4: Entropy for Memory Sources
Assume a block of source symbols (X1, ,

Xn) and define the block entropy:
That is, the summation is done over all possible combinations of n symbols.
The entropy for a memory source is defined

as:
That is, let the block length go towards infintity.
Divide by n to get the number of bits / symbol.
Entropy for a Markov Source
The entropy for a state Sk can be expressed as
Pkl is the transition probability from state k to state l.
Averaging over all states, we get the

entropy for the Markov source as

Information Theory and Coding (Lecture 1) : Dr. Farman Ullah

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Information Theory and Coding (Lecture 1) : Dr. Farman Ullah

Uploaded by

Copyright:

Available Formats

Information Theory and Coding

Dr. Farman Ullah

Information theory is about the quantication of information.

Main Focus: ultimate performance limit (usually the efciency of

What is Information Theory

2 Theorems, not just denitions

3 Sharp phase transition

Source Channel Channel

Channel Channel Source Sink,

Recover from channel distortion.

The channel is anything transmitting or storing information

Channel Channel Source Sink,

H: The information content of the source.

Channel Channel Source Sink,

Shannon 1: Error-free transmission possible if RH and CR.

Channel coding refers to the class of signal transformations designed to

Example 2: A (digitized) grayscale image

a 0.5 A ternary source with

For both the events to happen, the probability is

Logarithms satisfy this!

On average over all the symbols, we get:

H(X) is called the first order entropy of the source.

This can be regarded as the degree of uncertainty

Assume a block of source symbols (X1, ,

The entropy for a memory source is defined

Pkl is the transition probability from state k to state l.

Averaging over all states, we get the

You might also like