You are on page 1of 5

R.

Srikants Notes on Modeling and Control of High-Speed Networks

Large deviations

Consider a sequence of i.i.d. random variables {Xi }. The central limit theorem provides an
estimate of the probability
Pn
Xi n
P ( i=1
> x),
n

where = E(X1 ) and 2 = V ar(X1 ). Thus, the CLT estimates the probability of O( n)
deviations from the mean of the sum of the random variables {Xi }ni=1 . These deviations are
P
small compared to the mean of ni=1 Xi which is an O(n) quantity. On the other hand, large
deviations of the order of the mean itself, i.e., O(n) deviations, is the subject of this section.
The simplest large deviations result is the Chernoff bound which we review here. First,
recall the Markov inequality: for a positive random varaible X,
P (X )

E(X)
.


For 0,
n
X

P(

Xi nx) P (e

Pn
i=1

Xi

enx )

i=1

Pn

E(e i=1 Xi )

.
enx
where the first inequality becomes an equality if > 0 and the second inequality follows
from the Markov inequality. Since this is true for all 0,
Pn

n
X

E(e i=1 Xi )
P ( Xi nx) inf
0
enx
i=1
= en sup0 {xlog M ()} ,

(1)

where M () := E(eX1 ) is the moment generating function of X1 . The above result is called
the Chernoff bound. The following theorem quantifies the tightness of this bound.
Theorem 1 (Cramer-Chernoff Theorem) Let X1 , X2 , . . . be i.i.d. and suppose that their
common moment generating function M () < all in some neighborhood B0 of = 0.
Further suppose that the supremum in the following definition of the rate function I(x) is
obtained at some interior point in this neighborhood:
I(x) = sup x (),

(2)

where () := log M () is called the log moment generating function or the cumulant generating function. In other words, we assume that there exists Interior(B0 ) such that
I(x) = x ( ).
Fix any x > E(X1 ). Then, for each  > 0, there exists N such that, for all n N,
n
X

en(I(x)+) P (

i=1

Xi nx) enI(x) .

(3)

R. Srikants Notes on Modeling and Control of High-Speed Networks

(Note: Another way to state the result of the above theorem is


n
X
1
log P ( Xi nx) = I(x).)
n n
i=1

lim

(4)

Proof: We first prove the upper bound and then, the lower bound.
The upper bound in (3): This follows from the Chernoff bound if we show that the value
of the supremum in (1) does not change if we relax the condition 0 and allow to take
negative values. Since ex is a convex function of x, by Jensens inequality, we have
M () e .
If < 0, since x > 0, then e(x) > 1. Thus,
M ()e(x) e .
Taking the logarithm of both sides yields
x () 0.
Noting that x () = 0 when = 0, we have
sup x () = sup x ().

The lower bound in (3): Let p(x) be the pdf of X1 . Then, for any > 0,
n
X

P(

Xi nx) =

i=1

n
Y

Z
n
X

xi nx

p(xi )dxi

i=1

i=1

nx
=

n
Y

Z
n
X

xi n(x + )

p(xi )dxi

i=1

i=1
Z

n
M n ( )
en (x+) Y
p(xi )dxi
n
en(x+) nx X x n(x + ) M n ( ) i=1
i

i=1

n
Y
e xi p(xi )
M n ( ) Z
n(x+)
dxi
n
X
)
e
M
(
i=1
nx
xi n(x + )

i=1
n
Y
M n ( ) Z
= n(x+)
q(xi )dxi ,
n
X
e
i=1
nx
xi n(x + )
i=1

where

e y p(y)
q(y) :=
.
M ( )

(5)

R. Srikants Notes on Modeling and Control of High-Speed Networks

Note that
q(y)dy = 1. Thus, q(y) is a pdf. Let Y be a random variable with q(y) as its
pdf. The moment generating function of Y is given by

MY () =

ey q(y)dy =

Thus,

M ( + )
.
M ( )

dMY ()
M 0 ( )
|=0 =
.
d
M ( )

E(Y ) =

From the assumptions of the theorem achieves the supremum in (2). Thus,
d
x log M () = 0
d
0

( )
at = . From this, we obtain x = M
. Therefore, E(Y ) = x. In other words, the pdf q(y)
M ( )
defines a set of i.i.d. random variables Yi , each with mean x. Thus, from (5), the probability
P
P
of a large deviation of the sum ni=1 Xi can be lower bounded by probability that ni=1 Yi is
near its mean nx as follows:
n
X

P(

Xi nx) enI(x) en P (nx

i=1

n
X

Yi n(x + )).

i=1

By the central limit theorem,


P (nx

n
X

Pn

Yi n(x + )) = P (0

i=1 (Yi

i=1

x)

n )

1
.
2

Given  > 0, choose N (dependent on ) such that


P (nx

n
X

Yi n(x + ))

i=1

and
en

1
4

1
en .
4


Thus, the theorem is proved.

Remark 1 The key idea in the proof of the above theorem is the definition of a new pdf
q(y) under which the random variable has a mean at x, instead of at . This changes the
nature of the deviation from the mean to a small deviation, as opposed to a large deviation
and thus, allows the use of the central limit theorem to complete the proof.
Changing the pdf
Rx
from p(y) to q(y) is called a change of measure. The new distribution
q(y)dy is called
the twisted distribution or exponentially tilted distribution.

Remark 2 The theorem is also applicable when x < E(X1 ). To see this, define Yi = Xi ,
and consider
n
X

P(

i=1

Yi > nx).

R. Srikants Notes on Modeling and Control of High-Speed Networks

Note that M () be the moment generating function of Y. Further,


I(x) = sup x () = sup x ().

Thus, the rate function is the same for Y. Since x > E(Y1 ), the theorem applies to {Yi }. 
Remark 3 It should be noted that the proof of the theorem can be easily modified to yield
the following result: for any > 0,
n
X
1
lim log P nx
Xi n(x + ) = I(x).
n n
i=1

Noting that, for small , P (nx ni=1 Xi n(x+)) can be interpreted as P ( ni=1 Xi nx),
this result states that the probability that the sum of the random variables exceeds nx is
approximately equal (up to logarithmic equivalence) to the probability that the sum is equal
to x.

P

Properties of the rate function:


Lemma 1 I(x) is a convex function.
Proof:
I(x1 + (1 )x2 ) = sup (x1 + (1 )x2 ) ()

= sup x1 () + (1 )x2 (1 )()

sup x1 () + sup (1 )x2 (1 )()

= I(x1 ) + (1 )I(x2 ).

Lemma 2 Let I(x) be the rate function of a random variable X with mean . Then,
I(x) I() = 0.
Proof: Recall that
I(x) = sup x ().

Thus,
I(x) 0 x (0) = 0.
By Jensens inequality,
M () = E(eX ) e .
Thus,
() = log M () ,

() 0

I() = sup () 0.

Since we have shown that I(x) 0 x, we have the desired result.

R. Srikants Notes on Modeling and Control of High-Speed Networks

Lemma 3
() = sup x I(x).
x

Proof: We will prove this under the assumption that all functions of interest are differentiable.
From the definition of I(x) and the convexity of (),
I(x) = x ( ),

(6)

0 ( ) = x.

(7)

where solves
Since I(x) is convex, it is enough to show that, for each , there exists an x such that
() = x I(x )

(8)

and = I 0 (x ). We claim that such an x is given by x = 0 (). To see this, we note from
(6)-(7) that
I(x ) = x ()
which verifies (8).

You might also like