Dynamics of Bayesian Learning

Dynamics of Bayesian Learning:
Prices are updated over time in accordance with the market maker’s learning about the
underlying conditions.
Let us consider the posteriors of an independent and identically distributed (i.i.d.) process
over time.
We can prove that
(i) these prices converge almost surely (a.s.) to the true value;
(ii) the convergence process is exponential.
To prove the above we need to refer to the ‘Strong Law of Large Numbers’.
Let {X1, X2, . . . , XT} be an i.i.d. sample from a distribution with mean µ.
T X
t
Mean of this sample = ∑ T
t =1
The Strong Law of Large Numbers states that the mean of the sample will tend to the
mean µ of the distribution in the limit T → ∞.
T X
Lt
T →∞ ∑ Tt = µ
t =1
We shall use the above law in the movement of prices set by the market maker.
First of all, recognize that we are considering the trades as i.i.d. and hence the sequence
of the trades (buys and sales) does not matter in our discussion and it suffices to consider the
aggregate number of buys and sales.
Notation:
(1) The value V of the asset is either ‘low’ ( V ) or ‘high’ ( V ).

(2) q is the probability of the trade being a buy and 1 – q, the probability that the trade
is a sale when the price is low.
q=P[B/ V ], 1–q= P[S/ V ]
(3) p is the probability of the trade being a buy and 1 – p, the probability that the trade
is a sale when the price is high.
p = P [ B / V ], 1 – p = P [ S /V ]
(4) b = total number of buys

s = total number of sales
b+s = total number of trades
The Bayes Rule gives
Pr ior Belief X P ( data / event )

Posterior belief = P (event / data) = M arg inal likelihood of data
Let us use the above to find the probability that V= V / b, s which is our posterior belief.
Prior belief is P(V= V )
P(data/event) = Probability that there are ‘b’ buys and ‘s’ sales given V= V
Since q and 1-q are the probabilities for a buy and a sale respectively when V= V ,
P(data/event) = qb (1-q)s
Since p and 1-p are the probabilities for a buy and a sale respectively when V= V ,
P(data/not event) = pb (1-p)s
Marginal likelihood of data = P(event) P( data/event) + P(not event) P( data/not event)
= P(V= V ) qb (1-q)s + P(V= V )( pb (1-p)s
Hence,
P (V =V ) q b (1 −q ) s
P ( V / b, s) =
P (V =V ) q b (1 −q ) s + P (V =V ) p b (1 − p ) s
Similarly,
P (V =V ) p b (1 − p ) s
P ( V / b, s) =
P (V =V ) q (1 −q ) s + P (V =V ) p b (1 − p ) s
b
Dividing,
P (V / b, s ) P (V =V ) p b (1 − p ) s
=
P (V / b, s ) P (V =V ) q b (1 −q ) s
Taking natural log on both sides,

P (V / b, s ) P (V =V ) p 1−p
log = log + b log q + s log 1 −q
P (V / b, s ) P (V =V )
Dividing by b + s throughout,
1 P (V / b, s ) 1 P (V =V ) b p s 1−p
log = log + log q + log 1 −q
b +s P (V / b, s ) b +s P (V =V ) b +s b +s
Taking limits as b + s → ∞,
1
→ 0,
b +s
b
→ probability of the trade being a buy when V = V
b +s
→q
s
→ probability of the trade being a sale when V = V
b +s
→1-q
Substituting these limits in the equation above,
1 P (V / b, s ) p 1−p
log → q log q + (1- q) log 1 −q
b +s P (V / b, s )
→ −I q ( p) (1)
I q ( p) is called the entropy and is a measure of distance between probabilities. It is defined as
q 1 −q
I q ( p) = q log p + (1- q) log 1 − p
There are three useful properties of entropy measures to note:
(1) I q ( p) ≥ 0 ∀p, q
(2) I q (q ) =0
(3) I q ( p ) ≠ 0 if p ≠ q
Proof:
I q ( p) = q log q – q log p + (1 – q) log (1 – q) – (1 – q) log (1 – p)

q 1 −q
(∂ / ∂p) [ I q ( p ) ] = − p −1 − p (-1)
−q 1 −q p −q
= + =
p 1− p p (1 − p )
When p < q,
(∂ / ∂p) [ I q ( p ) ] < 0. i.e., I q ( p) decreases as p increases.
When p > q,
(∂ / ∂p) [ I q ( p ) ] > 0. i.e., I q ( p) increases as p increases.
When p = q, I q ( p) = 0.
To sum up,
when 0 < p < q, I q ( p ) decreases from a positive value to zero.
when q < p < 1, I q ( p ) increases from zero to a positive value.
∴I q ( p) is convex with respect to p. It is never negative and takes its minimum value 0
when p = q.
(1) implies that

1 P (V / b, s )
log tends to a negative value as b + s tends to ∞, if p ≠ q.
b +s P (V / b, s )
[If p = q, buying and selling are equally likely whether V= V or V= V and trade data are not
informative. Beliefs will remain at our original prior.]
P (V / b, s )
log → -∞
P (V / b, s )
P (V / b, s )
→0
P (V / b, s )
i.e., P (V = V / b, s) → 0.
i.e., the posteriors converge a.s. to the true value.

Let us now discuss the speed of this convergence.
Trades take place sequentially, one at a period.
Let t = total period = b + s
Equation (1) gives

1 P (V / b, s )
log → - I q ( p ) a.s.
t P (V / b, s )
P (V / b, s )
→ exp [ - I q ( p) / t ] as t → ∞
P (V / b, s )
P (V / b, s )
converges to zero exponentially at rate I q ( p ) .
P (V / b, s )
CONTINUOUS RANDOM VARIABLES
So far, the variables under discussion have been assumed to be discrete. A study of the case
when variables are continuous is important in the light of the fact that many decision problems
involve such variables. In practice, many models involve variables following normal law. In this
section we study the general updating procedure and derive the posterior distribution when the
variables are normally distributed.
Procedure:
X is the continuous random variable under study.
µ is a parameter for the density of X.
The prior density of µ is g(µ).
The conditional density of X given µ is f(X / µ).
We observe i.i.d. draws of X.
Using Bayes Rule, we find the posterior distribution of µ, given an observation x.
Pr iorBelief X Conditiona l probabilit y of data

Posterior Belief = M arg inal likelihood of the data
g (µ )f(x /µ )
g (µ /x ) = (*)
∫ g (µ f() x /µd) µ
Example:
The prior density g(µ) is N(m, σ µ 2)

The conditional density of X is N(µ, σ x 2).
 
1 1
g(µ) = exp − ( µ − m) 2  -∞ < µ < ∞
σ µ 2π  2σ 2 
 µ 
1  1 
f(x / µ) = exp  − ( x − µ) 2  -∞ < x < ∞
σ x 2π  2σ x 2 
Substitute in (*).
 
1 1
Numerator = exp  − ( µ − m ) 2 − 1 ( x − µ) 2 
( 2π ) 2 σ µσ x  2σ 2 2σ x 2 
 µ 
1
exp( −B )
= 2
( 2π ) σ µσ x
, say. (* *)
1  µ 2 + m 2 − 2mµ σ 2 +  x 2 + µ 2 − 2µ x σ 2 
B =   x   µ 
2σ x σ µ 
2 2    
=
1  µ 2 (σ 2 + σ 2 ) − 2 µ (mσ 2 + xσ 2 ) + (m 2σ 2 + x 2σ 2 )
 x µ x µ x µ 
2σ x 2σ µ 2
 2 2 
σ x2 + σ µ 2  2  mσ 2 + xσ 2   mσ 2 + xσ 2 
 x µ   x µ 
 mσ 2 + xσ 2 
 x µ  m 2σ x 2 + x 2σ µ 2 
= µ − 2µ  +  −  + 
2σ x 2σ µ 2   σ x2 + σ µ 2   σ x2 + σ µ 2   σ x2 + σ µ 2  σ x2 + σ µ 2 
       
 2 
σ x2 + σ µ 2  mσ x 2 + xσ µ 2 
 ( m − x ) 2 σ 2σ 2 
x µ
=  µ −  + 
2 ,
2σ x 2σ µ 2  σ x2 +σµ2   2 2
σ x + σ µ   
 
  

(after simplifying the last
two terms)
σ x2 + σ µ 2 1  1 1 

1 1
We can write as  +  = , say.
2σ x 2σ µ 2 2  σ µ 2 2
σ x  2 σ2
−1
 
∴ σ2 =
 1
 2 +
1 

(#)
σ µ
 σ x 2 
Also
m x
+
 mσ 2 + xσ 2  σ µ 2 σ x2
 x µ 
  = 1 1
= A, say. (# #)
 σ x +σµ2 
2
+
 
σ µ 2 σ x2
1
( µ − A) 2 + ( m − x) 2 1
∴B = = ( µ − A) 2 + K
2σ 2 2σ x 2 + σ µ 2  2σ 2
 
Substituting in (* *),
Numerator of (*) = g(µ) f(x/µ) =
1 − ( µ − A)  2
= 2π σ σ exp ( − K ) exp  
x µ  2σ 2 
+∞
Denominator of (*) = ∫ g(µ) f(x/µ) dµ
−∞
1 +∞  − ( µ − A) 2 
= 2π σ σ exp ( − K ) ∫ exp   dµ
x µ −∞  2σ 2 
µ−A
To evaluate the integral, put = z , ∴dµ = σ dz
σ
+∞  − ( µ − A) 2  +∞  −z 2 
=  σdz
∫ exp 
 2
 dµ

∫ exp  2 
−∞  2σ −∞  
= 2π σ
Cancelling the common terms in the numerator and denominator,
1  − ( µ − A) 2 
g (µ /x ) = exp  
2π σ  2σ 2 
~ N ( A, σ2),
where A and σ2 are given in (# #) and (#) .
We note the following points:
(1) In the case of a normal random variable whose mean has a normal prior, the posterior of
the mean is also normal.
m x
+
σ µ 2 σ x2
(2) The mean of the posterior = 1 1
+
σ µ 2 σ x2
which is the weighted average of ‘m’ (the mean of the prior) and ‘x’ (the observation),the
1 1
weights being the respective precisions and .
σµ2 σ x2
So, this formula can be used to write the mean of the posterior directly.
(3) Just as in the discrete case, the Bayesian posterior beliefs converge to the true value when
repeated draws of ‘x’ are made. After T independent draws { x1, x2 , . . . xT } from N( µ, σx2 )
where the prior of µ is N ( m, σµ2 ), the posterior of µ is normal with
T
∑ xt
m
+ t =1
Mean = σ µ2 σ x2
1 T
+
σ µ2 σ x2
−1
 
 1 T 
and variance =  2
+ 
σµ
 σ x2 

Applying the ‘Strong Law of Large Numbers’ to the above,

T X
Lt
∑ Tt =µ
T →∞
t =1
T
∑ xt
m
+ t =1
Mean of the posterior = Tσ µ 2 Tσ x 2
1 1
+
Tσ µ 2 σ x2
µ
0+
σ x2
→
1
0+
σ x2
→ µ
Variance of the posterior → 0.
Hence after repeated draws, the posterior value converges to the true value.

Dynamics of Bayesian Learning

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dynamics of Bayesian Learning

Uploaded by

Copyright:

Available Formats

Dynamics of Bayesian Learning:

(1) The value V of the asset is either ‘low’ ( V ) or ‘high’ ( V ).

q=P[B/ V ], 1–q= P[S/ V ]

(4) b = total number of buys

Pr ior Belief X P ( data / event )

Prior belief is P(V= V )

P(data/not event) = pb (1-p)s

Marginal likelihood of data = P(event) P( data/event) + P(not event) P( data/not event)

= P(V= V ) qb (1-q)s + P(V= V )( pb (1-p)s

Taking natural log on both sides,

Substituting these limits in the equation above,

There are three useful properties of entropy measures to note:

I q ( p) = q log q – q log p + (1 – q) log (1 – q) – (1 – q) log (1 – p)

(1) implies that

i.e., the posteriors converge a.s. to the true value.

Equation (1) gives

CONTINUOUS RANDOM VARIABLES

Pr iorBelief X Conditiona l probabilit y of data

The prior density g(µ) is N(m, σ µ 2)

Numerator of (*) = g(µ) f(x/µ) =

Cancelling the common terms in the numerator and denominator,

where A and σ2 are given in (# #) and (#) .

We note the following points:

Applying the ‘Strong Law of Large Numbers’ to the above,

Variance of the posterior → 0.

You might also like