Professional Documents
Culture Documents
Tapas Kanungo
Center for Automation Research University of Maryland Web: www.cfar.umd.edu ~kanungo Email: kanungo@cfar.umd.edu
&
'
1. Markov models
Outline
2. Hidden Markov models 3. Forward Backward algorithm 4. Viterbi algorithm 5. Baum-Welch estimation algorithm
&
'
Markov Models
Observable states:
1; 2; : : : ; N
Observed sequence:
q1; q2; : : : ; q ; : : : ; q
t T
Stationarity:
P q = j jq ,1 = i = P q + = j jq + ,1 = i
t t t l t l
&
'
Markov Models
State transition matrix A : 2 6 a11 a12 a1 6 6 6 6 6 6 a21 a22 a2 6 6 6 6 .. .. .. 6 6 6 A=6 6 6 6 a 1 a 2 a 6 6 6 6 .. .. .. 6 6 6 6 6 4a 1 a 2 a where
i i N N
ij
Nj
a1 a2 a a
.. ..
iN
NN
3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5
a = P q = j jq ,1 = i
ij t t
1 i; j; N
Constraints on a :
ij
&
a 0; X a = 1;
ij N
=1
ij
8i; j 8i
'
3 0:3 7 7 7 7 7 7 0:2 7 7 7 7 5
&
'
&
'
&
'
Exponential Markov chain duration density. What is the expected value of the duration d in state i?
= d
i
1 X
=1 1 X =1
dp d
i
&
1 @ X @a 0=1a 1 @ a A = 1 , a @ @a 1 , a
d
= 1 , a
ii
da ,11 , a
d ii ii
1 X
= 1 , a
ii ii
=1
da ,1
d ii ii ii
ii d
1,a
ii
ii
ii
'
Avg. number of consecutive cloudy days = 2.5 Avg. number of consecutive rainy days = 1.67
&
'
&
'
Each urn has a possibly di erent distribution of colors Sequence generation algorithm: 1. Pick initial urn according to some random process. 2. Randomly pick a ball from the urn and then replace it 3. Select another urn according a random selection process associated with the urn 4. Repeat steps 2 and 3
&
'
N
The Trellis
STATES
OBSERVATION
&
'
&
'
1. Given observation O = o1; o2; : : : ; o and model = A; B; ; e ciently compute P Oj: Hidden states complicate the evaluation Given two models 1 and 2, this can be used to choose the better one. 2. Given observation O = o1; o2; : : : ; o and model nd the optimal state sequence q = q1; q2; : : : ; q :
T T
Optimality criterion has to be decided e.g. maximum likelihood Explanation" for the data. 3. Given O = o1; o2; : : : ; o ; estimate model parameters = A; B; that maximize P Oj:
T
&
'
Solution to Problem 1
Problem: Compute P o1; o2; : : : ; o j
T
Algorithm: Let q = q1; q2; : : : ; q be a state sequence. Assume the observations are independent:
T
P Ojq; =
P o jq ; = b o1 b o2 b T o
T i
=1
q1
q2
1 qT
Also, P O; qj = P Ojq; P qj Enumerate paths and sum probabilities: P Oj = X P Ojq; P qj
q
&
'
N
STATES
aNk a3k k
3 2 1 t
a1K
t+1 TIME
&
'
i
Forward Algorithm
De ne forward variable i as:
t t
is the probability of observing the partial sequence o1; o2 : : : ; o such that the state q is i.
t t t
2 X +1j = 4
N i
=1
ia
X
N
ij
3 5b
o +1
t
3. Termination:
P Oj =
i
=1
i
Complexity: ON 2T :
&
'
Example
Consider the following coin-tossing experiment: State 1 State 2 State 3 PH 0.5 0.75 0.25 PT 0.5 0.25 0.75 state-transition probabilities equal to 1 3 initial state probabilities equal to 1 3 1. You observe O = H; H; H; H; T; H; T; T; T; T : What state sequence, q; is most likely? What is the joint probability, P O; qj; of the observation sequence and the state sequence? 2. What is the probability that the observation sequence came entirely of state 1?
&
'
2 6 0:9 6 6 6 6 A0 = 6 6 0:05 6 6 6 4
How would your answers to parts 1 and 2 change? 4. If the state transition probabilities were:
0:45 0:1 0:05 0:45 0:1
3 0:45 7 7 7 7 7 ; 7 0:45 7 7 7 7 5
how would the new model 0 change your answers to parts 1-3?
&
'
N
Backward Algorithm
4 3 2 1 1 2 t-1
1111111 0000000 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 aiN 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 000000 111111 0000000 1111111 000000 111111 0000000 1111111 000000 111111 0000000 1111111 000000 111111 000000 111111 0000000 1111111 000000 111111 0000000 1111111 0000000 1111111 000000 111111 0000000 1111111 000000 111111 i1111111 000000 111111 0000000 000000 111111 0000000 1111111 0000000 1111111 000000 111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 ai1 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111
STATES
t TIME
t+1
t+2
T-1
o1
o2
o t-1
ot
o t+1
o t+2
o T-1
oT
OBSERVATION
&
'
i
Backward Algorithm
De ne backward variable i as:
t t
is the probability of observing the partial sequence o +1; o +2 : : : ; o such that the state q is i.
t t t T t
i =
X
N j
=1
a b o +1 +1j ; 1 i N; t = T , 1; : : : ; 1
ij j t t
&
'
Solution to Problem 2
Choose the most likely path Find the path q1; q ; : : : ; q that maximizes the likelihood: P q1; q2; : : : ; q jO;
t T T
i
By induction we have:
t t
ia b o +1
ij j t
&
'
N
Viterbi Algorithm
4 3 2 1 1 2 t-1
1111111 0000000 0000000 1111111 0000000 1111111 aNk 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 k 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 a1k 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111
STATES
t TIME
t+1
t+2
T-1
o1
o2
o t-1
ot
o t+1
o t+2
o T-1
oT
OBSERVATION
&
'
Viterbi Algorithm
Initialization:
1 i
= b o1;
i i
1iN
1i = 0
Recursion:
j = 1max ia b o ,1 j = arg 1max ia ,1
t i N t ij j t t i N t ij
2 t T; 1 j N
Termination:
P = 1max i q = arg 1max i
i N T T i N T
&
q = +1q+1; t = T , 1; T , 2; : : : ; 1
t t t
'
Solution to Problem 3
Estimate = A; B; to maximize P Oj No analytic method because of complexity iterative solution. Baum-Welch Algorithm: 1. Let initial model be 0: 2. Compute new based on 0 and observation
O:
3. If log P Oj , log P Oj0 DELTA stop. 4. Else set 0 and goto step 2.
&
'
Baum-Welch: Preliminaries
De ne i; j as the probability of being in state i at time t and in state j at time t + 1:
i; j =
ia b o +1 +1j P Oj ia b o +1 +1j = P P =1 =1 ia b o +1 +1j
t ij j t t t ij t j t t N i N j ij j t t
i =
X
N j
=1
i; j
t
=1
i
=1
&
'
i ij
t =k
t
j j
t
&
'
Properties
Covariance of the estimated parameters Convergence rates
&
'
Ergodic
Types of HMM
Continuous density State duration
&
'
Implementation Issues
Scaling Initial parameters Multiple observation
&
'
Comparison of HMMs
What is a natural distance function? If 1; 2 is large, does it mean that the models are really di erent?
&