Hmmtut

'
Tapas Kanungo
HMM Tutorial c Tapas Kanungo 1
Hidden Markov Models
Center for Automation Research University of Maryland Web: www.cfar.umd.edu ~kanungo Email: kanungo@cfar.umd.edu
&
'
1. Markov models
Outline
2. Hidden Markov models 3. Forward Backward algorithm 4. Viterbi algorithm 5. Baum-Welch estimation algorithm
&
'
Markov Models
Observable states:
1; 2; : : : ; N
Observed sequence:
q1; q2; : : : ; q ; : : : ; q
t T
First order Markov assumption:

P q = j jq ,1 = i; q ,2 = k; : : : = P q = j jq ,1 = i
t t t t t
Stationarity:
P q = j jq ,1 = i = P q + = j jq + ,1 = i
t t t l t l
&
'
Markov Models
State transition matrix A : 2 6 a11 a12 a1 6 6 6 6 6 6 a21 a22 a2 6 6 6 6 .. .. .. 6 6 6 A=6 6 6 6 a 1 a 2 a 6 6 6 6 .. .. .. 6 6 6 6 6 4a 1 a 2 a where
i i N N
ij
Nj

a1 a2 a a
.. ..
iN
NN
3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5
a = P q = j jq ,1 = i
ij t t
1 i; j; N
Constraints on a :
ij
&
a 0; X a = 1;
ij N
=1
ij
8i; j 8i
'
Markov Models: Example

States: 1. Rainy R 2. Cloudy C 3. Sunny S State transition probability matrix:
2 6 0:4 6 6 6 6 A=6 6 0:2 6 6 6 4
0:3 0:6 0:1 0:1 0:8
3 0:3 7 7 7 7 7 7 0:2 7 7 7 7 5
Compute the probability of observing SSRRSCS given that today is S .
&
'

P A; B = P AjB P B
Basic conditional probability rule: The Markov chain rule:

P q1; q2; : : : ; q = P q jq1; q2 ; : : : ; q ,1P q1 ; q2; : : : ; q ,1 = P q jq ,1P q1 ; q2; : : : ; q ,1 = P q jq ,1P q ,1jq ,2P q1 ; q2; : : : ; q ,2 = P q jq ,1P q ,1jq ,2 P q2 jq1 P q1
T T T T T T T T T T T T T T T T
&
'

Observation sequence O :
O = S; S; S; R; R; S; C; S
Using the chain rule we get:

P Ojmodel = P S; S; S; R; R; S; C; S jmodel = P S P S jS P S jS P RjS P RjR P S jRP C jS P S jC = 3a33 a33 a31a11 a13 a32 a23 = 10:82 0:10:40:30:10:2 = 1:536 10,4
The prior probability = P q1 = i

i
&
'

What is the probability that the sequence remains in state i for exactly d time units? p d = P q1 = i; q2 = i; : : : ; q = i; q +1 = 6 i; : : : = a ,1 1 , a
i d d d i ii ii
Exponential Markov chain duration density. What is the expected value of the duration d in state i?
= d
i
1 X
=1 1 X =1
dp d
i
&
1 @ X @a 0=1a 1 @ a A = 1 , a @ @a 1 , a
d
= 1 , a
ii
da ,11 , a
d ii ii
1 X
= 1 , a
ii ii
=1
da ,1
d ii ii ii
ii d
1,a
ii
ii
ii
'

Avg. number of consecutive sunny days =
1 , a33 1 = 1 , 0:8 1 =5
Avg. number of consecutive cloudy days = 2.5 Avg. number of consecutive rainy days = 1.67
&
'
Hidden Markov Models

States are not observable Observations are probabilistic functions of state State transitions are still probabilistic
&
'
Urn and Ball Model

N urns containing colored balls M distinct colors of balls
Each urn has a possibly di erent distribution of colors Sequence generation algorithm: 1. Pick initial urn according to some random process. 2. Randomly pick a ball from the urn and then replace it 3. Select another urn according a random selection process associated with the urn 4. Repeat steps 2 and 3
&
'
N
The Trellis
STATES
4 3 2 1 1 2 t-1 t TIME o1 o2 o t-1 ot o t+1 o t+2 o T-1 oT t+1 t+2 T-1 T
OBSERVATION
&
'
Elements of Hidden Markov Models

N the number of hidden states Q set of states Q = f1; 2; : : : ; N g M the number of symbols V
set of symbols V = f1; 2; : : : ; M g

a = P q +1 = j jq = i 1 i; j; N
ij t t
A the state-transition probability matrix. B Observation probability distribution: B k = P o = kjq = j 1 k M

j t t
the initial state distribution: = P q1 = i 1 i N

i
&
the entire model = A; B;
'
Three Basic Problems

T
1. Given observation O = o1; o2; : : : ; o and model = A; B; ; e ciently compute P Oj: Hidden states complicate the evaluation Given two models 1 and 2, this can be used to choose the better one. 2. Given observation O = o1; o2; : : : ; o and model nd the optimal state sequence q = q1; q2; : : : ; q :
T T
Optimality criterion has to be decided e.g. maximum likelihood Explanation" for the data. 3. Given O = o1; o2; : : : ; o ; estimate model parameters = A; B; that maximize P Oj:
T
&
'
Solution to Problem 1
Problem: Compute P o1; o2; : : : ; o j
T
Algorithm: Let q = q1; q2; : : : ; q be a state sequence. Assume the observations are independent:
T
P Ojq; =
P o jq ; = b o1 b o2 b T o
T i
=1
q1
q2
Probability of a particular state sequence is:

P qj = a a a T ,
q1 q1 q2 q2 q3 q
1 qT
Also, P O; qj = P Ojq; P qj Enumerate paths and sum probabilities: P Oj = X P Ojq; P qj
q
&
N state sequences and OT calculations. Complexity: OTN calculations.

T T
'
Forward Procedure: Intuition
N
STATES
aNk a3k k
3 2 1 t
a1K
t+1 TIME
&
'
i
Forward Algorithm
De ne forward variable i as:
t t
i = P o1 ; o2; : : : ; o ; q = ij

t t
is the probability of observing the partial sequence o1; o2 : : : ; o such that the state q is i.
t t t
Induction: 1. Initialization: 1i = b o1 2. Induction:

i i t
2 X +1j = 4
N i
=1
ia
X
N
ij
3 5b
o +1
t
3. Termination:
P Oj =
i
=1
i
Complexity: ON 2T :
&
'
Example
Consider the following coin-tossing experiment: State 1 State 2 State 3 PH 0.5 0.75 0.25 PT 0.5 0.25 0.75 state-transition probabilities equal to 1 3 initial state probabilities equal to 1 3 1. You observe O = H; H; H; H; T; H; T; T; T; T : What state sequence, q; is most likely? What is the joint probability, P O; qj; of the observation sequence and the state sequence? 2. What is the probability that the observation sequence came entirely of state 1?
&
'
2 6 0:9 6 6 6 6 A0 = 6 6 0:05 6 6 6 4
3. Consider the observation sequence

~ = H; T; T; H; T; H; H; T; T; H : O
How would your answers to parts 1 and 2 change? 4. If the state transition probabilities were:
0:45 0:1 0:05 0:45 0:1
3 0:45 7 7 7 7 7 ; 7 0:45 7 7 7 7 5
how would the new model 0 change your answers to parts 1-3?
&
'
N
Backward Algorithm
4 3 2 1 1 2 t-1
1111111 0000000 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 aiN 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 000000 111111 0000000 1111111 000000 111111 0000000 1111111 000000 111111 0000000 1111111 000000 111111 000000 111111 0000000 1111111 000000 111111 0000000 1111111 0000000 1111111 000000 111111 0000000 1111111 000000 111111 i1111111 000000 111111 0000000 000000 111111 0000000 1111111 0000000 1111111 000000 111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 ai1 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111
STATES
t TIME
t+1
t+2
T-1
o1
o2
o t-1
ot
o t+1
o t+2
o T-1
oT
OBSERVATION
&
'
i
Backward Algorithm
De ne backward variable i as:
t t
i = P o +1; o +2; : : : ; o jq = i;

t t T t
is the probability of observing the partial sequence o +1; o +2 : : : ; o such that the state q is i.
t t t T t
Induction: 1. Initialization: i = 1 2. Induction:

T t
i =
X
N j
=1
a b o +1 +1j ; 1 i N; t = T , 1; : : : ; 1
ij j t t
&
'
Choose the most likely path Find the path q1; q ; : : : ; q that maximizes the likelihood: P q1; q2; : : : ; q jO;
t T T
Solution by Dynamic Programming De ne:

t
i = 1 max P q1; q2; : : : ; q = i; o1; o2; : : : ; o j 2 t,1

q ;q ;:::;q t t
i
is the highest prob. path ending in state i

+1j = max
i
By induction we have:
t t
ia b o +1
ij j t
&
'
N
Viterbi Algorithm
4 3 2 1 1 2 t-1
1111111 0000000 0000000 1111111 0000000 1111111 aNk 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 k 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 a1k 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111
STATES
t TIME
t+1
t+2
T-1
o1
o2
o t-1
ot
o t+1
o t+2
o T-1
oT
OBSERVATION
&
'
Viterbi Algorithm
Initialization:
1 i
= b o1;
i i
1iN
1i = 0
Recursion:
j = 1max ia b o ,1 j = arg 1max ia ,1
t i N t ij j t t i N t ij
2 t T; 1 j N
Termination:
P = 1max i q = arg 1max i
i N T T i N T
Path state sequence backtracking:
&
q = +1q+1; t = T , 1; T , 2; : : : ; 1
t t t
'
Estimate = A; B; to maximize P Oj No analytic method because of complexity iterative solution. Baum-Welch Algorithm: 1. Let initial model be 0: 2. Compute new based on 0 and observation
O:
3. If log P Oj , log P Oj0 DELTA stop. 4. Else set 0 and goto step 2.
&
'
Baum-Welch: Preliminaries
De ne i; j as the probability of being in state i at time t and in state j at time t + 1:
i; j =
ia b o +1 +1j P Oj ia b o +1 +1j = P P =1 =1 ia b o +1 +1j
t ij j t t t ij t j t t N i N j ij j t t
De ne i as probability of being in state i at time t; given the observation sequence.

t t
i =
X
N j
=1
i; j
t
is the expected number of times state i is visited.

P
T t
=1
i
is the expected number of transitions from state i to state j:

P ,1 i; j
T t
=1
&
'
i ij
Baum-Welch: Update Rules

= expected frequency in state i at time t = 1 = 1i: a = expected number of transition from state i to state j expected nubmer of transitions from state i: P i; j a = P i
t ij t j
b k = expected number of
times in state j and observing symbol k expected number of times in state j : P

b k =
j t;o
t =k
t
j j
t
&
'
Properties
Covariance of the estimated parameters Convergence rates
&
'
Ergodic
Types of HMM
Continuous density State duration
&
'
Implementation Issues
Scaling Initial parameters Multiple observation
&
'
Comparison of HMMs
What is a natural distance function? If 1; 2 is large, does it mean that the models are really di erent?
&

Hmmtut

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hmmtut

Uploaded by

Copyright:

Available Formats

'

HMM Tutorial c Tapas Kanungo 1

Hidden Markov Models

HMM Tutorial c Tapas Kanungo 2

HMM Tutorial c Tapas Kanungo 3

First order Markov assumption:

HMM Tutorial c Tapas Kanungo 4

HMM Tutorial c Tapas Kanungo 5

Markov Models: Example

0:3 0:6 0:1 0:1 0:8

Compute the probability of observing SSRRSCS given that today is S .

HMM Tutorial c Tapas Kanungo 6

Markov Models: Example

Basic conditional probability rule: The Markov chain rule:

HMM Tutorial c Tapas Kanungo 7

Markov Models: Example

Using the chain rule we get:

The prior probability  = P q1 = i

HMM Tutorial c Tapas Kanungo 8

Markov Models: Example

HMM Tutorial c Tapas Kanungo 9

Markov Models: Example

HMM Tutorial c Tapas Kanungo 10

Hidden Markov Models

HMM Tutorial c Tapas Kanungo 11

Urn and Ball Model

HMM Tutorial c Tapas Kanungo 12

4 3 2 1 1 2 t-1 t TIME o1 o2 o t-1 ot o t+1 o t+2 o T-1 oT t+1 t+2 T-1 T

HMM Tutorial c Tapas Kanungo 13

Elements of Hidden Markov Models

set of symbols V = f1; 2; : : : ; M g

A the state-transition probability matrix. B Observation probability distribution: B k = P o = kjq = j  1  k  M

 the initial state distribution:  = P q1 = i 1  i  N

 the entire model  = A; B; 

HMM Tutorial c Tapas Kanungo 14

Three Basic Problems

HMM Tutorial c Tapas Kanungo 15

Probability of a particular state sequence is:

N state sequences and OT  calculations. Complexity: OTN  calculations.

HMM Tutorial c Tapas Kanungo 16

Forward Procedure: Intuition

HMM Tutorial c Tapas Kanungo 17

i = P o1 ; o2; : : : ; o ; q = ij

Induction: 1. Initialization: 1i =  b o1 2. Induction:

HMM Tutorial c Tapas Kanungo 18

HMM Tutorial c Tapas Kanungo 19

3. Consider the observation sequence

HMM Tutorial c Tapas Kanungo 20

HMM Tutorial c Tapas Kanungo 21

i = P o +1; o +2; : : : ; o jq = i; 

Induction: 1. Initialization: i = 1 2. Induction:

HMM Tutorial c Tapas Kanungo 22

Solution by Dynamic Programming De ne:

i = 1 max P q1; q2; : : : ; q = i; o1; o2; : : : ; o j 2 t,1

is the highest prob. path ending in state i

HMM Tutorial c Tapas Kanungo 23

HMM Tutorial c Tapas Kanungo 24

Path state sequence backtracking:

HMM Tutorial c Tapas Kanungo 25

HMM Tutorial c Tapas Kanungo 26

De ne i as probability of being in state i at time t; given the observation sequence.

is the expected number of times state i is visited.

is the expected number of transitions from state i to state j:

The prior probability = P q1 = i

A the state-transition probability matrix. B Observation probability distribution: B k = P o = kjq = j 1 k M

the initial state distribution: = P q1 = i 1 i N

the entire model = A; B;

N state sequences and OT calculations. Complexity: OTN calculations.

i = P o1 ; o2; : : : ; o ; q = ij

Induction: 1. Initialization: 1i = b o1 2. Induction:

i = P o +1; o +2; : : : ; o jq = i;

Induction: 1. Initialization: i = 1 2. Induction:

i = 1 max P q1; q2; : : : ; q = i; o1; o2; : : : ; o j 2 t,1

Path state sequence backtracking:

De ne i as probability of being in state i at time t; given the observation sequence.

b k = expected number of

times in state j and observing symbol k expected number of times in state j : P