You are on page 1of 31

'

Tapas Kanungo

HMM Tutorial c Tapas Kanungo 1

Hidden Markov Models

Center for Automation Research University of Maryland Web: www.cfar.umd.edu ~kanungo Email: kanungo@cfar.umd.edu

&

'
1. Markov models

HMM Tutorial c Tapas Kanungo 2

Outline

2. Hidden Markov models 3. Forward Backward algorithm 4. Viterbi algorithm 5. Baum-Welch estimation algorithm

&

'

HMM Tutorial c Tapas Kanungo 3

Markov Models
Observable states:
1; 2; : : : ; N

Observed sequence:
q1; q2; : : : ; q ; : : : ; q
t T

First order Markov assumption:


P q = j jq ,1 = i; q ,2 = k; : : : = P q = j jq ,1 = i
t t t t t

Stationarity:
P q = j jq ,1 = i = P q + = j jq + ,1 = i
t t t l t l

&

'

HMM Tutorial c Tapas Kanungo 4

Markov Models
State transition matrix A : 2 6 a11 a12    a1 6 6 6 6 6 6 a21 a22    a2 6 6 6 6 .. ..    .. 6 6 6 A=6 6 6 6 a 1 a 2  a 6 6 6 6 .. ..    .. 6 6 6 6 6 4a 1 a 2  a where
i i N N

ij

Nj

     

a1 a2 a a

.. ..

iN

NN

3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5

a = P q = j jq ,1 = i
ij t t

1  i; j;  N

Constraints on a :
ij

&

a  0; X a = 1;
ij N

=1

ij

8i; j 8i

'

HMM Tutorial c Tapas Kanungo 5

Markov Models: Example


States: 1. Rainy R 2. Cloudy C  3. Sunny S  State transition probability matrix:
2 6 0:4 6 6 6 6 A=6 6 0:2 6 6 6 4

0:3 0:6 0:1 0:1 0:8

3 0:3 7 7 7 7 7 7 0:2 7 7 7 7 5

Compute the probability of observing SSRRSCS given that today is S .

&

'

HMM Tutorial c Tapas Kanungo 6

Markov Models: Example


P A; B  = P AjB P B 

Basic conditional probability rule: The Markov chain rule:


P q1; q2; : : : ; q  = P q jq1; q2 ; : : : ; q ,1P q1 ; q2; : : : ; q ,1 = P q jq ,1P q1 ; q2; : : : ; q ,1 = P q jq ,1P q ,1jq ,2P q1 ; q2; : : : ; q ,2 = P q jq ,1P q ,1jq ,2    P q2 jq1 P q1 
T T T T T T T T T T T T T T T T

&

'

HMM Tutorial c Tapas Kanungo 7

Markov Models: Example


Observation sequence O :
O = S; S; S; R; R; S; C; S 

Using the chain rule we get:


P Ojmodel = P S; S; S; R; R; S; C; S jmodel = P S P S jS P S jS P RjS P RjR  P S jRP C jS P S jC  = 3a33 a33 a31a11 a13 a32 a23 = 10:82 0:10:40:30:10:2 = 1:536  10,4

The prior probability  = P q1 = i


i

&

'

HMM Tutorial c Tapas Kanungo 8

Markov Models: Example


What is the probability that the sequence remains in state i for exactly d time units? p d = P q1 = i; q2 = i; : : : ; q = i; q +1 = 6 i; : : : =  a  ,1 1 , a 
i d d d i ii ii

Exponential Markov chain duration density. What is the expected value of the duration d in state i?
 = d
i

1 X

=1 1 X =1

dp d
i

&

1 @ X @a 0=1a  1 @ a A = 1 , a  @ @a 1 , a
d

= 1 , a 
ii

da  ,11 , a 
d ii ii

1 X

= 1 , a 
ii ii

=1

da  ,1
d ii ii ii

ii d

1,a

ii

ii

ii

'

HMM Tutorial c Tapas Kanungo 9

Markov Models: Example


Avg. number of consecutive sunny days =
1 , a33 1 = 1 , 0:8 1 =5

Avg. number of consecutive cloudy days = 2.5 Avg. number of consecutive rainy days = 1.67

&

'

HMM Tutorial c Tapas Kanungo 10

Hidden Markov Models


States are not observable Observations are probabilistic functions of state State transitions are still probabilistic

&

'

HMM Tutorial c Tapas Kanungo 11

Urn and Ball Model


N urns containing colored balls M distinct colors of balls

Each urn has a possibly di erent distribution of colors Sequence generation algorithm: 1. Pick initial urn according to some random process. 2. Randomly pick a ball from the urn and then replace it 3. Select another urn according a random selection process associated with the urn 4. Repeat steps 2 and 3

&

'
N

HMM Tutorial c Tapas Kanungo 12

The Trellis

STATES

4 3 2 1 1 2 t-1 t TIME o1 o2 o t-1 ot o t+1 o t+2 o T-1 oT t+1 t+2 T-1 T

OBSERVATION

&

'

HMM Tutorial c Tapas Kanungo 13

Elements of Hidden Markov Models


N the number of hidden states Q set of states Q = f1; 2; : : : ; N g M the number of symbols V

set of symbols V = f1; 2; : : : ; M g


a = P q +1 = j jq = i 1  i; j;  N
ij t t

A the state-transition probability matrix. B Observation probability distribution: B k = P o = kjq = j  1  k  M


j t t

 the initial state distribution:  = P q1 = i 1  i  N


i

&

 the entire model  = A; B; 

'

HMM Tutorial c Tapas Kanungo 14

Three Basic Problems


T

1. Given observation O = o1; o2; : : : ; o  and model  = A; B; ; e ciently compute P Oj: Hidden states complicate the evaluation Given two models 1 and 2, this can be used to choose the better one. 2. Given observation O = o1; o2; : : : ; o  and model  nd the optimal state sequence q = q1; q2; : : : ; q :
T T

Optimality criterion has to be decided e.g. maximum likelihood Explanation" for the data. 3. Given O = o1; o2; : : : ; o ; estimate model parameters  = A; B;  that maximize P Oj:
T

&

'

HMM Tutorial c Tapas Kanungo 15

Solution to Problem 1
Problem: Compute P o1; o2; : : : ; o j
T

Algorithm: Let q = q1; q2; : : : ; q  be a state sequence. Assume the observations are independent:
T

P Ojq;  =

P o jq ;  = b o1 b o2    b T o 
T i

=1
q1

q2

Probability of a particular state sequence is:


P qj =  a a    a T ,
q1 q1 q2 q2 q3 q

1 qT

Also, P O; qj = P Ojq; P qj Enumerate paths and sum probabilities: P Oj = X P Ojq; P qj
q

&

N state sequences and OT  calculations. Complexity: OTN  calculations.


T T

'

HMM Tutorial c Tapas Kanungo 16

Forward Procedure: Intuition

N
STATES

aNk a3k k

3 2 1 t

a1K

t+1 TIME

&

'
i

HMM Tutorial c Tapas Kanungo 17

Forward Algorithm
De ne forward variable i as:
t t

i = P o1 ; o2; : : : ; o ; q = ij


t t

is the probability of observing the partial sequence o1; o2 : : : ; o  such that the state q is i.
t t t

Induction: 1. Initialization: 1i =  b o1 2. Induction:


i i t

2 X +1j  = 4
N i

=1

ia
X
N

ij

3 5b

o +1
t

3. Termination:
P Oj =
i

=1

i

Complexity: ON 2T :

&

'

HMM Tutorial c Tapas Kanungo 18

Example

Consider the following coin-tossing experiment: State 1 State 2 State 3 PH 0.5 0.75 0.25 PT 0.5 0.25 0.75 state-transition probabilities equal to 1 3 initial state probabilities equal to 1 3 1. You observe O = H; H; H; H; T; H; T; T; T; T : What state sequence, q; is most likely? What is the joint probability, P O; qj; of the observation sequence and the state sequence? 2. What is the probability that the observation sequence came entirely of state 1?

&

'
2 6 0:9 6 6 6 6 A0 = 6 6 0:05 6 6 6 4

HMM Tutorial c Tapas Kanungo 19

3. Consider the observation sequence


~ = H; T; T; H; T; H; H; T; T; H : O

How would your answers to parts 1 and 2 change? 4. If the state transition probabilities were:
0:45 0:1 0:05 0:45 0:1
3 0:45 7 7 7 7 7 ; 7 0:45 7 7 7 7 5

how would the new model 0 change your answers to parts 1-3?

&

'
N

HMM Tutorial c Tapas Kanungo 20

Backward Algorithm

4 3 2 1 1 2 t-1

1111111 0000000 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 aiN 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 000000 111111 0000000 1111111 000000 111111 0000000 1111111 000000 111111 0000000 1111111 000000 111111 000000 111111 0000000 1111111 000000 111111 0000000 1111111 0000000 1111111 000000 111111 0000000 1111111 000000 111111 i1111111 000000 111111 0000000 000000 111111 0000000 1111111 0000000 1111111 000000 111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 ai1 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111

STATES

t TIME

t+1

t+2

T-1

o1

o2

o t-1

ot

o t+1

o t+2

o T-1

oT

OBSERVATION

&

'
i

HMM Tutorial c Tapas Kanungo 21

Backward Algorithm
De ne backward variable i as:
t t

i = P o +1; o +2; : : : ; o jq = i; 


t t T t

is the probability of observing the partial sequence o +1; o +2 : : : ; o  such that the state q is i.
t t t T t

Induction: 1. Initialization: i = 1 2. Induction:


T t

i =

X
N j

=1

a b o +1 +1j ; 1  i  N; t = T , 1; : : : ; 1
ij j t t

&

'

HMM Tutorial c Tapas Kanungo 22

Solution to Problem 2
Choose the most likely path Find the path q1; q ; : : : ; q  that maximizes the likelihood: P q1; q2; : : : ; q jO; 
t T T

Solution by Dynamic Programming De ne:


t

i = 1 max P q1; q2; : : : ; q = i; o1; o2; : : : ; o j 2 t,1


q ;q ;:::;q t t

i

is the highest prob. path ending in state i


+1j  = max
i

By induction we have:
t t

ia  b o +1
ij j t

&

'
N

HMM Tutorial c Tapas Kanungo 23

Viterbi Algorithm

4 3 2 1 1 2 t-1

1111111 0000000 0000000 1111111 0000000 1111111 aNk 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 k 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 a1k 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111

STATES

t TIME

t+1

t+2

T-1

o1

o2

o t-1

ot

o t+1

o t+2

o T-1

oT

OBSERVATION

&

'

HMM Tutorial c Tapas Kanungo 24

Viterbi Algorithm
Initialization:
1 i

=  b o1;
i i

1iN

1i = 0

Recursion:
j  = 1max ia b o    ,1  j  = arg 1max ia   ,1
t i N t ij j t t i N t ij

2  t  T; 1  j  N

Termination:
P  = 1max i  q = arg 1max i 
i N T T i N T

Path state sequence backtracking:

&

q =  +1q+1; t = T , 1; T , 2; : : : ; 1
t t t

'

HMM Tutorial c Tapas Kanungo 25

Solution to Problem 3
Estimate  = A; B;  to maximize P Oj No analytic method because of complexity iterative solution. Baum-Welch Algorithm: 1. Let initial model be 0: 2. Compute new  based on 0 and observation
O:

3. If log P Oj , log P Oj0 DELTA stop. 4. Else set 0   and goto step 2.

&

'

HMM Tutorial c Tapas Kanungo 26

Baum-Welch: Preliminaries
De ne i; j  as the probability of being in state i at time t and in state j at time t + 1:
i; j  =
ia b o +1 +1j  P Oj ia b o +1 +1j  = P P =1 =1 ia b o +1 +1j 
t ij j t t t ij t j t t N i N j ij j t t

De ne i as probability of being in state i at time t; given the observation sequence.


t t

i =

X
N j

=1

 i; j 
t

is the expected number of times state i is visited.


P
T t

=1

i

is the expected number of transitions from state i to state j:


P ,1  i; j 
T t

=1

&

'
i ij

HMM Tutorial c Tapas Kanungo 27

Baum-Welch: Update Rules


  = expected frequency in state i at time t = 1 = 1i: a  = expected number of transition from state i to state j  expected nubmer of transitions from state i: P  i; j  a  = P i
t ij t j

 b k = expected number of

times in state j and observing symbol k expected number of times in state j : P


 b k =
j t;o

t =k
t

j  j 
t

&

'

HMM Tutorial c Tapas Kanungo 28

Properties
Covariance of the estimated parameters Convergence rates

&

'
Ergodic

HMM Tutorial c Tapas Kanungo 29

Types of HMM
Continuous density State duration

&

'

HMM Tutorial c Tapas Kanungo 30

Implementation Issues
Scaling Initial parameters Multiple observation

&

'

HMM Tutorial c Tapas Kanungo 31

Comparison of HMMs
What is a natural distance function? If 1; 2 is large, does it mean that the models are really di erent?

&

You might also like