You are on page 1of 17

Hidden Markov Models

Machine Learning 

CSx824/ECEx242
Bert Huang
Virginia Tech
Outline

• Hidden Markov models (HMMs)

• Forward-backward for HMMs

• Baum-Welch learning (expectation maximization)


Hidden State Transitions

? ?

submarine ?
Hidden State Transitions

? ?

submarine ?
Hidden State Transitions

? ?

submarine
Hidden Markov Models
p(yt |xt ) observation probability SONAR noisiness

p(xt |xt 1) transition probability submarine locomotion


TY1 T
Y
p(X , Y ) = p(x1 ) p(xt+1 |xt ) p(yt 0 |xt 0 )
t=1 t 0 =1

x1 x2 x3 x4 x5

y1 y2 y3 y4 y5
Hidden State Inference
p(X |Y ) p(xt |Y )

↵t (xt ) = p(xt , y1 , ... , yt ) t (xt ) = p(yt+1 , ... , yT |xt )

↵t (xt ) t (xt ) = p(xt , y1 , ... , yt )p(yt+1 , ... , yT |xt ) = p(xt , Y ) / p(xt |Y )

normalize to get conditional probability

note: not the same as p(x1 , ... , xT , Y )


Forward Inference
↵t (xt ) = p(xt , y1 , ... , yt )

p(x1 , y1 ) = p(x1 )p(y1 |x1 ) = ↵1 (x1 )

X X
p(x2 , y1 , y2 ) = p(x1 , y1 )p(x2 |x1 )p(y2 |x2 ) = ↵2 (x2 ) = ↵1 (x1 )p(x2 |x1 )p(y2 |x2 )
x1 x1
X
p(xt+1 , y1 , ... , yt+1 ) = ↵t+1 (xt+1 ) = ↵t (xt )p(xt+1 |xt )p(yt+1 |xt+1 )
xt
Backward Inference
t (xt ) = p(yt+1 , ... , yT |xt )

p({}|xT ) = 1 = T (xT )
X
t 1 (xt 1 ) = p(yt , ... , yT |xt 1) = p(xt |xt 1 )p(yt , yt+1 , ... , yT |xt )
xt
X
= p(xt |xt 1 )p(yt |xt )p(yt+1 , ... , yT |xt )
xt
X
= p(xt |xt 1 )p(yt |xt ) t (xt )
xt
Backward Inference
t (xt ) = p(yt+1 , ... , yT |xt )

p({}|xT ) = 1 = T (xT )

X
t 1 (xt 1 ) = p(yt , ... , yT |xt 1) = p(xt |xt 1 )p(yt |xt ) t (xt )
xt
Fusing the Messages
↵t (xt ) = p(xt , y1 , ... , yt ) t (xt ) = p(yt+1 , ... , yT |xt )

↵t (xt ) t (xt ) = p(xt , y1 , ... , yt )p(yt+1 , ... , yT |xt ) = p(xt , Y ) / p(xt |Y )

p(xt , xt+1 , y1 , ... , yt , yt+1 , yt+2 , ... , yT )


p(xt , xt+1 |Y ) =
p(Y )
p(xt , y1 , ... , yt )p(xt+1 |xt )p(yt+2 , ... , yT |xt+1 )p(yt+1 |xt+1 )
= P
xT p(x t , Y )
↵t (xt )p(xt+1 |xt ) t+1 (xt+1 )p(yt+1 |xt+1 )
= P
xT ↵T (xT )
Forward-Backward Inference
X
↵1 (x1 ) = p(x1 )p(y1 |x1 ) ↵t+1 (xt+1 ) = ↵t (xt )p(xt+1 |xt )p(yt+1 |xt+1 )
xt
X
T (xT ) = 1 t 1 (xt 1 ) = p(xt |xt 1 )p(yt |xt ) t (xt )
xt

↵t (xt ) t (xt )
p(xt , Y ) = ↵t (xt ) t (xt ) p(xt |Y ) = P 0 0
0
xt ↵ t (x t ) t (x t )

↵t (xt )p(xt+1 |xt ) t+1 (xt+1 )p(yt+1 |xt+1 )


p(xt , xt+1 |Y ) = P
xT ↵T (xT )
Normalization
To avoid underflow, re-normalize at each time step

↵t (xt ) ˜t (xt ) = P t (xt )


˜ t (xt ) = P
↵ 0 0
xt0 ↵t (xt ) xt0 t (xt )

Exercise: why is this okay?


Learning
• Parameterize and learn


 p(xt+1 |xt ) p(yt |xt )

conditional probability table observation model
transition matrix emission model
• If fully observed, super easy!

• If x is hidden (most cases) treat as latent variable

• E.g., expectation maximization


Baum-Welch Algorithm

EM using forward-backward inference as E-step


Baum-Welch Details
Compute p(xt |Y ) and p(xt , xt+1 |Y ) using forward-backward

Maximize weighted (expected) log-likelihood

T e.g., Gaussian
1 X PT
p(x1 ) p(xt |Y ) or p(x1 |Y ) t=1 p(x t = x|Y )y t
T µx PT
t=1 p(x = x|Y )
0
t =1 t
PT 1 PT
t=1p(xt+1 = i, xt = j|Y ) p(x = x|Y )I (y = y )
p(xt 0 +1 = i|xt 0 = j) PT 1 t=1 t t
p(y |x) PT
t=1 p(xt = j|Y )
0
t =1 p(x t = x|Y )
e.g., multinomial
Summary
• HMMs represent hidden states

• Transitions between adjacent states

• Observation based on states

• Forward-backward inference to incorporate all evidence

• Expectation maximization to train parameters (Baum-Welch)

• Treat states as latent variables

You might also like