You are on page 1of 9

Tm hiu m hnh Markov n Phn 1: Tng quan v m hnh Markov n M hnh Markov c ng dng nhiu trong lnh vc nhn

dng ging ni, trong lnh vc sinh hc nh nhn dng gene hoc phn loi protein; x l tn hiu, x l hnh nh v cc ng dng khc lin quan n chui chuyn tip hoc kt hp cc thnh phn, d kin. Trong lnh vc in, m hnh Markov c s dng nh l 1 cng c d bo gi in nng vi cc d liu lin quan. Mt trong nhng thng s c trng ca m hnh Markov l cc trng thi state. Ty thuc vo vic xy dng m hnh Markov vi cc i tng khc nhau th s c cc state khc nhau. M hnh Markov c nhng hn ch trong nhiu ng dng nh cc gi tr chuyn state l cc gi tr p t sn, khng thay i trong khi i tng, d liu quan st lun bin i theo thi gian. khc phc tnh trng ny, chng ta s dng m hnh Hidden Markov (Hidden Markov Model HMM). M hnh Markov n (ting Anh l Hidden Markov Model - HMM) l m hnh thng k trong h thng c m hnh ha c cho l mt qu trnh Markov vi cc tham s khng bit trc v nhim v l xc nh cc tham s n t cc tham s quan st c, da trn s tha nhn ny. Cc tham s ca m hnh c rt ra sau c th s dng thc hin cc phn tch k tip, v d cho cc ng dng nhn dng mu. Trong mt m hnh Markov in hnh, trng thi c quan st trc tip bi ngi quan st, v v vy cc xc sut chuyn tip trng thi l cc tham s duy nht. M hnh Markov n thm vo cc u ra: mi trng thi c xc sut phn b trn cc biu hin u ra c th. V vy, nhn vo dy ca cc biu hin c sinh ra bi HMM khng trc tip ch ra dy cc trng thi. Nhiu bi ton thc t c biu din di mi quan h nhn qu, nhng ch quan st c phn qu cn phn nhn th n. HMM dng gii quyt cc bi ton xc lp mi nhn qu cc b (Fragmentation,Classification, Similarity Search). Mt m hnh Markov n bao gm cc thng s sau: 1) S trng thi state N c trong m hnh v cc trng thi ny l n. Cc trng thi ny s c biu th tng ng vi gi tr S=(S1, ., SN) gi l tp tt c cc trng thi n. 2) M, S symbol trn mi dy quan st trong mt State. Cc symbol ny s c biu th tng ng bi cc gi tr V=(V1, , VM) gi l tp tt c cc k hiu quan st c. 3) A= [aij] xc sut chuyn trng c xc nh theo biu thc sau: Trong trng hp c bit, khi cc trng thi l nh nhau in a single step , ta c aij > 0 i vi tt c cc gi tr i v j. Trong mt vi loi hnh khc ca HMM, ta chi aij = 0 cho

mt vi cp (i,j). 4) B=[bij] xc sut nh k hiu. 5) p= [pi] xc sut khi trng 6) qt - Trng thi thi im t. 7) Ot= (k hiu) Quan st ti thi im t. Cho cc gi tr N, M, A, B v thch hp, m hnh HMM s cho ra mt chui quan st nh sau: O = O1 O2 OT (Trong , mi gi tr Ot l mt trong s cc symbol ca tp cc k hiu quan st c V v T l s lng chui quan st.) - Chn trng thi khi u l q1 = S1 tu thuc vo xc xut khi trng . - Cho t = 1 - Chn Ot = Vk theo xc sut nh k t bi(k) ca tp cc trng thi n Si - V chuyn sang trng thi mi qt+1 = Sj tu theo xc sut chuyn trng aij ca tp cc trng thi n. - Thit lp t = t +1 v sau quay li thc hin ln lc nh bc (3). Nu t < T th kt thc thut ton. .ng dng ca m hnh Markov n: M hnh ny c ng dng nhiu trong cc lnh vc nh: Nhn dng ting ni. Nhn dng ch vit tay. X l ngn ng thng k. Dch my. Tin sinh hc: Khp xp x nhiu chui. Tm Motif. - Tm kim tng t.

Thuc tnh Markov Mt dy trng thi ngu nhin gi l c thuc tnh Markov nu nh xc sut chuyn sang trng thi tip theo ch ph thuc vo trng thi hin ti v qu kh. Dy chuyn trng quan st c Xch Markov. Dy chuyn trng khng quan st c M hnh Markov n. M hnh Markov n HMM Di y l m hnh th chuyn trng. Trong : Cc node l cc trng thi. Cc cung l cc chuyn trng c gn xc sut. Cc node nh cc k hiu theo xc sut. Cc k hiu nh quan st c trong khi dy chuyn trng l n.

Xt v d: D bo thi tit Trng hp t ra rng: bn ang b kho trong mt cn phng trong mt vi ngy v bn c hi thng bo v thng tin thi tit. Ch mt manh mi nh bn bit c thi tit bn ngoi nh th no l da vo ngi trng nh thng mang thc n n c mang theo d hay khng. Hy cho xc sut cc trng hp nh sau: Xc sut Nng 0.1 Ma 0.8 Sng m 0.3 Biu thc Markov cho d on thi tit khi bn cha b kho trong phng l: Hin thi, thng tin v thi tit l n i vi bn. Chng ta s dng nh lut Bayes: Trong ui l TRUE nu ngi trng nh n c mang d vo ngy i v l FALSE nu ngi trng nh khng mang d. Xc sut P(w1, , wn) c xc nh ging m hnh Markov v xc sut P(u1,,un) l xc sut tng thy ngi trng nh mang d (v d: TRUE,FALSE,TRUE). Xc sut P(u1, , un|w1,,wn) c th c c lc thnh nu gi s cc gi tr wi, ui c lp vi cc gi tr uj v wj vi ij. Xc nh thng tin Gi s rng ngy bn ang trong phng l nng. Ngy tip theo, ngi trng nh mang chic n phng. Gi s xc sut cc ngy trc ngi trng nh cng mang theo chic l 0.5 vy xc sut ngy tip theo tri ma s c tnh nh sau:

Gi s ngy bn trong khng l ngy nng, ngi trng nh mang theo vo ngy th 2 nhng ngy th 3 li khng mang . Li gi s xc sut ngy ngi trng nh mang l tri ma l 0.5 vy xc sut ngy th 3 tri sng m s c xc nh nh sau:

. Nhn dng ting ni i vi vn nhn dng ting ni, tng c bn l tm ra mt chui cc t cho mt vi m Trong , w l chui cc t ngn ng bn dung v y l h thng cc vector m iu. Phng trnh xc nh ging ni theo nh lut Bayer c xc nh nh sau: i vi mt single speech input, gi tr m tit (y) s l mt hng s v t P(y) cng l mt hng s. T , ta ch cn xc nh gi tr m ca t Trc tin , chng ta xc nh xc sut ca cc m tit c mt trong mt t cho. L s c rt nhiu vector m tit hnh thnh nn mt t. Di y l mt m hnh ca t of V d: nhn dng ting ni

Ba bi ton c bn ca HMM Bi ton th nht: (The Evaluation Problem) Cho dy quan st O=o1o2...oT v HMM - l hy xc nh xc sut sinh dy t m hnh P(O| l).

Bi ton th hai: (The Decoding Problem) Cho dy quan st O=o1o2...oT v HMM - l hy xc nh dy chuyn trng Q = q1q2...qT cho xc sut sinh O ln nht (optimal path).

Bi ton th 3: (The Learning Problem) Hiu chnh HMM - l cc i ho xc sut sinh O P(O| l) (tm m hnh khp dy quan st nht.

Trong , P(O | T) l xc xut c to ra bi m hnh ca dy O. M hnh ny da vo kin trc ( S lng trng thi) v cc thng s = (ei(.), aij) ca n, v vy P(O| T) c xem nh l P(O| ) v P(O) khi kin trc v thng s tng ng l khc nhau. P(O, | T), P(O, | ) and P(O, ) l ging nhau khi c hai thng s v kin trc u mc nh. Tuy nhin trong bi ton th 3, chng ta lun ghi P(O| ) nhn mnh rng chng ta ang tm gi tr * P(O| ) t gi tr ln nht. Gii quyt bi ton Bi ton th nht: Mc ch ca bi ton ny l xc nh xc sut ca mt chui quan st O = O1 O2 OT Cho mt chui trng thi c nh Q = q1 q2 qT Trong , q1 l trng thi khi im. Xc sut ca chui quan st O ca chui trng thi Q i trn s l: Gi s cc trng thi l c lp vi nhau trong chui quan st. Ta s c: Xc sut ca mt chui trng thi c th c vit nh sau: Xc sut cn tm c xc nh: Xc sut ca chui quan st O c tnh bng cch cng tt c cc xc sut ca chui trng thi thu dc nh sau: ng vi thi gian khi im t1, ta c trng thi q1 t , ta xc nh c xc sut khi trng 1. Nh vy, kt qu ta thu c symbol O1 v xc sut nh k hiu bq1(O1). Tip tc thc hin tng t nh th n thi im (t - 1), ta s thu c symbol OT v xc sut nh k hiu bqT(OT). Bi ton th 2 Khng ging bi ton th nht, c nhiu cch gii quyt bi ton th 2. gii quyt vn ny, ta a ra mt bin vi Biu thc trn c th c vit li di dng: Trong : t(i) biu th cho chui quan st O1 O2 .. Ot trng thi Si vo thi im t, trong khi t(i) biu th cho phn cn li ca chui quan st Ot+1 Ot+2 ... OT trng thi Si vo thi im t. Cng thc c bn s c vit li: T y, ta xc nh c trng thi qt thi gian t nh sau: Bi ton th 3: Bi ton th 3 ny kh hn rt nhiu so vi hai bi ton trc. Yu cu ca HMM l xc nh mt phng thc iu khin cc thng s (A, B, ) ca m hnh nhm cc i xc sut ca chui quan st. Hin nay cha c gii quyt no c a ra phn tch gii quyt cc i xc sut ca chui quan st. Trn thc t, khi cho bt k chui quan st di v hn ta khng c cch no c lng ti u cc thng s ca m hnh c. Tuy

nhin, chng ta c th chn gi tr cc thng s = (A, B, ) sao cho gi tr P(O|) l cc i cc b bng cch s dng gii thut lp nh thut ton Baum-Welch. gii bi ton ny, gi t(i,j) l xc sut ca chui k t n St ti thi im t v chui k t n Sj ti thi im t +1. Ta c: Chui s kin t cng thc trn c m t nh hnh di y t(i,j) c th c vit li di dng sau: Trong , thnh phn mu s chnh l P=(qt = Si , qt+1 = Sj, O|). trn, chng ta chn t(i) l xc sut ca chui k t n Si ti thi im t. t(i) v t(i,j) quan h vi nhau qua cng thc: Nu chng ta cng tt c cc gi tr t(i) ng vi tng thi im t, chng ta s xc nh c mt n s y chnh l s ln xut hin trng thi n Si trong sut qung thi gian kho st.Cng tng t nh trn, vic cng tt c cc gi tr t(i,j) trong khong thi gian t ( t t = 1 cho n t = T-1 s xc nh c s ln chuyn t trng thi Si sang Sj. S dng cng thc trn, ta c th a ra mt phng thc ti c tnh cc thng s ca m hnh HMM . Cng thc thch hp cho vic ti c tnh cc thng s , A v B nh sau: Nu chng ta chn c cc thng s = (A, B, ) cho m hnh HMM, v s dng cc thng s ny tnh gi tr v phi ca hai biu thc trn v chng ta xc nh m hnh ti c tnh v tri. V nh th, chng ta tm c m hnh c chui quan st rt ging vi chui c to ra. Cng theo phng thc trn, chng ta thc hin lp li cc cng on nh trn i vi m hnh thay v m hnh v thc hin cng vic ti c tnh nh trn. V nh vaa, ta tnh c cc gi tr xc sut ca chui quan st O cho n khi xut hin cc im gii hn. Kt qu cui ca phng thc ti c tnh ny s l mt c tnh likehood cc i ca m hnh HMM. Thut ton lan truyn xui Cho t(i) l xc sut ca chui quan st Ot = {o(1), o(2), ... , o(t)}, ta c: t(i) = P(o(1), o(2), ... , o(t) | q(t) = qi ). Xc sut tng i ca chui quan st l tng ca cc gi tr t(i) ( vi i = 1 ,.. ,N)

Thut ton lan truyn xui l mt thut ton quy, u tin, xc sut cho mt chui cc symbol s c tnh ton xc sut ti trng thi khi u th I v xc sut pht ra symbol o(1) trng thi th i. Sau cng thc c quy s c p dng. Gi s chng ta xc nh c t(i) ti mt vi t. tnh t+1(j), ta nhn cc gi tr t(i) vi cc xc sut

truyn tng ng ti cc gi tr trng thi t trng thi i n trng thi j sau cng cc gi tr ti cc trng thi li vi nhau v sau nhn vi xc sut pht ra symbol o(t+1). Kt thc qu trnh tnh ton, ta xc nh c gi tr T(i) v sau cng cc gi tr thu c t cc trng thi, ta thu dc kt qu cn tm.

M phng thut ton: Khi u: 1(i) = pi bi(o(1)) , i =1, ... , N Php quy Vi i=1,,N; t=1,,T-1 Kt thc Thut ton lan truyn ngc Theo cch tng t, chng ta c th to ra mt chui i xng ngc t(i) Xc sut tuyt i ca chui quan st t o(t+1) n kt thc c to ra bi chui trng thi s bt u t trng thi th i t(i) = P(o(t+1), o(t+2), ... , o(T) | q(t) = qi ). M phng thut ton Khi u T (i) = 1 , i =1, ... , N Tu thuc vo xc nh trn, T (i) khng tn ti. This is a formal extension of the below recursion to t = T. quy Vi i =1, ... , N , t = T - 1, T - 2 , . . . , 1 Kt thc Mt cch hin nhin, c hai thu ton lan truyn xui v lan truyn ngc phi cho ra gi tr xc sut tng P(O) = P(o(1), o(2), ... , o(T) ) l nh nhau. Thut tonViterbi y l thut ton chn trng thi tt nht t gi tr cc i ca chui quan st. Gi t(i) l xc sut cc i ca chui trng thi c di t. Ta c t(i) = max{P(q(1), q(2), ..., q(t-1) ; o(1), o(2), ... , o(t) | q(t) = qi ).} The Viterbi algorithm is a dynamic programming algorithm that uses the same schema as the Forward algorithm except for two differences:

1. It uses maximization in place of summation at the recursion and termination steps. 2. It keeps track of the arguments that maximize t(i) for each t and i, storing them in the N by T matrix . This matrix is used to retrieve the optimal state sequence at the backtracking step. M phng thut ton Khi u 1(i) = pi bi(o(1)) 1(i) = 0 , i =1, ... , N Tu thuc vo xc nh trn, T(i) khng tn ti. y l cng thc m rng ca phng trnh quy sau quy t ( j) = max i [t - 1(i) aij] b j (o(t)) t( j) = arg max i [t - 1(i) aij] Kt thc thut ton p* = max i [T( i )] q*T = arg max i [T( i )] ng dn truy tch nghch q*t = t+1( q*t+1) , t = T - 1, T - 2 , . . . , 1 Thut ton Baum-Welch Xc sut t(i, j) ca trng thi qi thi gian t v q j thi gian t +1 c cho trong chui quan st ca m hnh c xc nh nh sau: t(i, j) = P(q(t) = q i, q(t+1) = q j | O, ) Hnh sau m t cng vic tnh ton xc nh t(i, j) T ta xc nh c Xc sut ra ca dy s c xc nh nh sau: Xc xut ca trng thi qi thi gian t l: c lng Initial probabilities: Transition probabilities: Emission probabilities: In the above equation * denotes the sum over t so that o(t) = ok. Mt s thut ton khc : Ngoi cc thut ton mang tnh ti trn, chng ta cn c mt s thut ton khc tnh ton cho m hnh HMM nh sau: Thut ton hc Baldi-Chauvin (dng Grandient Descent). Thut ton hc Mamitsuka (kt hp gia Baum-Welch v Baldi-Chauvin cho php hc trn c negative examples). V d ng dng: T m hnh Markov n, ta c th ng dng n gi quyt trong mt s lnh vc nh:

Tm GEN trong chui DNA Nhim v ca m hnh HMM: Xc inh cc vng coding v non-coding trn mt xu khng gin nhn gm cc DNA nucleotides. ngha: M hnh ny s gip chng ta h tr nh du ( phn loi) d liu t ccgenome thu uc.Ngoi ra, n cn gip chng ta hiu thm cc c ch trong Gen transcription, splicing, ... u im ca vic s dng m hnh HMM trong tm gen y l: Phn lp: Phn lp cc quan st theo chui. Th t: Mt chui DNA l mt chui c th t. Vn phm :Vn phm (cu trc HMM) c xc nh:

tt ca m hnh: o bng s lng cc exons c nh du chnh xc D liu hun luyn: T cc chui DNA c nh du (bng cc phng php th cng khc).

You might also like