You are on page 1of 61

1

GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH


CHNG1
TMHIUVCCHTHNGNHNDNGTINGNI.
1.1 t vn
Ngy nay cng vi s pht trin ca khoa hc k thut, cc h thng my mc
ngy cng hon thin gip con ngi lm vic hiu qu hn v cng gii
phng con ngi trong rt nhiu cng on ca cng vic. Tuy nhin qu trnh
giao tip gia ngi v my vn cn nhiu bt cp phi thng qua cc thit b xut
nhp chun. ngy cng t nhin ho vic giao tip v iu khin my mc, vn
nghin cu cc phng php nhn dng ting ni c t ra v c mt s
phn mn ra i nhn dng ting anh tng i tt nh via voice ca IBM, spoken
toolkit ca CSLU(central of spoken language understanding),Speech Recognition
ca Microsoft
Tuy nhin i vi ting vit th cha c h thng nhn din hon chnh no
c xy dng, vn ny ch mi c quan tm nghin cu trong nhng nm
gn y. c mt s cng trnh nghin cu v lnh vc ny vi nhiu hng tip
cn khc nhau, song kt qu t c vn cn nhiu khim tn(v d nh phn
mm Vspeech ca nhm Sinh Vin i hc Bch Khoa TPHCM,VN voice ca
PGS.TS Lng Chi Mai Vin Cng Ngh Thng Tin..). Do , ny sinh vn
cn xy dng mt nn tng hon chnh tm ra mt hng i thch hp cho
nghin cu nhn dng ting ni nhm t c mt kt qu tt hn.
1.2 Tnh hnh pht trin
Vi s pht trin nhanh chng ca cc ngnh cng nghip iu khin, t ng
ho, lin quan ti lnh vc ngn ng. T cn phi c s lin kt cht ch gia
cc loi ngn ng(ngn ng khoa hc,my,) lm cho vic iu khin v x
l cng vic mt cch d dng v thun tin hn. Khi lng cng vic c th
c tng ln ng k nh nhng tin li trong trao i thng tin. V vy, x l
v nhn dng ting ni p dng vo cc lnh vc l yu cu cn thit c
ra.
2
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Trc s pht trin ca my tnh nhiu nh khoa hc ngh n vn
lm th no nhn dng c ting ni, chuyn ting ni thnh vn bn v
ngc li chuyn vn bn thnh ting ni lm cho con ngi c kh nng giao tip
vi my tnh bng ngn ng v t c th giao tip vi cc thit b khc bng
ting ni mt cch d dng. S pht trin ca my tnh lm cho vn x l v
nhn dng tng chng nh rt d dng nhng trn thc t th khng hon ton
nh vy.

Nhng nguyn nhn x l v nhn dng ting ni tr nn phc tp:
Trn th gii c rt nhiu loi ngn ng, mi quc gia, dn tc c mt
loi ngn ng ring nn rt kh to mt h thng nhn dng chung m bt k ai
cng c th s dng c.
Cu to thanh qun ca mi ngi cng rt khc nhau cho nn vic pht m
ca cng mt t ca nhng ngi khc nhau cng c s khc bit ln.
Nhn dng ting ni khng ch l nhn dng mt t hay mt s t m l
nhn dng mt chui cc t lin tc v cn phi xc nh cc t nhn dng c l
ng hay sai v mt ng php, ph hp vi nhng lnh thit kt hay khng, nu
khng th phi x l nh th no.
Bn thn ting ni khng n nh li c pht ra trong mi trng c
nhiu cng lm cho cc c trng ca ting ni b sai lch v s nh hng trong
qu trnh nhn dng.

1.3 Tnh hnh nghin cu v ng dng
1.3.1 Tnh hnh nghin cu nhn ging ting ni:
Ngy nay trn th gii c rt nhiu nghin cu v nhn dng ting ni
trong nhiu lnh vc khc nhau vi rt nhiu phng php rt a dng. Mi h
thng nhn dng v x l khc nhau hai c im:
- c trng ting ni c tch thc hin vic x l v nhn dng.
- Phng php p dng phn bit cc c trng ca ting ni.
3
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Cc phng php nhn dng ting ni ph bin:
Phng php ng m - m v hc (acoustic-phonetic approach)
Phng php ng m - m v hc da trn l thuyt m v: l thuyt ny khng
nh s tn ti hu hn v duy nht cc n v ng m c bn trong ngn ng ni
gi l m v, c phn chia thnh: nguyn m - ph m, v thanh-hu thanh, m
vang -m bt Cc m v c th xc nh bi tp cc c trng trong ph ca tn
hiu ting ni theo thi gian.
c trng quan trng nht ca m v l formant. l cc vng tn s c cng
hng cao nht ca tn hiu. Ngoi ra cn mt s c trng khc nh m vc (cao
- pitch), m lng
H thng nhn dng da trn phng php ny s tch cc c trng t tn hiu
ting ni v xc nh chng tng ng vi m v no. Sau , da vo mt t in
phin m, my s xc nh chui cc m v c kh nng l pht m ca t no
nht.
Xt kha cnh nguyn l, phng php c v rt n gin. Tuy nhin cc th
nghim trong thc t cho thy phng php cho kt qu nhn dng khng cao.
Nguyn nhn t nhng vn sau:
- Phng php cn rt nhiu tri thc v ng m hc, nht l cc tri thc
lin quan n c tnh m hc ca cc m v. M nhng tri thc ny
nhn chung cn cha c nghin cu y .
- formant ch n nh i vi cc nguyn m, vi ph m formant rt kh
xc nh v khng n nh. Hn na vic xc nh cc formant cho
chnh xc khng cao. c bit khi chu nh hng ca nhiu (l vn
thng xy trong thc t).
- Rt kh phn bit cc m v da trn ph, nht l cc ph m v
thanh. C mt s ph m rt ging nhiu (v d: /s/, /h/).
4
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Phng php tr tu nhn to (artifactial intelligence approach)
Phng php tr tu nhn to nghin cu cch hc ni v hc nghe ca con
ngi, tm hiu cc quy lut ng m, ng php, ng ngha, ng cnh v tch hp
chng b sung cho cc phng php khc nng cao kt qu nhn dng.
Chng hn c th thm cc h chuyn gia (expert system), cc lut logic m
(fuzzy logic) v ng m, m v vo cc h nhn dng ting ni da trn phng
php ng m-m v hc tng chnh xc cho vic xc nh cc m v (vn
c cp l rt kh nu ch s dng cc thng tin v m ph ).
Hay i vi cc h nhn dng mu, ngi ta ci tin bng cch vi mi i
tng cn nhn dng, h thng s chn ra mt s mu ging i tng nht, sau
s kim chng tip cc kt qu bng cc lut ng php, ng ngha, ng
cnh xc nh mu ph hp nht.
Hin nay ang c mt phng php tr tu nhn to trong nhn dng ting ni
c nghin cu rng ri l mng nron. Tu vo cch s dng, mng nron c
th coi l m rng ca phng php nhn dng mu hoc phng php ng m-
m v hc.
Phng php nhn dng mu (pattern recognition approach)
Phng php nhn dng mu da vo l thuyt xc sut - thng k nhn
dng da trn tng: so snh i tng cn nhn dng vi cc mu c thu
thp trc tm mu "ging" i tng nht.
Nh vy h thng nhn dng s tri qua 2 giai on:
Giai on hun luyn thc hin cc nhim v: thu thp mu,
phn lp v hun luyn h thng ghi nh cc mu .
5
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Ngun vo
Thu thp, tin x
l
Trch chn c
trng
D


l
i

u

h
u

n

l
u
y

n
Mu d liu hun luyn Phn lp

c

t
r

n
g
Hun luyn c
cu nhn dng
C cu nhn dng

Hnh 1.1 S giai on hun luyn ca phng php i snh mu


Giai on nhn dng: nhn vo i tng cn nhn dng, so
snh vi cc mu v a ra kt qu l mu ging i tng nht.
Ngun vo
Thu thp, tin x
l
Trch chn c
trng

i

t

n
g

c

n

n
h

n

d

n
g
c trng
C cu nhn
dng
T
h

n
g

t
i
n

p
h

n

l

p
La chn lp,
hu x l
Kt qu

Hnh 1.2 S giai on nhn dng ca phng php i snh mu

Phn ln cc h nhn dng thnh cng trn th gii l s dng phng php
ny. Phng php c nhng u im sau:
S dng n gin, d hiu, mang tnh ton hc cao (l thuyt xc sut
thng k, l thuyt my hc, )
6
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
t b nh hng ca nhng bin th v b t vng, tp c trng, n v
nhn dng, mi trng xung quanh
Cho kt qu cao. iu ny c kim chng trong thc t.
Nhn dng mu s dng m hnh Markov n l phng php c chn
thc hin ti nn s c trnh by chi tit phn sau ca bo co.

1.3.2 Kh nng ng dng:
H thng nhn dng ting ni l h thng cho php u vo l ting ni con
ngi. C 3 ng dng c bn ca h thng nhn dng ting ni:
Voice commands: ngi s dng ni vo mt lnh v my tnh thc hin
mt cng vic tng ng vi lnh .
Text dictation: ngi s dng c chnh t, cn my tnh s ghi li
nhng g ngi s dng c. i hi my tnh phi cho php c ton b t vng
trong ngn ng.
Speaker recognition: ngi s dng c vo mt cu cho sn v my
tnh s nh danh ngi dng da trn cc c tnh ca ging ni.

1.4 Tnh hnh tng quan trong v ngoi nc
Th gii:
Trn th gii c nhiu nh khoa hc cc trng i hc v cng ty ln u
t vo lnh vc nhn dng ting ni v c nhiu thnh tu ng k. Di y l
mt s d n nghin cu tiu biu.
CMU SPHINX: cn gi gn l SPHINX l mt h nhn dng ting
ni c pht trin ti i hc Carnegie Mellon, bao gm mt Engine nhn dng
tn l SPHINX v mt chng trnh hun luyn m v l SphinxTrain. K t nm
2001, nhm pht trin Sphinx m m ngun mt s thnh phn ca chng
trnh nhn dng bao gm Sphinx 2 v ti nay l Sphinx 4. Ngoi ra cn mt s
chng trnh khc km theo gm chng trnh hun luyn m hnh m v
(acoustic model), chng trnh bin dch m hnh ngn ng (language model) v
7
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
t din pht m (cmudict), l cc chng trnh cn thit s dng Sphinx. Chi
tit v Sphinx c th tham kho ti a ch http://cmusphinx.sourceforge.net.
Microsoft Speech Recognition: t nm 1993 sau khi tuyn m c
ngi ng u ca nhm nghin cu Sphinx X.Huang, t i hc Carnegie
Mellon, Microsoft b thu ht vo lnh vc nghin cu nhn dng ting ni v
chuyn vn bn sang ting ni(text - to - speech). Cng ty ny sau xy dng
Speech API (SAPI), mt giao din lp trnh dnh cho nhng ai mun pht trin
ng dng lin quan n ging ni trn mi trng Windows. Hin ti c SAPI
5.4, chi tit c th tham kho ti
http://www.microsoft.com/speech/speech2007/default.mspx.
Julius: l mt d n m ngun m nghin cu pht trin h nhn
dng ting ni vi b t vng ln (khong 60000 t) da trn m hnh Markov n
ph thuc ng cnh. Mc tiu chnh ca Julius l pht trin h nhn dng ting ni
lin tc vi b t vng ln dnh cho ting Nht. y l mt chng trnh nhn
dng m ngun m c kh nng nghin cu m rng cho nhiu ngn ng khc
nhau, chi tit ti a ch http://julius.sourceforge.jp/en_index.php.
Dragon: y l mt sn phm thng mi ni ting trong lnh vc
nhp ging v tng hp ting ni c cng ty NUACE. Cng ty ny pht trin
h nhn dng ca ring mnh v cho ra i nhiu phin bn Dragon khc nhau p
ng cc nhu cu khc nhau nh Dragon cho lnh vc y khoa, Dragon phc v hc
tpTuy nhin v y l mt chng trnh m ng nn khng c kh nng
nghin cu. Chi tit tham kho ti http://www.nuance.com/dragon/index.htm.
Trong nc:
Mc d th gii c nhiu cng trnh nghin cu v ng dng mang li thnh
cng nh vy, nhng do nhiu l do m trong nc cha c mt h nhn dng
ting ni no c pht trin mt cch hon chnh cng nh rt ti nghin
cu v lnh vc ny. c bit n nhiu nht trong nhn dng ting Vit l
chng trnh Vspeech ca nhm sinh vin ti trng i hc Bch khoa TpHCM.
y l chng trnh ot gii thng Tr tu Vit Nam nm 2004. Vspeech l
8
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
mt chng trnh hot ng da trn b my nhn dng ting ni ca Microsoft
tch hp sn trong Windows.
Cch hot ng ca Vspeech ht sc n gin, da trn s tng i ging
nhau khi phin m latin ca mt s t ting Anh v ting Vit, nhm Vspeech
s dng chng trnh nhn dng ting ni ca Microsoft cung cp, thc hin thao
tc nhn dng sau nh x t ting Anh nhn c sang t ting Vit c phin
m latin gn ging nht.
V d, t hai trong ting Vit pht m gn ging vi hi- ngha l xin cho-
trong ting Anh. Khi mt ngi ni hai, b nhn ging ting ni ca Microsoft
s cho kt qu l hi, t kt qu ny, Vspeech s nh x hi vi hai trong
ting Vit v cung cp kt qu cui cng ca thao tc nhn dng l hai, ng vi
t m ngi ni mong mun.
Vi cch lm ny, cng vic ch yu ca nhm Vspeech l tm ra nhng t
ting Anh c phin m gn ging vi t ting Vit, sau ch cho php b nhn
dng ting ni (ting Anh) ca Microsoft chp nhn cc t ny l kt qu nhn
dng cui cng. B t vng ny s c nh x vi mt b t ting Vit cng
phin m tng ng. Vic cn li l t kt qu l t ting Anh c c t b nhn
dng ca Microsoft, Vspeech s nh x sang t ting Vit tng ng v a ra kt
qu nhn dng l t ting Vit ny.
u im trong cch lm ny l khng cn phi xy dng mt b nhn dng
ting ni m s dng li ci c sn, thi gian xy dng ng dng nhanh.
Tuy nhin, cch lm ny l khng bn vng v:
- Khng phi t ting Vit no cng c t ting Anh pht m gn ging.
- Nhng t ting Vit ch khc du pht m gn ging nhau s kh phn
bit trong qu trnh nhn dng.
- Khng nhn dng c khi ni nhiu t lin tc.
Nhng nhc im trn l bi v b nhn dng ting ni ca Microsoft l mt
b nhn dng ting ni dnh cho ting Anh v mt s ngn ng khc m khng h
tr ting Vit, trong khi c im v ng m, ng php, m hnh ngn ng ca cc
9
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
ngn ng khc nhau l khc nhau. Do m chnh xc khi nhn dng l khng
cao cng nh kh nng p ng thp.
V l do ny m nhm thc hin ti nghin cu chi tit v cc k thut s
dng trong nhn dng ting ni ni chung v cc h nhn dng cc ngn ng khc
c m m ngun, tng bc xy dng mt h nhn dng ting ni dnh cho
ting Vit.
















10
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
CHNG2
CSLTHUYT
Chng ny s trnh by s qua v cc vn l thuyt c bn v quan trng
c s dng khi xy dng mt h nhn dng ting ni trong chng ti.
2.1. L THUYT NG M
2.1.1. L thuyt v ting ni
Mc ch ca ting ni l truyn thng tin. C mt s c trng cho vic
truyn ting ni. Theo l thuyt thng tin, ting ni c th biu din di dng ni
dung thng bo hoc thng tin. Mt cch c trng khc l ting ni biu din
di dng tn hiu mang thng tin thng bo. Mc d cc quan im l thuyt ca
thng tin ng vai tr ch o trong cc h thng truyn tin phc tp, ta s thy l
biu din ting ni da trn dng sng hoc m hnh tham s c s dng chnh
trong cc ng dng thc t.
xt qu trnh thng tin ting ni, u tin nn coi thng bo nh mt dng
tru tng no trong u ngi ni. Qua qu trnh phc tp to m, thng tin
trong thng bo ny c chuyn trc tip thnh tn hiu m hc. Thng tin thng
bo c th c biu din di mt s dng khc nhau trong qu trnh to ting
ni. Chng hn, thng tin thng bo lc u c chuyn thnh tp hp cc tn
hiu thn kinh iu khin c ch pht m ( l chuyn ng ca li, mi, dy
thanh m). B my pht m chuyn ng tng ng vi cc tn hiu thn kinh
ny to ra dy cc iu b, m kt qu cui cng l dng sng m cha thng
tin trong thng bo gc.
Thng tin c thng bo bng ting ni v bn cht l ri rc, c th biu
din bi vic dn cc phn t mt tp hp hu hn cc k hiu. Cc k hiu m
mi m c th c phn loi ra gi l cc m v (phoneme). Mi ngn ng c tp
hp cc m v ring ca n, con s mu mc l khong t 30 n 50. V d ting
Anh c th biu din bng khong 42 m v, ting Vit khong 33 m v ( 12
nguyn m : a, , , o, u v 21 ph m: k, l, m, ph ).
11
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Trong l thuyt thng tin ngi ta cn xt tc truyn thng tin. Vi ting
ni, lu n cc gii hn vt l ca tc chuyn ng ca b my pht m,
nh gi th ca tc thng tin l con ngi to ra ting ni vi tc trung
bnh khong 10 m v trong 1 giy. Nu mi m v biu din bng mt s nh phn
th m s 6 bit l qu biu din tt c cc m v ting Anh. Vi tc trung
bnh khong 10 m v trn giy v b qua tng tc gia cp m v lin k, ta c
c lng 60 bit/giy cho tc thng tin trung bnh ca ting ni. Ni cch khc
l lng vit ra ca ting ni cha thng tin tng ng vi 60 bit/giy tc
ni chun. D nhin, cn di ca ni dung thng tin xc thc trong ting ni c
coi l cao hn tc ny. c lng trn khng tn n cc nhn t nh trng thi
ca ngi ni, tc ni, m hng ca ting ni,v. v
2.1.2. Mt s c im ng m ting Vit
Mt c im d thy l ting Vit l ngn ng n m (monosyllable - mi t
n ch c mt m tit), khng bin hnh (cch c, cch ghi m khng thay i
trong bt c tnh hung ng php no). Ting Vit hon ton khc vi cc ngn
ng n-u nh ting Anh, ting Php l cc ngn ng a m, bin hnh.
Nhn v mt ghi m: m tit ting Vit c cu to chung l: ph m-vn. V d
m tin c ph m t, vn in. Ph m l mt m v v m v ny lin kt rt lng lo
vi phn cn li ca m tit (hin tng ni li).
Vn trong ting Vit li c cu to t cc m v nh hn, trong c mt
m v chnh l nguyn m.
Hnh sau l ph tn hiu ca m tit ba. Chng ta c th quan st v phn bit
r min nhiu nn, min ph ca ph m b v nguyn m a (min m hn l c
mt nng lng ln hn).
12
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH

Hnh 2.1: Ph tn hiu ca m tit ba, c min nhiu nn (silence), min tn hiu ca ph m
/b/ v nguyn m /a/ (min m hn l c mt nng lng ln hn).

Quan st ph cc m tit tng t chng ta c th rt ra kt lun: cc ph m
v nguyn m u phn bit vi nhau rt r qua s phn b nng lng ti cc
min tn s, v d: ph m tn s thp, nng lng nh, nguyn m c nng
lng ln c vng tn s cao. Vng khng c tn hiu ting ni (nhiu nn v
khong lng) c nng lng thp v ch tp trung cc tn s rt thp.
Cc nguyn m c tn ph (spectrum) khc nhau kh r. Hnh sau minh ho s
khc nhau v ph ca 5 nguyn m c bn. Min m l min c mt nng
lng cao.




13
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH

Hnh 2.2: S khc nhau v ph ca 5 nguyn m c bn. Min m l min
c mt nng lng cao (vng c formant).

Theo tc gi on Thin Thut , xt v mt ng m-m v hc m tit ting
Vit c lc nh sau:
Thanh iu
m u Vn
m m m chnh m cui

Lc cho thy m tit ting Vit c cu trc r rng, n nh. Lc cn
cho thy ting Vit l ngn ng c thanh iu. H thng thanh iu gm 6 thanh:
bng, huyn, sc, hi, ng, nng.
Thanh iu trong m tit l m v siu on tnh (th hin trn ton b m tit).
Do c trng v thanh iu th hin trong tn hiu ting ni khng r nt nh
cc thnh phn khc ca m tit.
14
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
S khc bit v cch pht m ting Vit rt r rt theo gii, la tui v c bit
l theo v tr a l (ging min Bc, min Trung v min Nam khc nhau rt
nhiu).
2.1.3. L thuyt v m v
2.1.3.1. nh ngha m v
C nhiu cch nh ngha m v khc nhau:
Theo cun Ng m hc ting Vit hin i, NXBGD 1972, nhm tc gi C
nh T th m v l n v nh nht ca ng m c mang chc nng phn bit
ngha v nhn din t.
Theo cun ng m ting Vit ca on Thin Thut th m v l tng th cc
nt khu bit, c xut hin ng thi (c con ngi tri gic theo trt t trc
sau) v c chc nng khu bit v m thanh ca t hoc hnh v.
Theo Gio s Cao Xun Ho, nhng cch nh ngha trn cn c nhng ch
cha tha ng: mang tnh cht n tng ch ngha, c s lm ln v cch tri gic
tnh ng thi, k tip v theo ng th m v l n v khu bit m thanh nh
nht c th tham gia vo th i lp m v hc v trt t thi gian, hoc m v l
n v m v hc tuyn tnh nh nht.
Theo inh L Th v Nguyn Vn Hu, ngi ta thng nh ngha m v l
n v nh nht ca c cu m thanh ngn ng, dng cu to v phn bit hnh
thc ng m ca nhng n v c ngha ca ngn ng t v hnh v. V d: cc t
ti v i, ta v a trong ting Vit phn bit nhau bi cc m v /t/ v //. Nu
thay m v ny bng mt m v khc trong cng mt m tit s lm cho m tit
thay i v ngha hoc mt ngha. V d, ta c t ton, nu thay m v /t/ bng
m v /h/, th s c hon c ngha khc.

15
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
2.1.3.2. Chc nng ca m v
Cc m v trn nguyn tc nht nh phi c s khc nhau, t nht l v mt c
trng no . Chnh nh s khc bit ny m cc m v c th to ra s khc bit
v hnh thc m thanh ca hnh v v t, to nn nhng tn hiu khc bit i vi
s cm th ca con ngi. Theo , m v c 2 chc nng c bn: chc nng khu
bit v m thanh ca hnh v v t, v chc nng cu to nn nhng thnh t ca
n v c ngha.
2.1.3.3. Tch m v v xc nh m v trn chui sng m
S lng t trong mi ngn ng u rt ln. Xy dng h nhn dng theo t c
kh nng phn bit s lng t qu l mt thch thc. Thay vo , ngi ta
ngh ra cch xy dng h nhn dng da trn hng tip cn m v. Theo , ch
cn nhn dng khong vi chc m v s c th nhn dng c ton b t ca mt
ngn ng (Theo thng k, s m v trong mt ngn ng dao ng t khong 20
n 60). V y cng chnh l hng tip cn ng n cho nhn dng ting ni
ting Vit. Tuy nhin, ngi ta li gp kh khn khc, l tch m v v xc nh
m v trn chui sng m: Cho mt dy tn hiu ting ni thu sn. Nhim v
ca chng ta l tm ranh gii ca tt c cc m v v cho bit m v l m v g.
Cha ni n kh nng thc hin cng vic bng my, ngay c con ngi vn
c th b nhm ln khi tch m v bng tay do ranh gii gia cc m v thng
mp m v hay chng lp ln nhau. C khi 2 m v thuc 2 m tit khc nhau, nu
ng k nhau vn c th xy ra s chng lp. V vy, cng vic tch m v trn
chui sng m ch mang tnh tng i.
Nhng thun li v kh khn i vi nhn dng ting ni ting Vit
Thun li
Nhng c im ng m ting Vit cho thy nhn dng ting ni ting Vit c
mt s thun li sau:
16
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
- Ting Vit l ngn ng n m, s lng m tit khng qu ln. iu ny
s gip h nhn dng xc nh ranh gii cc m tit d dng hn nhiu. i vi h
nhn dng cc ngn ng n-u (ting Anh, ting Php...) xc nh ranh gii m
tit (endpoint detection) l vn rt kh v nh hng ln n kt qu nhn
dng.
- Ting Vit l ngn ng khng bin hnh t. m tit ting Vit n nh, c
cu trc r rng. c bit khng c 2 m tit no c ging nhau m vit khc
nhau. iu ny s d dng cho vic xy dng cc m hnh m tit trong nhn
dng; ng thi vic chuyn t phin m sang t vng (lexical decoding) s n
gin hn so vi cc ngn ng n-u. Vic chuyn t phin m sang t vng cng
l mt vn kh khn trong nhn dng cc ngn ng n-u.
Kh khn
Ngoi nhng thun li trn, nhn dng ting ni ting Vit cng gp rt nhiu
kh khn nh sau:
- Ting Vit l ngn ng c thanh iu (6 thanh). Thanh iu l m v siu
on tnh, c trng v thanh iu th hin trong tn hiu ting ni khng r nt
nh cc thnh phn khc ca m tit.
- Cch pht m ting Vit thay i nhiu theo v tr a l. Ging a phng
trong ting Vit rt a dng (mi min c mt ging c trng).
- H thng ng php, ng ngha ting Vit rt phc tp, rt kh p dng
vo h nhn dng vi mc ch tng hiu nng nhn dng. H thng phin m
cng cha thng nht.
- Cc nghin cu v nhn dng ting Vit cng cha nhiu v t ph bin.
c bit kh khn ln nht l hin nay cha c mt b d liu chun cho vic
hun luyn v kim tra cc h thng nhn dng ting Vit.


2.2. L thuyt x l tn hiu s
17
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Trong cc h thng truyn ting ni, tn hiu ting ni c truyn i, lu gi
v x l bng nhiu cch. Cc gii php k thut cho ta nhiu cch biu din tn
hiu ting ni. C 2 cch chnh:
- Lu gi ni dung thng bo trong tn hiu ting ni.
- Biu din tn hiu ting ni di dng thun tin truyn i hc lu gi,
hoc di dng linh ng c th sa cha m khng nh hng n ni dung
thng bo.
Biu din tn hiu ting ni phi lm sao cho ni dung thng tin c th d
dng lnh hi c bi ngi nghe hoc bng my t ng. Trong vic thit k v
x l cc biu din ny, cc phng php x l tn hiu ng vai tr c bn.
2.2.1 X l tn hiu
Trong trng hp tnh hiu ting ni, ngi ta coi ngun thng tin, o c
hoc quan st, ni chung, l c dng sng m. X l tn hiu bao gm trc ht l
nhn c biu din tn hiu da trn m hnh cho v sau l dng bin i
mc cao hn t tn hiu v dng tin dng hn. Cc thao tc v x l thng tin
c v nh hnh di y.

Bc cui cng ca x l tn hiu l trch ra v s dng thng tin. Bc ny c
th c thc thc hin hoc bi ngi nghe hoc t ng bng my. Ly v d l
18
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
h thng c chc nng nhn bit t ng ngi ni t mt tp hp ngi cho,
c th s dng biu din ph ph thuc thi gian ca tn hiu ting ni. Mt bit
i tn hiu c th dng l ph trung bnh mt cu y , so snh ph trung
bnh vi ph trung bnh lu tr ca mi ngi ni, ri sau da trn s o
tng t ca ph m nhn bit ngi ni. v d ny, thng tin trong tn hiu
dng nhn dng ngi ni. Thao tc tnh ton s o tng t, tm cu ph hp
nht vi tn hiu ting ni lin quan n n tnh ton xc sut v thng k
c trnh by phn tip theo ca chng.
2.2.2 X l tn hiu s
X l tn hiu s tp trung vo 2 vic chnh l nhn c cc biu din ri rc
ca tn hiu v l thuyt, thit k, thc hin cc th tc s x l cc biu din
ri rc ny. i tng ca x l tn hiu s l nhn bit cc i tng trong x l
tn hiu tng t. Cu hi t ra l v sao cc k thut x l tn hiu s li c
dng nghin cu thng tin ting ni? u tin v quan trng nht l cc hm x
l tn hiu phc tp c th thc hin bng cch dng k thut s. Khi x l tn hiu
ting ni, nhiu trng hp khng th coi cc h thng ny l cc h thng xp x
ca cc h thng tng t.
Cc k thut x l tn hiu s lc u c dng trong cc h thng x l ting
ni nh m phng cc h thng tng t phc tp. Quan im lc u l phi m
phng cc h thng tng t trn my tnh trnh vic xy dng cc h thng
thc nghim. Xc tc chnh l s pht trin ca my tnh nhanh hn v cc tin b
nhanh trong l thuyt k thut x l tn hiu s. Nh vy, cc h thng x l tn
hiu s c kh nng m phng cc h tng t tt, cng thm vi cc pht trin v
l thuyt v phn cng s lm tng u th ca cc h thng x l tn hiu s so vi
cc h thng tng t. Cc h thng s tr nn ng tin cy v rt cht ch.
Cng ngh cho php cc h thng cc k phc tp c th hot ng trn mt
chip n, cc thnh phn logic nhanh s ln cc tnh ton thc t trong
19
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
nhiu hm x l tn hiu c th thc hin trong thi gian thc v tc mu
ting ni.
C nhiu l do khc dng k thut s trong cc h thng thng tin ting ni.
Chng hn, nu m ha c dng, ting ni di dng s ha c th truyn i
mt cch tin cy trn cc knh rt n. Cng vy, nu tn hiu ting ni dng s
th n ng nht vi d liu ca cc dng khc. Do vy, mt h thng c th dng
truyn tn hiu ting ni cng cc tn hiu khc m khng cn phn bit chng
tr vic gii m. Ngoi ra, v yu cu bo mt vic truyn tn hiu ting ni, biu
din s c u th hn hn so vi cc h thng tng t. Vi cc l do trn v
nhiu l do khc na m cc k thut s c s dng ngy cng nhiu trong cc
bi ton v ting ni.
2.2.3 C s x l tn hiu s
Trong hu ht tnh hung x l hoc truyn thng tin, ngi ta phi bt u
bng vic biu din tn hiu nh mu bin i lin tc. Sng m pht ra cng c
bn cht nh vy. V mt ton hc, c th biu din cc mu bin i lin tc nh
vy l hm ca bin lin tc t biu din thi gian. Trong phn l thuyt ny, k
hiu x(t) cho dng sng thi gian bin i lin tc. Cng c th biu din tn hiu
ting ni nh cc dy s, k hiu x(n) dng biu din dy s.
Khi nghin cu v cc h thng x l tn hiu ting ni ngi ta s s dng
mt s dy s c bn nh dy xung n v, dy bc n v, dy ly tha



20
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
X l tn hiu i hi bin i tn hiu thnh dng mong mun theo mt ngha
no . Lp cc h thng bt bin dch chuyn tuyn tnh (LSI Linear Shift
Invariant) thng c dng trong x l ting ni. Cc h thng ny c c
trng hon ton bi p ng ca chng vi u vo. Vi cc h thng ny, u ra
y(n) c t tnh c t u vo x(n) v p ng ca mu n v h(n) theo tch
chp:

y * l k hiu tch chp ri rc (discrete convolution). Biu thc tng
ng l:

Cc h thng LSI thng dng lp cc php lc trn cc tn hiu ting ni
v c bit l trong cc m hnh to ra ting ni.
Khi c c cch biu din ting ni dng s, ngi ta tin hnh cc thao
tc x l tn hiu bng cc php bin i nh bin i Z (Z Transform ZT), bin
i Fourier (Fourier Transform FT), bin i Fourier ri rc (Discrete FT
DFT) L thuyt v x l tn hiu s c th tm thy cc gio trnh mn X l
tn hiu s.








CHNG 3
21
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
M HNH HMM V NHN DNG TING NI
3.1. Gii thiu
Trong phn ny cc phn c bn ca m hnh HMM s c gii thiu. Cc
nh ngha, cc tp hp tham s, cc vn thit yu, cc thut ton chnh ca m
hnh HMM p dng vo nhn dng ting ni s c gii thiu. Tip cc phn
chnh ca h nhn dng, cc cng on chnh ca nhn dng da vo m hnh
HMM cng s c gii thiu s qua
3.2. Nhng vn c bn ca HMM
M hnh HMM l m hnh thng k thng xuyn c s dng m hnh
ha ting ni, s dng cho cng vic nhn dng. Cu trc bn trong ca HMM
khng phi da vo kin thc v ting ni nhng n li c s dng trong nhn
dng tnh ton cc c trng s ca ting ni.
3.3. Cc kin thc ton hc ca HMM v cc vn khi s dng HMM vo
nhn dng ting ni
M hnh Markov n HMM l mt m hnh thng k da vo m hnh Markov.
V vy hiu c m hnh HMM chng ta s xem xt qua v m hnh Markov
v m hnh thng k ni chung.
3.3.1. Gii thiu v nhn dng thng k v HMM
Qu trnh thng k l qu trnh xc nh xc sut ca mt s s kin v xc
sut mi quan h gia cc s kin trong mt tin trnh ti cc thi im khc nhau.
Bin c ngu nhin v hm mt xc sut
Gi X l tp cc bin ngu nhin , X = {X
1
,X
2
, , X
n
} m tn ti t nht mt
s kin ca tp X ny xut hin. Gi xc sut ca s kin X
i
l P(X
i
). Khi ta c
xc sut ca X l P(X) = 1. Nu gi P(X
i
) l xc sut ca bin c X
i
th ta c:


( ) ( )
1
i
P X P X = =

(3.1)
22
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH

Xc sut c iu kin:
Gi A v B l cc bin ngu nhin. Xc sut c iu kin c m t nh
sau:
Gi s s kin A xy ra vi xc sut P(A). Xc sut ca s kin A vi
iu kin s kin B xy k hiu l P(A|B) c tnh nh sau:


Trong nhn dng thng k, hai hm xc sut c iu kin sau thng c
dng:
- P(x|C
i
): hm mt xc sut ca o vector x thuc v lp C
i,
vi
gi thit l xc sut P(C
i
) c bit trc hay c th c lng c.
- P(C
i
|x): vi iu kin x xy ra, xc nh xc sut xut hin ca lp ri
rc C
i
(cha bit), thng qua P(C
i
): xc sut ca lp C
i
v P(x|C
i
).
Mt tin trnh c gi l tin trnh Markov ( Markov process) nu xc sut
ca mt s kin ti mt thi im b rng buc bi cc s kin trong qu kh xc
nh. Nh vy, m hnh Markov l mt m hnh thng k thun ty. Trng thi l
mt nh ngha gip chng ta hiu c s bin i ca s kin theo thi gian.
Bin c chnh trong tin trnh Markov l t trng thi i ti thi im t. Mt
tin trnh Markov c s dng trong cc h thng nhn dng ting ni t ng
(ASR) tun theo cc iu kin sau:
1. Ch tn ti mt thi im phn bit trong chui thi gian t = 1,2,, T;
2. Ch c mt s lng trng thi xc nh {s
t
} ={i}, i = 1,2,,n;
( )
( | ) ( ) 0
( )
P AB
P A B P B
P B
= =
(3.2)
23
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
3. Trng thi hin ti v qu kh ch l cch nhau mt bc:


Nhim v chnh ca tin trnh Markov l to ra chui trng thi S=s
1
,s
2
,...,s
T
.
M hnh Markov n HMM c pht trin da trn m hnh Markov. Cc kha
cnh chnh yu v s dng m hnh HMM trong nhn dng ting ni s c trnh
by di y.
3.3.2. Cc thnh phn chnh ca HMM
Thnh phn chnh ca HMM gm tp hp trng thi v cc tp hp tham s
chnh.


Hnh 3.1 M hnh HMM 6 trng thi

Tp hp cc trng thi v ba tp hp cc tham s xc nh 1 m hnh HMM.
N c trng cho mt tin trnh khng bit trc th t c m t nh trong
[Hnh 3.1].
( ) ( )
1 2 1
| , ,..., |
t t t t k t t
P s s s s P s s

=
(3.3)
24
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
u tin S={s
i
, i = 1,, n} (vi n l s trng thi ca HMM) l tp hp
cc trng thi ca m hnh HMM.
Tp hp tham s th nht c gi l xc sut chuyn trng thi
(transition probabilities) c nh ngha nh sau:


Tp hp cc xc sut khi to m hnh HMM i: ={
i
}.
Tuy nhin trong nhn dng ting ni cc m hnh HMM ch yu c khi
to ti trng thi u tin nn ta b qua tp xc sut khi to m hnh HMM.
ngha ca a
ij
l: xc sut chuyn trng thi t trng thi i ti thi im t-1
sang trng thi j ti thi im t. Chng ta gi ma trn A = {a
ij
} l ma trn chuyn
trng thi. i vi m hnh Markov, khi trng thi trc c xc nh th xc
sut chuyn trng thi ti trng thi tip theo cng hon ton xc nh.
p dng HMM vo cc sng m thanh, mt hm thng k s c gn
vo cc trng thi. u tin sng m ting ni s c chuyn i thnh chui cc
vector c trng theo thi gian (s c gii thiu trong phn rt trch c trng
sng m). Chui cc vector c trng ny gi l chui cc vector quan st
O=o
1,
o
2
,, o
T
, vi o
t
l vector c trng cho chui sng m ti t. Xc sut m
hm thng k ni trn tnh l xc sut vector c trng o
t
t trng thi j ti thi
im t. Xc sut ny gi l xc sut quan st hay xc sut output v n c trng
cho tp tham s th 2 ca HMM:


t B={b
j
(o
t
)}.
( )
ij 1
|
t t
a P s j s i

= = =
(3.4)
( ) ( )
| ( 1, 2,..., ; 1, 2,..., )
j t j t
b o P o s j t T j n = = = =
(3.5)
25
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Tng qut m hnh HMM c th khi to ti bt k trng thi no, xc sut
m hnh HMM khi to ti trng thi i k hiu l
i
. Tp xc sut ={
i
}, gi l
tp xc sut khi to m hnh HMM, l tp trng thi th 3 ca m hnh HMM.
Tuy nhin trong bi ton nhn dng ting ni, hu ht cc m hnh HMM u khi
to ti trng thi u tin nn chng ta khng cn ti tp xc sut th 3 ny.
Nh vy trong nhn dng ting ni mt m hnh HMM s c c trng
bi tp trng thi v 2 tp tham s A v B ni trn: = (A,B).
3.3.3. V d v nhn dng t n da trn HMM
Gi s chui sng m ca t cn nhn dng c m ha bi chui vector
c trng (chui quan st) O=o
1
,o
2
,,o
T
vi o
i
l vector c trng ti thi im t.
Vic nhn dng ting n l i tnh:


C ngha l tm t c xc sut P(w
i
|O) ln nht, vi w
i
l t (i vi ting
Vit l m tit) trong danh sch t cn nhn dng (tng qut l t trong t in),
trong nhn dng t ny c m hnh ha bi mt s m hnh HMM. Xc sut
ny khng th tnh c trc tip nhng c th dng nh lut Bayes:


y xc sut P(O) l hng s khi cho O trn tt c cc t c th w
i
. Xc
sut P(w
i
) ch lin quan n m hnh ngn ng (s gii thiu trong phn sau),
chng ta cng c th xem nh l hng s. Nh vy vn ca vic nhn dng ch
l vic tnh xc sut P(O|w
i
).
( ) { }
arg ax w |
i
m P O (3.6)
( )
( ) ( )
( )
| w w
w |
i i
i
P O P
P O
P O
=
(3.7)
26
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
V d, trong mt m hnh nhn dng n gin nht, mi m tit w
i
s c
m hnh ha bi m hnh HMM
i
, nh vy vic tnh xc sut P(O|w
i
) c th qui
v tnh xc sut P(O|
i
).
Trong trng hp nhn dng m v mi m tit w
i
khng tng ng vi mt
m hnh
i
. Mi m tit tng ng vi mt dy cc m v, mi m v c m
hnh ha bi mt m hnh HMM nn mi m tit c m hnh ha bi dy cc
HMM m v
1
,
2
, ,
k
. V vy vic tnh xc sut P(O|w
i
) c qui v tnh xc
sut P(O|
1

2

k
).
3.3.4. Hai gi thuyt c bn xy dng h thng nhn dng da trn
HMM
Gi thuyt 1: gi thuyt v xc sut chuyn trng thi:
Gi thit th nht da trn chnh m hnh Markov. V xc sut chuyn trng
thi a
ij
ch ph thuc vo trng thi pha trc n, khng h ph thuc vo chui
quan st nn ta c th xem a
ij
nh l hng. Xc sut chuyn trng thi xut hin ti
cc thi im khc nhau v cc trng thi khc nhau nn ta c th xem chng l
c lp. V vy vic tnh xc sut ca mt chui trng thi w
i
trong mt m hnh
HMM tng t c th qui v:


Gi thuyt 2: Gi thuyt v xc sut quan st:
Gi chui trng thi ca m hnh HMM ng vi chui vector quan st O l
chui quan st. Gi vic tm chui trng thi trong m hnh HMM c xc sut cao
nht ng vi 1 chui quan st O gi l quan st O. T nh ngha ca xc sut
quan st b chng ta c th thy rng xc nh chui quan st O (xc nh chui
trng thi ng vi chui vector c trng O) cng l mt tin trnh thng k. Tuy
nhin n khng phi l chui Markov. Khi mt HMM tin hnh m hnh ha mt
( ) ( )
1 ( 1) ( )
| |
T T
t t s t s t
i i
P S P s s a

= =
[ [
(3.8)
27
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
chui quan st ca ting ni (tc xc nh chui quan st ng vi chui vector c
trng sng m), mi trng thi c th pht sinh ti bt k vector quan st no (vi
mt s rng buc), nhng vi xc sut khc nhau. Do s khng bit c trng
thi no s gn vi vector no. Nh vy s khng bit c chui trng thi
S=s
0
s
1
...s
T
no s pht sinh ti chui vector c trng O cho trc. y chnh l
iu tin trnh Markov ny c gi l n. Mc d xc sut ca s kin quan st
vector o
t
(xc nh trng thi ca O
t
) ti cc thi im t khc nhau l c lp v
hon ton xc nh c (iu ny l rt cn thit i vi tt c cc qu trnh tnh
ton da trn HMM).
V vy nn gi thit th 2 l:
Cho tp hp trng thi S ca m hnh HMM , ta c:


Vi b
st
(O
t
) l xc sut ca vector c trng O
t
t trng thi s ti thi im t.
Vi P(S|) v P(O|) c tnh ton theo 2 gi thit trn ta c th tnh xc
sut kt ca O v S c to ra bi m hnh nh sau:


Tuy nhin trong thc t chui trng thi S li khng bit, v vy tnh
P(O|) phi tnh tng trn cc chui S c th c:



( ) ( )
1
| ,
t
T
s t
t
PP O S b O
=
=
[
(3.9)
( ) ( ) ( )
, | | | , P O S P S P O S =
(3.10)
(3.11)
( ) ( ) ( ) | | | ,
S
P O P S P O S =

28
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH




V vy ta c:



Trong mt s ti liu c thm xc sut nhng HMM lun khi to t trng
thi u tin nn c th b qua xc sut khi to .
Theo cng thc trn, vi s trng thi ca HMM l N th xc sut ny c n
2*T*N
T
php nhn. Thm ch vi N=5 v T=100, cng thc trn cha xp x 10
70
php nhn. phc tp ny qu ln v vy cn c thut ton hiu qu hn xc
nh P(O|).
3.3.5. Ba vn thit yu ca HMM v cc cch gii quyt
Gi s cho mt HMMs =(A, B) v chui quan st O= O
1
, O
2
,, O
n
Tt c cc ng dng da trn m hnh Markov n s phi gii quyt 3 vn
c bn sau:
i. Vn tnh ton: Lm sao tnh c P(O|) (xc sut ca
vic xut hin chui quan st O trong m hnh HMMs) vi khi lng tnh
ton ti thiu.
ii.Vn c lng tham s cho HMM: Vn ny c t ra
trong qu trnh hun luyn. Chng ta s xc nh c cc m hnh da
( ) ( ) ( )
( ) ( ) ( )
1 1 2 2 1
1 2
| | , |
...
T T T
S
s s s s s s s T
s
P O P O S P S
b O a b O a b O

=
=

(3.14
)
(3.12)
(3.13)
( ) ( ) ( ) ( )
1 2
1 2
| , ...
T
S S S T
P O S b O b O b O =
( )
1 2 2 3 1
| ...
T T
S S S S S S
P S a a a

=
29
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
vo d liu hun luyn. Lm sao xc nh c cc tham s ca m hnh
HMM =(A,B) sao cho P(O|S,) (hoc P(O,S|) l ln nht.
iii. Vn Decoding(nhn dng): Cho mt m hnh HMM
c hun luyn. Lm sao tnh c chui trng thi S= s
1
, s
2
,, s
T
tng
ng vi chui quan st O P(O,S|), sao cho xc sut ca chui trng thi
ca chui quan st O trng vi chui S l ln nht.
Trc khi nu cch gii quyt 3 vn trn xin trnh by 2 thut ton c lin
quan l gii thut forward-backward.
Gii thut Forward
Hm forward
t
(i) c nh ngha nh sau:


Vi
t
(i) l xc sut kt ca vic quan st t vector sng m u tin t
trng thi i ti thi im t.

t
(i) c th c tnh qui nh sau:
1.
t
(i) =
i
b
i
(O
1
), 1iN (N l s trng thi s)
2. vi t = 1,2, ,T-1, 1N


Nh vy c th tnh c xc sut:


( ) ( )
1 2
, ,..., , |
t t t
i P O O O s i o = =
(3.15)
1 ij 1
1
( ) ( ) ( )
N
t t j t
i
j i a b O o o
+ +
=
(
=
(

(3.16)
( )
1
| ( )
N
T
i
P O i o
=
=

(3.17)
30
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Gii thut Backward
Hm backward
t
(i) c nh ngha nh sau:


ngha ca hm ny nh sau: n l xc sut kt ca chui vector sng m
th i+1 n vector sng m th T vi trng thi i ti vector th t v m hnh . N
khc trng hp gii thut forward l trng thi ti vector c trng th t (l trng
thi i) l bit trc.
C th tnh
t
(i) qui nh sau:
1. Vi t = T:

t
(i) = 1 1 i N
2. Vi t = T-1, T-2, , 1 ta c


Nh vy ta c:


3.3.5.1. Gii quyt vn tnh ton
Vn ny c gii quyt bng thut gi Forward. p dng thut gii
Forward, chng ta ch mt N
2
T, so vi 2T*N
T
ca phng php tnh trc tip. V
d vi trng hp N=5 v T=100 th my tnh ch phi thc hin khong 3000
php nhn thay v 10
70
php nhn nh ca phng php trc tip.
ij 1
1
( ) ( ) ( )
N
t j t t
j
i a b O i | |
+
=
=

(3.19)
1 2
( ) ( , ,..., | , )
t t t T t
i P O O O s i |
+ +
= =
(3.18)
( )
1 1
1
| ( ) ( )
N
i i
i
P O b O i t |
=
=

(3.20)
31
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH


3.3.5.2. Gii quyt vn c lng tham s cho HMM
C 2 gii thut gii quyt vn ny l gii thut phn on K-mean v
gii thut Baum-Welch.
Gii thut c lng tham s Baum-Welch: y tham s ca m hnh
=(A,B) c tnh ton nhm tng xc sut P(O|) cho n khi n t gi tr cc
i. Nh trnh by trn vic tnh ton P(O|) l tnh tng tt c cc P(O,I|)
trn tt c cc chui trng thi I ch khng phi cho mt chui ring bit no.
Phn quan trng ca thut ton Baum-Welch l hm expectation-
maximisation (EM). Thut ton ny c s dng gii quyt tnh trng thng
tin khng y trong d liu hun luyn (tc khng bit c chui trng thi).
Hm EM thng c s dng nht trong nhn dng ting ni l chun
maximum-likelihood (ML). Gii php ca gii thut ML l a ra cc cng thc
nhm cp nht cc gi tr tham s c ca HMM. qu trnh hun luyn t c
cc tham s tt th cn c cc tp tham s khi to ca HMM tt, bi v Baum-
Welch l mt thut ton ch cho kt qu tt nht trong cc b.
tng ca ML l c lng cc tham s ca HMM sao cho xc sut
likelihood P(O|) ln nht vi tp hp cc chui quan st {O}. Mt im na ca
ML l thut gii khng thc hin tinh P(O|) trn tt c cc chui quan st m ch
mt s chui S c P(|S) vt tri. iu ny lm gim chi ph tnh ton rt nhiu
nhng kt qu t c khng gim bao nhiu. Trong thc t thun tin cho
tnh ton ngi ta khng dng likelihood nguyn thy m dng xc sut
log_likelihood (ly logarit ca likelihood).
Trong thc t c lng cc tham s ca HMM ngi ta t s dng thut
ton Baum-Welch nguyn thy v chi ph tnh ton ca n qu ln. Thay vo
thut ton Baum-Welch c tnh da vo thut ton Forward v Backward. V
vy chng ta thng nhm thut ton Baum-Welch v thut ton Forward-
Backward.
( )
1
| ( )
N
T
i
P O i o
=
=

(3.21)
32
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
3.3.5.3. Gii quyt vn decoding
Vn ny c gii quyt bng gii thut Viterbi.
Gii thut ny tm ra chui S= s
1
, s
2
,, s
T
sao cho s xut hin chui vector
c trng O = O
1
, O
2
,, O
T
trong chui trng thi I l ln nht. C ngha l phi
tm chui trng thi S sao cho P(O,S|) l ln nht. y l mt gii thut qui np
v n c s dng trong qu trnh nhn dng sau khi qu trnh hun luyn tm
ra c cc m hnh HMM. Gii thut Viterbi trnh c tm kim trn khng
gian trng thi ln ca m hnh HMM nn gim thiu c chi ph tnh ton.


nh ngha thm:


Suy ra:


V vy bi ton tm chui trng thi tt c xc sut cao nht:


Tng ng vi bi ton tm:

( ) ( ) ( )
( ) ( ) ( )
1 1 2 2 1
1 2
| | , |
...
T T T
S S S S S S S T
P O P O S P S
b O a b O a b O

=
=
(3.22)
( ) ( ) ( )
( )
1 1
1 2 1
2
, ,..., ln ln
T T t
T s s S S t
t
U s s s b O a b O

| |
= +
|
\ .

(3.23)
1 2
( | ) exp( ( , ,..., ))
T
P O U s s s = (3.24)
1
{ } 1
ax ( , ... | )
T
T
s t
m P O s s
=
(3.25)
1 2
{ } 1
min ( , ,..., )
T
T
T
s t
U s s s
=
(3.26)
33
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
C th xem
ln( ( ))
j k t
i i s t
a b O
l hm chi ph khi i t trng thi s
j
sang
trng thi s
k
ti thi im t (ti vector c trng th t).
Thut ton Viterbi c th c m t nh sau:
Gi s chng ta ang vector c trng th t trong chui vector c trng v
ang trng thi i trong m hnh HMM, chng ta mun chuyn qua trng thi j
trong m hnh HMM. Chi ph chuyn t trng thi i sang trng thi j l trng
s ln(a
ij
b
j
(O
t
)) (vi a
ij
b
j
(O
t
) l xc sut chuyn t trng thi i sang trng thi j ti
thi im t v vector c trng O
t
t trng thi j) vi O
t
l vector c trng c
chn trong chui O=O
1
,O
2
,,O
T
khi chuyn ti trng thi j. Ti thi im t=1
trng s ny l ln(b
i
(O
1
)), trng s ny c gi l trng s khi u. Nh vy
tng trng s ca chui trng thi s l tng cc trng s chuyn trng thi gia 2
trng thi k nhau. Nh vy bi ton tm chui trng thi c xc sut cao nht s
tng ng vi bi ton tm chui trng thi c tng trng nh nht.
3.4. Nhn dng ting ni v nhn dng m v da trn HMM
3.4.1. M hnh nhn dng
Bt k chng trnh nhn dng da trn HMM no cng phi tun t tri
qua 2 giai on pht trin, u tin l giai on hun luyn v sau l giai on
nhn dng
Giai on hun luyn:
Trong giai on ny, d liu sng m ting ni c trch xut cc thng s
c trng, gn nhn m v v kt qu cui cng l cc m hnh HMM ca cc
m v. y l giai on quyt nh i vi chnh xc khi nhn dng ting ni.
Cc thao tc thc hin trong giai on ny nh Hnh 3.3.1.




34
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH









Giai on nhn dng:
Trong giai on hun luyn chng ta s dng thut ton Baum-Welch
c lng tham s cho HMM. Vic hun luyn thc hin trn d liu gn nhn
hoc khng gn nhn.
D liu ting ni c gn nhn theo m v l d liu m trn c
xc nh bin ca m v ca tng t trn chui sng m. Cc nhn ny c gn
bng tay. Cn d liu khng gn nhn c nhiu mc . Trong bi lm th d liu
khng gn nhn c hiu theo ngha l ch xc nh chui m tit (tng ng
xc nh c chui m v) tng ng vi chui sng m.
Qu trnh hun luyn s hun luyn cho tng m hnh m v HMM ng vi
tp hp tt c cc chui sng m c gn nhn tng ng vi n.
Cc tham s c c lng cho HMM l xc sut chuyn trng thi t
trng thi i sang trng thi j gia cc trng thi ca HMM a
ij
cc tham s ca tng
trng thi vector trung bnh v vector hip phng sai .
Trong nhn dng h thng s da vo cc m hnh HMM hc c v m
hnh ngn ng (language model) hay vn phm (grammar) v p dng cc chin
Sng
m

Rt
Trch
c
trng
File gn
nhn

c
trng

HMM
Prototype


T in
m v

Hun
luyn
Sng
m

Hnh 3.3.1 S hun luyn m hnh HMM
35
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
lc tm kim tm kim chui cc m tit tng ng vi chui sng m a
vo. Chi tit nh trong hnh di y.












Ta c th thy lc hun luyn th chng ta hun luyn trn m hnh m v,
cn lc nhn dng th chng ta nhn dng trn m tit. Chin lc tm kim c
xy dng da trn thut ton Viterbi.
3.4.2. Cc thnh phn c bn ca h thng nhn dng ting ni da trn
HMM v mi lin h gia chng
Chng ta xem xt phn ct li ca HMM p dng vo nhn dng ting ni,
by gi xem xt n cc thnh phn c bn ca h thng nhn dng da trn
HMM. Cc thnh phn ny bao gm:
1. Rt trch c trng d liu
2. M hnh m hc
3. M hnh ngn ng
4. Chin lc tm kim
Rt trch c trng s bin i cc chui tnh hiu s sng m thnh chui
cc vector quan st O= o
1
, o
2
,, o
T
. Mi vector O
t
s c t 10 n 40 thnh phn,
Hnh 3.3.2 S nhn dng t m hnh HMM hun luyn
Sng m

Rt
Trch
c
trng
Ng php

c
trng

Cc m hnh
HMM hun
luyn


T in
m v

Nhn
dng
T
nhn
dng
c

36
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
lu gi cc thng tin ca sng m. Qu trnh rt trch c trng s c gii thiu
phn sau.
M hnh HMM m hc dng cc m hnh HMM m hnh ha cc mu
sng m ca mt n v ting ni. n v ny c th l t (m tit) hoc cc n
v di t (m v). Mi HMM trong h thng m hnh ha cc th hin thc t
ca mt n v ting n. i khi cc n v m hnh ha (trong hun luyn) li
khc so vi cc n v trong nhn dng. V d, trong hu ht cc h thng nhn
dng ting ni, n v trong m hnh ha l phones (l m v v mt s n v
khng phi l ting ni), hay triphones(cc m v ph thuc ng cnh), trong khi
n v trong nhn dng li l t (hoc m tit). Nguyn nhn ca vic ny l
vn k thut. Do s lng t l rt ln (i vi ting Vit l khong 8000 t)
nn vic phn bit trc tip trn cc t l khng th. Tuy nhin cc t (hoc m
tit ) li c cu thnh t mt s n v c bn l m v, c s lng nh hn
nhiu. V vy hu ht cc h thng nhn dng s m hnh ha cc n v di t
(sub-word) nh m v trong khi s nhn dng cc t da vo tm kim cc
chui m v tng ng ca n. Cc h thng nhn dng nh vy gi l h thng
nhn dng da vo cc n v di t (sub-word based system).
S lin h gia cc trng thi ca HMM v khng gian m hc (acoustic
space) chnh l hm xc sut quan st b (observation probability function), c 2
dng c bn: ri rc (discrete-density) v lin tc (continuous-density). Do s
c 2 loi m hnh HMM l m hnh HMM lin tc (CDHMM) v m hnh HMM
ri rc (DDHMM). H thng DDHMM s dng nhiu b nh hn nhng li tnh
ton nhanh hn CDHMM. Cc tham s ca DDHMM l CDHMM u c tnh
bi gii thut Baum-Welch.
M hnh ngn ng trong nhn dng ting ni th hin s chuyn tip c th
xy ra gia cc t (m tit). M hnh ngn ng cha cc thng tin nh cc rng
buc c php ca ngn ng t nhin. Trong ng dng c 2 loi m hnh ngn ng
thng c s dng, l m hnh ngn ng lut thng k v m hnh ngn
ng lut quyt nh. M hnh thng k thng c s dng l m hnh word n-
37
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
gram. V d, m hnh 2-gram xc nh xc sut ca t hin hnh ph thuc vo t
trc n. Tham s ca cc m hnh trn c th c xc nh da vo d liu vn
bn ca cc c s d liu sng m (vn bn m t dy cc t trong d liu sng
m). Xc sut chuyn dch gia cc t khng thy c qua qu trnh hun luyn
do chng ta c th cho chng cc gi tr nh nhm cho php xc nh cc xc
sut chuyn dch gia cc t lc kim tra chng trnh trn d liu test trong
corpus (corpus l d liu bao gm ting ni c thu m v cc file vn bn ni
dung tng ng vi d liu ). Loi m hnh ngn ng cn li l m hnh ngn
ng lut quyt nh, n xc nh tt c cc dch chuyn hp l gia cc t. Cc
lut ny c th c hun luyn t cc d liu text hoc c th c to ra da
vo cc h thng nhn dng khc trong mi cu s tun theo cc qui tc ng
php nht nh.
M hnh ngn ng c mt vai tr ln trong nhn dng ting ni. l v s
hn ch ca cc m hnh m hc trong nhn dng ting ni v s lng kt hp
gia cc t l rt ln trong khi nu c m hnh ngn ng chng ta c th hn ch
s lng kt hp ny xung rt nh. Do khng gian tm kim s gim xung
ng k nn tc v chnh xc ca nhn dng s tng ln rt nhiu, c bit
khi chng ta lm vi s lng t ln. S thit k, cu trc ton hc ca m hnh
ngn ng, cc phng php xc nh cc tham s ca chng, s tng hiu qu
ca h thonosgs nhn dng c trnh by trong nhiu ti liu v m hnh ngn
ng (v d, Shih et al, 1995).
Thnh phn cui cng, chin lc tm kim, cng l mt thnh phn rt quan
trng i vi chnh xc, hiu qu ca h thng, v i vi s cn bng ca c 3
thnh phn trn. N tr thnh mt thnh phn v cng quan trng khi s lng t
nhn dng rt ln. Khng gian tm kim ny bao gm cc s kt hp ca tt c
cc lin kt c th gia cc m v v gia cc t trong nhn dng trn t in ln.
Cc thut ton tm kim c thit k ph thuc vo tng cng vic nhn dng
ting ni c th. Thut ton thng c s dng nht trong tm kim l
thut ton Viterbi.
38
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
CHNG 4
TIN X L TN HIU TRCH C TRNG XY DNG
M HNH HMM
4.1. Tin x l ting ni
4.1.1. ngha ca tin x l ting ni
nhn dng ting ni, trc tin cn thu ting ni. Cht lng ting ni thu
c cng tt th nhn dng cng chnh xc. Tuy nhin trn thc t, d l trong
phng th nghim hay mi trng sinh hot hng ngy, ngi ta vn ch thu c
ting ni km theo nhiu tn hiu khng mong mun khc, bao gm ting n (ting
qut my, xe c, ), nhiu (ting nhp mi, ting th, nhiu do in t), ting
vang, Ngoi ra, m thanh thu c c th b bp mo do thit b (microphone,
card m thanh,), m thanh qu to, qu nh, V vy, qu trnh tin x l tn
hiu c a vo qu trnh nhn dng ting ni (ngay sau giai on thu m)
nhm a cht lng m thanh ln cao nht c li cho qu trnh nhn dng sau ny
(khi hc mu, nhn dng,)
V mt k thut, qu trnh tin x l s p dng cc phng php my hc,
thut gii hoc p dng mt hay nhiu b lc ln tn hiu ting ni va mi thu
c.
Gi:
Y
i
l tn hiu thu c ti thi im i
X
i
l tn hiu thun ting ni ti thi im i
N
i
l tn hiu khc (nhiu, tn hiu lm mp ting) ti thi im i
Ta c:
i i i
Y X N = +
(4.1)
39
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Trong , N
i
c th l tp hp ca M ngun tn hiu nhiu khc
nhau:


Nh vy, u vo ca qu trnh tin x l l dy tn hiu m thanh thu c
c chiu di k:
Y = { Y
1
, Y
2
,..., Y
k
}
V u ra l dy tn hiu ting ni l tng c cng chiu di:
X = { X
1
, X
2
,..., X
k
}
Thc hin tt qu trnh tin x l s nng cao ng k cht lng nhn dng.
Tuy nhin trn thc t, qu trnh tin x l ch hn ch nhng nh hng khng
mong mun tc ng xu n kt qu nhn dng ch khng th trit tiu ht c
chng.
4.1.2. Mt s cng vic trong tin x l ting ni
4.1.2.1. Lm ni tn hiu (pre emphasis )
Thc hin mt b lc thng cao nhm hn ch vic gim i cng tn hiu
trong d liu ting ni. Nh tn gi ca n, b lc thng cao gi li cc thnh
phn c tn s cao v loi b i cc thnh phn c tn s thp. Trong ting ni,
cc thnh phn c tn s cao mang nng lng t hn nhiu so vi cc thnh
phn c tn s thp, nhng nhng vng ny li lu gi mt phn tn hiu quan
trng ca ting ni, v vy chng ta s tng cng tn hiu trong vng tn s
cao, lm cn bng tn s gia cc vng. Ngi ta cn gi qu trnh ny l lm
phng tn hiu ting ni. Qu trnh lc c thc hin nh sau:
ng vi mi gi tr X
i
trong chui d liu u vo X = {X
1
, X
2
, , X
k
},
p dng cng thc:
| |
M
i i
j
N N j =

(4.2)
40
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Y
i
= X
i
+ a*X
i-1

Trong ,
Y = {Y
1
, Y
2
, , Y
k
} i = 1..k: chui tn hiu c x l

i xc nh mu thi im i
l h s lm ni, thng c chn t 0,95 n 0,97. H s lm ni a
cng cao, thnh phn c tn s thp b loi i cng nhiu.







4.1.2.2. Lc ting n
Thut ton Tr m ph phi tuyn (Nonlinear Spectral Subtraction NSS)
Thut ton NSS s dng n t l SNR (Signal to noise ratio) ph thuc tn
s. S dng php tr phi tuyn, h s tr s gim cc thnh phn ph c t l
SNR cao v tng cc thnh phn ph c t l SNR thp. Thm vo , m hnh
ting n c m rng bng cch s dng c ph ting n trung bnh v ph ting
n vt ngng. NSS c th hin bng cng thc sau:

Hnh 4.1 Sng m ca t hai trc v sau khi c lm phng
( ) ( )
^
( ) .
i i
X H Y e e e =
(4.3)
(4.4)
41
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Trong , H
i
() ph thuc vo c lng lm trn ca ph cng ting
n, |
..
Y ()|, v h s tr phi tuyn
i
(),



H s tr
i
() c tnh nh sau:



Vi
l hng s co gin ph thuc vo khong ( )
i
e
Khong bin thin ca ( )
i
| e :
( ) ( ) ( )
3
i i i
N N e | e e
-- --
s s

Thut ton c lng MMSE (MMSE Estimator)
Ephraim v Malah xut thut ton ny p dng cho cc thnh phn
bin ph ting n ngn k. Trong phng php ny, cc thnh phn ph
ca ting ni v tin n c m hnh thnh cc bin ngu nhin Gaussian.
Thut ton c lng thnh phn bin ca ph th k bng b lc sau:
( )
( )
i
i
i
Y
N
e

e
--
--
=
( )
( ) ( )
( )
i
i
i
i
H
H
Y
e | e
e
e
--
--

=
(4.5)
( )
( )
( )
( )
ax
1
i M i
i
i
m N
t
t
e
| e
e
s s
=
+
(4.6)
42
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
( ) ( )
0 1
exp 1
2 2 2 2
k k k k
i k k k
k
v v v v
H v I v I
t
e

| | | | | | | | | | | |
( = + +
|
| | | | |

\ . \ . \ . \ . \ .
\ .

Trong ,
0
2
k
v
I
| |
|
\ .
v
1
2
k
v
I
| |
|
\ .
l hm Bessel hiu chnh th 0 v th 1.
V
k
c tnh nh sau:

1
k
k k
k
v

=
+

k
v
k
l cc t l SNR a priori v a posteriori ca thnh phn ph thu k.
4.2. Rt trch c trng
4.2.1. Gii thiu
Tn hiu ting ni thu c t micro vn c kch thc ln. My tnh c
nhn khng c tc v b nh cho qu trnh nhn dng lng d liu ln
. gii quyt vn ny, ngi ta ch trch ra v x l nhng thng tin
cn thit nht t dy tn hiu thu c. Cc tn hiu trch ra c gi l c trng
ca dy tn hiu gc. c trng ny phi tho nhng iu kin sau:
- Nh hn rt nhiu so vi tn hiu gc ( khng chim b nh, gim
thi gian khi x l,).
- Vn gia li nhng c im quan trng nht ca dy tn hiu ban
u.
C nhiu dng c trng, c th chia thnh 3 nhm sau:

Nhm 1:

c trng trch ra nh m phng theo m hnh b my pht m ca con
ngi. Cc c trng nhm ny s dng phong php d bo tuyn tnh rt
t trng. Cc phng php thng dng l: Linear Predictive Coding
(4.7)
43
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
LPC, Perceptional Linear Prediction (PLP),
Nhm 2:

c trng trch ra nh da trn kh nng cm nhn m ca thnh gic con
ngi. Cc c trng nhm ny s dng phng php phn tch cepstral
1
v
ph m (spectral) (cn gi l cc phng php dy b lc). Phng php
thng s dng l Mel Frequency Cepstral Coding (MFCC).
Nhm 3:

Mt s cc c trng m hc khc v cao , ng iu, du nhn: thch hp
cho nhn dng ngn ng ang ni, thanh iu, tm trng ngi ni, .

Cc c trng nhm 3 t c dng trong vic nhn dng m tit . Ngi ta
dng ch yu cc c trng nhm 1 v nhm 2.
4.2.2. Mt s phng php trch c trng
4.2.2.1. Phng php m ho d bo tuyn tnh(LPC)
Phng php m ho d bo tuyn tnh rt ra cc h s d bo tuyn tnh
(Linear Prediction Coefficients). LPC l mt k thut c s dng kh rng ri v
n trch ra c m hnh tn hiu ting ni tng i tt v chy tng i nhanh
hn so vi cc k thut dng dy b lc. (Tuy nhin, vi cc my tnh ngy nay,
tc khng cn l vn ng quan tm).
Trong phng php ny, ngi ta hng gi nh:
- Dy thanh m c lp vi b my pht m
- B my pht m c cu to tuyn tnh
Trong mi trng cng hng ca b my pht m tuyn tnh, gi tr ca tn
hiu hin ti s l thuc vo cc gi tr trc , tc l tn hiu s(t) ti thi im t
44
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
c biu din bng mt phng trnh tng ca tch p gi tr tn hiu trc
trong frame v cc h s:
1 2
1
( ) * ( 1) * ( 2) ... ( ) ( )
p
p i
i
s t a s t a s t a s t p a s t i
=
= + + + =

Chng ta khng th tm ra dy s a
i
chnh xc m ch c th c lng tm ra
gi tr gn nht.
Gi nh v cc tnh gi tr tn hiu nh trn l khng chnh xc v tnh cng
hng trong c quan pht m bin thin chm theo thi gian. V vy, mt gi tr
li s c cng thm vo kt qu:
1
( ) ( ) ( )
p
i
i
s t a s t i e n
=
= +

Hm truyn c xy dng bng b lc ton cc, c dng:
0
1
( )
p
i
i
i
H z
a z

=
=


Trong , p l s lng cc v a
0
1 v cc h s lc {a
i
} c chn sao cho
sai s d on bnh phng t gi tr ti thiu. C 2 phng php c s dng:
t tng quan v hip phng sai. Phng php hip phng sai c s dng
trong tn hiu tun t. Chng ta ch xem xt n phng php t tng quan: ng
vi mi frame, p+1 h s tng quan u tin c tnh nh sau:
1
( ) ( 1)
N i
i
j
r s j s j

=
= +

vi 0 i p s s
(4.8)
(4.9)
(4.10)
45
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Sau , cc h s lc c tnh ton quy s dng cc h s tng phn.
Gi E l li d bo. Ban u, E = r
0
, v gi k
(i-1)
v a
(i-1)
ln lt l h s tng
phn v h s lc cho b lc th (i-1). B lc th (i) c th c tnh qua 3 bc:
Bc 1: To cc h s tng phn mi:
( ) ( 1) i i
j j
k k

=

vi 1 1 j i s s
{ }
( ) ( 1) ( 1)
/
i i i
i i j i j
k r a r E

= +

Bc 2: Cp nht nng lng d on:

( )
( ) ( ) ( ) ( 1)
1
i i i i
i i
E k k E

=

Bc 3: Tnh cc h s lc mi:
( ) ( 1) ( ) 1 i i i i
j j i i j
a a k a

=

vi 1 1 j i s s

( ) ( ) i i
i i
a k =

Cc h s lc c tnh qua p bc lp.
4.2.2.2. Phng php M ho cepstral tn s Mel (MFCC)
Davis v Mermelstein a ra thut ng h s cepstral theo tn s Mel
(Mel- frequency Cepstral Coefficients - MFCC) vo nm 1980 khi h kt hp
cc b lc cch khong khng u vi bin i Cosine ri rc (DCT) thnh
mt thut ton u-cui ng dng trong lnh vc nhn dng ting ni lin tc.
(4.11)
(4.12)
46
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Phng php MFCC l mt trong nhng phng php s dng dy b lc.
Phng php ny s rt ra c trng thng qua 2 giai on. u tin, nng
lng dy b lc hp c xc nh bng cch s dng mt dy b lc Mel
scale. Sau , cc nng lng ny c m ho bng php bin i Fourier (hay
cc php bin i tng t) y chnh l giai on phn tch cepstral.
Qu trnh trch c trng MFCC ni chung qua cc bc sau:
- Lm ni tn hiu (xem phn phn tin x l)
sf[n] = preemphasis(s[n])
- Tn hiu c nhn vi cc ca s chng lp v chia ra thnh cc
frame c chiu di 20~30 ms vi khong chng lp gia cc ca s t
10~15 ms.
sf[n] = s[n].w[n]
w[n] thng l ca s Hamming c gi tr:

| |
0.54 0.46 os(2 / ) 0
w
0 erwise
c n N n N
n
oth
t s s


- Php bin i Fourier (DFT, FFT, ) c p dng cho mi frame
tnh ph tn hiu, sau ly log
S
f
() = log|DFT{sf[n]}|
- 24 dy b lc tam gic vi khong cch xp x logarit (gi l dy b
lc Mel) c p dng cho sf (). Nng lng trong cc dy ny
c tch hp cho ra 24 h s trong ph. Dy b lc Mel-scale bao
gm mt dy cc b lc tam gic chng ln nhau vi tn s v rng
dy tnh theo t l tn s Mel. T l tn s Mel, ging nh t l Bark s
dng cho phng php PLP, c da trn nhng kt qu nghin cu
tm l t con ngi. Mi khong ngh trong t l Mel ng vi mt cao
tng i ca mt tone m con ngi cm nhn. Thc t kim
nghim c vi cch khong nh vy ca dy b lc, t l nhn dng
ng c nng ln. Tip theo, cc h s cepstral c tnh t php
(4.13)
(4.14)
(4.15)
47
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
bin i Cosine ri rc (DCT).
c = DCT{S()}
12 h s u tin (tr i h s th 0) chnh l cc c trng MFCC.
4.2.2.3. Phng php M ho cepstral tn s Mel da trn LPC
(MFCC)
Phng php ny ging nh phng php MFCC. im khc bit l sau
bc phn chia ca s, ph tn hiu c tnh thng qua php bin i d bo
tuyn tnh LPC thay v dng cc php bin i Fourier.













4.2.2.4. Cc h s delta (D) v h s gia tc (A)
Hiu sut ca mt h nhn dng ting ni c th tng ln ng k nu
Tn hiu ting ni
Phn chia ca s
Lm ni tn hiu
FFT
Dy b lc Mel
Log|tn hiu lc Mel|
DCT
Cc c trng MFCC
LPC
Dy b lc Mel
Log|tn hiu lc Mel|
DCT
Cc c trng MFCC da
trn LPC
Hnh 4.2 S ca b lc MFCC da trn bin i d bo tuyn tnh v
cc bin i Fourier

(4.16)
48
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
chng ta a thm gia s thi gian vo b tham s tnh c bn. y, chng ta
s xem xt h s D (delta) v h s A (accelerator).
H s D l h s c c do tnh o hm bc nht ca c trng gc, h s
gia tc A c c nh tnh o hm bc hai ca c trng gc. H s A khng th
dng ring l mt mnh. H s delta c tnh da vo cng thc truy hi sau:
( )
1
2
1
2
t t
t
c c
d
u u
u
u
u
u
+
=
=


trong , d
t
l h s delta ti thi im t c tnh da theo cc h s tnh c
t+

v c
t-
.
Do phng trnh trn ph thuc vo cc gi tr tham s ting ni s dng
v cha s dng nn chng ta cn tinh chnh phn u v cui ca li ni. Thng
thng, chng ta nhn i vector u tin hoc vector cui cng. Mt cch khc l
s dng im khc bit v tr u u v cui ting ni:

1
,
t t t
d c c t
+
= < O

v

1
,
t t t
d c c t T

= > O

Trong , T l chiu di d liu.
Nu s dng h s delta v h s gia tc trong c trng, chng c p dng
cho ton b cc c trng gc, bao gm c nng lng nu c. Trong vi ng
dng, nng lng tuyt i khng ng vai tr quan trng, nhng nhng trch
on v thi gian ca nng lng th quan trng.
4.3. So snh cc phng php trch c trng
(4.17)
49
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Phng php LPC Phng php MFCC da trn LPC
Spectral LPC dng b lc t l Mel (MFCC da trn LPC) c hiu sut
tt hn mt cht so vi spectral LPC d bo tuyn tnh. Nguyn nhn c th
do hiu ng lm mt vng tn s cao ca b lc t l Mel, lm gim i cc yu
t cao tht thng trong vng tn s cao bng cch nng cao bng thng ca
cc dy b lc Mel.
Phng php MFCC da trn LPC Phng php MFCC da trn FFT
MFCC da trn LPC c khuynh hng ph thuc ngi ni, do , thch
hp hn cho cc cng vic nhn dng ngi ni.i vi ting ni khng b pha
tp ting n hoc ting ni ging ting ni hc mu, phng php LPC c hiu
sut km hn FFT (c 2 u dng b lc t l Mel), nhng i vi ting ni mi
trng n o hoc khng ging dng ting ni hc mu, c lng spectral
LPC c hiu sut cao hn.











50
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
CHNG 5
THC HIN CHNG TRNH MU
5.1 Gii thiu v qu trnh thc hin chng trnh
Mc ch ca nhm thc hin l thc hin mt chng trnh c kh nng
nhn dng c mt s cu lnh ngn nh Gi H, Gi Khnh, Bm 0,
Bm 1, nhm th hin chi tit cc qu trnh nhn dng ting ni da trn m
hnh HMM v vi mc ch nhn dng cc cu lnh iu khin my tnh.
Nh trnh by phn l thuyt vic nhn dng c thc hin l nhn
dng m v ch khng phi nhn dng t, do s lng m v t hn rt nhiu so vi
s lng t, m bo tnh kh thi trong thc hin chng trnh.Tuy nhin v phn
t in m v cho t vng ting Vit cha c mt chun phin m no nn nhm
thc hin ti thc hin phn chia m v cc t theo m hnh sau:
T
m u Nguyn m m cui


V d t Khnh s c chia m v thanh KH - - NH.
Thm na, do cc cng c trong HTK (t nhin a vo HTK m cha gii
thiu v n v ni s qua dng n lm g) khng h tr m Unicode nn vic nh
ch ting vit s c quy c theo kiu g Telex. V d:
- AS
- UOW
- OOJ
Hnh 5.1 M hnh phn chia m v cho t
51
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Trn y l mt s quy c ca nhm lm ti khi thc hin chng trnh
mu. thc hin c chng trnh th chng ta cn mt ci t v cu hnh mt
s chng trnh, c trnh by phn tip theo.
5.2 Ci t cc chng trnh cn thit
Qu trnh thc hin cn cc chng trnh sau:
HTK toolkit b cng c xy dng m hnh HMM
Cygwin chng trnh to mi trng Linux trn windows.
Audacity chng trnh thu m trn PC cho php xem v chnh sa
waveform.
Julian mt engine nhn dng ging ni.
Qu trnh ci t cc chng trnh trn c hng dn k ti a ch
http://www.voxforge.org/home/dev/acousticmodels/windows/create/htkjulius/tutori
al/download
Sau khi ci t v cu hnh ng cc chng trnh trn theo hng dn,
chng ta bt u cc bc thc hin chng trnh.
5.3. Qu trnh thc hin chng trnh mu
Cc bc thc hin chng trnh bao gm:
1. Chun b d liu
a. Thu d liu
b. Gn nhn d liu
c. Rt trch c trng
2. Xy dng m hnh ngn ng
3. Hun luyn
a. Khi to tham s cho HMM
b. Hun luyn trn m hnh monophone
52
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
4. Xy dng chng trnh nhn dng
5.3.1. Chun b d liu
5.3.1.1. Thu d liu
Cng c HParse ca HTK c dng pht sinh cc cu ngu nhin (bng
vn bn), sau s thu m cc cu ny, lu mi cu vo mi file ring bit. Cc
file m thanh c lu nh dng WAV (Microsoft 16 bit PCM) vi cht lng
nh sau:
nh dng tc ly mu Default Sample Rate Format, l 48000Hz
nh dng ly mu Default Sample Format l 16 bit
Knh m thanh Chanel : Mono
Thao tc chnh cht lng m thanh c thc hin trong giao din Audacity.
Tool HParse ca HTK c th pht sinh cu ngu nhin, tuy nhin ta phi cung cp
danh sch t vng cho n. Danh sch t vng ny t nht phi bao gm cc t m
ta cn nhn dng. File cha cc cu c pht sinh c dng nh sau:







Trong qu trnh thu m, cn gim thiu tin n (hi th, xe c, qut) v c
khong ngt ngn gia cc t. u v cui cu c vng lng nh du.
Hnh 5.1 File cha cc cu c pht sinh ngu nhin
53
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
5.3.1.2. Gn nhn d liu
Chng ta ch cn to cc file m t th t cc m tit trong cu v dng t
in m v chuyn chui m tit thnh chui m v. Trong lc gn nhn, chng
ta nn thm m v silence vo gia 2 m tit. iu ny lm tng hiu qu ca qu
trnh nhn dng ln nhiu, v thc t l gia 2 m tit thng c khong silence.
Nhng nu khng c m v sillence gia 2 m tit th cng khng nh hng ti
qu trnh hun luyn. Sn phm ca giai on ny l 2 file phones0.mlf v
phones1.mlf (mlf master label file) cha tt c vn bn ca cc cu c thu m
nhng c phin m sang mc m v.
5.3.1.3. Rt trch c trng d liu
Giai on ny l giai on chuyn d liu sng m sang d liu c trung
MFCC(Mel-frequency cepstral coefficients h s cepstral tn s Mel). Cng c
c dng cho thao tc ny l HCopy. HCopy thc hin vic chuyn i theo cc
thng s c chng ta cung cp, c th trong ti nh sau:
# Coding parameters
TARGETKIND = MFCC_0
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 26
CEPLIFTER = 22
NUMCEPS = 12
ENORMALISE = F

Hnh 5.2 Cu hnh chng trnh trch
c trng

54
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
D liu sng m c thu di tn s ly mu 16000 mu /giy, kch
thc mi mu l 16 bit.
Kch thc mi ca s l 25 ms. chng lp ca cc ca s l 15 ms.
S dng dy b lc s dng c 26 knh. C th hiu qua v cu hnh trch c
trng trn l: tham s ch l MFCC s dng C
o
l thnh phn nng lng,
khong thi gian ly khung d liu l 10ms (HTK s dng n v 100ns), outut
c lu dng nn c crr checksum. Bin i FFT s dng ca s Hamming
(USEHAMMING = T(True)) c th t tin nhn u tin s dng h s l 0.97
(PREEMCOEF = 0.97). Ton b qu trnh s dng HCopy c trnh by nh
trong Hnh 5.3.


















5.3.2. Khi to m hnh HMM
HDMan
WaveFormFil
e
sample1.wav
sample2.wav

Configuration
file (config)
Script File
(cha danh sch file
ngun v file ch)
MFCC file
sample1.mfc
sample2.mfc

Hnh 5.3 Trch c trng vi HDMan
55
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Nh gii thiu trong gii thut Embedded Training, khi to HMM
cho cc m hnh l rt quan trng, n nh hng n tc hi t ca thut
ton, v s chnh xc ca thut ton.
Trong phm vi ti nhm thc hin s dng phng php Flat start khi
to tham s cho HMM. y l phng php khi to trn d liu khng gn nhn
ca HTK. tng ca cch khi to ny l da vo cc d liu hun luyn
c a vo, chng ta s khi to cc HMM sao cho chng c vector trung
bnh v hip phng sai bng nhau v bng vi vector trung bnh ton cc v
vector hip phng sai ton cc ca d liu hun luyn.
Vic u tin cn lm trong giai on hun luyn l nh ngha mt m hnh
nguyn mu. Cc tham s ca m hnh ny khng quan trng, mc nh chnh
ca n l nh ngha mt m hnh cu trc lin kt. M hnh nguyn mu c
cung cp trong HTK c dng

~o <VecSize> 39 <MFCC_0_D_A>
~h "proto"
<BeginHMM>
<NumStates> 5
<State> 2
<Mean> 39
0.0 0.0 0.0 ...
<Variance> 39
1.0 1.0 1.0 ...
<State> 3
<Mean> 39
0.0 0.0 0.0 ...
<Variance> 39
1.0 1.0 1.0 ...

<State> 4
<Mean> 39
0.0 0.0 0.0 ...
<Variance> 39
1.0 1.0 1.0 ...

<TransP> 5
0.0 1.0 0.0 0.0 0.0
0.0 0.6 0.4 0.0 0.0
0.0 0.0 0.6 0.4 0.0
0.0 0.0 0.0 0.7 0.3
0.0 0.0 0.0 0.0 0.0
<EndHMM>

Hnh 5.4 M hnh nguyn mu trong HTK Book (proto)

Trong m hnh ny chiu di mi vector l 39 bao gm cc c trng c
56
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
rt trch l MFCC_O (13 c trng). S dng thm 2 c trng na l bin
thin (13 c trng) v gia tc bin thin (accelerator) (13 c trng ). Nh
vy mi vector c trng s c 39 thnh phn.

5.3.3. Hun luyn m hnh HMM
Cng c thc hin hun luyn Embedded training trong HTK l
HERest. HERest s dng cc m hnh HMM va khi to v d liu hun
luyn hun luyn ra cc HMM m v c lp ng cnh monophone. M hnh
ca qu trnh hun luyn nh hnh di. Chi tit v thut ton hun luyn trn
d liu khng gn nhn m v Embedded training c gii thiu phn trn.











HERest s load cc m hnh HMM va khi to cha trong file hmmdefs
(file cha nh ngha cc m hnh HMM) trong danh sch m v monophone0.
Qu trnh c lng tham s s s dng cc file hun luyn cha trong danh
sch file hun luyn trains.scp v cc chui cc m v tng ng nm trong file
Cc monophone
c hun luyn
(cc hmmdef mi)
File gn nhn m v
(phone0.mlf)
HMM khi
to (hmmdef)
Danh sch HMM
(monophones0)
Danh sch file hun
luyn (trains.scp)
HERest
Hnh 5.4 M hnh hun luyn monophone dng HERest
57
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
phone.mlf to ra cc HMM mi. Cc HMM c hun luyn c lu
trong th mc new_hmm. Chng ta s lp li s ln hun luyn nhiu ln n
khi hi t. Tiu chun nh gi hi t l xem s hi ca gi tr log_likelihood
trung bnh trn mi frame d liu hun luyn. Thng th s ln hun luyn l
t 2 n 5 ln. Nu s bc lp hun luyn qu t th mc chnh xc ca cc
HMM khng cao. Nhng nu s bc lp qu ln, s xy ra tnh trng qu
luyn dn n mc tng qut ca cc HMM li b gim i. V vy chng ta
s phi quyt nh s lng cc bc lp sao cho hiu qu nht. Trong phm vi
ti, s ln hun luyn l 3, c lu trong th mc hmm3 tng ng.
i vi cc ngn ng a m, tng chnh xc khi nhn dng, ngi ta
s tip tc tin hnh sa thm m hnh HMM cho sp, l khong thi gian dng
ngn gia cc m tit. Tuy nhin, ting Vit li l ngn ng n m, mi t cng
l mt m tit nn thao tc ny l khng cn thit. Nh vy, m hnh HMM
bc cui cng c th dng cung cp cho engine Julian thc hin nhn
dng.
5.3.4. Nhn dng bng Julian
Mc d Julian h tr nh dng HMM ca HTK, tc l file hmmdef c c
bc trn c th dng c cho Julian. Tuy nhin, gii hn s lng t
cn nghe nhm tng chnh xc, phi cung cp cho engine ny mt m hnh
ngn ng hoc mt file ng php cung cp cc lut lin kt gia cc t cho n.
Trong phm vi ti, vi s lng t vng khng nhiu, nhm thc hin xy
dng cho Julian mt file ng php c ni dung nh sau:
S : NS_B SENT NS_E
SENT: CALL_V NAME_N
SENT: DIAL_V DIGIT
Hnh 5.5 Ng php dng trong chng trnh


58
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
T vng ng vi cc nhn trong ng php c cung cp thng qu file
.voca nh sau:
% CALL_V
GOIJ g oij
% DIAL_V
BAASM b aas m
% NAME_N
KHASNH kh as nh
HAF h af
THAAFY th aaf y
DDAJT dd aj t
LINH l i nh
% DIGIT
KHOONG kh oo ng
MOOJT m ooj t
Nh vy, Julian s c gii hn nghe cc cu lnh dng Gi H, Gi
Khnh hoc Bm khng Bm Mt m khng chp nhn cc kt qu
khc. iu ny nm trong mong mun s dng ca ngi dng. Nh vy, vi 3
file hmmdefs, mygram.grammar, mygram.voca, Julian c th thc hin vic nhn
dng. Nhng trc khi chy Julian, cn phi cu hnh engine nhn ng cc
file ngun to. C th nh sau:

## Grammar definition file (DFA and dictionary)
-dfa mygram.dfa
-v mygram.dict
## Acoustic HMM file
-h hmm3/hmmdefs
-smpFreq 48000 # sampling rate (Hz)

Hnh 5.6 cu hnh cho Julian (file julian.jconf)

Cui cng, bt u chy Julian vi dng lnh sau:
$julian -input mic -C julian.jconf
5.4. nh gi kt qu v hng pht trin
5.4.1. Kt qu
59
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
Vi vn t bao gm cc t /GI/ /BM/ /H/ /KHNH/ /THY/ /T/
/LINH/ /KHNG/ /MT/ /HAI/ /BA/ /BN/ /NM/ /SU/ /BY/ /TM/
/CHN/ /MI/, chng trnh cho chnh xc 86% trong iu kin yn lng v
cu lnh pht ra r rng.
Nu m rng s lng t vng ln c th p ng c cc ng dng lnh bng
ging ni c bn.
5.4.2. Nhn xt
Mc d nhn dng tng i chnh xc cc t vng trn, tuy nhin vn t
vng m chng trnh s dng cn t, cha kim tra c chnh xc khi pht
trin vi s lng t vng ln.
Mt vn khc l ti ch mi dng li bc xy dng m hnh HMM
mc monophone m cha xy dng c mc triphone, nn chnh xc
cha cao.
5.5. Kh khn v hng pht trin
i vi vic xy dng chng trnh nhn dng cho ting Vit, kh khn ln
nht l cha c nhng cng trnh nghin cu mang tnh chun mc v ng m
ca ting Vit, do khi p dng vo HTK, bn thn nhm thc hin cng p
dng cc k thut mc cm tnh, cha m bo c s chnh xc.
Ngoi ra, chnh v y cng l mt lnh vc kh mi m nc ta nn rt t
ti nghin cu su v lnh vc ny, v vy m vic thc hin bt u t cc
ti dnh cho ting Anh, mt ngn ng kh khc bit vi ting Vit, nn mt s
cng c ch chnh xc cho ting Anh.
Hng pht trin ca chng trnh l tng s lng t vng, xy dng c
b t in pht m chun v chnh xc theo m hc (rt quan trng v khi pht
trin s lng t ln, khng th phin m bng tay m phi lm t ng), xy
dng c m hnh triphone tng chnh xc. Sau khi t c chnh xc
mong mun, c th tip tc hun luyn vi nhiu ging khc nhau, to thnh mt
chng trnh nhn dng c lp ngi ni, tng tnh kh dng ca chng trnh.

60
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH
TI LIU THAM KHO
Ting Vit:
- [1] inh L Th - Nguyn Vn Hu, C cu ng m Ting
Vit, NXB Gio Dc,1998
- [2] Thi Hng Vn, Lun n thc s, khoa Cng ngh thng
tin, i hc Khoa hc T nhin, 2000
- [3] Xun t V Vn Tun, Lun vn tt nghip khoa
Cng ngh thng tin,i hc Khoa hc T nhin, 2003
- [4] ng Thi Dng H Giang Hi, lun vn tt nghip
khoa CNTT, H Khoa hc t nhin, 2004
Ting Anh:
- [5] X. Huang Spoken Language Processing: A Guide to
Theory, Algorithm and System Development
- [6] Foundations of Statistical Natural Language Processing" by
Manning & Schtze. Chapter 9, Markov Models
- [7]Lawrence R. Rabiner, A tutorial on Hidden Markov
Models and Selected Applications in Speech Recognition,
IEEE, 1989







61
GVHD: TS.V C LUNG SVTH: L HONG H TRN LONG KHNH

You might also like