You are on page 1of 99

Trang | 1

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng


CHNG 1. TNG QUAN
1.1. Gii thiu ti
ti ny mang tn Nghin cu v nhn dng ging ni ting Vit v ng
dng trong iu khin. thc hin c vic nhn dng ging ni, ta phi xy dng
mt h thng gi l H thng nhn dng ting ni t ng (Automatic Speech
Recognition -ASR), y l h thng chuyn i chui m thanh ting ni thnh chui
t. Vic xy dng mt h nhn dng ting ni khng phi l mt cng vic n gin,
i hi nhm pht trin phi am hiu cc k thut, l thuyt t nhiu kin thc khc
nhau nh: m hc - vt l, ng m hc, ngn ng hc, l thuyt xc sut thng k,
my hc, tr tu nhn to, Trn th gii, nhiu nhm nghin cu pht trin thnh
cng h nhn dng ting ni cho cc ngn ng ln nh: ting Anh, ting Trung Quc,
ting Nht, nhng gii php nhn dng cho ting Vit vn cn nhiu mt hn ch.
1.2. Cc nghin cu c lin quan n kha lun
1.2.1. Trn th gii
Giao tip ngi-my l mt lnh vc nghin cu ln v kh nhng li c nhiu
ng dng thc tin. Ting ni l mt phng tin giao tip t nhin nht ca con
ngi v v vy, nghin cu my tnh c th hiu ting ni ca con ngi, hay cn
gi l nhn dng ting ni t ng (Automatic Speech Recognition ASR), tri
qua qu trnh 70 nm pht trin. Nhng n lc nghin cu u tin v ASR c
tin hnh trong thp nin 50 vi tng chnh l da trn ng m. Do k thut x l
tn hiu s cng nh kh nng my tnh cn gii hn, cc h thng nhn dng lc
ch tp trung khai thc c trng ph cng hng (spectral resonances) i vi cc
nguyn m ca tn hiu, sau khi i qua cc b lc tng t. Trong giai on ny, c
cc h thng ng ch nh: h thng nhn dng k s ri rc ca Bell-lab (1952), b
nhn dng 13 m v ca trng i hc CollegeAnh (1958) [1, p. 8]
Trong thp k 1960, im ng ghi nhn nht l tng ca tc gi ngi Nga,
Vintsyuk khi ng xut phng php nhn dng ting ni da trn qui hoch ng
theo thi gian (Dynamic Time Warping DTW) [2, p. 1]. ng tic l mi n
nhng nm 1980, phng php ny mi c th gii bit n. Cui nhng nm
1960, Reddy trng i hc CMU (M) xut nhng tng u tin v nhn
Trang | 2

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
dng ting ni lin tc bng k thut nh du ng i v truy vt li tm kt qu [2,
p. 2].
n nhng nm 70, nghin cu v nhn dng ting ni bc u thu c
cc kt qu khch l, lm nn tng cho nhng pht trin sau ny. Trc tin l bi ton
nhn dng t ri rc c gii quyt da trn tng ca cc nh khoa hc ngi
Nga v Nht. Velichko v Zagoruyko ( Nga) l nhng ngi i tin phong trong vic
p dng tng v phn lp mu cho ASR. Sakoe v Chiba (Nht) xut cc k
thut s dng phng php qui hoch ng. V Itakura, trong thi gian Bell-lab,
a ra phng php m ho d bo tuyn tnh (Linear Predictive Coding LPC) lm
tin cho vic p dng cc tham s ph LPC vo ASR. Cc h thng ASR ng ch
ca giai on ny gm: Harpy v Hearsay-II ca trng i hc CMU-M, h thng
HWIM ca BBN [2, p. 2]
Nghin cu v ASR trong thp k 80 nh du php dch chuyn trong phng
php lun: t cch tip cn i snh mu sang cch tip cn s dng m hnh thng
k. Ngy nay, hu ht cc h thng ASR u da trn m hnh thng k c pht
trin thp k ny, cng vi nhng ci tin thp k 90. Mt trong nhng pht minh
quan trng nht thp k 80 l m hnh Markov n (Hidden Markov Model HMM).
Mc d HMM c p dng thnh cng mt s phng LAB (ch yu l IBM v
vin nghin cuthuc b Quc phng M) nhng phi i n vi nm sau , m
hnh ny mi c xut bn v ph bin trn th gii. Ngoi ra, hai xut quan trng
khc ca giai on ny l t hp Cepstrum v m hnh ngn ng:
- Furui xut s dng t hp ca cc h s ph cng vi o hm bc 1 v
bc 2 ca chng, nh l nhng c trng c s cho ASR. Phng php ny
d c xut cui nhng nm 70 nhng khng c p dng sut
mt thi gian di. Ngy nay, hu ht cc h thng nhn dng ting ni u
s dng t hp c trng ny. [2, p. 4]
- Nhng nh khoa hc thuc cng ty IBM l nhng ngi i tin phong trong
vic pht trin m hnh ngn ng (Language Model LM). y l mt
cng c hiu qu trong vic la chn chui t nhn dng v c p
dng thnh cng trong tt c cc h thng ASR ngy nay, c bit l cc h
thng nhn dng ting ni lin tc vi b t vng ln.
Trang | 3

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Cc h thng ASR ra i trong thi gian ny c th k n: h thng Sphinx
ca trng i hc CMU, Byblos ca cng ty BBN, Decipher ca vin SRI, v cc h
thng khc ca Lincoln Labs, MIT v AT&T Bell Labs.
Thp nin 90 ghi nhn mt s kt qu nghin cu mi trong lnh vc phn lp
mu. C th, bi ton phn lp theo m hnh thng k (da trn lut quyt nh
Bayes), i hi phng php c lng cc phn b cho d liu, c chuyn thnh
bi ton ti u, bao gm php cc tiu li phn lp bng thc nghim. S chuyn i
ny v mt bn cht xut pht t tng sau y: mc tiu ca phn lp l cc tiu
li ch khng phi cung cp hm phn b ph hp vi d liu nhn dng. Khi nim
cc tiu li ny lm ny sinh mt s k thut nh phng php hun luyn ring
bit (Discriminative Training). Hai dng in hnh ca php hun luyn ny l: CME
(Minimum Classification Error) v MMI (Maximum Mutual Information). Kt qu
thc nghim cho thy cc phng php hun luyn mi a ra kt qu nhn dng tt
hn so vi phng php hun luyn cc i xc sut trc . Ngoi ra, nhn dng
ting ni trong mi trng nhiu cng rt c quan tm. nng cao hiu sut nhn
dng cho d liu nhiu, mt s k thut c xut nh: MLLR (Maximum
Likelihood Linear Regression), PMC (Parallel Model Combination) [2, p. 3]
Cui cng, cc ng dng c pht trin trong giai on ny gm: h thng tr
li thng tin t ng cho cc chuyn bay (Air Travel Information Service ATIS), h
thng ghi li cc bn tin pht thanh (Broadcast News Transcription System).
n nhng nm u ca th k 21, cc nghin cu tp trung vo vic nng cao
kt qu nhn dng ting ni, thng qua chng trnh c tn gi EARS (Effective
Affordable Reusable Speech-to-Text) [2, p. 3]. Ti thi im ny, ting ni c gi
thit l c thu m trong mi trng bnh thng, khng b rng buc bi bt c
iu kin no (cc iu kin gi thit trc y thng bao gm ting ni c thu
trong phng sch cch m v do ngi bn x c). ch hng ti ca chng trnh
ny l kh nng nhn dng, tm tt v chuyn ng cc on audio, gip cho ngi c
hiu nhanh ni dung ca chng thay v phi nghe ton b.
Hin nay, vi cc ngn ng ph bin nh ting Anh, ting Php, ting Ty Ban
Nha, cc cng trnh nghin cu v nhn dng ting ni thu c nhng kt qu rt
tt, c nhiu ng dng thc tin c trin khai nh:
Trang | 4

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
- H thng giao tip hi p thng tin t ng qua in thoi.
- H thng truy vn thng tin thoi.
- H thng thng dch ting ni xuyn ng t ng.
- Cc trm kim sot, h thng iu khin s dng ting ni.
- Cc ng dng ting ni trn thit b di ng.
V mt kinh t v thng mi, cng ngh nhn dng ting ni thay i cch
con ngi tng tc vi h thng v thit b, khng cn b buc trong cch thc
tng tc truyn thng (nh thng qua bn phm ca my tnh hay in thoi) m
chuyn sang tng tc trc tip bng ging ni. Trong mi trng kinh t cnh tranh,
cc ng dng dn dn chuyn sang tch hp tnh nng tng tc m thanh. Vic
ng dng v khch hng c th tng tc vi nhau thng qua m thanh khng c
ngha l loi b giao din ha truyn thng m n cung cp thm mt cch truy
cp thng tin v dch v tin li, t nhin hn.
V mt nghin cu khoa hc, cc h thng nhn dng ting ni hin ti u da
trn phng php thng k v so khp mu. Phng php ny i hi cc tri thc v
ng m v mt lng ln d liu hun luyn, bao gm c dng m thanh v dng vn
bn, hun luyn b nhn dng. Lng d liu hun luyn cng ln, b dng dng
cng c nhiu kh nng a ra kt qu chnh xc hn.
1.2.2. Trong nc
Ti Vit Nam, c 2 nhm nghin cu chnh v bi ton nhn dng ting ni lin
tc vi b t vng ln (LVCSR). Nhm u tin thuc Vin Cng ngh Thng tin do
PGS. Lng Chi Mai ng u, vi phng php ANN v cng c CSLU [3] c s
dng. Nhm th hai thuc trng i hc Khoa hc T nhin thnh ph H
Ch Minh do PGS. V Hi Qun ng u, vi phng php HMM v cng c HTK
c s dng, cc nghin cu ca nhm tp trung vo bi ton truy vn thng tin
ting Vit, nhn dng ting ni, h thng giao tip gia ngi v my, tm kim bng
ging ni,...
Ngoi ra, gn y c nghin cu ca LIG (Laboratoire Informatique de
Grenoble) hp tc vi phng th nghim MICA H Ni v s kh chuyn ca cc
m hnh ng m (acoustic model portability).
Trang | 5

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
trong nc cn c cc ti lin quan nh: Chng trnh c chnh t, s
dng lng ha vector VQ, hn ch v nhn dng ting ni lin tc. Pht trin cc
kt qu tng hp, nhn dng cu lnh, chui s ting Vit lin tc trn mi trng
in thoi di ng [4], Tng cng chnh xc ca h thng mng neuron nhn
dng ting Vit [5], Chng trnh nhn dng lnh 10 ch s lin tc qua in
thoi ca Vin cng ngh thng tin s dng cng c CSLU [6], phng php m
hnh Artificial neural network - ANN, gii m bng thut ton Viterbi, c s d liu
mu m hc ca CSLU.
1.3. Mc tiu ca kha lun
Mc tiu chung nht: tm hiu, vn dng cc kin thc v nhn dng ting ni
xy dng m chng trnh nhn dng ting ni ting Vit v ng dng trong iu
khin thit b m phng v thit b tht.
Mc tiu chi tit:
a. Tm hiu cc khi nim c lin quan n h nhn dng ting ni lm r
hn mt s yu t quan trng trong vic s dng cng c h tr.
b. Tm hiu phng php ci t cng c h tr xy dng h nhn dng ting
ni.
c. Tm hiu xy dng m hnh m hc, m hnh ngn ng thch hp cho ting
Vit.
d. Xy dng chng trnh m phng, thc nghim, th nghim gia cc m
hnh vi nhau t a ra kt lun v nhn xt.
1.4. Phm vi
Gii quyt bi ton nhn dng cc cu lnh iu khin c bn qua 2 thc
nghim (demo) l:
- iu khin trnh duyt web Google Chrome bng ging ni vi t hp cu
lnh iu khin gm 45 t.
- iu khin m hnh xe iu khin bng ging ni vi t hp cc cu lnh
gm 24 t.
Do thi gian, v gii hn phm vi kha lun nn demo ch mi thc hin nhn
dng thnh cng vi mt, hai ngi dng (sinh vin thc hin kha lun), xc sut
nhn dng (thc t) ng sp x 90%.
Trang | 6

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
1.5. Nhng im ni bt ca kha lun
Kha lun c trnh by vi mc r rng v lng kin thc va , gip
nhng ngi pht trin sau ny d dng tip cn v pht trin kha lun.
Thc hin demo khng ch trn my tnh m cn c c iu khin cho vt th
tht. Tuy v nguyn l l hon ton ging nhau, nhng em c gng thc hin trin
khai trn nhiu mt trng v ng dng tng tnh thc tin cng nh xc thc ca
ti. Khc vi cc ti nhn dng trc ch yu ch mang tnh demo nhn dng.
1.6. Cu trc kha lun
Chng 1: Trnh by tng quan: gii thiu v ti, xc nh mc tiu nhng
vn m ti cn gii quyt, gii hn phm vi ti, xc nh phng php gii
quyt v sau cng ch ra nhng im ni bt ca ti.
Chng 2: Trnh by c s l thuyt bao gm: l thuyt c bn v m hc, ng
m hc, c im trong ting Vit, cc kin thc c bn xy dng v s dng mt
h nhn dng ting ni. C s l thuyt v rt trch c trng, mt trong nhng khi
nim quan trng trong cc h nhn dng ting ni. L thuyt v m hnh Hidden
Markov Model (HMM). Bao gm khi nim, cc thut ton lin quan, ngha ca
HMM trong mt h nhn dng ting ni.
Chng 3: Gii thiu cc khi nim c bn, quan trng ca hai Framework h
tr xy dng mt h nhn dng ting ni ph bin nht hin nay l HTK, Sphinx 4.
Kha lun ny s dng Sphinx 4 xy dng chng trnh demo thc nghim.
Chng 4: Trnh by chi tit qu trnh ci t Sphinx4, thu m, xy dng b
hun luyn, tin hnh hun luyn, gii thch kt qu hun luyn, thc hin th nghim
so snh HTK v Sphinx cui cng l xy dng chng trnh demo.
Chng 5: Nu ln kt lun, trnh by nhng kt qu t c, nhng im
cn hn ch, cng nh kinh nghim rt ra sau qu trnh thc hin kha lun, t nu
ln cc hng ci thin, nghin cu v pht trin.

Trang | 7

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
CHNG 2. C S L THUYT
2.1. Tng Quan V m Hc V Ting Ni
2.1.1. m hc
2.1.1.1. Khi nim
Khi c ngun pht ra m thanh (nh ting trng, ting nhc c, ting ni), ta s
nghe v cm nhn c m thanh pht ra. Vt to ra c m thanh cn c gi l
ngun pht m, m thanh chnh l s dao ng c ca cc thnh phn vt cht trong
mt mi trng no lan truyn v n tai ta v khi ta cm nhn c m thanh.
Trong mi trng khng c vt cht tn ti nh chn khng, khng c dao ng song
c do cng khng c m thanh tn ti. Trong i sng x hi, m thanh l phng
tin giao tip, truyn t thng tin ph bin v xu hin t lu i nht ca con ngi.
Khi nghin cu v m thanh, ngi ta thng quan tm n 2 c im: c trng vt
l v t trng sinh hc.
2.1.1.2. Biu din tn hiu m thanh trong min thi gian v tn s.
Thng thng, ngi ta dng hm ton hc x(t) biu din m thanh trong
min thi gian. Trong :
- t: thi gian
- x: bin bin thin, hay cn gi l ly .
Nh vy, ta c th biu din x(t) bng th theo thi gian.t x(t) = A.sin

t
= A. sin 2F
0
t

Hnh 2.1 Biu din tn hiu m thanh
Trang | 8

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Ph tnh hiu: l cch biu din cc thnh phn cu to nn x(t) theo tn s.
vi tn hiu Sin ni trn, th ph l mt vch c cao l A ti in c tn s F
0.
Ta
ni l ph vch. Trong thc t, vi x(t) bt k, bin thin, khng tun hon, ngi
ta s dng phn tch Fourier tnh ton ph tn hiu. Khi , ta c ph lin tc X().
2.1.1.3. Cc loi m thanh
Nhng dao ng c m con ngi nghe c gi m thanh (sound).
m thanh c th biu din theo thi gian, song cng c th biu din theo tn
s do c th phn tch mt tn hiu m thanh thnh t hp cc thnh phn tn s khc
nhau (Chui Fourier, tch phn Fourier). Hoc ni mt cch n gin thc tin hn,
mt m thanh c th l t hp t nhiu n m, t nhiu nhc c, m mi ci c mt
tn s dao ng nht nh.
Di tn s nghe c l t 20 Hz - 20000 Hz. Siu m l m dao ng ngoi
20000 Hz. H m l cc m dao ng di 20 Hz. Tai ngi khng nghe c siu
m v h m.
Ting ni (voice, speech) l m thanh pht ra t ming ngi, c truyn i
trong khng kh n tai ngi nghe . Di tn s ca ting ni nghe r l t
300 Hz n 3500 Hz, l di tn tiu chun p dng cho in thoi. Cn di tn
ting ni c cht lng cao c th l t 200 Hz-7000 Hz, p dng cho cc
ampli hi trng.
m nhc (music) l m thanh pht ra t cc nhc c. Di tn s ca m nhc l
t 20 Hz n 15000 Hz.
Ting ku l m thanh pht ra t mm ng vt. Ting ca C Heo (dolphins)
l mt loi m thanh trong dy tn s 1-164 kHz, ca Con Di (bats) 20 - 115
kHz, ca C Voi (whale) 30-8000 Hz. (Cn xc minh li s liu).
Ting ng l m thanh pht ra t s va chm gia cc vt. Th d ting va
chm ca 2 ci cc, ting va chm ca cnh ca, ting sch ri.
Ting n (noise) l nhng m khng mong mun.
Nhn chung li, xt v phng din tn hiu v s cm th ca tai ngi, c hai
loi m:
tun hon bao gm ting ni, m nhc...
khng tun hon nh tn hiu tp nhiu, mt s ph m tc xt nh sh, s.
Trang | 9

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
2.1.1.4. n v o m thanh
Ngi ta thy rng con ngi cm nhn to ca m thanh khng t l thun
vi cng m thanh m theo hm s m.
Bel = 10lg P2/P1. (Pht m l Ben)
decibel = 20lg I2/I1 (Pht m l xi ben)
2.1.2. Ting ni
Ting ni l m thanh pht ra t ming (ngi). Nghin cu ting ni gm: B
my pht m ca con ngi. Th cm m thanh ca tai ngi. Phn loi ting ni.
B my pht m ca con ngi gm:
Phi ng vai tr l ci bm khng kh, to nng lng hnh thnh m.
i dy thanh (vocal fold, vocal cord)l hai c tht trong cung hng, c hai
u dnh nhau, cn hai u dao ng vi tn s c bn l Fo, ting Anh gi l
pitch, fundamental frequency. Fo ca nam gii nm trong khong 100-200 Hz,
ca n gii l 300-400 Hz, ca tr em l 500-600 Hz.
Thanh qun v vm ming: ng vai nh l hc cng hng, to ra s phn
bit tn s khi tn hiu dao ng t i dy thanh pht ra. p ng tn s ca
hc cng hng ny c nhiu nh cng hng khc nhau c gi l cc
formant.
Ming ng vai tr pht tn m thanh ra ngoi.
Li thay i to ra tn s formant khc nhau.
Cc m khc nhau l do v tr tng i ca formants.
Phn loi ting ni theo thanh:
m hu thanh (voiced, ting Php l vois) l m khi pht ra c s dao ng
ca i dy thanh, nn n tun hon vi tn s Fo. V vy ph ca nguyn m
l ph vch, khong cch gia cc vch bng chnh Fo.
m v thanh (unvoiced, ting Php l non vois) pht ra khi i dy thanh
khng dao ng. Th d phn cui ca pht m English, ch sh cho ra m xt.
Ph tn hiu c dng l nhiu trng, ph phn b u.
Phn loi ting ni:
Trang | 10

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Nguyn m (vowel) l m pht ra c th ko di. Tt c nguyn m u l m
hu thanh, ngha l tun hon v kh n nh trong mt on thi gian vi chc
ms.
Ph m (consonant) l m ch pht ra mt nht, khng ko di c. C ph
m hu thanh v ph m v thanh.
Thanh iu ca ting Vit tng ng vi cc du: khng du, huyn, hi, ng,
sc, nng khi vit. Phn tch my mc cho thy thanh iu l s thay i Fo, tn s c
bn pitch, trong qu trnh pht m cc nguyn m v tai ngi cm nhn c. Ting
Vit c 6 thanh th hin s phong ph v c o, trong khi ting Trung quc c 4
thanh. Tuy nhin c dn mt s vng Vit Nam c th khng phn bit du ? v du
~ nn hay vit sai chnh t.
Ging bng (high voiced pitch, hay high pitched) hay ging trm (low voiced
pitch) l Fo cao hay thp. Nh vy Fo ng vai tr rt quan trng trong cm nhn,
trong th cm m thanh ca con ngi.
Ting bng hay ting trm tng ng vi di tn s cao hay thp. Trong thc
t ngi ta dng loa trm l loa loa bass hay loa sub woofer, loa tp hay loa bng
tng ng vi loa thch ng pht cc m trong vng tn s cao, treble.
2.2. H Thng Ng m Ting Vit
2.2.1. c im ca ting Vit
Khc vi mt s ngn ng khc nh ting Anh, Php , ting Vit l ngn
ng n m tit, tc l cc t khi vit ra ch c ln thnh mt ting, khng c t no
(thun Vit) pht m t 2 ting tr ln. Mt t c cu to gm 2 phn l: nguyn m V
(vowel) v ph m C (consonant) v c kt hp theo 3 cch to nn t trong
ting Vit:
- C+V (ph m + nguyn m). V d: ba, m, i
- C+V+C (ph m + nguyn m + ph m). V d: bn, con, mong
- V+C (nguyn m + ph m). V d: an, ng, n
Trong ting Vit, ngoi 2 thnh phn chnh l nguyn m, ph m, cn c cc
thnh phn khc gip cho Vit phn loi trong m tit tr nn r rng nh nh hp m,
tam hp m, ph m n, ph m kp. Khi hc ting Vit, ngay t c phi hc thuc
cc nguyn m, ph m, nh hp m, tam hp m, ph m n, ph m kp, quy tc
Trang | 11

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
ghp ni cc thnh phn to thnh m tit hoc mt t, khi mt t ting Vit
c Vit ra, s km theo cc c ca t bng quy tc kt hp trn. Nu mt t
vit ra m khng theo quy tc k hp c nh sn trong ting Vit, tng ng vi
vic t khng th c c v cng khng c ngha, mt t trong ting Vit ch c
mt cch c (tr trng hp ting vng min, a phng), khc vi ting Anh,
khng c quy tc xc nh trong vic to ra mt t, mt t ch tn ti khi n xut hin
trong t in, khi phi km theo cch c ca t (pronuciation) th mi c th
c c.
Ting Vit ngoi l mt ngn ng n m (nh ni trn) cn c yu t a
thanh. a thanh tc l c nhiu thanh iu, nhiu du ging. C th l c 6 thanh iu,
c ghi bng 5 k hiu khc nhau : du sc (), du huyn (), du hi (), du ng
(), du nng (). (Gi tt l 5 du 6 ging). Khng c du gi l thanh-iu
ngang.
2.2.2. H thng mu t v ng m ting Vit
2.2.2.1. H thng mu t ting Vit
Bng ch ci ting Vit c 29 ch ci [7], theo th t :
[a, , , b, c, d, , e, , g, h, i, k, l, m, n, o, , , p, q, r, s, t, u, , v, x, y]
chia lm hai phn: mu t chnh (khi pht m th gi l nguyn m) v mu t
ph (khi pht m th gi l ph m):
- Nguyn m: Trong ting Vit, ngoi nguyn m n cn c nguyn m i,
nguyn m ba. C mi lin h phc tp gia nguyn m v cch pht m
ca chng. Mt nguyn m c th biu th cho vi cch pht m khc nhau,
ty theo n nm trong nguyn m n, i hay ba; v nhiu khi cc cch
vit nguyn m khc nhau tng trng cho cng mt cch pht m.
ng 1 Cch pht m c th tng ng vi tng cch vit nguyn m [8]:
Cch vit Pht m Cch vit Pht m
a //, //, // o //, /w/, /w/
// /o/, /w/, //
// //, //
e // u /u/, /w/
Trang | 12

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
/e/, // //
i /i/, /j/ y /i/, /j/

ng 2 Cch vit c th tng ng vi tng cch pht m nguyn m i v ba
Nguyn m i & ba
Pht m Cch vit Pht m Cch vit
Nguyn m i
/uj/ ui /iw/ iu
/oj/ i /ew/ u
/j/ oi /w/ eo
/j/ i /w/ u
/j/ y, /w/ u,
/j/ ai /w/ ao
/j/ ay, a /w/ au, o
/j/ i /w/ u
/i/
ia, ya, i,
y
/u/ ua, u
// a,
Nguyn m ba
/iw/ iu, yu /uj/ ui
/j/ i /w/ u

- Ph m: ting Vit c 17 ph m n trong tp trn gm:
[b, c, d, , g, h, k, l, m, n, p, q, r, s, t, v, x]
v 11 ph m ghp:
[gi, gh, qu, ch, kh, ng, ngh, nh, ph, th, tr]
Trong ch c 8 ph m c th nm cui t:
[c, m, n, p, t, ng, nh, ch]
Trang | 13

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
2.2.2.2. H thng ng m ting Vit
Trong ting Vit, m tit c cu trc cht ch, mi m v c mt v tr nht
nh trong m tit. Theo mt s nh nghin cu m v hc trong ting Vit, m tit
ting Vit c cu to nh sau:
Thanh iu
m u
Vn
m m m chnh m cui
a. m u:
Ti v tr th nht trong m tit, m u c chc nng m u m tit. Nhng
m tit m chnh t khng ghi m u nh an, m, m c m u bng ng tc
khp kn khe thanh, sau m ra t ngt, gy nn mt ting bt. ng tc m u y
c gi tr nh mt ph m v ngi ta gi l m tc thanh hu (k hiu: /?/). Nh vy,
m tit trong ting Vit lun lun c mt m u (ph m u). Vi nhng m tit
mang m tc thanh hu nh va nu trn th trn ch vit khng c ghi li, v nh
vy v tr xut hin ca n trong m tit l zero, trn ch vit n th hin bng s vng
mt ca ch vit.
b. m m
m m l yu t th hai trong m tit, n thng nm trong cc m tit to
nn s khc nhau gia m trn mi (nh ton) v m khng trn mi (nh tn).
m m trong ting Vit c miu t gm 2 dng: m v bn nguyn m /u/ (trong
ton) v m v trng (trong tn). Trn ch vit, m m trng th hin bng s
vng mt ca ch vit, m m /u/ th hin bng ch u (nh tun) v ch o
(nh loan).
c. m chnh
ng v tr th ba trong m tit, m chnh c xem nh l nh ca m tit,
mang m sc ch yu ca m tit v lun l nguyn m. Do c xem l thnh phn
ht nhn ca m tit, nn khng bao gi c mt t no c c li khng c m
chnh, trong m tit, m chnh cng ng vai tr l m mang thanh iu ca m tit.
d. m cui
m cui nm cui cng trong m tit, n c chc nng kt thc mt m tit,
trong cc m tit ting Vit ta thng thy c s i lp bng cc cch kt thc khc
Trang | 14

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
nhau. Mt s m tit c m cui kt thc bng s ko di v gi nguyn (nh
m,ba), s khc li c m cui kt thc bng cch bin i m tit phn cui
do c s tham gia ca m cui (nh bn, bn, bi). Trong trng hp u, ta
gi cc m cui l m v rng, trng hp cn li m cui l nhng m v bn
nguyn m hay ph m.
e. Thanh iu
Thanh iu l mt yu t th hin cao v s chuyn bin ca cao trong
mi m tit. Mi m tit ting Vit nht thit phi c th hin vi mt thanh iu.
Thanh iu c chc nng phn bit v m thanh, phn bit ngha ca t.C nhiu
kin khc nhau v v tr ca thanh iu trong m tit. Nhng kin cho rng thanh
iu nm trong c qu trnh pht m ca m tit (nm trn ton b m tit) l ng tin
cy nht v v tr ca thanh iu.
2.3. H Nhn Dng Ting Ni
2.3.1. Tng quan
Nhn dng ting ni l mt h thng to kh nng my nhn bit ng ngha
ca li ni. V bn cht, y l qu trnh bin i tn hiu m thanh thu c ca
ngi ni qua Micro, ng dy in thoi hoc cc thit b khc thnh mt chui cc
t. Kt qu ca qu trnh nhn dng c th c ng dng trong iu khin thit b,
nhp d liu, son tho vn bn bng li, quay s in thoi t ng hoc a ti mt
qu trnh x l ngn ng mc cao hn.

Hnh 2.2 S nhn dng ting ni tng qut
Trang | 15

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

Hnh 2.3 Cc phn t c bn ca mt h thng nhn dng ting ni
Hnh 2.3 l cu trc ca mt h nhn dng ting ni. Tnh hiu ting ni u
tin c tin x l v rt trch c trng. Kt qu thu c sau qu trnh ny l tp
cc c trng m hc (acoustic features), c to dng thnh 1 hay nhiu vct c
gi l vector c trng.
c th thc hin vic so snh, trc ht h thng phi c hun luyn v
xy dng cc c trng, sau mi c th dng so snh vi cc tham s u vo
nhn dng.
Trong qu trnh hun luyn, h thng dng cc vector c trng c a vo
c lng, tnh ton cc tham s cho cc mu (c gi l mu tham kho). Mt
mu tham kho chnh l bn mu dng so snh v nhn dng, cc mu tham kho
ny m phng cho mt t, mt m tit, hoc thm ch mt m v.
Trong qu trnh nhn dng, dy cc vector c trng c em so snh vi cc
mu tham kho (c xy dng trn). Sau , h thng tnh ton tng ng
(likelihood) ca dy vector c trng v mu tham kho. Vic tnh ton tng ng
c c thc hin bng cch p dng cc thut ton c chng minh hiu qu
nh thut ton Vitertbi (trong Hidden Markov Model). Mu c tng ng cao
nht c cho l kt qu ca qu trnh nhn dng.
2.3.2. Phn loi cc h thng nhn dng ting ni
2.3.2.1. Nhn dng t lin tc v nhn dng t tch bit
Mt h nhn dng ting ni c th l mt trong hai dng: nhn dng lin tc v
nhn dng tng t. Nhn dng lin tc tc l nhn dng ting ni c pht lin tc
Trang | 16

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
trong mt chui tn hiu, chng hn nh mt cu ni, mt mnh lnh hoc mt on
vn c c bi ngi dng. Cc h thng loi ny rt phc tp, n phc tp ch
cc t c pht lin tc kh x l kp (nu cn thi gian thc), hoc kh tch ra nu
nh ngi ni lin tc khng c khong ngh (thng thng rt hay xy ra trong thc
t). Kt qu tch t nh hng rt ln n cc bc sau, cn x l tht tt trong qu
trnh ny. Tri li, i vi m hnh nhn dng tng t, mi t cn nhn dng c
pht m mt cch ri rc, c cc khong ngh trc v sau khi pht m mt t. M
hnh loi ny d nhin n gin hn m hnh nhn dng lin tc, ng thi cng c
nhng ng dng thc tin nh trong cc h thng iu khin bng li ni, quay s
bng ging ni..., vi chnh xc kh cao, tuy nhin kh p dng rng ri i vi m
hnh trn.
2.3.2.2. Nhn dng ph thuc ngi ni v c lp ngi ni
i vi nhn dng ph thuc ngi ni th mi mt h nhn dng ch phc v
c cho mt ngi, v n s khng hiu ngi khc ni g nu nh cha c hun
luyn li t u. Do , h thng nhn dng ngi ni kh c chp nhn rng ri v
khng phi ai cng kh nng kin thc v nht l kin nhn hun luyn h
thng. c bit l h thng loi ny khng th ng dng ni cng cng. Ngc li,
h thng nhn dng c lp ngi ni th l tng hn, ng dng rng ri hn, p
ng c hu ht cc yu cu ra. Nhng khng may l h thng l tng nh vy
gp mt s vn , nht l chnh xc ca h thng. Trong thc t, mi ngi c
mt ging ni khc nhau, thm ch ngay cng mt ngi cng c ging ni khc nhau
nhng thi im khc nhau. iu ny nh hng rt ln n vic nhn dng, n lm
gim chnh xc ca h thng nhn dng xung nhiu ln. Do khc phc
khuyt im ny, h thng nhn dng c lp ngi ni cn c thit k phc tp
hn, i hi lng d liu hun luyn ln hn nhiu ln (d liu c thhu t nhiu
ging khc nhau ca nhiu ngi). Nhng iu ny cng khng ci thin c bao
nhiu cht lng nhn dng. Do , trong thc t c mt cch gii quyt l bn c
lp ngi ni. Phng php ny thc hin bng cch thu mu mt s lng ln cc
ging ni khc bit nhau. Khi s dng, h thng s c iu chnh cho ph hp vi
ging ca ngi dng, bng cch n hc thm mt vi cu c cha cc t cn thit
(ngi dng trc khi s dng h thng cn phi qua mt qu trnh ngn hun luyn
Trang | 17

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
h thng). Nhn dng c lp ngi ni kh hn rt nhiu so vi nhn dng ph thuc
ngi ni. Cng mt t, mt ngi, d c c gng pht m cho tht ging i na th
cng c s khc bit. i vi b no con ngi, mt h thng hon ho, th s khc
bit c th c b qua do ng cnh, v do c phn x l lm m i ca no.
Nhng i vi my tnh th rt kh xy dng c mt m hnh gii quyt cho tt c
cc trng hp khc bit .
2.3.3. Mt s phng php nhn dng ting ni
C 3 phng php ph bin c s dng trong nhn dng ting ni hin nay:
- phng php m hc- ng m hc.
- phng php nhn dng mu.
- phng php ng dng tr tu nhn to.
2.3.3.1. Phng php m hc-ng m hc (acoustic-phonetic)
Phng php ny da trn l thuyt v m hc-Ng m hc. L thuyt cho
bit: tn ti cc n v ng m xc nh, c tnh phn bit trong li ni v cc n v
ng m c c trng bi mt tp cc tn hiu ting ni. Cc bc nhn dang ca
phng php gm:
Bc 1: Phn on v gn nhn. Bc ny chia tn hiu ting ni thnh cc
on c c tnh m hc c trng cho mt (hoc mt vi) n v ng m, ng thi
gn cho mi on m thanh mt hay nhiu nhn ng m ph hp.
Bc 2: Nhn dng. Bc ny da trn mt s iu kin rng buc v t vng,
ng php v.v xc nh mt hoc mt chui t ng trong cc chui nhn ng m
c to ra sau bc 1.
2.3.3.2. Phng php nhn dng mu
Phng php nhn dng mu khng cn xc nh c tnh m hc hay phn
on ting ni m s dng trc tip cc mu tn hiu ting ni trong qu trnh nhn
dng. Cc h thng nhn dng ting ni theo phng php ny c pht trin theo
hai bc c th l:
Bc 1: S dng tp mu ting ni (c s d liu mu ting ni) o to cc
mu ting ni c trng (mu tham chiu) hoc cc tham s h thng.
Bc 2: i snh mu ting ni t ngoi vi cc mu c trng ra quyt
nh. Trong phng php ny, nu c s d liu ting ni cho o to c cc phin
Trang | 18

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
bn mu cn nhn dng th qu trnh o to c th xc nh chnh xc cc c tnh m
hc ca mu (cc mu y c th l m v, t, cm t). Hin nay, mt s k thut
nhn dng mu c p dng thnh cng trong nhn dng ting ni l lng t ha
vector, so snh thi gian ng (DTW), m hnh Markov n (HMM), mng nron nhn
to (ANN).
2.3.3.3. Phng php ng dng tr tu nhn to
Phng php ng dng tr tu nhn to kt hp cc phng php trn nhm tn
dng ti a cc u im ca chng, ng thi bt chc cc kh nng ca con ngi
trong phn tch v cm nhn cc s kin bn ngoi p dng vo nhn dng ting
ni.
c im ca cc h thng nhn dng theo phng php ny l: S dng h
chuyn gia phn on, gn nhn ng m. iu ny lm n gin ha h thng so
vi phng php nhn dng ng m.S dng mng nron nhn to hc mi quan
h gia cc ng m, sau dng n nhn dng ting ni.
Vic s dng h chuyn gia nhm tn dng kin thc con ngi vo h nhn
dng:
- Kin thc v m hc: phn tch ph v xc nh c tnh m hc ca cc
mu ting ni.
- Kin thc v t vng: s dng kt hp cc khi ng m thnh cc t cn
nhn dng.
- Kin thc v c php: nhm kt hp cc t thnh cc cu cn nhn dng.
- Kin thc v ng ngha: nhm xc nh tnh logic ca cc cu c nhn
dng.
2.4. Rt Trch t Trng Tn Hiu Ting Ni
2.4.1. Gii thiu
Rt trch c trng ca ting ni l mt trong nhng khu quan trng trong qu
trnh nhn dng ting ni. D liu ting ni thng thng di dng sng m lu
tr trong my tnh l loi d liu kh x l, hc mu hun luyn, v so snh, v th
vic rt trch c trng ting ni l cn thit. Kt qu ca qu trnh rt trch c trng
l 1 hoc nhiu vector c trng, cc vector ny cha cc tham s mang gi tr quan
trng ca tn hiu ting ni, lm gim i rt nhiu s lng tnh ton cn thc hin,
Trang | 19

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
lm r rng hn s khc bit gia 2 tn hiu ting ni. Hnh bn di minh ha cho
qu trnh rt trch c trng.
C nhiu phng php thc hin rt trch c trng, 2 trong s l phng
php MFCC v LPC.

Hnh 2.4 Cng on rt trch c trng
Hnh 2.4 m t qu trnh ca vic rt trch t trng, tn hiu m thanh lu
trong my tnh l tn hiu digital [9], m hnh ha tn hiu m thanh trong my tnh
di dng ton hc l mt hm s(n), trong n ch thi gian (thng thng l ms) v
s(n) l bin m.
2.4.2. Lm r tn hiu (pre-emphasis - tin khuch i)
Theo cc nghin cu v m hc th ging ni c s suy gim 20dB/decade khi
ln tn s cao do c im sinh l ca h thng pht m con ngi. khc phc s
suy gim ny, chng ta s phi tng cng tn hiu ln mt gi tr gn 20dB/decade.
Bn cnh , h thng thnh gic con ngi c xu hng nhy cm hn vi vng tn
s cao. Da vo nhng c im trn, ta s p dng b lc thng cao tin x l cc
tn hiu thu c nhm lm r vng tn hiu m tai ngi c th nghe c. B lc p
dng cng thc sau:
Trang | 20

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

(2.1)
Trong a
pre
l h s nhn mnh, thng c gi tr l 0.9700002861.
B lc ny c chc nng tng cng tn hiu ti tn s cao (tn s tren 1KHz).
Tn hiu ting ni s ho, s(n), c a qua mt h s bc thp, lm
phng tn hiu v ph v lm n t b nh hng bi cc hiu ng c chnh xc hu
hn sau ny trong qu trnh x l tn hiu. H thng s c dng trong khi tin
khuch i va c nh va thch nghi chm (v d, ly trung bnh cc iu kin
chuyn, cc nn nhiu, hoc thm ch ly trung bnh ph tn hiu).
Trong trng hp s dng b lc p theo cng thc 2.1, u ra ca dy tin
khuch i s(n), lin quan n u vo ca dy tn hiu s(n), theo ng thc vi phn
sau:
) 1 ( ) ( ) ( ' = n s a n s n s
pre
(2.2)
2.4.3. Tch t
Tn hiu ting ni s(n) sau khi c lm ni tn hiu, s c chuyn sang
tch t, tch t l cng on chia ton b tnh hiu thu c thnh nhng on tnh
hiu m trong ch cha ni dung ca m t
C nhiu phng php tch im u v im cui ca mt ra khi ton b
tn hiu ting ni, trong phng php dng hm nng lng thi gian ngn l
phng php c s dng ph bin. Vi mt ca s kt thc ti mu th m, hm
nng lng thi gian ngn E(m) c xc nh:
E
m
=

(2.3)
2.4.3.1. Phn on thnh cc khung (Framing)
Tn hiu ting ni l mt i lng bin thin theo thi gian v khng n nh
nn khng th x l trc tip trn c. Do , tn hiu c chia ra thnh cc
khung vi chiu di tng i nh ly c cc on tn hiu tng i n nh v
x l tip trong cc b lc tip theo. Theo cc nghin cu c th trong khoang thi
gian 10-20ms, tn hiu ting ni tng i n nh. Nn bc ny, ngi ta thng
phn tn hiu thnh cc khung vi kch thc 20-30ms. Nhng trnh mt mt v
Trang | 21

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
lm gin on tn hiu ban u, khi phn khung, ngi ta chng lp cc khung ln
nhau khong 10-15ms.
Trong bc ny tn hiu c tin khuch i,
~
( ) s n
, c chia khi thnh cc
khung N mu, vi cc khung k nhau c ngn cch bi M mu.

Hnh 2.5 V d phn khung on tn hiu
Khung u tin cha N mu ting ni u tin. Khung th hai bt u sau
khung th nht M mu, v chng ln n N - M mu. Thng thng, khung th 3 bt
u sau 2M so vi khung u tin (hoc M mu sau khung th 2) v chng ln khung
u N - 2M mu. Qu trnh ny tip tc cho n khi ton b ting ni c tnh ht
cho mt hay nhiu khung. D thy l nu MsN th cc khung chng ln nhau v c
on ph LPC kt qu s l tng quan t khung ny n khung khc; nu M<<N th
cc c on ph LPC t khung ny n khung khc s kh tri chy. Ni cch khc,
nu M>N, s khng c chng lp gia cc khung k nhau; thc t, mt s tn hiu
ting ni s hon ton b mt (tc l khng bao gi xut hin trong bt c khung phn
tch no), v mi tng quan gia cc c on ph LPC ca cc khung k nhau s
khng cha mt thnh phn nhiu m cng ca n tng nh M (ngha l, khi c
nhiu ting ni b b qua khng phn tch). Tnh trng ny l khng th chp nhn
trong phn tch LPC cho nhn dng ting ni. Nu ta biu th khung ting ni th

l
x n

( )
v c L khung trong ton b tn hiu ting ni th
x n s M n n N L

( )
~
( ) , ,..., , ,... = + = = 01 1 01 1

(2.4)
Trang | 22

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Tc l, khung ting ni u tin, x
0
(n), cha cc mu ting ni
~
( ) s 0
,
~
( ) s 1
, ..,
~
( ) s N 1
, khung ting ni th hai, x
1
(n), cha cc mu
~
( ) s M
,
~
( ) s M +1
,...,
~
( ) s M N + 1
,v khung th L, x
L-1
(n), cha cc mu
~
( ( )) s M L 1
,
~
( ( ) ) s M L + 1 1
, ...,
~
( ( ) ) s M L N + 1 1
.
2.4.4. Ly ca s khung tn hiu
lm r tn hiu v ng thi gim tnh gin on tn hiu u v cui ca
mi khung trong qu trnh x l rt trch c trng, khi x l ,cc khung s c nhn
vi hm ca s, thng l ca s Hamming. Kt qu ca vic ny l lm cho khung
tn hiu mt hn, gip cho cc thnh phn c tn s cao sut hin trong ph. Cng
thc hm ca s tng qut:




(2.5)
Trong w(n) gi l hm ca s. Ty vo gi tr ca m c cc ca s sau:
Vi =0.54, ta c ca s Hamming (Hamming Window):




(2.6)
Vi nh ngha ca s w(n) theo cng thc trn, 0 s n s N-1, th kt qu chia
ca s cho khung x(n):
1 0 ) ( ) ( ) (
~
s s = N n n w n x n x
l

(2.7)

Hnh 2.6 Ca s Hamming v tn hiu sau khi nhn vi hm ca s Hamming
Trang | 23

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
ngha ca vic p ca s : l nhm mc ch c c d liu theo min tn
s chun a vo php bin i Fourier ri rc.
2.4.5. Rt trch c trng
2.4.5.1. Phng php rt trch c trng MFCC
Trong lnh vc x l v nhn dng ting ni, vic tin x l cc tn hiu thu
c v rt trch c trng l mt k thut thit yu m bt c h thng nhn dng no
cng bt buc phi c. Trch rt c trng c vai tr quan trng quyt nh hiu sut
ca qu trnh nhn dng mu(c trong qu trnh nhn dng v trong qu trnh hun
luyn). Cng vic ca bc ny l phn tch ph spectral nhm mc ch xc nh cc
thng tin quan trng, c trng ca ting ni, ct gim bt cc yu t khng cn thit
trong qu trnh nhn dng lm gim khi lng d liu cn x l.
Mel Scale Frequency Cepstral Coefficients (MFCC) l mt phng php rt
trch c trng s dng dy b lc c Davis v Mermelstein a ra vo nm 1980
khi h kt hp cc b lc cch khong khng u vi php bin i Cosine ri rc (
Discrete Cosin Transform ) thnh mt thut ton hon chnh ng dng trong lnh vc
nhn dng ting ni lin tc. ng thi nh ngha khi nim h s Cepstral v thang
o tn s Mel (Mel scale).

Hnh 2.7 Tng qut phng php rt trch c trng MFCC
Trang | 24

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Tm tt qu trnh rt trch c trng theo MFCC s nh sau: Ban u tn hiu
sau khi qua tin x l s c chia thnh cc Frame c khong thi gian ngn. T mi
frame sau khi p dng cc bc chuyn i v lc s ra c mt vecto tng ng.
V xong qu trnh ny, ta s c c trng ca dy tn hiu input vo l mt dy vecto
c trng output ra.
f. Bin i FFT (Fast Fourier Transform )
Bin i FFT thc cht l mt bin i DFT (Discrete Fourier Transform )
nhng c ti u bng cc thut ton nhanh v gn hn p ng cc yu cu x
l theo thi gian thc trong cc lnh vc nh x l m thanh, hnh nh,
Fast Fourier l mt php bin i thun nghch c c im bo ton tnh tuyn
tnh bt bin, tun hon v tnh tr. Dng bin i tn hiu tng t sang min tn
s, n gm cc cng thc nh sau:
Cng thc php bin i thun (dng phn tch tn hiu):
k=0,1,2,, N-1
(2.8)
Cng thc php bin i nghch (dng tng hp li tn hiu):
n=0,1,2,, N-1
(2.9)
Trong : x(n)=a(n)+b(n)
Kt qu chng ta c c khi thc hin FFT l dy tn hiu X
t
(k) a vo b
lc Mel-scale
g. Lc qua b lc Mel-scale
Trong lnh vc nghin cu v nhn dng ting ni, i hi chng ta phi hiu
v m phng chnh xc kh nng cm th tn s m thanh ca tai ngi. Chnh v th
cc nh nghin cu xy dng mt thang tn s - hay gi l thang tn s Mel (Mel
scale) da trn c s thc nghim nhiu ln kh nng cm nhn m thanh ca con
ngi. Thang tn s Mel c nh ngha trn tn s thc theo cng thc:

(2.10)

Trang | 25

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Trong : m l tn s trong thang Mel, n v l Mel
f l tn s thc, n v l Hz.

Hnh 2.8 Biu thang tn s Mel theo tn s thc.
Theo biu trn th trong khong tn s thp hn 1kHz th th trn gn nh
l tuyn tnh, ngha l trong khong tn s di 1kHz, tn s Mel v tn s thc.
Trong khong tn s trn 1kHz th mi quan h ny la quan h Logarit.
Da vo cc thc nghim trn tai ngi, ngi ta xc nh c cc tn s
thc m tai ngi c th nghe c v cha ng nhiu thng tin. Sau chuyn cc
tn s ny sang tn s Mel v xy dng mt thang o nh sau:

Hnh 2.9 ng lc tn s Mel.
Trang | 26

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Ta dng thang o ny p vo dy sng tn hiu thu c sau khi thc hin
FFT

Hnh 2.10 a tn hiu vo bng lc tn s Mel.
Kt qu ca bc ny l chng ta s c c tp hp cc tn s Y
t
(m) l giao
im ca sng tn s vi thang tn s Mel t dy tn hiu X
t
(k)
h. Logarit gi tr nng lng (Logarit of filter energies)
Mc ch ca bc ny l nn cc gi tr lc c vo min gi tr nh hn
x l nhanh hn. Nn cc gi tr thu c mi knh lc s c ly Logarit.

i. Bin i cosin ri rc
Da vo ph tn hiu ting ni ca con ngi trn min tn s, ta c th thy
rng ph tn hiu kh trn, nn khi ly cc gi tr nng lng ra t cc b lc, cc gi
tr ny c s tng quan kh gn nhau, dn n cc c trng ta rt c s khng r
rng. Chnh v th, ta thc hin bin i DCT (Discrete Cosin Transform) lm ri
rc cc gi tr ny ra cho n t tng quan vi nhau, lm tng tnh c trng ca cc
tham s. Gi tr thu c sau bc ny ta gi l h s Cepstral.

(2.11)
N l s knh lc.
Trong : M
j
l gi tr logarit nng lng ca mch lc th j.
i l bc ca h s cepstral.
Trang | 27

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Thng thng ngi ta ly i trong on [1,12] l s lng c trng trong mi
vecto c trng. Trong cc h nhn dng, s lng c trng nm trong khong
(10,15) l cho kt qu nhn dng tng i m d liu x l li khng qu ln.
Sau khi thc hin bin i DCT, theo cng thc trn ta thy cc h s thu c
s tng tuyn tnh theo s bc ca n. H s Cepstral c s bc cao s c gi tr rt
cao, ngc li cc h s vi s bc thp s c gi tr rt thp. S chnh lch ny s gy
kh khn cho chng ta trong qua trnh m hnh ha d liu v x l sau ny. V khi c
s chnh lch cao, ta phi dng min gi tr ln biu din d liu, v gp kh khn
khi a vo cc m hnh x l xc sut,..Nn c cc h s ti u cho cc qua trnh
sau, ta s thc hin vic iu chnh cc h s ny gim s chnh lch. Vic ny
thc hin bng cng thc:

(2.12)
Cui cng chng ta s thu c cc gi h s Cepstral c tinh ch. Cc h
s ny l c trng MFCC m chng ta s s dng hun luyn v nhn dng.
2.4.5.2. Phng php trch c trng LPC
j. Gii thiu
LPC l ch vit tt ca cm t: Linear Predictive Coding (m ha d bo tuyn
tnh). y c xem l mt trong nhng phng php c s dng rng ri trong
vic rt trch t trng ca tn hiu m thanh (hay cn c gi l tham s ha tnh
hiu m thanh). ng vai tr quan trng trong cc k thut phn tch ting ni. y
cn c xem l mt phng php hiu qu cho vic nn (m ha vi cht lng tt)
d liu ting ni mc bit rate thp.
k. C s m hc ca phng php LPC
Ting ni hay cn c xem l m thanh do con ngi pht ra t ming bt
ngun t t s rung ng ca dy thanh m (do s thay i p sut khng kh t phi
a ln), s rung ng ny mang 2 c tch l cng (intensity) v tn s
(frequency). m thanh ny sau c truyn qua cung hng n khoang ming v
khoan mi. Ti y da vo cu to vng ming khi ni, cch t li, chuyn ng
ca li v c ming s gp phn gy ra s cng hng ca m thanh (hay cn
c gi l cc Formant), kt qu chnh l ting ni m ta nghe c. LPC phn tch
nhng tn hiu ting ni bng cch c tnh cc formant, loi b i nhng thnh phn
Trang | 28

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
nh hng ca n (nhng th khng mang gi tr ting ni trong m pht ra), v c
lng cc c im v cng , tn s ca phn m thanh cn li. Qa trnh loi b
trn cn c gi l qu trnh lc nghch o (inverse filtering) v phn m thanh cn
li gi l cn(residue) mang nhng yu t v c trng ct li ca m thanh. Kt
qu cn li sau qu trnh LPC l nhng con s, m m t nhng c im quan trng
ca cc formant cng nh phn m thanh cn li. cc con s ny c th c dng i
din nh tn hiu ting ni ban u, hiu qu hn trong vic lu tr, phn tch ni
dung, truyn ti ting ni LPC cn c dng trong qu trnh tng hp li ting ni
t cc con s c trng trn.
l. Ni dung phng php rt trch c trng LPC
tng ca LPC l mt mu ting ni cho trc thi im n, s(n), c th
c xp x bng mt t hp tuyn tnh ca p mu ting ni trc :
s(n) ~ a
1
s(n-1) + a
2
s(n-2) + ... + a
p
s(n-p) (2.13)
Trong a
1
, a
2
, ..., a
n
coi l cc hng trn ton khung phn tch ting ni. Ta
chuyn ng thc (1) trn tng ng bng cch thm gii hn kch thch, Gu(n), c:

=
+ =
p
i
i
n Gu i n s a n s
1
) ( ) ( ) (

(2.14)
Trong u(n) l kch thch chun ho, G l khuch i ca kch thch.
Biu din ng thc (2) trong min z ta c quan h:
S n a z S z GU z
i
i
i
p
( ) ( ) ( ) = +

1

(2.15)
dn n hm chuyn:
H z
S z
GU z
a z
A z
i
i
i
p
( )
( )
( ) ( )
= =

1
1
1
1

(2.16)
Hnh 2.11 din t ng thc bn trn, m t ngun kch thch chun ha
u(n), sau c nng mc vi khuch i G, v ng vai tr u vo ca h ton
cc,
) (
1
) (
z A
z H = , to ra tn hiu ting ni s(n).
Trang | 29

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

Hnh 2.11 Mu ting ni d bo tuyn tnh
Chc nng kch thch thc t i vi ting ni v bn cht l chui lin tc cc
xung gn nh iu ho (vi cc m hu thanh) v l mt ngun nhiu ngu nhin (vi
cc m v thanh), mu ho m thch hp i vi ting ni, ng vi phn tch LPC
c m t trong pha di. Ti y, ngun kch thch chun ho c chn nh
mt chuyn mch m v tr ca n c iu khin bi c tnh hu thanh/v thanh
ca ting ni, chn c chui xung gn iu ho lm kch thch i vi cc m hu
thanh, chn c chui nhiu ngu nhin cho cc m v thanh. khuch i G ca
ngun c c tnh t tn hiu ting ni, v ngun c nng mc c dng lm
u vo ca b lc s (H(z)), iu khin bi c tnh ca cc tham s di pht m ca
ting ni ang c to. Nh vy cc tham s ca mu ny l phn loi hu thanh/v
thanh, chu k cao ca m hu thanh, tham s khuch i, v cc h s ca b lc
s, {a
k
}. Cc tham s ny u bin i rt chm theo thi gian.

Hnh 2.12. Mu phn tch ting ni theo phng php LPC
Da vo mu trn hnh 1, quan h chnh xc gia s(n) v u(n) l

=
+ =
p
k
k
n Gu k n s a n s
1
) ( ) ( ) (

(2.17)
Ta coi t hp tuyn tnh ca cc mu ting ni trc l c lng ) (n s ,
c nh ngha l:
Trang | 30

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
s n a n k
k
k
p
( ) ( ) =
=

1

(2.18)
By gi ta thit lp li d bo, e(n), c nh ngha l:
e n s n s n s n a n k
k
k
p
( ) ( ) ( ) ( ) ( ) = =
=

1

(2.19)
vi hm chuyn i li:
A z
E z
S z
a z
k
k
k
p
( )
( )
( )
= =

=

1
1

(2.20)
R rng l khi s(n) c to thc s bi mt h tuyn tnh kiu nh hnh 13 th
li d bo e(n) s bng Gu(n), kch thch c khuch i.
Vn c bn ca phn tch d bo tuyn tnh l xc nh tp cc h s d bo,
{a
k
}, trc tip t tn hiu ting ni. V cc c tnh ph ca ting ni bin i theo thi
gian nn cc h s d bo ti thi im cho trc n, phi c c lng t mt on
tn hiu ting ni ngn xut hin quanh thi im n. Nh vy cch c bn l tm mt
tp cc h s d bo gim thiu li d bo trung bnh bc hai trong mt on dng
sng ting ni. (Thng th kiu phn tch ph thi gian ngn nyc thc hin trn
cc khung ting ni lin tip, c dn cch khung khong 10ms).
Ni dung tip theo trnh by cc bc chnh trong cng on phn tch c
trng theo phng php LPC.
m. Phn tch t tng quan
Sau khi tin hnh cng on nhn vi hm ca s (p dung cng thc 2.7).Mi
khung ca tn hiu c chia ca s l t tng quan vi khung tip theo cho:
r m x n x n m m p
n
N m

( )
~
( )
~
( ) , ,..., = + =
=

0
1
0 1

(2.21)
Trong gi tr t tng quan cao nht, p, l bc ca php phn tch LPC. Cc
gi tr p thng dng l t 8 n 16, vi p=8 c s dng nhiu trong cc h phn
tch LPC.
n. Phn tch LPC
Bc x l tip theo l phn tch LPC, chuyn tng khung ca p+1 mi t
tng quan thnh mt "tp tham s LPC", l cc h s LPC. Phng php chnh
thc chuyn t cc h s t tng quan sang tp tham s LPC ( dng cho
Trang | 31

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
phng php t tng quan LPC) c gi l phng php Durbin v c th cho mt
cch hnh thc nh thut ton ( thun tin, ta s b qua nh di ) (m r

):
E r
( )
( )
0
0 =

(2.22)
k r i r i j E i p
i j
i
j
L
i
=


`
)
s s

( ) (| |) ,
( ) ( )
o
1
1
1
1
1

(2.23)
o
i
i
i
k
( )
=

(2.24)
o o o
j
i
j
i
j i j
i
k
( ) ( ) ( )
=

1 1

(2.25)
E k E
i
i
i ( ) ( )
( ) =

1
2 1

(2.26)
Trong tng trong ng thc (2.23) c b qua i vi i=1. Tp cc ng
thc (2.22 - 2.26) c gii qui vi i=1,2,...,p, v li gii cui cng c cho l
=
m
a cc h s LPC
= s s o
m
p
m p
( )
, 1

(2.27)
=
m
k cc h s PARCOR
(2.28)
=
m
g cc h s truyn min logarit
|
|
.
|

\
|
+

=
m
m
k
k
1
1
log
(2.29)
Lc ny ta c th dng cc h s LPC lm vector c trng cho tng khung.
Tuy nhin c mt php bin i to ra dng h s khc c tp trung cao hn t cc
h s LPC, l php phn tch Ceptral.
o. Phn tch cepstral
Mt tp tham s LPC rt quan trng, c th suy trc tip t tp h s LPC, l
cc h s Cepstral LPC, c(m). Dng qui l:
c a
k
m
c a m p
m m k m k
k
m
= +
|
\

|
.
|
s s

1
1
1 ,

(2.30)
p m Q a c
m
k
c
m
k
k m k m
> >
|
.
|

\
|
=

=

,
1
1

(2.31)
Trong o
2
l khuch i trong mu LPC. Cc h s cepstral, l cc h s
ca biu din chuyn i Fourier ca logarit ph cng , c m t l mt tp c
tnh mnh, ng tin cy hn cc h s LPC. trong p Q
|
.
|

\
|
~
2
3
.
Trang | 32

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
p. t trng s cho cc h s cepstral
Do tnh nhy cm ca cc h s cepstral bc thp i vi sn ph tng th v
do tnh nhy cm ca h s cepstral bc cao i vi nhiu (v cc dng bin i ging
nhiu khc), n tr thnh k thut chun nh trng cc h s cepstral nh mt ca
s c lm hp sao cho gim thiu c nhng nhy cm ny. Chuyn sang ceptral
c trng s:
cm = wm.cm vi (2.32)
Hm trng s thch hp l b lc thng di (trong min cepstral):

vi (2.33)
q. Tnh o hm cepstral

(2.34)
Vi l hng s chun v (2K+1) l s lng Frame cn tnh. K=3 l gi tr
thch hp tnh o hm cp.Vector c trng ca tn hiu gm Q h s cepstral v
Q h s o hm cepstral.
r. Nhn xt phng php LPC
Tm li, trong m hnh phn tch LPC trn, chc ta cn phi c t cc tham s
bao gm:
- N: s mu trong mi Frame phn tch
- M: s mu cch nhau gia 2 Frame k nhau
- p: cp phn tch LPC
- Q: s chiu ca cc vector cepstral dn xut t cc h s LPC
- K: s Frame c dng tnh cc o hm cepstral.
Tuy mi tham s u c th thay i trn di rng, bng sau cho cc gi tr c
trng i vi cc h phn tch 3 tn s ly mu khc nhau (6.67kHz, 8kHz v
10kHz).
ng 3 Cc gi tr c trng cho cc tham s phn tch LPC i vi h nhn dng ting ni
Tham s F1=6.67kHz F1=8kHz F1=10kHz
N 300 (45ms) 240 (30ms) 300 (30ms)
M 100 (15ms) 80 (10ms) 100 (10ms)
Trang | 33

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
P 8 10 10
Q 12 12 12
K 3 3 3
M hnh LPC l m hnh thch hp cho vic x l tn hiu ting ni. Vi min
ting ni hu thanh c trng thi gn n nh, m hnh tt c cc im cc i ca
LPC ch ta mt xp x tt i vi ng bao ph c quan pht m. Vi ming ting ni
v thanh, m hnh LPC t ra t hu hiu hn so vi min hu thanh, nhng n vn l
m hnh hu ch cho cc mc ch nhn dng ting ni. M hnh LPC n gin v d
ci t tr phn cng v phn mm.
2.4.6. Tm hiu v Formant
hiu r hn v khi nim Formant, trc tin chng ta cn hiu khi nim
cng hng l g? l mt hin tng xy ra trong dao ng cng bc khi mt vt
dao ng c kch thch bi mt ngoi lc tun hon c cng tn s vi giao ng
ring ca n.

Hnh 2.13 Minh ha hin tng cng hng.
Formant l di tn s c tng cng do hin tng cng hng trong ng
dn thanh, c trng cho m sc ca mi nguyn m. Trong mi di tn nh th c
mt tn s c tng cng hn c gi l nh formant (formant peak).
Trang | 34

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

Hnh 2.14 Minh ha Formant
Formant l trong cc thnh phn quan trng hnh thnh nn m hc, to ra s
c trng cho tng m thanh. Mt nguyn m do mt ngi pht ra c nhiu formant:
F1: ng vi cng hng vng yt hu
F2: ng vi cng hng khoang ming
Khi ta ni, s c cc s cng hng khc khoang mi to nn cc formant
khc (F3, F4, F5..) chnh cc formant ny s quy nh cc c trng ging ni ring
cho tng ngi v khoang mi ca mi ngi l c nh, c cu to ty thuc vo th
trng ca tng ngi. Ngoi ra cn c 2 ni cng hng khc l khoang ming v yt
hu, Nhng hp cng hng 2 ni ny c kh nng thay i linh ng ph thuc v
tr ca cc c quan khc nh mi, li, hm. Khi 3 c quan ny thay i v tr s to
ra cc hp cng hng khc nhau v th tch, hnh dng, ng lu thng ca kh
trong thanh qun,..Nn s lm bin i cc m sc, chnh v iu m chng ta thy
cc ln ni khc nhau ca mt t th cha chc chng c pht m ra ging nhau.
Nn trong lnh vc m hc, ngi ta tp trung nghin cu formant to ra 2 vng ny
l F1, F2.
Formant c xc nh da trn s tp trung nng lng m hc xung quanh
mt tn s c bic trong di sng. Trong 1 di sng c th xc nh vi formant, mi
formant c th c cng cng m nhng c tn s khc nhau.
2.5. Gaussian Mixture Model
M hnh hp Gauss (Gaussian Mixture Model - GMM) l mt dng m hnh
thng k c xy dng t vic hun luyn cc tham s thng qua d liu hc. M
Trang | 35

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
hnh GMM cn c mt s tn gi khc nh Weighted Normal Distribution Sums hay
Radial Basis Function Approximations

Hnh 2.15: Hm mt Gausss
V c bn, m hnh GMM xp x mt hm mt xc sut bng hp cc hm
mt Gauss. Hnh 2.15 minh ha hai hm mt Gauss vi cc tham s khc nhau.
Mt cch hnh thc, hm mt xc sut ca phn phi Gauss f
N
(x, ,
2
) c cho
bi cng thc:
|
|
.
|

\
|
=
2
2
2
) (
exp
2
1
) (
o

t o
x
x p

(2.35)
trong , l gi tr trung bnh, l lch chun. Trong trng hp x l
vector gm D thnh phn, hm mt xc sut ca phn phi Gauss f
N
(x, , ) c
cho bi cng thc:
|
.
|

\
|

=

) ( )' (
2
1
exp
) 2 (
1
) (
1
2 / 1
2 /

t

x x x p
D

(2.36)
khi , l vector trung bnh, l ma trn hip phng sai. Nu chn =0 v
=1, cng thc (2.35) s tr thnh hm mt chun Gauss tiu chun:
|
|
.
|

\
|
=
2
exp
2
1
) (
2
x
x p
t

(2.37)
T Gauss c t theo tn ca nh ton hc ngi c Carl Friedrich
Gauss. ng nh ngha hm mt Gauss v p dng trong phn tch d liu thin
vn.
Trang | 36

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

Hnh 2.16: M hnh GMM
Cho trc M phn phi Gauss c cc hm mt p
1
, p
2
, , p
M
, hm mt
xc sut ca m hnh GMM c minh ha trong Hnh 2.16 chnh l tng trng ca M
phn phi Gauss theo cng thc:

=
=
M
i
i i GMM
x p w x p
1
) ( ) (


(2.38)
trong , w
i
l trng s ca phn phi Gauss th i, tha rng buc 0 w
i
1 v

=
=
M
i
i
w
1
1
. Cc trng s ny th hin mc nh hng ca mi phn phi Gauss i
vi m hnh GMM. Nh vy, phn phi Gauss c phng sai v trng s ln bao
nhiu th c mc nh hng ln by nhiu i vi kt xut ca m hnh. Hnh 2.17
cho thy mc nh hng ca tng phn phi Gauss ln GMM.

Hnh 2.17:Hm mt ca GMM c 3 phn phi Gauss
Nh vy, mt m hnh GMM c M phn phi Gauss s c i din bi b
tham s = { w
i
,
i
,
i
}, i e [1, M]. Ty thuc vo cch t chc ca ma trn hip
phng sai (covariance matrix), GMM c th c mt s bin th khc nhau:
Trang | 37

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
- Nodal covariance matrices GMM: mi phn phi Gauss trong GMM c
mt ma trn hip phng sai ring.
- Grand covariance matrix GMM: mi phn phi Gauss trong mt GMM
dng chung mt ma trn hip phng sai.
- Global covariance matrix GMM: mi phn phi Gauss trong tt c cc
GMM dng chung mt ma trn hip phng sai.
Ngoi ra, xt v dng thc, ma trn hip phng sai gm hai loi: full (dng
y ) v diagonal (dng ma trn ng cho). Thng thng, dng nodal-diagonal
covariance matrices GMM c s dng ph bin nht.
2.6. Hidden Markov Model
2.6.1. Gii thiu chui Makov

Hnh 2.18 M hnh Markov chain thi tit
Mt m hnh Markov chain c c th ha bng cc thnh phn:
S = {S
1
, S
2
, , S
N
} Tp cc trng thi, gi q
t
l trng thi t n c
thi im t.
A = { a
ij
} Ma trn xc sut chuyn trng thi, mi a
ij
th hin xc
xut di chuyn t trng thi S
i
sang trng thi Sj,
( ) N j i S q S q p a
i t j t ij
s s = = =
+
, 1 , |
1
, sao cho

=
= >
N
j
ij ij
a a
1
1 , 0
.
= {
i
}
i
l xc sut khi u ca trng thi S
i
xc sut ri
vo trng thi S
i
thi im t = 1:
Trang | 38

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
( ) N i S q p
i i
s s = = 1 ,
1
t
,

=
=
N
i
i
1
1 t

2.6.2. M hnh Markov n
Trong m hnh Markov n, cc s kin quan st c nm trong mi trng thi
v ph thuc vo vo hm mt xc sut trong cc trng thi .

Hnh 2.19 M hnh Markov n 3 trng thi
Hnh 2.19 minh ha mt m hnh Markov n 3 trng thi vi cc s kin c th
quan st c trong mi trng thi l V = {v
1
, v
2
, v
3
, v
4
}. Kh nng quan st c s
kin v
k
trong trng thi S
j
ph thuc vo xc sut quan st b
j
(k). Hm b c gi l
hm mt xc sut ca cc s kin c quan st.
Hnh 2.20 minh ha mt v d HMM n gin v mi lin h gia s lng
que kem vi thi tit. S lng que kem c n mi ngy l cc quan st, v d: O =
{1,2,3}. Thi tit nng hay lnh tng ng vi hai trng thi. Gi s chng ta bit
c s lng que kem c n, khng bit thi tit, liu chng ta c d on c
thi tit hm y th no khng. Thi tit (trng thi) c xem l n so vi s que
kem c n (kt qu quan st c). Chnh v vy, m hnh ny c gi l m hnh
Markov n (hidden) Hidden Markov Model (HMM).
Trang | 39

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

Hnh 2.20 V d HMM n gin
Mt m hnh Markov n c c th ha bng cc thnh phn:
S = {S
1
, S
2
, , S
N
} Tp cc trng thi, q
t
l trng thi t n c thi
im t.
V = {v
1
, v
2
, , v
M
} Tp cc tn hiu quan st, gi O
t
l tn hiu quan st
c thi im t
A = { a
ij
} Ma trn xc sut chuyn trng thi, mi a
ij
th hin xc
xut di chuyn t trng thi S
i
sang trng thi Sj,
( ) N j i S q S q p a
i t j t ij
s s = = =
+
, 1 , |
1
, sao cho

=
= >
N
j
ij ij
a a
1
1 , 0
.
B = { b
j
(k) } Ma trn cc hm mt xc sut trong mi trng thi,
( ) M k N j S q t at v p k b
j t k j
s s s s = = 1 , 1 , | ) (

Tha rng buc

=
=
M
k
j
k b
1
1 ) (

= {
i
}
i
l xc sut khi u ca trng thi S
i
xc sut ri
vo trng thi S
i
thi im t = 1:
( ) N i S q p
i i
s s = = 1 ,
1
t
,

=
=
N
i
i
1
1 t

thun tin cho vic trnh by, mi m hnh HMM s c quy c i din
bi b tham s = ( , A, B).
Trang | 40

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
2.6.3. Ba bi ton c bn ca HMM
c th p dng c m hnh HMM vo cc ng dng phc tp trong thc
t, trc ht cn c li gii tha ng cho 3 bi ton c bn ca HMM:
- Bi ton 1: cho trc chui tn hiu quan st O = O
1
O
2
O
T
v m hnh
HMM c i din bi b tham s = (, A, B). Lm sao tnh ton mt
cch hiu qu p(O|) xc sut pht sinh O t m hnh ?
- Bi ton 2: cho trc chui tn hiu quan st O = O
1
O
2
O
T
v m hnh
HMM i din bi b tham s = (, A, B). Cn tm ra chui trng thi ti
u nht Q = q
1
q
2
q
T
pht sinh ra O? y l bi ton rt quan trng
dng cho nhn dng.
- Bi ton 3: cho trc chui tn hiu quan st O = O
1
O
2
O
T
. Lm th no
xc nh cc tham s m hnh = (, A, B) sao cho cc i ha xc sut
p(O|)? y chnh l bi ton hc hay cn gi l hun luyn m hnh. Bi
ton ny em li mt kh nng rt quan trng ca HMM: kh nng m hnh
ha mt i tng c th trong thc t, m hnh ha d liu hc.
Cc tiu mc tip theo s ln lt trnh by gii php cho ba bi ton ny.
2.6.3.1. Bi ton 1 Computing Likelihood
Mc tiu ca bi ton th nht l tnh p(O|) xc sut pht sinh O t m hnh
. Mt gii php kh thi tnh p(O|) l thng qua thut ton forward-backward.
Trc tin, ta nh ngha bin forward
t
(j) l xc sut trng thi j ti thi im t v
quan st c on O
1
, O
2
, ..., O
t
vi m hnh cho trc:
) | , ... ( ) (
2 1
o
i t t t
S q O O O p i = =

(2.39)
Cc bin
t
(i) c th c tnh theo qui np tng bc (hay thut ton qui
hoch ng) nh sau:
1) Khi to:
N i O b i
i i
s s = 1 , ) ( ) (
1 1
t o

(2.40)
2) Qui np:
N j T t O b a i j
t j
N
i
ij t t
s s s s
(

=
+
=
+
1 , 1 1 , ) ( ) ( ) (
1
1
1
o o

(2.41)
3) u ra:
Trang | 41

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

=
=
N
i
T
i O p
1
) ( ) | ( o

(2.42)
phc tp tnh ton ca thut ton forward l O(N
2
T).
Tng t nh trong th tc forward, th tc backward trc ht nh ngha
bin backward
t
(i) l xc sut quan st c on O
t+1
, O
t+2
, ..., O
T
cho trc trng
thi i thi im t v m hnh :
) , | ... ( ) (
2 1
| i q O O O p i
t T t t t
= =
+ +

(2.43)

Cc bin
t
(i) cng c tnh theo qui np tng bc nh sau:

1) Khi to:
N i i
T
s s = 1 , 1 ) ( |

(2.44)
2) Qui np:

=
+ +
=
N
j
t t j ij t
j O b a i
1
1 1
) ( ) ( ) ( | |
vi 1 ..., , 2 , 1 = T T t v N i s s 1 (2.45)
3) u ra:

=
=
N
j
j O p
1
1
) ( ) | ( |

(2.46)
Cng ging nh thut ton forward, phc tp ca thut ton backward l
O(N
2
T). Nh vy, th tc forward-backward l kh thi vi chi ph tnh ton hon ton
c th chp nhn c.
i vi vic tm li gii cho bi ton 1, ta ch cn n phn forward trong th tc
forward-backward. Tuy nhin, phn backward gip tm li gii cho bi ton 3.
2.6.3.2. Bi ton 2 decoding (Thut ton Virterbi)
Mc tiu ca bi ton 2 l tm ra chui trng thi ti u nht Q = q
1
q
2
q
T

pht sinh ra O. Mt iu ng lu l c rt nhiu cc tiu ch ti u khc nhau cho
vic xc nh Q, nn li gii cho bi ton ny ph thuc vo tiu ch ti u c
chn.
Trang | 42

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Thut ton viterbi nh ngha bin
t
(i) l xc sut cao nht ca on chui trng
thi dn n S
i
thi im t v quan st c on O
1
, O
2
, , O
t
cho trc m
hnh :
) | ... , ... ( max ) (
2 1 2 1
,..., ,
1 2 1
o
t i t
q q q
t
O O O S q q q p i
t
= =


(2.47)

t
(i) c th c tnh theo qui np:
) ( ] ) ( max [ ) (
1 1 + +
=
t j ij t
i
t
O b a i j o o

(2.48)
Tuy nhin thut ton viterbi cn cn n mng
t
(j) lu li cc tham s i lm
cc i ha biu thc (2.48), c th hiu ti thi im t trc j l i, nhm mc ch
truy vt (back trace). Chi tit cc bc ca thut ton viterbi nh sau:
1) Khi to:
0 ) (
1 , ) ( ) (
1
1 1
=
s s =
i
N i O b i
i i

t o

(2.49)
2) Lp qui np:
N j
T t O b a i j
t j ij t
N i
t
s s
s s =

s s
1
2 , ) ( ] ) ( [ max ) (
1
1
o o

(2.50)
N j
T t a i j
ij t
N i
t
s s
s s =

s s
1
2 , ] ) ( [ max arg ) (
1
1
o

(2.51)
3) Kt thc:
)] ( [ max arg *
)] ( [ max *
1
1
i q
i p
T
N i
T
T
N i
o
o
s s
s s
=
=

(2.52)
4) quay lui backtracking:
1 , ... , 2 , 1 , *) ( *
1 1
= =
+ +
T T t q q
t t t


(2.53)
Kt thc thut ton, chui
* *
2
*
1
...
T
q q q Q = chnh l li gii tha ng ca bi ton 2.
2.6.3.3. Bi ton 3 learning (Forward Backward)
Mc tiu ca bi ton th 3, cng l bi ton phc tp nht trong ba bi ton, l
tm cch cp nht li cc tham s ca m hnh = (, A, B) sao cho cc i ha xc
sut p(O|) xc sut quan st c chui tn hiu O t m hnh.
Trang | 43

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Vi mt chui tn hiu quan st hu hn bt k O lm d liu hun luyn, cha
c mt phng php ti u no cho vic c lng cc tham s = (, A, B) ca m
hnh theo hng cc i ton cc. Tuy nhin, b tham s c th c chn sao cho
xc sut p(O|) t cc i cc b bng thut ton Forward-Backward hoc thut ton
Baum-Welch, hoc thut ton Expectation-Maximization (EM).
Trc tin, ta nh ngha
t
(i,j) l xc sut trng thi S
i
ti thi im t v ri
vo trng thi S
j
thi im t+1 cho trc m hnh v chui tn hiu quan st O:
) , | , ( ) , (
1
O S q S q p j i
j t i t t
= = =
+

(2.54)
p dng xc sut Bayes:

Suy ra:
) | (
) ( ) ( ) (
) | (
) | , , (
) , (
1 1 1

| o

O p
j O b a i
O p
O S q S q p
j i
t t j ij t j t i t
t
+ + +
=
= =
=

(2.55)

Hnh 2.21 Cch tnh ) | , , (
1
O S q S q p
j t i t
= =
+

Gi
t
(i) l xc sut trng thi S
i
vo thi im t cho trc chui tn hiu quan
st O v m hnh :
) , | ( ) ( O S q p i
i t t
= =

(2.56)
p dng xc sut Bayes, suy ra:
Trang | 44

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

) | (
) | , (
) (

O p
O S q p
i
i t
t
=
=

(2.57)
p dng xc sut forward v backward nh trong Hnh 2.22, suy ra:
) | (
) ( ) (
) (

| o

O p
i i
i
t t
t
=

(2.58)

Hnh 2.22:
t
(i) c tnh da trn xc sut forward v backward
Nu ta ly tng
t
(i) theo t e [1, T-1], kt qu nhn c l s ln k vng
chuyn trng thi t S
i
. Tng t, ly tng
t
(i,j) theo t e [1, T-1], ta s c s ln k
vng chuyn t trng thi S
i
sang S
j
:

=
=
1
1
) (
T
t
t
i
s ln k vng chuyn trng thi t S
i
. (2.59)
=

=
1
1
) , (
T
t
t
j i s ln k vng chuyn t trng thi S
i
sang S
j
. (2.60)
Vi cc i lng ny, ta c cc biu thc cp nht tham s ca HMM nh sau:
=
i
t s ln k vng trng thi S
i
vo thi im (t=1) =
1
(i)
(2.61)

=
= =
1
1
1
1
) (
) , (
) (
) , (
T
t
t
T
t
t
i
j i
ij
i
j i
S exnum
S S exnum
a


(2.62)

Trang | 45

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

=
=
=
= =
T
t
t
v O t s
T
t
t
j
k j
j
j
j
S in exnum
v S in exnum
k b
k t
1
. .
1
) (
) (
) ( _
) , ( _
) (


(2.63)
exnum(S
i
,S
j
): s ln k vng chuyn t trng thi S
i
sang trng thi S
j
.
exnum(S
i
): s ln k vng chuyn trng thi t S
i
.
exnum_in(S
j
,v
k
): s ln k vng trng thi S
j
v quan st c tn hiu v
k
.
exnum_in(S
j
): s ln k vng trng thi S
j
.
T m hnh ban u = (A, B, ) v chui tn hiu quan st O, ta tnh c v
phi ca cc biu thc (3.4)(3.5)(3.6), kt qu nhn c chnh l cc tham s mi ca
m hnh ) , , ( t B A = . Theo Baum chng minh, ta lun c ) | ( ) | ( O p O p > tr
phi m hnh t n im ti u cc b (khi = ).
Nh vy, thut ton Baum-Welch s c p dng qua nhiu bc lp c
lng v cp nht cc tham s m hnh cho n khi hi t. Tuy nhin, kt qu cui
cng ch t c ti u cc b m thi. Thng thng, nu cc tham s c khi
to vi cc gi tr thch hp, th c mt phn kh nng no c th gip m hnh t
c ti u ton cc khi hun luyn.
2.7. Mixture Of Gaussians Hidden Markov Model
2.7.1. c t m hnh
Phn 2.6 trnh by v m hnh HMM v 3 bi ton c bn ca n, trong
hm mt xc sut ca cc tn hiu quan st l ri rc. M hnh HMM ny c gi
l HMM ri rc. Khi hm mt xc sut trong mi trng thi l hm lin tc, ta c
HMM lin tc. Thng thng, c hai dng chnh ca HMM lin tc:
- Gaussian Hidden Markov Model (GHMM): hm mt xc sut trong
mi trng thi l hm mt Gauss.
- Mixture of Gaussians Hidden Markov Model (MGHMM): hm mt
xc sut trong mi trng thi l hp cc hm mt Gauss (m hnh
GMM nh trnh by trong mc 2.5 trn).
Trang | 46

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

Hnh 2.23: M hnh MGHMM 3 trng thi
Phn ny trnh by v m hnh MGHMM. y l mt dng ca HMM lin tc,
trong hm mt xc sut ca vector quan st O
t
c cho bi m hnh GMM:
) ( ) (
t GMM t j
O p O b
j
=

(2.64)
trong
j
GMM
p
chnh l xc sut u ra ca m hnh GMM trong trng thi S
j
.
Nh vy, kh nng quan st c cc vector trong mi trng thi s b chi phi bi
GMM ca trng thi . Hnh 2.23 minh ha m hnh MGHMM c 3 trng thi.
Cng nh trong nh ngha ca HMM, mt m hnh MGHMM c N trng thi v
M phn phi Gauss trong mi trng thi s c i din bi b tham s = {, A,
B}, trong :
A = { a
ij
}, a
ij
l xc sut chuyn t trng thi S
i
sang trng thi S
j
.
= {
i
},
i
l xc sut khi u ca trng thi S
i
.
B = { b
j
}, b
j
l hm mt xc sut trong trng thi S
j
.
a
ij
v
i
th khng c g thay i so vi HMM, im khc bit chnh nm b
j
. T
(2.38) ta c:
Trang | 47

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
( )
jm jm t
M
m
jm t j
U O G c O b , , ) (
1

=
=

(2.65)
trong trng thi th j e [1, N], O
t
l vector quan st c thi im t,
G(
jm
, U
jm
) l hm mt Gauss th m trong trng thi S
j
vi vector trung bnh v
ma trn hip phng sai U tng ng (dng U thay cho trnh nhm ln vi k
hiu tng), c
jm
l trng s (khng m) ca phn phi Gauss th m trong trng thi S
j

tha rng buc xc sut
1
1
=

=
M
m
jm
c
.
V 3 bi ton c bn ca HMM, gii php cho bi ton 1 v bi ton 2 l tng
t cho m hnh MGHMM. Ring i vi bi ton 3, c mt s im khc bit trong
cng thc cp nht tham s B ca MGHMM so vi HMM.
2.7.2. Hun luyn tham s
Gii php cho bi ton hun luyn m hnh MGHMM = {A, B, } vn l
thut ton Baum-Welch (tng ng EM [2]). Khng c g khc bit nhiu so vi
hun luyn HMM, cc cng thc cp nht A v vn l:
) (
1
i
i
t =

(2.66)

=
=
1
1
1
1
) (
) , (
T
t
t
T
t
t
ij
i
j i
a


(2.67)
Tuy nhin i vi B = { b
j
}, b
j
gm 3 tham s con { c
jm
,
jm
, U
jm
}, th phng
php c lng phc tp hn. Trc tin, ta nh ngha
t
(j, k) l xc sut quan st
c vector O
t
bi phn phi Gauss th k trong trng thi S
j
ti thi im t cho trc
chui vector quan st O v m hnh :
(
(
(
(

(
(
(
(

=

= =
M
m
jm jm t jm
jk jk t jk
N
j
t t
t t
t
U O G c
U O G c
j j
j j
k j
1 1
) , , (
) , , (
) ( ) (
) ( ) (
) , (

| o
| o


(2.68)
trong ,
t
(j),
t
(j) l cc bin forward-backward, c
jk
,
jk
v U
jk
ln lt l
trng s (weight), vector trung bnh (mean vector) v ma trn hip phng sai
Trang | 48

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
(covariance matrix) ca phn phi Gauss th k trong trng thi S
j
. Vi i lng
t
(j,
k), ta c cng thc cp nht cc thnh phn ca b
j
nh sau:

= =
=
=
T
t
M
k
t
T
t
t
jk
k j
k j
c
1 1
1
) , (
) , (


(2.69)

=
=

=
T
t
t
T
t
t t
jk
k j
O k j
1
1
) , (
) , (


(2.70)

=
=

=
T
t
t
T
t
jk t jk t t
jk
k j
O O k j
U
1
1
) , (
)' )( ( ) , (



(2.71)
Gii thut hun luyn:
- Input: chui cc vector c trng O = { O
1
, O
2,
, O
T
},
m hnh = {A, B, }
- Output:
*
= {A
*
, B
*
,
*
}
while (!converged && iter < maxIter)
{
// Tnh cc bin forward
for i = 1 N

1
(i) =
i
b
i
(O
1
)
for t = 1 T-1
for j = 1 N

) ( ) ( ) (
1
1
1 +
=
+
(

=
t j
N
i
ij t t
O b a i j o o

// Tnh cc bin backward
Trang | 49

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
for i = 1 N

T
(i) = 1
for t = T-1 1
for i = 1 N

=
+ +
=
N
j
t t j ij t
j O b a i
1
1 1
) ( ) ( ) ( | |

// Cp nht cc tham s m hnh
= updateParameters (, O, , )
}
Phng thc updateParameters cp nht tham s ca m hnh MGHMM theo cc
cng thc (2.69), (2.70), (2.71).
Nh cp, thut ton Baum-Welch ch c th hun luyn m hnh n ti u
cc b m thi. Kt qu thu c khng phi lc no cng l ti u ton cc. Cha c
gii php tuyt i cho vn ny, tuy nhin vic khi to tt gi tr ban u cho cc
tham s m hnh s em li kh nng cao hn cho vic t ti u ton cc, ng thi
y nhanh tc hi t trong hun luyn.

Trang | 50

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
CHNG 3. CC FRAMEWORK NHN DNG GING NI
3.1. HTK Framework
3.1.1. Gii thiu
HTK (Hidden Markov Model Toolkit) l mt b cc cng c xy dng m
hnh Markov n (HMM). HTK c thiu k ch yu xy dng cc m hnh x l,
nhn dng ting ni da trn HMM, do cc h tr m HTK cung cp u phc v
cho mc ch trn.

Hnh 3.1 Nhn dng ting ni s dng HTK
Trong hnh 23 c m t mt h thng nhn dng ting ni c s dng HTK, n
bao gm 2 giai on x l chnh:
- Cc cng c hun luyn HTK c s dng c lng cc tham s ca mt
tp cc HMM s dng cc cu ni v kt hp cc pht m ca chng hun
luyn.
- Sau , mt cu ni cha bit thng qua cc cng c nhn dng HTK s cho
kt qu l cc pht m (phin m) ca nhng ting ni ny.
HTK c s dng nh mt b th vin, d m rng v pht trin. y l mt
cng c l tng nghin cu m hnh v nhn dng da trn HMM v qua ,
chng ta c th dnh nhiu thi gian hn pht trin cc cng c ring cho mnh
nhm gip cho vic nhn dng tt hn. Nh chng ta c th d dng p dng cng c
ny cho vic nhn dng ting Vit, khng phi mt thi gian xy dng li cng c
m hnh HMM, HTK thc s c pht trin rt tt cho cng vic ny.
Trang | 51

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
3.1.2. Tng quan v HTK
HTK cung cp mt tp cc cng c thc hin cc nhim v khc nhau trong
m hnh Markov n trong h thng nhn dng. Cc cng c ny c vit bng ngn
ng C v s dng mt tp cc hm c bn trong mt th vin chung xy dng
HMM

Hnh 3.2 Kin trc ca HTK
Trong Hnh 24, m t kin trc tng qut ca HTK, bao gm nhiu mdun
c phn chi theo tng mc ch khc nhau, trong c cc cng c h tr c bit
cho vic nhn dng. Hnh 25 di trnh by cc cng on ca vic xy dng h thng
nhn dng ting ni, v cc cng c m HTK cung cp qua tng cng on c th
nh: chun b d liu, hun luyn, kim tra kt qu, nh gi v cui cng l phn
tch.

Trang | 52

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

Hnh 3.3 Cc cng on xy dng h nhn dng ting ni vi HTK
HTK s dng nhng tp tin d liu khc nhau chuyn i d liu cho cc
cng c khc nhau. Cc tp tin ny c th cha d liu m thanh (dng sng m hoc
dy cc vector c trng m thanh), d liu m thanh c nh nhn (phin m), cc
HMM (cc tham s nh ngha m hnh HMM) hoc cc mng nhn dng. nhng
tham s ca HMM c th c chia s gia cc tp hp, cc trng thi, hay cc
HMM, Trong HTK, n v nhn dng nh nht (do ngi dng nh ngha, c th
l m v, m tit hoc mt t) c m hnh bng mt HMM c gi l mt phone.
Vic kt hp cc m hnh trong cc b phn nhn dng da trn t con (sub-word),
HTK c cung cp cc HMM ph thuc ng cnh, do m hnh nh m (bi-phone)
hay tam m (tri-phone) c s dng cho mi ng cnh c kh nng gia cc phone
Sau y l cc cng c chnh trong HTK
- HCode: y l cng c c s dng m ha mt tp tin ting ni dng
sng m thnh tp tin dng tham s (dy cc vector quan st).
- HInit: cng c ny c dng khi to mt HMM da trn mt tp cc
phn on d liu hun luyn c nh nhn.
Trang | 53

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
- HRest: y l cng c dng ti c lng tham s da trn thun ton
Baum-Welch, n cng lm vic trn mt tp cc phn on d liu hun
luyn c nh nhn.
- HERest: y l cng c da trn thut ton Baum-Welch hun luyn ng
thi (embedded training). N hun luyn mt tp cc HMM cng lc s
dng mt tp d liu hun luyn (bao gm cc cu c ni lin tc cng
phin m ca chng).
- HVite: cng c ny s dung thut ton Viterbi nhn dng ting ni lin
tc, da trn cc rng buc v c php v tm kim tn hiu.
- HResults: cng c ny s dng 2 tp tin lu tr d liu phin m, s dng
mt kt hp lp trnh ng (Dynamic programming), sau xut tra thng
k kt qu nhn dng.
- HSLab: cng c ny dng hin th ting ni di dng sng m v tp
phin m ca n di dng ha.
3.1.3. Chi tit v HTK
3.1.3.1. X l tn hiu ting ni
HTK h tr vic rt trch c trng (qu trnh chuyn tn hiu m thanh dng
sng thnh dy cc vector tham s). D liu u vo ca bc ny l cc tp tin c
lu tr di dng file .wav v kt qu thu c l dy cc tham s s dng phng
php nn MFCC_C (compressed mel-ceptral coefficients). Tht ra, HTK cho php d
liu du vo v ra nhiu dng khc nhau. V d, d liu vo cn c th l tn hiu m
thanh c thu trc tip bng cng c HAudio v d liu ra l dy cc tham s ca
mt phng php rt trch c trng khc nh LPC, PLP,
Qu trnh ny c th hin trong hnh 26. i vi HTK, cc tp tin dng sng
m v cc tp tin tham s ha l nh nhau, chng u dng biu din cho cng mt
dy mu tn hiu, ch khc l mt loi th dng nh phn, loi kia dng vector
nhiu thnh phn. Thng thng mt vector tham s ch th hin cho mt khon thi
gian nht nh trong dy tin hiu m thanh, v khong thi gian ny l t 10-25 msec
(ngi dng HTK c th nh ngha li gi tr ny).
Trang | 54

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

Hnh 3.4 X l m ha ting ni
C ch ca qu trnh X l ting ni trong HTK tng t nh gii thiu
trong phn Rt trch c trng ca bo co ny. Trong HTK ta c th s dng ln
HCopy thc hin vic ny.
3.1.3.2. c lng tham s cho cc m hnh
HTK cung cp 4 cng c c bn cho vic c lng tham s cho cc HMM,
l: HCompV, HInit, HRest. HCompV v HInit c dng khi to gi tr cho cc
tham s. HCompV s tnh k vng v phng sai ca mi thnh phn Gausssian trong
nh ngha m hnh Markov n, cho chng gn bng vi k vng v phng sai
ca d liu hun luyn ting ni. HInit s tnh gi tr cho cc HMM mi s dng cng
thc c lng nh thut ton Viterbi.
3.1.3.3. M hnh ngn ng c dng trong nhn dng ting ni HTK
Cng ging nh Sphinx, nhn dng ting ni t kt qu tt, chng ta cn
phi p dng, xy dng trong cc vn phm v ng ngha ca ngn ng. c bit,
trong trng hp nhn dng ting ni lin tc da trn phone, kt qu nhn dng c
l mt chui cc phone lin tip nhau. V th c kt qu ra cc cu ni thng
Trang | 55

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
thng, chng ta khng th n gin ch l tm tng ng vi mi dy phone l mt
t n trong t in, bi v vi chui cc phone tng ng, kt qu tm c c th l
mt dy cc t khc nhau.
Mt mng ng ngha biu din cho dy cc t c th c nhn dng thng qua
h thng. Mt mng ng ngha biu din mc t l nh ngha th t c th c
ca tt c cc t trong t in vi nhng cu khc nhau, n chnh l vn phm ca cc
t trong cu. Trong mng ng ngha, thng thng cc ng truyn trng thi gia
cc t vi nhau c km theo mt cng thc xc sut, n cho bit kh nng t ny
tip sau t kia l bao nhiu trong ngn ng. khi da vo mng m chng ta c th
kly c kt qu c nhiu kh nng nht vi mt tp cc phone c nhn dng.
Trong HTK c nhiu cng c xy dng mng ni trn nh: HParse, HSGen,
HLStats. Trong :
HParse: dng chuyn mt tp tin vn phm ca t in thnh mt mng ng
ngha v kh nng xy ra th t cc t theo mt trt t.
HGen: ngc li vi HParse, khng nhng th n cn gip pht sinh ra mt
dy cc cu da trn tp vn bn thu c.
HLStats: c mt danh sch cc HMM v tp phin m li ni. Cng c ny s
tnh ton cc thng k khc nhau phn tch d liu hun luyn ng m v pht sinh
cc m hnh ngn ng n gin dng nhn dng.
3.2. CMU Sphinx Framework
3.2.1. Gii thiu CMU Sphinx.
Sphinx l mt h thng nhn dng ting ni hon chnh c vit trn ngn
ng Java. Sphinx c cng b v chia s m ngun ln u tin vo nm 2010 do
nhm nghin cu Sphinx Carnegie Mellon University nghin cu v pht trin. Sau
vi s gip ca cc t chc (Sun Microsystems Laboratories, Hewlett Packard,
Mitsubishi Electric Research Labs) v cc trng i hc (University of California at
Santa Cruz, Massachusetts Institute of Technology) Sphinx tip tc nghin cu v b
sung cc phin bn mi cho Sphinx (Sphinx 2, Sphinx 3,..) v mi nht l Sphinx 4,
lm cho h thng nhn dng ngy cng ti u hn. Trong phm vi bo co ny, chng
ta s tp trung tm hiu Sphinx 4.
Trang | 56

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
3.2.2. Cc c im ca CMU Sphinx
Sphinx tuy mi c cng b a vo cc ng dng nhn dng cha lu
nhng Sphinx ang dn tr thnh mt framwork nhn dng ting ni mnh m v
c s dng rt nhiu trong cc ng dng trong cuc sng. l nh cc c im
sau:
- H tr nhn dng ting ni ch trc tip hoc chia l, c kh nng nhn dng
ting ni ri rc v lin tc.
- L mt h thng nhn dng s nhng c kh nng tho lp rt linh ng. H tr
sn y cc tnh nng p ng nhu cu nhn dng nh xy dng cc b lc, cc
hm ca s, cc php bin i, cc cng c h tr rt trch c trng theo nhiu
phng php khc nhau.
- H tr nhiu m hnh ngn ng dng ASCII v cc phin bn nh phn ca
unigram, bigram, trigram, Java Speech API Grammar Format(JSGF) v ARPA-
format FST grammars.
- Xy dng sn cc thut ton tm kim ti u (breath first, word pruning), d dng
tinh chnh cho ph hp vi nhu cu nhn dng.
3.2.3. Kin trc Sphinx4
Sphinx 4 l mt framework nhn dng m thanh c thit k kh s v
phc tp. N bao gm cc thnh phn c ghp ni rt linh ng vi nhau v c
cng c ha thnh cc b phn c chc nng ring. Cc thnh phn ny c th c
chnh sa v kt ni vi nhau theo nhu cu ca ng dng m khng ph v cu trc
ca h thng.
Trang | 57

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

Hnh 3.5 Kin trc tng qut sphinx 4
C ba thnh phn c bn cu to nn Sphinx 4: B ngoi vi (FrontEnd), B
gii m (Decoder) v b ngn ng (Linguist). d hnh dung cch hot ng tng
qut ca Sphinx ta s tm hiu s lc chc nng ca cc b phn trn. B ngoi vi
vi cc cng c h tr cho vic thu nhn v tin x l cc tn hiu s nhn v thc
hin tham s ha chng thnh tp hp cc vecto c trng (Feature). B ngn ng s
c nhim v c vo cc m hnh ngn ng vi cc thng tin cch pht m trong t
in v thng tin cu trc ca cc m hnh m hc ri m hnh ha chng vo m
th tm kim (Search Graph). B phn cn li l b gii m (Decoder) c nhim v
quan trng nht kt ni hai b phn cn li. C th l thnh phn qun l tm kim
(Search manager) trong b gii m s ly cc thng tin c trng t b ngoi vi ri
kt hp vi th tm kim c pht sinh t b ngn ng gi m v tnh ton ra
kt qu nhn dng.
Trang | 58

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

Hnh 3.6 M phng hot ng ca Sphinx
Khi xy dng Sphinx, cc nh pht trin nghin cu kh k v lnh vc m
hc nn xy dng mt s lng ln cc hiu s cu hnh hiu chnh chung cho
ph hp vi nhu cu ca cc ngn ng v m thanh khc nhau. Chng ta c th thay
i, iu chnh chng nh thay i cc cch rt trch c trng, iu chnh cc phng
php tm kim,.. trn file config ca h thng m khng cn phi can thip su vo
bn trong.Bn cnh n cn xy dng cc tool h tr cho qu trnh nhn dng nh
cc tool hun luyn, cc tool gim st v bo co h thng.
3.2.3.1. B ngoi vi FrontEnd
Chc nng : Nhn vo cc tn hiu t bn ngoi, thc hin qua mt s b lc
v x l d liu cho kt qu l mt tp cc Vecto c trng.

Hnh 3.7 Kin trc tng qut B ngoi vi
Trang | 59

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Cu to : Bn trong ca b ngoi vi ny thc cht l mt chui cc modun con
c kh nng x l tn hiu giao tip c ni ghp vi nhau. Cc chui ny c th
hot ng song song cng mt lc vi nhau gi l Dataproccessor.


Hnh 3.8 Chui cc Dataprocessor trong qu trnh x l
Vic x l bng cc chui Dataprocessor ni ghp vi nhau s gip chng ta
linh ng trong vic rt trch cc thng tin c trng, vi mt tn hiu ta c th rt
trch c trng bng cc phng php khc nhau so snh tm cc c trng ti u
hoc p dng lin tip cc phng php vi nhau tm ra c trng tt nht. ng
thi cho php cc chui ny thc hin song song vi nhau lm cho h thng nhn din
nhanh chng theo thi gian thc vi lng tn hiu ln thu vo.
3.2.3.2. B ngn ng - Linguist
Chc nng: Bng cc cng c v phng php ngn ng, b ny s c vo
cc file cu trc ca mt ngn ng ri m hnh ha chng vo th tm kim s
dng trong vic tm kim nhn dng.
Cu to: b ny cu to kh phc tp v n quy nh hu nh ton b phm
vi ngn ng m chng ta cn nhn dng, n gm cc thnh phn nh sau:
a. Thnh phn m hnh ngn ng
Thnh phn ny s c chc nng c vo file cu trc ngn ng cp l cc
t (word level). Thnh phn ny c vai tr quan trng xc nh nhng th m h
thng cn nhn dng. Cu trc ngn ng s c m hnh ha thnh phn ny theo
hai m hnh: graph-driven grammar v Stochastic N-Gram.
- M hnh graph-driven grammar : biu din mt th t c hng trong
mi nt biu din mt t n v mi cung biu din xc sut dch chuyn sang
mt t.
- M hnh stochastic N-Gram : m hnh ny cung cp cc xc sut cho cc t
c cho da vo vic quan st n-1 t ng trc.
Trang | 60

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Sphinx h tr nhiu nh dng ngn ng khc nhau nh:
- SimpleWordListGrammar: nh ngha mt t da trn mt danh sch cc t.
Mt tham s ty chn ch ra ng php c lp hay khng. Nu ng
php khng lp, ng php s c dng cho mt nhn dng t tch bit. Nu
ng php lp, n s c dng h tr lin kt nhn dng t tm thng,
tng ng vi mt unigram grammar vi xc sut bng nhau.
- JSGFGrammar: H tr JavaTM Speech API Grammar Format (JSGF), nh
ngha mt biu din theo BNF, c lp nn tng, Unicode ca cc ng php.
- LMGrammar: nh ngha mt ng php da trn mt m hnh ngn ng
thng k. LMGrammar pht sinh mt nt ng php mi t v lm vic tt vi
cc unigram v bigram, xp x 1000 t.
- FSTGrammar: h tr mt b chuyn i trng thi gii hn (finite-state
tranducer) trong nh dng ng php ARPA FST.
- SimpleNGramModel: cung cp h tr cho cc m hnh ASCII N-Gram
trong nh dng ARPA. SimpleNGramModel khng c lm ti u vic s dng
b nh, do n lm vic tt vi cc m hnh ngn ng nh.
- LargeTrigramModel: cung cp h tr cc m hnh N-Gram ng c pht
sinh bi CMU-Cambridge Statictical Language Modeling Toolkit.
LargeTrigramModel ti u vic lu tr b nh, cho php n lm vic vi cc
tp tin rt ln, trn 100MB.
b. Thnh phn t in
Thnh phn ny cung cp cch pht m cho cc t ta xy dng trong m
hnh ngn ng v h tr phn lp cc t thnh cc lp khc nhau h tr cho vic
tm kim.
c. Thnh phn m hnh m hc
Thnh phn ny m hnh m hc cung cp mt nh x gia mt n v ting
ni v mt HMM (Hidden Markov Model) c th c nh gi da vo cc c
trng c cung cp bi b ngoi vi. Cc nh x c th a thng tin v tr ca t v
ng cnh t thnh phn m hnh ngn ng. nh ngha ng cnh ny c xy dng
t cu trc ng php ca m hnh ngn ng
Trang | 61

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Khc vi cc h nhn dng khc, m hnh HMM trong Sphinx c cu trc c
nh trong b nh v n thun ch l mt th c hng ca cc i tng. Trong
th ny, mi nt tng ng vi mt trng thi HMM v mi cung biu din xc
sut bin i t trng thi ny sang trng thi khc trong HMM. Bng cch biu din
HMM nh l cc th c hng ca cc i tng thay v mt cu trc c nh, mt
b sung ca m hnh m hc c th d dng cung cp cc HMM vi cc dng hnh
hc tp khc.
Mi trng thi HMM c kh nng pht sinh mt nh gi t mt c trng
quan st. Quy tc tnh ton im s c thc hin bi chnh trng thi HMM, do
che du cc thc thi ca n i vi phn cn li ca h thng, thm ch cho php
cc hm mt xc sut khc nhau c s dng trn mi trng thi HMM. M hnh
m hc cng cho php chia s cc thnh phn khc nhau trn tt c cc cp . Ngha
l cc thnh phn to nn mt trng thi HMM nh cc hp Gaussian (Gaussian
mixture), cc ma trn bin i v cc trng s hn hp (mixture weight) c th c
chia s bi bt k trng thi HMM no.
d. th tm kim Search Graph
y l kt qu m b ngn ng pht sinh c cui cng a vo s dng
trong b gii m. th tm kim ny l mt th c hng, trong mi nt c
gi l mt trng thi tm kim (SearchState): biu din mt trong hai trng thi: pht
hoc khng pht (emitting state hay non-emitting state). V cc ng cung biu
din cc trng thi bin i c th, trn cc cng ny c cc gi tr xc sut c tnh
ton t m hnh m hc: biu din kh nng chuyn t trng thi ny n trng thi
kia. Mt trng thi c th c nhiu cung hng n trng thi khc.

Hnh 3.9 V d mt th tm kim n gin
Trang | 62

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
3.2.3.3. B gii m Decoder
Chc nng : B gii m c nhim v chnh l s dng cc c trng (Features)
t b ngoi vi kt hp vi th tm kim c pht sinh t b ngn ng tin hnh
gii m v p dng cc thut ton suy ra kt qu nhn dng
Cu to: Thnh phn ca b ny tuy n gin nhng rt phc tp v thut ton
v nh hng ti tc tm kim, gm mt thnh phn qun l tm kim
(SearchManager) c kh nng tho lp v cc m h tr khc n gin ha qu
trnh gii m cho mt ng dng. Nn trong bo co ch trung gii thiu thnh phn
qun l tm kim.
Nhim v ca thnh phn qun l tm kim l nhn dng cc tp cc vecto c
trng tm ra nh x tng ng ca n trong th tm kim. p ng tm ra kt
qu chnh xc trong th tm kim khi x l kt qu, Sphinx cung cp cc tin ch c
kh nng pht sinh li v cc nh gi tin cy t kt qu. V thm c im na
khc cc h thng khc l khng gian tm kim trong Sphinx c th c tinh chnh
thay i trong qu tnh tm kim tng hiu sut tm kim.
Ngoi ra nng cao hiu sut ca kt qu nhn dng, Sphinx con b sung
thm cc cng c h tr cho vic nh gi kt qu nhn c, l thnh phn nh
gi (Scorer) v thnh phn ct ta (Pruner). Ni v thnh phn Scorer th n l mt
module dng c lng xc sut ca trng thi khi cung cp cc gi tr mt
trng thi xut hin. Khi thnh phn qun l tm kim yu cu nh gi im s cho
mt trng thi, n s gi n thnh phn Scorer, n s phn tch cc thng tin c
trng ca trng thi ri p dng cc php ton tnh im s.

Trang | 63

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
CHNG 4. XY DNG CHNG TRNH TH NGHIM
4.1. Ci t Sphinx
4.1.1. Chun b h iu hnh
Linux l mi trng h iu hnh thch hp nht ci t Sphinx v thc hin
hun luyn. Trong cc h iu hnh Linux th Ubuntu c xem nh l h iu hnh
ph bin. Sphinx d dng c ci t trong h iu hnh Ubuntu, nhm tng th ci
t trong Windows dng CYGWIN [10] nhng cha thnh cng.
4.1.2. Chun b cc gi ci t Sphinx
Cc gi bao gm:
Pocketsphinx mt th vin nhn dng vit bng ngn ng C.
Sphinxbase gi th vin nn, h tr cc th vin cn thit cho cc gi khc
Sphinx4 gi h tr nhn dng vit bng java
CMUclmtk b cng c xy dng m hnh ngn ng
Sphinxtrain b cng c hun luyn m hnh ng m
Cc gi ci t c th c ti trc tip t trang ch ca CMU Sphinx [11]
4.1.3. Ci t Sphinx
To mt th mc tn sphinx trong Home folder (trong my o Ubuntu). Chp
cc file (Sphinxbase, Sphinxtrain, Pocketsphinx, CMUclmtk) va download trong mc
trn vo , gii nn. (lu xa i ch s version sau khi extract).
S dng ca s Terminal trong Ubuntu: Ctrl+Atl+t.
Nhp vo sudo apt-get update sau nhp vo password ca root user
(password s khng hin ln, nhp cn thn v nhn Enter). Lnh trn update chn
b cho cc gi ci t cn dng bng lnh apt-get. Ch update xong, nhp vo: cd
sphinx di chuyn ti th mc sphinx va to.
Ci t cc gi cn thit trc khi ci SphinxBase:
G cc lnh:
- sudo apt-get install bison, ng ti v ci bison
- sudo apt-get install autoconf
- sudo apt-get install automake
- sudo apt-get install libtool

Trang | 64

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
a. Ci t SphinxBase
Nhp: cd sphinxbase i vo th mc sphinxbase.
G cc lnh sau v ch thi hnh:
- ./autogen.sh
- ./configure
- make
- sudo make install
b. Ci t Sphinxtrain
T th mc sphinxbase trn, g lnh chuyn sang th mc sang th mc
sphinctrain : cd ../sphinxtrain. G cc lnh sau v ch thi hnh:
- ./configure
- make
- sudo make install
c. Ci t PocketSphinx
Chuyn sang th mc pocketsphinx, g cc lnh sau v ch thi hnh:
- ./autogen.sh
- ./configure
- make
- sudo make install
g tip lnh sau vo Terminal: sudo ldconfig h iu hnh thc hin cp
nht cc thc vin ng.
Chi tit qu trnh ci t c th tham kho theo ngun [12].


Trang | 65

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

Hnh 4.1 Th mc sphinx cha cc file va ti v cc th mc sau khi gii nn, i tn
4.2. Chun b b hun luyn cho Sphinx
To mt th mc hun luyn, mang tn g c ngha cho b d liu hun
luyn, v d: docso.
Trong to 2 th mc con l etc, wav.
Sau to cc file nh cu trc sau:
etc
|___ your_db.dic - b t in m v, m tit
|___ your_db.phone - file cha danh sch cc m v
|___ your_db.lm.DMP - M hnh ngn ng
|___ your_db.filler - Danh sch cc khong lng
|___ your_db_train.fileids - Danh sch cc file hun luyn
|___ your_db_train.transcription D liu dng text ca file hun luyn
|___ your_db_test.fileids - Danh sch cc fille test
|___ your_db_test.transcription - Bn text ca file test

wav
|___ speaker_1
|___ file_1.wav - file thu m mt cu ni ca ngi hun luyn
Trang | 66

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
|___
|___ speaker_2
|___ file_2.wav
a. Phonetic Dictionary (your_db.dic)
File ny cha ni dung v cch pht m ca mt t trong b hun luyn. Vd: t
HELLO c pht m bng s kt hp ca cc m v sau:
HH AH L OW (theo nh trang ch Sphinx v d). Khi , trong file ny s ghi
l:
- HELLO H AH L OW
Mi mt dng trong file l nh ngha cch c ca mt t.
File ny c phn bit k t hoa thng. thng thng xy dng c file
ny, cn tm hiu v cch pht m ca mt t trong mt ngn ng nht nh. Nu l
ting Anh th h c cch c cho t ting Anh c trong t in. y cng lm 1 bc
quan trng xy dng thnh cng b hun luyn.
Trong ting Vit, cch c v cc vit mt t l gn nh gng lin vi nhau.
Khng cn c hng dn cch c khi hc ting Vit, trong ting Anh cch c v
cch vit khng ph thuc nhau, vd lead (dn u) & head (ci u). V d: mun
xy dng file ny cho ting Vit, ta c th nh ngha cc t bng nhiu cch nh sau:
- BAN B A N
Vi cch trn, ta xem t BAN l mt m tit vi s kt hp ca 3 m v l B,
A, N.
- BAN B AN
Vi cch trn, ta xem t BAN l mt m tit vi s kt hp ca 2 m v l B,
AN.
Sphinx khng h tr nh ngha dng word-base, ngha l cch c ca mt
t khng c chnh l t . Vd: BAN BAN l khng c cho php. Tuy nhin c
th lm mt phng php tng ng thay th nu mn xy dnh theo kiu word-
base. Khi phi nh ngha t theo kiu 1 t c nhiu cch c, v d:
BAN BAN BANG
Trang | 67

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
ngha ca dng nh ngha trn l t ban c th c theo 2 cch l ban
(cch c ng chun) hoc c l bang (cch c ngi min nam).
Ch c dng cc k hiu a-z, A-Z, 0-9 m bo khng gy li cho file
ny.
Vn thanh iu c gii quyt ti y:
Ta s xem cc m v i chung vi thanh iu s l mt m v c lp. khi
thay v xem thanh iu nh mt m v khc theo cch nh ngha sau (nh ngha cho
t bn):
BA3N B A 3 N
Ta s xem m la mt m v khc, c lp vi m a khi ta nh ngha nh
sau:
BA3N B A3 N
Theo mt s bi kha lun cao hc th phng php ny cho ra kt qu nhn
din tt hn cho m tit c thanh iu.
b. Phoneset file (your_db.phone)
File ny cha tt cc cc m v s dng trong file trn, mi mt dng l mt m
v, nn sp xp cc m v theo th t Sphinx d qun l (yu cu ny c
cp r rng trong vic hun luyn HTK, trong Sphinx khng thy ni). Lu thm
mt m v c bit vo file ny l SIL, m v i din cho khong lng.
c. Language model file (your_db.lm.DMP)
File ny nn nh dng ARPA hoc DMP. Vn v to file DMP ny nhm
s nghin cu sau, ni dung file ny nh ngha ng php cho cc cu ni nhn dng,
c dng hun luyn, test, v chy chng trnh. File ny c to t ng bng
nhiu cng c khc nhau.
d. Filler dictionary (your_db.filler)
Cha cc m tit dng lm y, thng thng l cc khong lng, chng
ta c th d dng nh ngha file ny nh sau:
<s> SIL
</s> SIL
<sil> SIL
Trang | 68

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
e. Danh sch file training (your_db_train.fileids)
L file text cha ni dung v ng dn ti file thu m (file .wav). nm trong
th mc wav, c trong s th mc trnh by phia trn. Vd:
speaker_1/file_1
Khng ghi ui file wav vo. Mi mt dng l mt file.
f. Ni dung file wav bng text transcript (your_db_train.transcription)
y l phn ni dung m file wav mnh thu m c, hun luyn cho
Sphinx hiu nhng g mnh ni, th mnh cn cung cp mt file text gip cho
Sphinx hiu c mnh ni g v hc t . Cu trc mt file transcript gm nhiu
dng, mi mt dng l ni dung ca mt file wav km theo tn file wav . Vd:
<s> hello word </s> (file_1)
Lu mi cu ni cn c b trong tag <s> </s>.
Thc hin tng t vi your_db_test.fileids v your_db_test.transcription.
4.3. Cch thc thu m
c c cc file wav, file hun luyn, ta phi thu m, thu cng nhiu cng
tt. s lng t cn hun luyn v s ting (hour) cn thu nh sau:
ng 4 Cc thng s tng ng vi ln ca b hun luyn
Vocabulary Hours in db Senones Densities Example
20 5 200 8 Tidigits Digits Recognition
100 20 2000 8 RM1 Command and Control
5000 30 4000 16 WSJ1 5k Small Dictation
20000 80 4000 32 WSJ1 20k Big Dictation
60000 200 6000 16 HUB4 Broadcast News
60000 2000 12000 64 Fisher Rich Telephone Transcription
Thu m l mt qu trnh i hi s kin nhn v cn thn ca ngi thc hin.
y cng chnh l im kh khn nht khi thc hin kha lun ny. Cng c dng
thu m l Audacity [13]. Loi micro dng trong thu m nn chn tt nht l loi
headphone. Mi trng thu m cn s yn tnh, lu : thit lp tn s ly mu l
16kHz, nh dng m 16 bit mono (nu dng nhn dng trn my tnh) v l 8kHz,
Trang | 69

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
16 bit mono (dnh cho nhn dng trn thit b di ng), tt loa my tnh khi thu,
micro hi di ming trnh hi th t mi ra lm nhiu tn hiu. Cc thng tin cn
thit chun b cho vic thu m c th c tm hiu chi tit ti VoxForge [14].
4.4. Tin hnh hun luyn m hnh bng Sphinx
Sau khi chun b mt folder train (th mc cha ton b cc file chun b bn
trn,file m thanh, ngi ta gi tn folder hun luyn l task folder) nh phn trnh
by trn. Tip theo ta s dng mt s lnh ca Sphinx Train to t ng cc m
lnh hun luyn (Training Script). M lnh hun luyn c nhim v thc hin ton b
cc cng on hun luyn bao gm: Tin x l tn hiu m thanh, rt trch t trng
m hc, xy dng, c lng m hnh HMM nh thut ton Baum-Welch,..
bt u khi tao cc th mc chun b (cc thc mc ny Sphinx dng cho
qu trnh hun luyn, to t ng) v cc file Script hun luyn. Ta thc hin dng
lnh sau vo Command Line trong Linux:
- Dnh cho Sphinxtrain t bn 1.0.7 tr v trc:
../SphinxTrain/scripts_pl/setup_SphinxTrain.pl -task [task_folder_name]
../pocketsphinx/scripts/setup_sphinx.pl -task [task_folder_name]
- Dnh cho Sphinxtrain bn snapshot:
sphinxtrain -t [task_folder_name]setup
Thc hin dnh lnh trn, Sphinx s t ng to cho ta cc th mc do
Sphinxtrain chun b thc hin hun luyn:
bin (c th khng xut hin trong bn Sphinxtrain mi)
bwaccumdir
etc
feat
logdir
model_parameters
model_architecture
python (c th khng xut hin trong bn Sphinxtrain mi)
scripts_pl (c th khng xut hin trong bn Sphinxtrain mi)
wav
Trang | 70

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Sau khi to thnh cng cc th mc nh trn, ta thc hin thao tc chnh sa
mt s thng s chun b bc vo tin hnh hun luyn. M tp tin
etc/sphinx_train.cfg tm ti cc dng sau v thay i thng s.
a. Ci t nh dng m thanh ca file hun luyn
$CFG_WAVFILES_DIR = "$CFG_BASE_DIR/wav";
$CFG_WAVFILE_EXTENSION = 'sph';
$CFG_WAVFILE_TYPE = 'nist'; # one of nist, mswav, raw
Thay i thng s sph thnh wav v nist thnh mswav nh sau:
$CFG_WAVFILES_DIR = "$CFG_BASE_DIR/wav";
$CFG_WAVFILE_EXTENSION = 'wav';
$CFG_WAVFILE_TYPE = 'mswav'; # one of nist, mswav, raw

b. Ci t ng dn n cc file chun b
Kim tra xem cc thng s sau y c thay i hay khng so vi th mc hin
ti, y l ng dn do Sphinx t to ra truy cp n cc file m ta chun b,
trong $CFG_DB_NAME l tn task folder ca ta pha trn.
$CFG_DICTIONARY = "$CFG_LIST_DIR/$CFG_DB_NAME.dic";
$CFG_RAWPHONEFILE = "$CFG_LIST_DIR/$CFG_DB_NAME.phone";
$CFG_FILLERDICT = "$CFG_LIST_DIR/$CFG_DB_NAME.filler";
$CFG_LISTOFFILES = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.fileids";
$CFG_TRANSCRIPTFILE = "$CFG_LIST_DIR/${CFG_DB_NAME}_train.transcription"

c. Tinh chnh kiu v cc tham s ca m hnh hun luyn
$CFG_FINAL_NUM_DENSITIES = 8;
S lng senones ph thuc vo ln ca b t vng cng nh tp tin m
thanh hun luyn, s lng senone thch hp c th tra cu trong (Bng 4 Cc thng
s tng ng vi ln ca b hun luyn). Cc thng s trong bng trn ch mang
tnh cht tham kho, thc t chng ta c th thc nghim b hun luyn vi cc mc
senones khc nhau mang li kt qu cao nht.
# Number of tied states (senones) to create in decision-tree clustering
$CFG_N_TIED_STATES = 1000;
Trang | 71

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

Cng on cui cng l thc hin m hun luyn sau:
i vi Sphinxtrain t bn 1.0.7 v trc
./scripts_pl/make_feats.pl -ctl etc/an4_train.fileids
./scripts_pl/make_feats.pl -ctl etc/an4_test.fileids
./scripts_pl/RunAll.pl
i vi Sphinxtrain bn snapshot
sphinxtrain run
Trong qu trnh hun luyn nu xy ra li ch yu l do file chun b (phone,
transcript, fileid,..) cha ng, cc li c ghi li c th trong 1 file log c tn
[task_folder_name].html nm trong thc mc task folder.
Phn ci t v hng dn hun luyn c trnh by chi tit trong ti liu [12].
4.5. S dng Sphinx trong chng trnh Java
Sphinxtrain l b cng c dng xy dng m hnh m hc (Acoustic
Model), M hnh ny bao gm cc cc file tham s HMM s dng trong qu trnh
nhn dng. Ta c th s dng m hnh ny thit k mt ng dng nhn dng. CMU
Sphinx cung cp nhiu la chn a dng ta c th xy dng ng dng. Nu ta
mun xy dng ng dng trn cc thit b di ng (mobile, tablet,..) ta c th dng gi
pocketsphinx trin khai. Ngc li, xy dng mt ng dng trn nn tng PC, ta
s dng Sphinx4. Sphinx4 c vit vi ngn ng Java, nhm mc nh thit k ng
dng c th chy c trn cc nn tng h iu hnh khc nhau. xy dng ng
dng s dng m hnh m hc va c hun luyn tao c th lm theo cc bc
trong hng dn [15].
Trang | 72

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
4.6. Ci t HTK
4.6.1. Hng dn ci t HTK framework:
Hng dn ny c thc hin trn Linux v HTK version 3.4
Chun b: HTK-3.4.1.tar.gz
Copy source code trn vo th mc ci t. y ly v d gii nn ti thu
mc home ca linux.
Cc bc ci t:
Gii nn HTK-3.4.1.tar.gz ra th mc no . Chuyn ti th mc home thc
hin cu lnh gii nn sau:
tar -xvzf HTK-3.4.1.tar.gz
Sau ta c th mc HTK trong home.Chuyn n th mc HTK thc hin
cc lnh sau:
#$./configure prefix=/home/yourusername/htk
#make all
#make install
Cc lnh trn s bin dch source code thnh cc file trong th mc bin
Vic tip theo l phi thc hin set path ca linux ti th mc bin ny chng ta c
th thc chy cc tool ca HTK trong ca s Terminal ca linux:
-M file .profile trong th mc home
-Thm dng sau vo cui file .profile ri save li
PATH="$PATH:/home/hoavo/HTK/htk/bin"
-Kim tra xem set path ng cha bng cch g lnh HInit trong ca s
Terminal.
4.6.2. Chun b cc th mc cn thit cho qu trnh training
hmm0 hmm15: cc th mc cha cc file hmm cc bc hun luyn
cfg (config): cha cc file config cho mt s lnh trogn qua trnh training.
ins (instruction): cha cc file .hed v .led.
mlf (master label file): cha cc file .mlf.
ph (phones): cha cc file phones: mono, tri.
pl (Perl script): cha cc file script vit bng Perl.
Trang | 73

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
txt (other files): cha cc file linh tinh nh t in, danh sch file, wdnet, gram,
train
wave: cha cc file Wave.
Mfc : Cha cc file mfc
Chun b cc file hun luyn:
a. Chun b file t in dict.txt
Trong file t in gm 2 phn: phn t v cch phin m tng ng ca t
.Chng ta c bic coi trng cch phin m nng cao hiu qu nhn dng. Fifle
ny c to bng tay v tuy bin cch pht m sao cho t hiu qu cao nht.
Lu : V trong cu ni chung ta thng c khong lng gia cc t nn
nhn din cho hiu qu, trong b t in ta nh ngha thm phn phin m cho cc t
# dict.txt
anh a nh
ba b a
barn b ar n
bary b ar y
bawsn b aws n
beejnh b eej nh
boosn b oos n
caf c af
car c ar
chaajm ch aaj m
d. Chun b file prompts.txt :
Thc cht file ny l transcript ca cc file wav c ni dung l transcript ca
cc file wav v tn tng ng ca file wav . C 2 cch to file prompts:
- Dng tool HSGen c sn trong HTK pht sinh ngu nhin file prompts
t cc t vng c sn trong t in.
- Son tho bng tay cc file ny da vo t in nh ngha trn.
e. Thu m cc file di dng wav a vo qu trnh training v test.
Thu m theo ni dung ca file prompts.txt v nh s th t tng ng cho cc
file wave. Khi thu m ta nn dng chng trnh: Audacity vi cc thit lp v lu
nh sau:
- Chnh li microphone boost trong record device v 0
Trang | 74

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
- Tt nht l iu chnh cho m thanh thu trong khong -0.5 v 0.5 dB
- Chnh li ch thu m trong Audacity l mono vi Rate format l
48000Hz v format 16bit
Lu file vi nh dng : WAV (Microsoft 16 bit PCM)
f. Chun b file proto.
y l file mu dng hun luyn cho m hnh HMM(file ny dng theo
hng dn ca HTK book )
#proto
~o <VecSize> 39 <MFCC_0_D_A>
~h "proto"
<BeginHMM>
<NumStates> 5
<State> 2
<Mean> 39
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
<Variance> 39
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
<State> 3
<Mean> 39
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
<Variance> 39
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
<State> 4
<Mean> 39
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
<Variance> 39
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
Trang | 75

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
<TransP> 5
0.0 1.0 0.0 0.0 0.0
0.0 0.6 0.4 0.0 0.0
0.0 0.0 0.6 0.4 0.0
0.0 0.0 0.0 0.7 0.3
0.0 0.0 0.0 0.0 0.0
<EndHMM>
g. Chun b "mkphones0.led" va "mkphones1.led" (bng tay)
#mkphones0.led
EX
IS sil sil
DE sp
Gii thch:
- EX: Thay th mi t trong words.mlf bng phin m tng ng trong t in dict.
- IS: Chn m hnh lng (silence - sil) vo u v cui ca mt t.
- DE: Xa tt c cc short pause (sp) c thm vo sau lnh EX.
//mkphones1.led
EX
IS sil sil
h. Chun b file "sil.hed" dng cho training bc to m hnh sp
AT 2 4 0.2 {sil.transP}
AT 4 2 0.2 {sil.transP}
AT 1 3 0.3 {sp.transP}
TI silst {sil.state[3],sp.state[2]}
i. Chun b file "mktri.led" to triphoned
#mktri.led
WB sp
WB sil
TC
Trang | 76

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
4.6.3. Cc bc chun b cho qu trnh training.
To file "wordlist.txt" : cha tt c cc t trong file "prompts.txt":
Thc hin bng cu lnh:
perl pl/prompts2wlist.pl txt/prompts.txt txt/wordlist.txt
u vo: prompts.txt :file cha cc transcript ca cc file wave training
u ra: file wordlist c cu trc nh sau:
# wordlist.txt
ba
bary
boosn
chisn
hai
khoong
moojt
nawm
sasu
tasm
Sau v xc nh u l bt u v kt thc mt cu, ta tin hnh nh
ngha thm vo trong wordlist.txt cc t sau theo ng th t sp xp ca n trong
bng ch ci.
SENT-END
SENT-START
trn, t in lu cc t hun luyn, nhng trong HTK th hun luyn
cp thp nht l cp m tit, nn chng ta phi chuyn n thnh danh sch cc m tit
l danh sch cc phone c dng phin m trong t in bng bc sau:
To file "monophones.txt" v "mydict.txt" t file "wordlist.txt" v "dict.txt"
Lnh thc hin:
HDMan -A -D -T 1 -m -w txt\wordlist.txt -n ph\monophones1.txt -l log\dict.log txt\mydict.txt
dict.txt
u vo:
- wordlist .txt: n gin l danh sch cc t c s dng trong wordnet, mi
t mt dng c cu trc nh trn.
Trang | 77

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
- dict.txt: L file t in m chng ta dng nh ngha cc t hun luyn
u ra:
- mydict.txt: l t v phin m cc t m c trong hun luyn, c to ra
bng cch kt hp 2 file wordlist.txt v dict.txt
- dict.log: File lu li cc thng bo kt qu thc hin ca tool HTK
- monophones1.txt : File cha danh sch cc m tit c dng phin m
cho t in
# monophones1.txt
a
nh
sp
b
ar
n
y
aws
eej
oos
hun luyn m hnh HTK vi thun cc t m khng c sp th ta to thm
file monophones0.txt c ni dung ging nh file monophones1.txt nhng xa b sp
i.
To file "words.mlf" t file "prompts.txt":
HTK khng s dng file prompts cho vic nhn dng. Ta cn to ra mt s file
khc, c th l dng file MLF (Master Label File, tham kho thm trong HTK Book).
Tp tin "words.mlf" thc cht l trin khai mc t (word) ca tp tin
"prompts.txt" theo chun MLF. Thc hin bng lnh:
perl pl/prompts2mlf.pl mlf/words.mlf txt/prompts.txt

#!MLF!#
"*/hoa2_01.lab"
cuoojn
xuoosng
Trang | 78

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
xosa
lijch
suwr
susng
guwri
thuw
haxy
ddosng
taast
..
To 2 file "phones0.mlf" v "phones1.mlf":
Hai file "phones0.mlf" v "phones1.mlf" c ni dung ging nhau ch khc
ch phones0.mlf ch c "sil" cn phones1.mlf c thm "sp" ging nh hai file
monophone1.txt v monophone0.txt tng ng trn
Thc ra, phones#.mlf l dng khai trin ca words.mlf mc m.
HLEd -A -D -T 1 -l * -d txt\mydict.txt -i mlf\phones0.mlf ins\mkphones0.led mlf\words.mlf
HLEd -A -D -T 1 -l * -d txt\mydict.txt -i mlf\phones1.mlf ins\mkphones1.led mlf\words.mlf
u vo:
- mydict.txt : t in ta c t trc.
- words.mlf: va c to trn.
- mkphones#.led : cha cc lnh script chuyn words.mlf thnh
phones#.mlf
u ra:
- phones#.mlf:
Lu :
phones0.mlf khng cha m sp, cn phones1.mlf th c. Thm sp vo chc vi
mc ch tng tnh hiu qu cho qu trnh nhn dng sau ny.
To file "listwavmfc.scp" : cha ng dn ti cc file wave v cc mfc tng
ng cho file wave .
perl pl\listwavmfc.pl train\wav train\listwavmfc.scp

Trang | 79

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
u vo:
- wave: th mc cha cc tp tin .wav
u ra:
- Listwavmfc.scp: tp tin text cha danh sch a ch cc file wave.
To danh sch file .mfc tng ng cho cc file .wav
Ti bc ny, cc file m thanh m ta thu s c rt c trng. HTK h tr
2 dng c trng MFCC v LPC. y ta s dng cc c trng MFCC. Cc
thng tin cu hnh khc c lu trong tp tin cu hnh HCopy.cfg
#config_HCopy.txt
#coding parameters - HCopy
SOURCEKIND = WAVEFORM
SOURCEFORMAT = WAV
TARGETKIND = MFCC_0_D_A
TARGETRATE = 100000.0
SAVECOMPRESSED = T
SAVEWITHCRC = T
WINDOWSIZE = 250000.0
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 26
CEPLIFTER = 22
NUMCEPS = 12
ENORMALISE = F
Cu lnh thc hin vic rt trch c trng
HCopy -A -D -T 1 -C cfg/HCopy.cfg -S train\listwavmfc.scp
u vo:
- HCopy.cfg: Tp tin cha cc tham s cu hnh rt trch c trng nh
trnh by trn.
u ra:
- Listwavmfc.scp: cha danh sch file wave v file mfc tng ng.


Trang | 80

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

Tao file "train.scp" : cha danh sch ng dn n cc file mfc m chng ta
va rt trch bc trn.
perl pl\mkTrainFile.pl train\mfc train\train.scp
u vo:
- mfc : th mc cha cc tp tin .mfc.
u ra:
- train.scp: tn tp tin cha danh sch file .mfc.
4.6.4. Giai on hun luyn
1. To Flat Start Monophones
Ti bc ny, chng ta s nh ngha ra mt prototype cho HMM. Vic gn
thng tin no cho prototype l khng quan trng, ch yu l xy dng mt ci khung.
Mt m hnh tt m HTK Book xut l m hnh 3 trng thi tri gia phi tun
t.

HCompV -A -D -T 1 -C cfg\HCompV.cfg -f 0.01 -m -S train\train.scp -M hmm0 proto
u vo:
- HCompV.cfg: tp tin cu hnh HCompV s dng(xem ni dung trong
file) .
- -f 0.01: yu cu xut file vFloor cha vector floor c gi tr bng 0.01
vector variance.
- -S train.scp: cha danh sch cc tp tin c trng mfc.
- -M hmm0: th mc m HCompV s dng cha proto (phi c to
trc).
- proto.txt: tp tin cha cu trc proto nh phn trn trnh by (nh l
lu trong th mc hmm0).
Trang | 81

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Sau khi chy HCompV, hai tp tin proto v vFloors c to ra trong th mc
hmm0.
2. To macros t ng
perl pl\mkMacrosFile.pl hmm0\vFloors hmm0\macros
u vo:
- hmm0/vFloors: file vFloors c to t lnh HCompV trn.
u ra:
- hmm0/macros: file macros cn to.
3. To hmmdefs t ng
perl pl\mkHmmdefsFile.pl hmm0\proto ph\monophones0.txt hmm0\hmmdefs
u vo:
- hmm0/proto: tp tin proto c c t bc trc.
- monophones0: tp tin monophones0 c t bc trc.
u ra:
- hmm0/hmmdefs: tn tp tin hmm s c lu li.
4. c lng cc tham s trong hmmdefs.
HERest -A -D -T 1 -C cfg\HERest.cfg -I mlf\phones0.mlf -t 250.0 150.0 1000.0 -S
train\train.scp -H hmm0\macros -H hmm0\hmmdefs -M hmm1 ph\monophones0.txt
u vo:
- C HERest.cfg: tp tin cu hnh
- I mlf/phones0.mlf: tp tin MLF c to t trc.
- t 250.0 150.0 1000.0: tham s prunning.
- S txt/train.scp: danh sch cc file .mfc.
- H hmm0/macros: va to.
- H hmm0/hmmdefs: va to.
- ph/monophones0: danh sch cc phones (ngoi tr sp).
u ra:
- M hmm1: cha tp tin hmmdefs v macros mi.
Sau khi c hmm1, ta hun luyn tip hmm2 v hmm3 bng HERest. Lu
l khi hun luyn hmm bc sau th da vo hmm ca bc trc
Trang | 82

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
HERest -A -D -T 1 -C cfg\HERest.cfg -I mlf\phones0.mlf -t 250.0 150.0 1000.0 -S
train\train.scp -H hmm1\macros -H hmm1\hmmdefs -M hmm2 ph\monophones0.txt

HERest -A -D -T 1 -C cfg\HERest.cfg -I mlf\phones0.mlf -t 250.0 150.0 1000.0 -S
train\train.scp -H hmm2\macros -H hmm2\hmmdefs -M hmm3 ph\monophones0.txt
5. Xy dng m hnh sp tu file "sil.hed" v hiu chnh m hnh sp
-Copy cc file trong th mc hmm3 vo th mc hmm4
-M file hmmdefs trong th mc hmm4 v chnh sa nh sau:
-Copy m hnh sil ri past xung cui file sa li thnh m hnh sp
-Trong m hnh sp xa 2 state 2 v 4 i
-Thay i <NUMSTATES> thnh 3
-Thay i <STATE> thnh 2
-Thay i <TRANSP> thnh 3
-Thay i ma trn trong <TRANSP> thnh
0.0 1.0 0.0
0.0 0.9 0.1
0.0 0.0 0.0
To ra m hnh hmm mi c chnh sa bng lnh:
HHEd -A -D -T 1 -H hmm4/macros -H hmm4/hmmdefs -M hmm5 ins/sil.hed
ph/monophones1.txt
u vo:
- hmm4/macros :file macro c to ra t cc bc trc
- hmm4/hmmdefs: file hmmdefs sau khi thm m hnh sp
- ins/sil.hed :file cha thng s hiu chnh hmm mi
- ph/monophones1.txt : danh sch cc phones (ngoi tr sp).
u ra:
- hmm5: th mc cha hmmdefs v macro mi

6. Trainning m hnh mi hiu chnh sp bc trn .
HERest -A -D -T 1 -C cfg/HERest.cfg -I mlf/phones1.mlf -t 250.0 150.0 3000.0 -S
train/train.scp -H hmm5/macros -H hmm5/hmmdefs -M hmm6 ph/monophones1.txt
Trang | 83

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

HERest -A -D -T 1 -C cfg/HERest.cfg -I mlf/phones1.mlf -t 250.0 150.0 3000.0 -S
train/train.scp -H hmm6/macros -H hmm6/hmmdefs -M hmm7 ph/monophones1.txt
7. Canh chnh li d liu hun luyn
Trong t in pht m c mt s t c nhiu kiu pht m khc nhau. bc
trc, HLEd chn ty mt trong cc kiu pht m. bc ny, chng ta s canh
chnh li tp tin transcription words.mlf. N s chn cch phin m no khp nht so
vi d liu ng m.
Ch : Ta c th b qua bc ny nu t in ca chng ta nh v to mt file
mlf/aligned.mlf copy ni dung t file worlds.mlf thc hin cho cc bc tip theo.
Nhng s khng cho hiu qu nhn din cao.
HVite -A -D -T 1 -l * -o SWT -b sent-end -C cfg/HERest.cfg -H hmm7/macros -H
hmm7/hmmdefs -i mlf/aligned.mlf -m -t 250.0 150.0 1000.0 -y lab -a -I mlf/words.mlf -S
train/train.scp txt/mydict.txt ph/monophones1.txt > Hvite_log
8. Training 2 bc na
HERest -A -D -T 1 -C cfg/HERest.cfg -I mlf/aligned.mlf -t 250.0 150.0 3000.0 -S
train/train.scp -H hmm7/macros -H hmm7/hmmdefs -M hmm8 ph/monophones1.txt

HERest -A -D -T 1 -C cfg/HERest.cfg -I mlf/aligned.mlf -t 250.0 150.0 3000.0 -S
train/train.scp -H hmm8/macros -H hmm8/hmmdefs -M hmm9 ph/monophones1.txt
9. To triphones,wintri.mlf t aligned.mlf
HLEd -A -D -T 1 -n ph/triphones1 -l * -i mlf/wintri.mlf ins/mktri.led mlf/aligned.mlf
u vo:
- ins/mktri.led: cha lnh to triphones t monophones
- mlf/aligned.mlf: monophones transcription c c lng li.
u ra:
- -n ph/triphones1: danh sch cc triphones.
- -i mlf/wintri.mlf: triphones transcription.
10. To "mktri.hed"
perl pl/mkTriHed.pl ph/monophones1.txt ph/triphones1 ins/mktri.hed
11. Tao mo hinh HMM moi theo triphones1
Trang | 84

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
HHEd -A -D -T 1 -H hmm9/macros -H hmm9/hmmdefs -M hmm10 ins/mktri.hed
ph/monophones1.txt
u vo:
- H hmm9/macros -H hmm9/hmmdefs: hmm ca monophones.
- Ins/mktri.hed: tp tin cha lnh thc hin tri buc cc ma trn
chuyn ca mi triphone trong tp tin triphones1.
- B: lu tr hmmdefs dng nh phn thay v text (gim khng gian
chim dng).
u ra:
- M hmm10: hmm10 c hun luyn thnh triphones.
12. Training 2 vong
HERest -A -D -T 1 -C cfg/HERest.cfg -I mlf/wintri.mlf -t 250.0 150.0 3000.0 -S
train/train.scp -H hmm10/macros -H hmm10/hmmdefs -M hmm11 ph/triphones1

HERest -A -D -T 1 -C cfg/HERest.cfg -I mlf/wintri.mlf -t 250.0 150.0 3000.0 -s stats -S
train/train.scp -H hmm11/macros -H hmm11/hmmdefs -M hmm12 ph/triphones1
13. Tao full list + full hmm
HDMan -A -D -T 1 -b sp -n fulllist -g global.ded -l flog dict-tri dict.txt
Sau to mi 1 file fulllist1 ri copy ht ni dung ca file fulllist v file
triphones1 vo sau chy lnh sau xa ht cc dng trung nhau trong file fulllist1
$perl fixfulllist.pl fulllist1 fulllist
Tip tc to file tree.hed
perl pl/mkTree.pl TB 350 ph/monophones0.txt ins/tree.hed
14. Tao mo hinh HMM moi
HHEd -A -D -T 1 -H hmm12/macros -H hmm12/hmmdefs -M hmm13 ins/tree.hed
ph/triphones1
u vo :
- H hmm12/macros -H hmm12/hmmdefs: hmm to bc trc
- tree.hed: l tp hp cc ch th tm kim cc ng cnh ph hp cho vic
gom nhm.
- ph/triphones1: danh sch cc triphones
Trang | 85

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
u ra:
- M hmm13: Th mc cha m hnh hmm mi
15. Training 2 vong
HERest -A -D -T 1 -C cfg/HERest.cfg -I mlf/wintri.mlf -s stats -t 250.0 150.0 3000.0 -S
train/train.scp -H hmm13/macros -H hmm13/hmmdefs -M hmm14 tiedlist

HERest -A -D -T 1 -C cfg/HERest.cfg -I mlf/wintri.mlf -s stats -t 250.0 150.0 3000.0 -S
train/train.scp -H hmm14/macros -H hmm14/hmmdefs -M hmm15 tiedlist

4.6.5. Kim tra d liu hun luyn
1. Tao file "listwavmfc.scp" : chi duong dan cho cac file wave va cac file
mfc tuong ung cho file wave do
perl pl/listwavmfc.pl test/wav listwavmfc_test.scp
2. Tao danh sach file .mfc tuong ung cho tung file .wav
HCopy -T 1 -C cfg/HCopy.cfg -S listwavmfc_test.scp
3. Tao file "test.scp" : chua danh sach duong dan cac file .mfc
perl pl/mkTrainFile.pl mfc test/test.scp
4. Testing
HVite -T 1 -C cfg/HVite.cfg -H hmm15/macros -H hmm15/hmmdefs -S test/test.scp -i
test/recout.mlf -w wdnet txt/mydict.txt tiedlist
Gii thch
C cfg/Hvite.cfg: u vo, tp tin cu hnh.
-H hmm15/macros -H hmm15/hmmdefs: u vo
-S test/test.scp: u vo, tp tin cha danh sch cc file .mfc cn nhn dng.
-i test/recout.mlf: u ra, transcription nhn dng c.
-w txt/wdnet.txt: u vo, wordnet c to t nhng bc u.
txt/dict.dct: u vo, t in phin m.
tiedlist: u vo, danh sch phones to c t lnh CO tiedlist trong
tree.hed.


Trang | 86

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Lu
Vi vic cu to triphones theo kiu word internal nh ni phn trc, trong
tp tin cu hnh Hvite.cfg cn c thm 2 tham s FORCECXTEXP = T v
ALLOWXWRDEXP=F. Mun hiu ti sao, xem chng 12 HTK Book.
C thm mt vi tham s ca Hvite nh p, s, ty ngi dng iu chnh.
4.6.6. Kt qu t c
-Vi 500 file wave d liu hun luyn v test trn 100 file wave th kh nng
nhn din ca chng trnh thu c nh sau:
------------------------ Overall Results --------------------------
SENT: %Correct=22.80 [H=114, S=386, N=500]
WORD: %Corr=99.78, Acc=87.55 [H=3991, D=0, S=9, I=489, N=4000]
4.7. Trin khai ng dng demo
4.7.1. ng dng iu khin Google Chrome
a. Gii thiu
y l ng dng demo s dng ging ni iu khin trnh duyt web
Google Chrome, ngi dng s thao tc vi trnh duyt thng qua ging ni, demo h
tr thc hin khong 20 cu lnh iu khin, c ghp t 47 t n.
ng 5 Danh sch 47 t trong iu khin Google Chrome
bn duyt m sang to xa
chuyn hy mi sau ti xung
ca kha nghe s tra

cui kim ngng s tri

cun kim nhc ti trang

u li nh tp trnh

i ln phi thu tr

lch phng th trc

ng lu qup tm t

T cc t n ny, ta c th gho li thnh cc cu lnh iu khin c ngha,
vd: m tp (tab) mi, m ca s, m trang qup (web). Danh mc cc cu iu khin
c th tham kho thm trong phn ph lc.
Trang | 87

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
b. Chng trnh
Chng trnh c vit bng ngn ng C#, kt hp th vin Julius.dll v m
hnh m hc c hun luyn t cng t HTK, bao gm 2 module chnh l: module
nhn dng v module iu khin trnh duyt.

Hnh 4.2 M hnh hot ng ca chng trnh demo
Module nhn dng s dng cc hm do b th vin Julius.dll cung cp thc
hin cng on nhn dng, s dng m hnh m hc c hun luyn phn trn (s
dng t phn hun luyn ca HTK), kt hp vi m hnh ngn ng. Sau khi nhn
dng ra kt qu s chuyn sang dng text (k t) v cui cng chuyn ni dung nhn
dng c sang module iu khin.

Hnh 4.3 M hnh s Module nhn dng
Trang | 88

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Module iu khiu sau khi c gi v truyn vo tham s l kt qu ting ni
nhn dng c s thc hin cng on iu khin da vo ni dung . Module ny
c xy dng nh mt th vin lin kt ng (dll) tin cho vic nng cp iu
khin sau ny. Hin ti, chng trnh demo ch c th iu khin c vi chc nng
c bn ca H iu hnh, trnh duyt Google Chrome v trnh nghe nhc WMPlayer.
Trong Module ny gm 1 class chnh l CommandReceiver.cs. Class ny m nhn
nhim v nhn lnh di dng text thc hin lnh iu khin. h tr vic nh x
cc cu lnh iu khin thnh lnh iu khin thc s, module ny lin kt vi mt tp
tin danh sch lnh mang tn voiceCommand.txt. Danh sch cc cu lnh trong y s
c np vo chng trnh lc khi ng, sau khi nhn c ni dung di dng text
cho ngi dng ra lnh, module nhn dng s tin hnh tra khp cc lnh c trong c
s d liu lnh va c np vo, t a ra c lnh cn tin hnh thc s. Ni
dung danh sch lnh c b sung trong phn ph lc.
Trong Module ny cn bao gm 3 class chnh thc hin nhim v iu khin
cho 3 i tng chnh ca demo ny l window, chrome v trnh nghe nhc
WMPlayer. iu khin c trnh duyt Chrome cng nh cc chng trnh khc,
chng trnh truyn cho h iu hnh Windows nhng s kin bn phm, nhng s
kin bn phm ny chnh l cc t hp phm tt s dng Chrome (vd: Ctrl + T: m
tab mi, Ctrl + W: ng tab,).

Hnh 4.4 M hnh cu to ca Module iu khin
Trang | 89

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

Hnh 4.5 Giao din chng trnh iu khin my tnh
c. Nhn xt
Kt qu nhn dng c t module nhn dng quyt nh chnh xc rt cao
ca chng trnh, nhiu khi nhn dng s gy ra cu lnh b sai lnh i cht so vi
cu lnh thc t, v d: ngi dng ni m tp mi, nhng do nhiu mi trng nn
kt qu nhn dng c th tr v hy m tp mi. Bin php khc phc n gin nht
chnh l so snh ging nhau gia cu lnh nhn dng c v cu lnh trong
mu cu thay v so khp chnh xc tng t. Nhn chung kt qu nhn dng kh chnh
xc trong mi trng khng qu n (~90%).
4.7.2. ng dng iu khin m hnh xe tng
a. Gii thiu
y l ng dng demo s dng ging ni iu khin hot ng ca mt m
hnh xe tng iu khin t xa. Ngi dng s ni nhng khu lnh iu khin di
chuyn, v.v vo microphone, v thng qua chng trnh demo, m hnh xe tng s
hot ng tng ng vi lnh c iu khin bng ging ni. Chng trnh h tr
thc hin khong hn 30 hnh thi cu lnh iu khin,c kt hp t 25 t n
Trang | 90

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
ng 6 Danh sch 25 t trong demo iu khin m hnh xe tng
ba dng mi quay ti
bn li ngng sang tri
chy ln nng su trm
i lui phi sng va
lui qua tin xoay
d. Chng trnh
Chng trnh c vit bao gm 2 module, module nhn dng mnh lnh ting
ni (ngn ng Java) v module iu khin m hnh xe tng (ngn ng C#). Chng
trnh l s k hp gia 2 module c vit bng 2 ngn ng khc nhau, c kt ni
giao tip vi nhau bng socket.
Module nhn dng ting ni hot hot ng tng t nh phn demo trn.
Trong c b sung thm phn to kt ni thng qua giao thc socket vi module
iu khin. Sau khi nhn dng c kt qu ting ni, chng trnh s truyn d liu
qua giao thc socket n vi module iu khin.
Module iu khin c vit bng ngn ng C#, module ny lm vic ging
nh mt driver cho thit b USB, thng qua module ny, my tnh s truyn lnh trc
tip xung chic Remote iu khin m hnh xe tng, v thng qua thit b Remote
ny iu khin hot ng ca m hnh chic xe tng t xa.

Hnh 4.6 M hnh xe tng s dng trong chng trnh
Trang | 91

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Thit b demo l mt m hnh xe tng iu khin t xa, km mt thit b
Remtoe, chic Remote ny c thit k v gn thm mt mch iu khin vo bn
trong, mch ny c nhim v nhn tn hiu u vo t cng USB v dng tn hiu
iu khin thit b Remote. Khi thit k hon thnh, chic Remote s c mt ng
vo USB (nh Hnh 4.7). Ta dng mt cp ni USB kt ni thit b Remote ny vi
my tnh.

Hnh 4.7 Trong hnh l iu khin xe tng vi cng USB kt ni my tnh


Hnh 4.8 Hnh chp bn trong Remote c lp thm mch iu khin qua USB
Trang | 92

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng

Hnh 4.9 Giao din chng trnh iu khin xe tng t xa bng ging ni
e. Nhn xt
Kt qu iu khin ch yu ph thuc vo module nhn dng ting ni. Nhn
dng cc t trong iu kin bnh thng (khng qu n) l 89%, chnh xc nhn
dng tng i thp hn demo trc do b nhiu ca ting n ng c (t m hnh xe
pht ra).
4.8. Thc hin so snh vi HTK
4.8.1. Gii thiu
Nh chng 3 gii thiu, HTK v Sphinx l 2 trong s nhng Framework
nhn dng ging ni m ngun s dng ph bin nht trn th gii hin nay. c
nhiu bi vit, bo co, kha lun ti Vit Nam trnh by v HTK v cho thy c
kh nng ca HTK trong ng dng nhn dng ging ni ting Vit. Mt trong nhng
phng th nghim ti Vit Nam s dng mnh m HTK l phng th nghim
AILAB trc thuc H. Khoa Hc T Nhin do TS. V Hi Qun qun l. Trong khi
, Sphinx do pht trin sau nn cn kh mi m ti Vit Nam. Phn ny s trnh by
s so snh v kh nng p dng cho ting Vit ca 2 Framework ny. Mc ch ca
phn ny l cho ta thy c nhng im khc bit c bn cng nh hiu qu tng
i ca 2 cng c nhn dng ging ni t ng.
Trang | 93

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
Vic so snh Sphinx v HTK ti Vit Nam l mt th nghim mi, nhm
chng em hiu r vic so snh kh nng ca 2 Framework ln nh cch lm ca nhm
l cn thiu st v mt khoa hc, xong vic gii quyt c mt s vn c bn
m nhm t ra nh:
- So snh c kh khn khi ci t 2 h thng.
- So snh v kh nng nhn dng Ting Vit mc c bn (khng thay i
nhiu nhng thng s tinh chnh khc).
- So snh c kh nng p dng m hnh vo chng trnh thc t i vi
ngi dng khng chuyn v lnh vc nghin cu nhn dng ting ni.
4.8.2. Thc hin
a. Chun b
Trc khi thc hin qu trnh hun luyn v th nghim, nhm ci t HTK
v Sphinx theo hng dn tham kho mc [16], [12].
Xy dng 2 b d liu dnh cho hun luyn v th nghim ging nhau dnh
cho 2 framework.
Danh snh t trong t in phin m tham kho ti bng ph lc, c 2
framework u s dng chung 1 t in ng m.
Ng php s dng trong HTK dng word-net, c sinh ra t danh sch cc
cu lnh. Ng php s dng trong Sphinx dng tri-gram, c to ra cng t danh
sch cu lnh trn. Bng danh sch cu lnh tham kho phn ph lc.
Decoder s dng trong HTK l hm HVite [16], v Sphinx l Sphinx3
Decoder.
Cc thng s thc hin trong qu trnh hun luyn ca 2 Framework c
gi mc nh.
ngha quan trng ca vic gi cc thng s mc nh: cc thng s c bn
c thit lp sn ca nh pht trin 2 framework l tng thch vi a s d liu
hun luyn ca cc ngn ng khc nhau, vic tinh chnh cc thng s ch nhm mc
ch nghin cu v lm cho m hnh m hc thch hp hn vi ngn ng xc nh.
Gi nguyn cc thng s ny th hin tnh khch quan trong qu trnh so snh. Ngi
dng mi s c nh hng nht nh v Framework no s thch hp cho ngn ng
Trang | 94

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
ca mnh hn vi cc thng s mc nh t nh pht trin. Trong trng hp chng ta
l ting Vit.
m thanh c thu nh dng chun ca c 2 framwork l: tn s ly mu
(sample rate):
b. D liu hun luyn (training)
D liu hun luyn l tp d liu m thanh thu m t 2 thnh vin trong nhm,
tng thi lng thu m l sp x 15 gi ng h vi s cu hun luyn l 5300 cu
gm cc cu ni iu khin my tnh v m hnh xe. Tng s t hun luyn l 120 t
n c cu to t 72 m v.
c. D liu th nghim (testing)
D liu th nghim l tp d liu m thanh thu m cng t 2 thnh vin trong
nhm, tng s cu s dng trong th nghim l 1000 cu vi thi lng thu m
khong 2 gi ng h.
d. Kt qu
ng 7 Kt qu so snh HTK v Sphinx
T l cu ng (%) T l t ng (%) chnh xc t (%)
HTK 41.60 99.97 94.38
SPHINX 68 98.2 96.7
ng 8 Kt qu chi tit li sai
Insertions Deletions Substitutions
HTK 833 28 4
SPHINX 206 43 227
T l cu ng c tnh theo cch sau: mt cu c nhn l ng khi tt c
cc t trong cu u ng, nu c t nht 1 t sai th cu xem nh cu sai. T l
cu ng l s cu ng chia cho tng s cu th nghim, trong trng hp ny l
1000 cu.
T l t ng l t l gia s t m h thng nhn dng ng so vi tng s t
cn nhn dng.
chnh xc t l t l gia tng s t sai, bao gm cc li nhn dng sai t
nh Insertion (thm t), Deletion (xa t), Substitutions (thay t). Khc vi ch s T
Trang | 95

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
l t ng, ch s ny th hin chnh xc khi nhn dng t ca h thng nhn dng.
T l ny cng cao, h thng nhn dng cc t n cng chnh xc.
4.8.3. nh gi kt qu
Sau qu trnh th nghim cng cc kt qu c c, nhm c mt s nhn xt
tng quan v 2 Framework nh sau:
- Kh nng nhn dng t ng ca c 2 Framework u rt cao (>98%), trong
HTK th hin tt hn.
- Tuy nhin so vi Sphinx th HTK li mc qu nhiu li Insertion-thm t
do l gim chnh xc ca h thng nhn dng xung ng k. Trong
bao gm c chnh xc nhn dng cu.
- Thi gian thc hin th nghim decode ca Sphinx ngn hn nhiu so vi
HTK.
- Vic trin khai hun luyn trn HTK thng xuyn gp nhiu kh khn do
quy trnh thc hin bao gm nhiu cng on phc tp hn Sphix.
- Ti liu tm hu HTK tuy phong ph nhng kh chi tit v nng v k thut,
v th i vi ngi mi bt u s gp nhiu kh khn. Trong khi ,
Sphinx cung cp mt trang ch [11] kh y thng tin km mt cng
ng Forum thng xuyn c cp nht v gii p thc mc. V th, i
vi ngi mun pht trin nhanh chng mt h nhn dng ging ni cho
mt ngn ng th Sphinx l s la chn u tin.
- V chnh sch bn quyn th Sphinx c cung cp hon ton min ph,
ngi dng c th s dng th vin v m ngun cho nhiu mc ch t
nghin cu ti xy dng ng dng thng mi. Khc vi Sphinx, HTK tuy
cng l mt Framework m ngun m, nhng c nhng iu khong rng
buc cht ch i vi ngi dng v yu cu ngi dng phi ng k
thnh vin mi c download s dng HTK.

Trang | 96

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
CHNG 5. KT LUN V HNG PHT TRIN
5.1. Kt qu t c
Sau qu trnh tm hiu, nghin cu phng php nhn dng ging ni ting
Vit, kha lun t c nhng mc tiu ra nh sau:
a. Tm hiu cc khi nim c lin quan n h nhn dng ting ni t hiu
v vn dng c mt s yu t quan trng trong vic s dng cng c h
tr.
b. Tm hiu phng php ci t cng c h tr xy dng h nhn dng ting
ni Sphinx v HTK. Trong bao gm vic chi tit ha tng bc ci t
v thc hin. Cc bc xy dng m hnh hun luyn.
c. Xy dng b hun luyn vi d liu thu m ~ 15 gi ng h.
d. Thc nghim so snh kh nng p dng vo ting Vit ca 2 cng c
Sphinx v HTK, trong bao gm c th nghim cc cng c decode khc
nhau nh Hvite, Sphinx4 v Julius t rt ra c kt lun v nhn xt.
e. Xy dng chng trnh m phng nhn dng ting ni ting Vit trn my
tnh s dng m hnh m hc c hun luyn bng cng c Sphinx,
chng trnh nhn dng c trn 60 t n, c ghp thnh khong 100
cu lnh iu khin. Ngoi ra, nhm cn thc hin demo trn m hnh tht,
vn dng kh nng ca mch iu khin vo vic iu khin m hnh tht.
Kt qu t c trong c 2 demo u t c t l chnh xc rt cao.
5.2. Nhng im cn hn ch
Do y l mt ti tng i kh, v nhm cng cha c trang b cc kin
thc v x l tn hiu s, x l ting ni, cc m hnh ton hc,.. cng nh ti liu v
m hc, ng m hc cn hn ch nn kha lun khng trnh khi nhiu thiu st.
Nhng mt cn hn ch ca kha lun:
a. B t vng cn qu t, so vi tt c t n ca ting Vit (hn 7000 t).
c c b t vng ln hn i hi phi b ra rt nhiu cng sc bao gm
c thu thp d liu, xy dng m hnh phin m chnh xc, thu m, phn
tch ng php,
b. M hnh ng m cn hn ch, h thng ch nhn dng c vi chnh
xc cao i vi 2 thnh vin trong nhm, i vi mt ngi cha c
Trang | 97

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
hun luyn th m hnh vn c th nhn dng c ting ni ca h nhng
chnh xc khng cao.
c. Hai demo ca kha lun ch tp trung xy dng nhm mc ch th hin kh
nng ng dng ca m hnh nhn dng ting ni ting Vit t ng c
xy dng t cc framework. Tnh ng dng thc t vn cn hn ch, tuy
nhin, xy dng 1 ng dng mang tnh thc tin tht s khng qu kh,
vn quan trng l t ra c mc tiu thc tin ca 1 ng dng c th,
t xy dng m hnh m hc thch hp.
5.3. Hng nghin cu v pht trin
Hin ti, nhm thnh cng trong vic th nghim m hnh t in ting Vit
do nhm tham kho c t nhiu ngun (m hnh ny cha tht s chun xc, nhng
kh nng p dng mc chp nhn c) vo cc cng c xy dng h thng nhn
dng ting ni t ng, bao gm HTK v Sphinx. Nhm thc hin th nghim trn
cc cng c decode khc nhau nh Hvite, Julius, Sphinx4, v nhn thy kt qu rt
kh quan, ng ti mc thnh qu t c ny s c rt nhiu hng nghin cu mi
c th c thc hin c th nh:
Tm hiu, nghin cu xy dng m hnh pht m ting Vit, iu ny c
ngha to ln cho c vic tng hp v nhn dng ging ni. M hnh ny nu c pht
trin mt cch khoa hc v ng n s a chnh xc nhn dng ca cc m hnh
m hc c xy dng bi cc cng c ln rt nhiu. Cng vic ny i hi s nghin
cu ca cc nh m hc, ng m hc, nghin cu v ting Vit,..
Tm hiu su hn v cu to ca cc framework nhn dng ging ni trn, t
, nm c quy trnh hot ng mt cch r rng hn, ta c th xy dng c mt
m hnh m hc thch hp nht cho ting Vit.
M rng vn t vng ca b t in, thc hin thu m s vi quy m rng ri
hn, a dng ging ni hn, nhm mc ch xy dng mt h nhn dng ting ni c
lp ngi ni.
Xy dng cc ng dng c th ha hn s dng cc m hnh c hun
luyn. Cc ng dng tng tc gia ngi v thit b bng ging ni, h tr s thng
minh cho thit b hoc h tr hot ng ca ngi khuyt tt.

Trang | 98

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
TI LIU THAM KHO

[1] B.H. Juang, Lawrence R. Rabiner, "Automatic Speech Recognition A Brief
History of the Technology".
[2] S. Furui, "50 years of progress in speech and speaker recognition".
[3] [Online]. Available: http://www.cslu.ogi.edu/toolkit/. [Accessed 7 2012].
[4] L. C. Mai, "Pht trin cc kt qu tng hp, nhn dng cu lnh, chui s ting
Vit lin tc trn mi trng in thoi di ng," 2006.
[5] ng Ngc c, Lng Chi Mai, "Tng cng chnh xc ca h thng mng
neuron nhn dng ting Vit," 2003.
[6] B. H. Khang, "Bo co tng kt Khoa hc v K thut ti Nghin cu pht
trin cng ngh nhn dng, tng hp v x l ngn ng ting Vit," 2004.
[7] "Vietnamese alphabet," Wikipedia, [Online]. Available:
http://en.wikipedia.org/wiki/Vietnamese_alphabet. [Accessed 7 2012].
[8] "IPA for Vietnamese," Wikipedia, [Online]. Available:
http://en.wikipedia.org/wiki/Wikipedia:IPA_for_Vietnamese. [Accessed 7
2012].
[9] "Digital audio," [Online]. Available: http://en.wikipedia.org/wiki/Digital_audio.
[Accessed 7 2012].
[10] Red Hat, [Online]. Available: http://www.cygwin.com/. [Accessed 7 2012].
[11] Carnegie Mellon University, [Online]. Available:
http://cmusphinx.sourceforge.net/. [Accessed 7 2012].
[12] "Training Acoustic Model For CMUSphinx," Carnegie Mellon University,
[Online]. Available: http://cmusphinx.sourceforge.net/wiki/tutorialam.
[Accessed 7 2012].
[13] [Online]. Available: http://audacity.sourceforge.net/. [Accessed 7 2012].
[14] "Recording the Test Data," [Online]. Available:
http://www.voxforge.org/home/dev/acousticmodels/windows/test/htk--
julius/data-prep/step-3. [Accessed 7 2012].
Trang | 99

GVHD: TS. V c Lung SVTH: V Vn Ha Tn Thanh Hng
[15] "Sphinx-4 Application Programmer's Guide," Carnegie Mellon University,
[Online]. Available: http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4.
[Accessed 7 2012].
[16] Steve Young, Gunnar Evermann, Mark Gales, Thomas Hain, Dan Kershaw,
Xunying (Andrew) Liu, Gareth Moore, Julian Odell, Dave Ollason, Dan Povey,
Valtcho Valtchev, Phil Woodland, HTK Book, Cambridge University
Engineering Department, 2009.
[17] L. Rabiner, A Tutorial on Hidden Markov Models and Selected Application in
Speech Recognition, 1989.

You might also like