You are on page 1of 10

M u:

Ting ni l phng tin giao tip c bn nht ca con ngi, sdng li ni l mt cch din t n gin v hiu qu nht. t lu,con ngi lun m c n cc h thng my iu khin t ng c th giao tip bng ting ni tnhin ca con ngi. Ngy nay, cng vi s pht trin ca khoa hc k thut v cng ngh, c bit trong lnh vc tin hc. Cc h thng my t ng dn thay th con ngi trong nhiu cng vic. Nhu cu giao tip vi thit b my bng ting ni l rt cn thit, l phng thc giao tip vn minh v t nhin nht.Nhn dng ting ni l mt vn khng mi. Trn th gii v ang c c rt nhiu cng trnh nghin cu v vn ny vi rt nhiu phng php nhn dng ting ni khc nhau. V nhng nghin cu cng c nhng thnh cng ng k. C th k n nh: h thng nhn dng ting ni ting Anh Via Voice ca IBM, Spoken T oolkit ca CSLU (Central of Spoken Laguage Under-standing), Speech Recognition Engine ca Microsoft, Hidden Markov Model toolkit ca i hc Cambridge, CMU Sphinx ca i hc Carnegie Mellon, ngoi ra, mt s h thng nhn dng tin ni ting Php, c, Trung Quc,... cng kh pht trin. Ting Vit th cng c mt s cng trnh ca cc nhm nh: AILab, Vietvoice, Vspeech Nhng i vi nc ta, nhn dng ting ni vn l mt lnh vc kh mi m. n nay tuy c nhiu nghin cu v nhn dng ting ni ting Vit v t c mt s thnh tu, nhng nhn chung vn cha t c kt qu cn thit c thto ra cc sn phm mang tnh ng dng cao.Vi mong mun c th hiu c cch giao tip gia ngi v my tnh, lun vn ny nghin cu cc phng php nhn dng ting ni, t xy dng mt chng trnh demo nhn dng ting ni ting Vit m khi con ngi ni my tnh c th hiu c. GI SPHINX:

Sphinx l (Sphinx l phn ngun m c nhiu chuyn gia v ngi s dng lm cng c nhn dng ging ni )mt hthng nhn dng ting ni hon chirh nht c vit bng ngn ng Java. Sphinx c cng b v chia s m ngun m ln u tin vo nm 2010 do nhm nghin cu Sphinx cCarnegie Mellon University (CMU) nghin cu v pht trin,Sau vi s gip ca cc t chc ( Sun Microsystems Laboratories, MERL (Mitsubishi Electric Research Labs) v HP (Hewlett Packard), v vi sng gp ca cc trng i hc UCSC (University of California at Santa Cruz) v

MIT(Massachusetts Institute of Technology). Sphinx tip tc nghin cu v b sung cc phin bn mi cho sphinx(sphinx 2 , sphinx 3,..) mi nht sphinx 4.

Cc tnh nng chnh (c im ): - Nhn dng ting ni ch trc tip v theo l, c kh nng nhn dng ting ni ri rc v lin tc. - Kin trc ngoi vi tng qut c kh nng tho lp. Bao gm kh nng bsung cc tnh nng tin nhn (preemphasis), ca s Hamming, bin i Fourier nhanh, thang lc tn s Mel, bin i cosine ri rc, chun ha cepstral, v trch c trng cepstra, delta cepstra, double delta cepstra. - Kin trc m hnh ngn ng tng qut v c kh nng tho lp. Bao gm h tr m hnh ngn ng dng ASCII v cc phin bn nh phn ca unigram, bigram, trigram, Java Speech API Grammar Format (JSGF), v ARPA-format FST grammars. - Kin trc m hnh m tng qut. Bao gm h tr cc m hnh m hc ca Sphinx3. - B qun l tm kim tng qut. Bao gm h tr cc tm kim breadth first v word pruning. - Cc tin ch cho vic x l kt qu sau khi nhn dng, bao gm tnh im s tin cy, pht sinh cc li v nhng kch bn ECMA vo th JSGF. Cc cng c 67 c lp bao gm cc cng c hin th dng sng v nh ph v trch c trng ttp tin m thanh. Sphinx tr thnh mt framework nhn dng ting ni mnh m, c s dng trong nhiu h thng nhn dng bao gm cc chng trnh in m nh Cairo, Freeswitch, jvoicexml,cc chng trnh iu khin nh Gnome-Voice-Control, Voicekey, SpeechLion, KIN TRC SPHINX: Sphinx Framework c thit k vi linh hot v tnh m un ha cao.

nh di y biu din bao qut kin trc ca h thng. Mi thnh phn c gn nhn biu din mt m un c th d dng c thay th, cho php cc nh nghin cu th nghim mt m un khc m khng c n phi thay i cc phn cn li ca h thng. C 3 m un chnh trong Sphinx Framework: Bngoi vi (FrontEnd), B gii m (Decoder) v b ngn ng (Linguist).

B ngoi vi nhn vo mt hay nhiu tn hiu s v tham s ha chng thnh mt dy cc c trng (Feature). B ngn ng chuyn i tt c cc m hnh ngn ng chun, cng vi thng tin cch pht m trong t in (Dictionary) v thng tin cu trc t mt hay nhiu cc tp hp cc m hnh m hc (AcousticModel) vo mt th tm kim (SearchGraph). B qun l tm kim (SearchManager) trong b gii m s dng cc c trng t b ngoi viv th tm kim t b ngn ng thc hin vic gii m, pht sinh cc kt qu(Result). Ti bt kz thi im trc v trong qu trnh x l nhn dng, ng dng (Application) a ra cc iu khin ti mi m un, tr thnh mt i tc hiu qutrong qu trnh x l nhn dng.H thng Sphinx4 c mt s lng ln cc tham s cu hnh iu chnh hiu sut ca h thng. Thnh phn qun l cu hnh (ConfigurationManager) c dng cu hnh cc tham s . B qun l cu hnh cn gip cho Sphinx4 c khnng np

ng v cu hnh cc m un trong thi gian thc thi, lm cho Sphinx4 trnn linh hot v c kh nng tho lp. B ngoi vi - FrontEnd:

y l Qu trnh trch c trng ca b ngoi vi dng MFCCMc ch ca b ngoi vi l tham s ha mt tn hiu u vo (m thanh) thnh mt dy cc c trng xut ra. B ngoi vi bao gm mt hay nhiu chui song song cc m un x l tn hiu giao tip c khnng thay th gi l cc DataProcessor. Vic h tr nhiu chui cho php gilp tnh ton cc loi tham skhc nhau trong cng mt hay nhiu tn hiu vo. iu ny cho php to nn cc hthng c th gii m cng mt lc s dng cc loi tham skhc nhau,

Hnh 4.3. Chui cc DataProcessor Mi DataProcessor trong b ngoi vi h tr mt u vo v mt u ra c th c kt ni vi DataProcessor khc, cho php to thnh dy cc chui di chuyn bit. Sphinx4 cho php kh nng pht sinh dy cc c trng song song v cho php mt s lng ty cc dng song song.S dng ConfigurationManager, ngi dng c th xu chui cc DataProcessor vi nhau theo bt kz cch no cng nh cc b sung DataProcessor kt hp cht ch trong thit k ring ca h. B ngn ng - Linguist

Bngn ngpht sinh thtm kim (SearchGraph) s dng trong bgii m trong qu trnh tm kim, trong khi n i cc phn phc tp bao gm pht sinh ra thny. Trong Sphinx4 bngn ng l mt module c th gn thm, cho php ngi dng c th cu hnh ng h thng vi cc ci t khc vo b ngn ng. Mt b sung b ngn ng thng thng xy dng nn th tm kim sdng cu trc ngn ng c m t bi LanguageModel cho trc v cu trc hnh hc tp ca m hnh ngn ng (cc HMM cho cc n v m c bn s dng bi h thng). B ngn ng c th cng s dng mt t in (thng l mt t in pht m) nh x cc t t m hnh ngn ng vo cc chui ca cc thnh phn m hnh m hc. Khi pht sinh th tm kim, b ngn ng c th cn kt hp cc n v t con (subword) vi cc ng cnh di ty , nu c cung cp. Bng cch cho php cc b sung khc nhau ca b ngn ng c gn kt vo trong thi gian chy, Sphinx4 cho php cc c nhn cung cp cc cu hnh khc nhau cho cc hthng v cc yu cu nhn dng khc nhau. B gii m - Decoder:

Vai tr chnh ca b gii m l s dng cc c trng (Features) t b ngoi vi kt hp vi th tm kim t b ngn ng pht sinh cc kt qu (Result). Khi b gii m bao gm mt b qun l tm kim (SearchManager) c kh nng tho lp v cc m h tr khc n gin ha qu trnh gii m cho mt ng dng. Do vy, thnh phn ng quan tm ca b gii m l b qun l tm kimB gii m ch n thun bo b qun l tm kim nhn dng mt tp cc cu trc c trng. Ti mi bc x l, B qun l tm kim to ra mt i tng kt qu cha tt c ng dn n mt trng thi khng pht sinh cui cng (final non-emitting state). x l kt qu, Sphinx cung cp cc tin ch c kh nng pht sinh mt li v cc nh gi tin cy t kt qu. Khng nh cc h thng khc, cc ng dng c th iu chnh khng gian tm kim v i tng kt qu gia cc bc, cho php ng dng tr thnh mt i tc trong qu trnh x l.Ging b ngn ng, b qun l tm kim khng b rng buc vi bt c bsung c th no. Qu trnh hot ng:

nh 5.3. S hot ng ca chng trnh demo u tin, tn hiu ting ni qua micro s c a vo b ngoi vi, y tn hiu c tham s ha thnh mt dy c trng v chuyn vo cho b gii m. B ngn ngchuyn i cc m hnh ngn ng, thng tin pht m trong t in v thng tin cu trc m trong m hnh m hc vo mt th tm kim trong b gii m. B gii m s xc nh chui c trng gn ging nht trong th tm kim so vi c trng ting ni c cung cp bi b ngoi vi v pht sinh kt qu. ------------------------------------------(M HNH NGN NG) Qu trnh so khp mu m hc v kin thc v ngn ng l quan trng nh nhau trong nhn dng v hiu ting ni t nhin. Trong nhn dng ting ni thc tin, c th khng c kh nng tch vic s dng ca cc cp khc nhau ca tri thc, v th chng thng c tch hp cht ch. Chng ta s tm hiu v l thuyt ngn ng hnh thc v xc sut m hnh ngn ng. Ngn ng hnh thc c 2phn c bn: ng php v thut ton phn tch c php. Ng php l s m t hnh thc ca cu trc c cho php i vi ngn ng. K thut phn tch c php l phng php phn tch cu thy nu cu trc ca n l tun theo ng php. Vi s c mt ca khi lng nhiu vn bn m cu trc ca n c ch thch bng tay. Mi quan h xc sut gia dy cc t c th c dn sut trc tip v c m hnh t tp vn bn

c gi l m hnh ngn ng Stochastic, nh n-gram. M hnh ngn ng Stochastic ng vai tr thit yu trong xy dng hot ng mt h thng ngn ng ni. M hnh ngn ng ca b ngn ng cung cp cu trc ngn ng cp t(word-level), c th biu din bi bt c s lng cc b sung c th gn thm. Nhng b sung ny thng l mt trong hai mc: cc graph-driven grammar v cc m hnh Stochastic N-Gram. Cc Graph-driven grammar biu din mt th t c hng trong mi nt biu din mt t n v mi cung biu din xc sut dch chuyn sang mt t. Cc m hnh stochastic N -Gram cung cp cc xc sut cho cc t c cho da vo vic quan st n-1 t ng trc. M HNH M M hnh m hc cung cp mt nh x gia mt n v ting ni v mt HMM c th c nh gi da vo cc c trng c cung cp bi b ngoi vi. Cc nh x c th a thng tin v tr ca t v ng cnh vo ti khon. V d trong trng hp cc triphone, ng cnh miu t cc m v n bn tri v bn phi ca m v cho, v v tr ca t m t triphone v tr bt u, gia hay cui ca mt t (hay chnh n l mt t). nh ngha ng cnh ny khng b c nh bi Sphinx4, Sphinx4 cho php nh ngha cc m hnh m hc cha cc tha m v cng nh cc m hnh m hc m ng cnh ca n khng cn phi st vi n v. Thng thng, b ngn ng phn tch mi t trong b t vng c kch hot thnh mt dy cc n v t con ph thuc ng cnh. B ngn ng sau chuyn cc n v ny v cc ng cnh ca n n m hnh m hc, tm cc thHMM gn vi cc n v . Sau n dng cc th HMM ny kt hp vi m hnh m hc xy dng nn th tm kim TH TM KIM

th tm kim l cu trc d liu chnh s dng trong sut qu trnh gii m. l mt th c hng trong mi nt, gi l mt Trng thi tm kim (SearchState), biu din mt trng thi pht hay khng pht (emitting state hay non-emitting state). Cc trng thi pht c th c nh gi

da trn cc c trng m hc vo (incoming acoustic feature) trong khi cc trng thi khng pht thng thng c dng biu din cc cu trc ngn ng cp cao nh cc t v cc m v khng th nh gi trc tip da trn cc c trng u vo. Cc cung gia cc trng thi biu din cc bin i trng thi c th, mi cung c mt xc sut ch kh nng bin i dc theo cc cung. C M CHNH CA B TM KIM :l vic thc thi ca trng thi tm kim khng cn c nh. Nh vy, mi b sung b ngn ng thng thng cung cp thc thi c th ca SearchState ca ring n m c th da trn cc c trng khc nhau ca b ngn ng c th. LI CH M ngun ca Sphinx c vit r rng v d c. Nhiu nh nghin cu khng ch mun s dng Sphinx nh mt cng c m cn mun thay i m ngun ph hp vi mc ch ca h.

You might also like