Professional Documents
Culture Documents
USING KIT TMS320C6713 FOR SPEECH RECOGNITION APLLYING FOR CONTROL THE MOUSE OF COMPUTER SVTH: Phm c Linh(1), Nguyn Vn Lm(1), L ng Trng(1), Nguyn Hu Trung(2) Lp: (1)07CLC2,(2)08CLC2, Khoa o To K S Cht Lng Cao, Trng i hc Bch Khoa, i hc Nng GVHD: PGS.TS on Quang Vinh Khoa in, Trng i hc Bch Khoa, i hc Nng
TM TT Nhn dng ging ni l mt lnh vc thuc tr tu nhn to, ng vai tr quan trng trong cuc cch mng v vic giao tip gia ngi v my. Cc ng dng ca h thng nhn dng ging ni c s dng rng ri nh: iu khin, bo mt, giao tip, dch thut Bi bo ny trnh by mt phng php nhn dng tn hiu ging ni l HMM (Hidden Markov model). ng dng thnh cng trn KIT TMS320C6713 dng iu khin chut my tnh. ABSTRACT Voice recognition is a field of artificial intelligence, to play an important role in communication between humans and machines. Applications of voice recognition is used widelysuch as: control, security, communication, translation, ..etc.. This paper introduce a method of speech recognition that is HMM (Hidden Markov model). Using KIT TMS320C6713 to control the mouse of computer.
1. t vn Nhn dng ging ni c MacCarthy t nn mng vo Nm 1958, trn th gii cc ng dng ca n trong cuc sng kh ph bin. Ti Vit Nam y vn cn l mt lnh vc mi m, cc nghin cu v nhn dng ging ni vn cn trn l thuyt, t i vo thc t. Mc ch ca ti l nhn dng ging ni v cc ng dng trn KIT TMS320C6713. Mt trong nhng ng dng l iu khin chut my tnh bng ging ni. iu ny gip nhng ngi tn tt c c hi giao tip v s dng cc tnh nng ca my tnh mt cch bnh thng. 2. Phng php v thut ton nhn dng ging ni 2.1.Qu trnh nhn dng
Reference Speech Sample Feature Extraction training Recognition Dection Test Speech Sample Feature Extraction Pattern Matching Recognition Output Computer Mouse
Nhn chung, mt h thng nhn dng ging ni c th hin nh hnh 1. Bao gm hai giai on chnh: Hun luyn (Training): Tn hiu ting ni c ly mu tn s 8KHz, sau c tch thnh vecto c trng (Featrure Extraction). S dng cc thut ton nh l HMM, DTW hoc VQ hun luyn cc vector ny. Nhn dng (Recognizing): Tin hnh so snh mu ting ni cn c nhn dng vi cc mu c hun luyn trc . Cui cng, mu tn hiu ting ni cn kim tra s c gn nhn giai on Recognition Decision. 2.1.1. Qu trnh trch c trng
Hnh 2: S qu trnh tch c trng s c nhng vo DSP Trch c trng ca tn hiu mu (training) s dng phng php MFCC: Tn hiu m thanh cn nhn dng c thu nhn bng microphone. Tn hiu ting ni l do tn hiu xung bc sng chp vi tn hiu tn s thp do b phn pht m to ra v v chng ta mun ly c tn hiu do b pht m ny. s(n) = e(n) * t(n). ( * : Tch chp) (1) - Tin hnh tin x l: lc nhiu, pht hin t (key word). - Dng hamming Window ct key word thnh cc frame. Thc hin c s dng overlap 63%. Qu trnh ny c gi l windowing. ( ) ( ) ; n = 0 . N 1 (2) - Bin i FFT,Cng thc bin i FFT (Fast Fourier Transform) (3)
S(ejw) l ph ca tn hiu ting ni, E(ejw) l ph xung kch thch v T(ejw) l ph ca b phn pht m. Nh vy mi lin h bin ca ba thnh phn ny l: S(ejw) = E(ejw) . T(ejw) (4)
(5)
Trong tham gia ca T(e ) th hin vng tn s thp do s bin i chm ca b phn pht m, cn tham gia ca xung bc sng E(ejw) th hin ch yu vng tn s cao. H thng tuyn tnh thng s dng l bin i Fourier ngc ca Log S(ejw) to ra cepstral ca tn hiu (thut ng cepstral l vit ngc ca spectral-ph): = IDFT{Log S(ejw) } = IDFT{ log E(ejw) }+ IDFT{ log T(ejw) } = ce(n) + c(n) (6) - Mel frequency: M t chnh xc s tip nhn tn s ca tai ngi, mt thang tn s c xy dng. Thang tn s Mel da trn c s thc nghim cm nhn nghe ca ngi. cs(n) ( - DCT (thc cht nh l qu trnh IFFT) ( ) ( ) ; k = 0 .. N 1 (7) )
Kt qu sau qu trnh training l cc acoustic vector ca mi frame.Mt tn hiu ting ni bao gm nhiu frame, tp hp cc acountic vector ca cc frame ny gi l codebook v c lu lm c s d liu (database). 2.1.2. Thut ton HMM (Hidden Markov model) M hnh Markov n (ting Anh l Hidden Markov Model - HMM) l m hnh thng k trong h thng c m hnh ha c cho l mt chui Markov vi cc tham s khng bit trc v nhim v l xc nh cc tham s n t cc tham s quan st c, da trn s tha nhn ny. Cc tham s ca m hnh c rt ra sau c th s dng thc hin cc phn tch k tip, v d cho cc ng dng nhn dng mu. Trong mt m hnh Markov in hnh, trng thi c quan st trc tip bi ngi quan st, v v vy cc xc sut chuyn
Hun luyn:
Mu hun luyn
Mt
1 2 3
hai
ba
c lng thng s
Nhn dng:
O=
(/ )
(/ )
(/ )
tip trng thi l cc tham s duy nht. M hnh Markov n thm vo cc u ra: mi trng thi c xc sut phn b trn cc biu hin u ra c th. V vy, nhn vo dy ca cc biu hin c sinh ra bi HMM khng trc tip ch ra dy cc trng thi. Hnh bn l m t v m hnh ny. 2.1.3. S dng m hnh HMM trong nhn dng ging ni.
1 2
3
33
34 3
44
45 4
55
56 5 6
Sau khi thc hin xong trch c trng, chng ta c mt c s d liu cc vector c ( ) ( ) ( ) ( ) ( ) ( ) trng ng vi tng t. By gi, chng ta s xy Hnh 4: M hnh HMM tri phi vi su dng mt m hnh Markov n vi d liu hun trng thi. luyn l cc vector c trng c c. S hun luyn v nhn dng bng m hnh HMM c th hin trn hnh 3 vi b t vng gm 3 t: mt, hai, ba. ng vi mi t cn nhn dng th chng ta c mt c s d liu cc c trng t cc ln c khc nhau (nh trn s l 3 ln ly mu). Sau ta s c lng cc thng s ca m hnh = (A, B, ) xc sut P(O|) t cc i, tng ng vi mi t l mt xc nh. nhn dng mt t th ta ch vic tnh xc sut chui quan st ca t ng vi cc c hun luyn, v chn mu no c xc sut ln nht. ( ) Da vo cc ti liu tham kho v nhng thng tin v cc h thng nhn dng xy dng thnh cng chng ta thy rng: i vi nhn dng tn hiu ting ni th m hnh HMM thng c chn l m hnh tri phi (left-right) c t 6 trng thi, xem hnh 4. D liu dng luyn thu m ti trung tm MICA l 30 ngi, v vi khong 2000 cu. ti s dng toolkit CMUSphinx ca i hc Carnegie Mellon thc thi trn Linux. M hnh HMM trin khai nhn dng vi 6 trng thi, s lng t l 10 t ( 10 s t 0 n 9) kt qu mang li kh tt vi bng nh bn di. 2.1.4. iu khin chut my tnh Sau khi nhn dng thnh cng key word, bin i key word thnh tn hiu logic a vo iu khin chut my tnh. Trn linux s dng xdotool kt hp vi kt qu nhn dng c iu khin chut. Chut c iu khin kh nhanh v chnh xc.
2.2. Kt qu Nhn dng thnh cng trn m hnh my tnh vi kt qu thc hin nh bng 1. Bng 1: kt qu nhn dng 4 t n Key Word Mt Hai Ba Bn Nm Su By Tm Chn 3. Kt lun ti xy dng thnh cng thut ton nhn dng ging ni v nhn dng hon ton cc s t 0 n 9 vi t l thnh cng tng i tt. iu ny lm bc m thc hin nhng t dng iu khin chut my tnh nh: ln, xung, tri, phi, click. Vi kt qu nhn dng c, ti cng xy dng thnh cng chng trnh iu khin chut my tnh trn Linux. iu ny lm nn tng a ng dng ny xung KIT DSP TMS320C6713. u im: p thi gian thc, kh nng hun luyn c t mi. Chng trnh c th chy trong cc mi trng nhiu khc nhau. Nhng t c hun luyn v nhn dng khng phn bit nam, n v cc ging khc nhau. Hn ch: S lng t nhn dng cn hn ch, nhn dng n l tng t, cha thc hin c vic nhn dng lin tc. Hn na, cha xy dng iu khin chut c trn KIT DSP m mi dng li trn h iu hnh Linux. TI LIU THAM KHO [1] [2] [3] [4] [5] Open Source Toolkit For Speech Recognition - Project by Carnegie Mellon University http://cmusphinx.sourceforge.net/wiki/ L Tin Thng, X L S Tn Hiu v Wavelets Tp 1, Nh xut bn i Hc Quc Gia TPHCM, 2002. TS Nguyn Vn Gip, KS Trn Vit Hng, K thut nhn dng ting ni v ng dng trong iu khin , B mn C in t, khoa C kh, i Hc Bch khoa TPHCM. Nguyn Ph Bnh, Bi ging X l ting ni, i hc Bch khoa H Ni. JOHN WILEY & SONS, Digital Signal Processing and Applications with theC6713 and C6416 DSK, 2004. 5 Xc Xut 92% 90% 95% 92% 96% 93% 90% 97% 89%