You are on page 1of 7

Nhn dng ting Vit dng mng neuron kt hp trch c trng dng LPC v AMDF

Vietnamese Speech Recognition Using Neural Networks Combined with LPC Formant Extraction and AMDF Pitch-Detection
Hong nh Chin

Abstract: This paper describes a method of creating neural network based isolated Vietnamese speech recognizer. The formant extraction process uses the current popular model LPC (Linear Predictive Coding) in which LPC parameters are converted to cepstral coefficients. The neural network is used to estimate the word probabilities and we will choose the maximum for the target words. We also combine LPC method with AMDF (Average Magnitude Difference Function) method to increase the accuracy. The experiments are stimulated by Matlab 6.5. The high accuracy of results leads the conclusion that the studied way is suitable. Keyword: Vietnamese Network, LPC, AMDF. speech recognition, Neural

cho kt qu nhn dng chnh xc cao qua th nghim. II. C S L THUYT 1. M ha d bo tuyn tnh (LPC - Linear Predictive Coding) [4] Gi r(k) l gi tr t tng quan ca tn hiu di i k mu:

r (k ) =

n =

x ( n) x ( n + k )

(1)

Khi cc h s LPC s l nghim ca h phng trnh:


r (1) r ( 2) ... r ( P 1) a1 r (1) r (0) r (1) r (0) r (1) ... r ( P 2) a 2 r (2) r (2) ... r ( P 3) a 3 r (3) (2) r (1) r (0) . . . . . = . . . . . . . . . . . . . r ( P 1) r ( P 2) r ( P 3) . r (0) a P r ( P)

I. GII THIU Nhn dng ting ni l mt k thut c th ng dng trong rt nhiu lnh vc ca cuc sng: trong vic iu khin (iu khin robot, ng c, iu khin xe ln cho ngi tn tt...), an ninh quc phng... Vit Nam, trong nhng nm gn y c mt s kt qu nghin cu ban u v nhn dng ting Vit, tuy nhin cn c gii hn v chnh xc, s t, vn thanh iu c th ting vit hu nh cha c cp.... Bi bo trnh by mt hng ng dng thanh iu vo vic nhn dng ting Vit c ri rc nhm nng cao chnh xc: phng php kt hp trch c trng bng LPC vi trch chu k c bn dng AMDF

H phng trnh trn c gii bng thut ton Levinson-Durbin. Tt c cc h s LPC s l c trng ca tn hiu ting ni. a) Thut ton Levinson-Durbin: [4] Khi to: p=1 Tnh sai s bnh phng trung bnh bc nht:

E1 = r (0)(1 a12 (1))

(3)

trong

a1 (1) =

r (1) r (0)

(4)

III. MNG NEURON Mi neuron nhn to gm mt s cc ng vo (t d liu gc, hay t ng ra cc neuron khc trong mng). Mi kt ni n ng vo c mt trng s v mt gi tr ngng. Tn hiu c truyn qua hm kch hot (hay cn gi l hm truyn) to gi tr ng ra neuron. Nguyn tc ca mng neuron l hc theo mu v nh x d liu vo qua mt hm truyn cho ra kt qu. Kiu mng c s dng ph bin nht l mng lan truyn ngc (back propagation) vi k thut c bn l cp nht trng s theo hng gim gradient tm v tr tt nht trn mt li.

qui: Vi p=2,3,,P 1. Tnh h s Kp:

Kp =

r ( p ) + rpbT1 a p 1 E p 1

(5)

2. Tnh cc h s d bo bc p:

a p ( p) = K p
a p ( p) = a p 1 (k ) + K p a p 1 ( p k )
vi k=1,2,,p-1 3. Tnh sai s bnh phng trung bnh bc p:
2 E p = E p 1 (1 K p )

(6) (7)

(8)

4. Quay li bc 1, thay p bng p+1 nu p P Kt thc, thay: [ap(1) ap(2) ap(P)]= -[ap(1) ap(2) ap(P)] b) Hm hiu bin trung bnh (AMDF - Average Magnitude Difference Function) [3] Hm hiu bin trung bnh l hiu bin ca tn hiu vi chnh n di i p mu.
Hnh 1: Cu trc Neural Networks

IV. TRCH C TRNG 1. Trch c trng bng LPC [4] Bc 1: Lc nhiu, s dng b lc thng cao c hm truyn:

d ( p ) = n =0

N 1 p

x ( n) x ( n + p )

(9)

Nu x(n) l tn hiu tun hon vi chu k T (mu) th AMDF s t cc tiu nu tn hiu b di i mt on ng bng T mu. Nhn dng ging ca ngi c tn s c bn t 80Hz (tng ng vi s mu l n1=Fs / 80) n 200Hz (tng ng n2=Fs/200, Fs l tn s ly mu). S tnh AMDF ca tn hiu vi di thay i t n2 n n1. Gi s AMDF t cc tiu ng vi di P0 (mu). chnh l chu k ca tn hiu (hoc gn vi chu k ca tn hiu nht), v tn s c bn ca tn hiu l F0=Fs/P0. Gi tr ny chnh l c trng ca tn hiu v mt thanh iu. Do ting ni l tn hiu khng dng nn c mi 30ms phi tnh li cc gi tr mi. Tt c cc gi tr tnh c s l c trng ca mt t v c dng hun luyn mng neuron.

H ( s) =

s s + wc

(10)

vi tn s ct di l 300 Hz lc nhiu tn s thp do microphone gy ra. Bc 2: Pre-emphasis, s dng b lc thng cao c p ng xung:
y(n) = x(n) a*x(n-1) vi 0.9 a 1 (11)

Bc 3: Tch im u v cui ca mt t dng hm nng lng thi gian ngn:

Em =

m + N 1 n=m

[ x(n) * w(n m)]

(12)

Bc 4: Phn on thnh cc frame (frame ny khc vi cc frame trong giai on tm im u im cui), mi frame c N mu, chng lp M mu,

thng M =

1 N. 3
(19)

Bc 5: Ca s ha. Hm ca s thng dng nht l ca s Hamming c nh ngha nh sau: 0 n M (13) 0.54 0.46 cos( 2n / M )
w( n ) = 0 n [0, M ]

Trong gi tr C khong 1/3 bin cc i ca tn hiu. Bc 5: Tn hiu sau khi xn c a n hm ly hiu bin trung bnh :

Bc 6: Xc nh cc h s d bo tuyn tnh dng thut ton Levinson-Durbin. Bc 7: Chuyn cc h s d bo tuyn tnh thnh cc h s cepstral.

d ( p ) = n =0

N 1 p

x ( n) x ( n + p )

(20)

cm = am +

1 m 1 kck amk vi 1 m P m k =1 1 m 1 V c m = kc k a m k vi m >P k k =1

(14) (15)

trong N l di khung v p c ly trong khong pitch tng ng vi tn s c bn 80-200Hz. Chn P0 c d cc tiu, chnh l chu k pitch v tn s c bn l Fs/P0. i vi cc khung c d(P0) > 0.7 dmax(p) c phn loi l khung v thanh v gn F0 = 0. Bc 6: Sau khi xc nh F0 ca ton b m tit, cn phi x l cc khung c F0 = 0. Nu cc khung l v thanh u hay cui m tit, thay cc khung bng F0 ca khung hu thanh k cn. Nu cc khung v thanh gia m tit th thay F0 ca khung bng trung bnh ca hai khung hu thanh hai bn. Bc 7: ng nt F0 c lm trn bng b lc trung bnh c trng s vi p ng xung h=[0.1 0.2 0.4 0.2 0.1] Bc 8: Do s ng vo ca mng neuron l c nh nn cn chun ha kch thc F0, nn cng phi chun ha ln ca F0 v chuyn qua thang log: F0n[i] = -20log(F0a[i] min + )/(max-min) vi i=0,1, , L-1 max v min l gi tr cc i v cc tiu ca F0 ly trn ton b d liu, l s dng trnh log0. Bc 9: Ly L gi tr o hm ca logF0, ghp L gi tr logF0 vi L gi tr o hm thnh vect c trng ca t. V. HUN LUYN MNG V KT QU THU C 1. Hun luyn mng dng LPC Tn hiu ting ni s c trch c trng bng phng php LPC-cepstrum. Vect c trng ca mi t c 144 h s. Nh vy mng neuron s c 144 ng

Cc h s cepstral ny c tp trung cao hn v ng tin cy so vi cc h s d bo tuyn tnh. Thng thng chn Q =3/2P. Bc 8: Chuyn sang cepstral c trng s:
cm = wmcm vi 1 m Q (16)

Hm trng s thch hp l b lc thng di (trong min cepstral)


Q m wm = 1 + sin 2 Q Bc 9: Tnh o hm cepstral. 1 m Q

(17)

K dc m (t ) = c m (t ) kcm (t + k ) dt k = K

(18)

vi l hng s chun v (2K+1) l s lng frame cn tnh. K= 3 l gi tr thch hp tnh o hm cp mt. Vect c trng ca tn hiu gm Q h s cepstral v Q h s o hm cepstral. 2. Trch chu k c bn bng AMDF [3] Bc 1, bc 2 v bc 3 ging nh phng php LPC. i vi b lc nhiu, s dng b lc thng cao c tn s ct l 60Hz do tn s c bn ca ngi t 80Hz n 200Hz. Bc 4: Tn hiu s c xn bt nhm lm ni r chu k c bn:

vo, s nt xut: 10 (tng ng vi 10 s), s nt n: 220. Trc dc l ch s c nhn dng, s kt qu nhn dng ng th hin ng cho, trc ngang th hin s ln s nhn dng sai.
Bng 1: Kt qu nhn dng hun luyn mng dng LPC 0 98 1 84 88 100 6 2 1 4 1 chnh xc trung bnh: 93,6% 91 84 1 98 1 3 13 1 94 99 100 1 2 2 3 4 13 5 6 1 12 7 8 2 9

chn. u tin, dng phng php AMDF nhn dng xem t thuc nhm no, sau s a qua mng neuron th hai xc nh t c th. Nh vy s cn ba mng neuron, hai mng nhn dng theo phng php AMDF v mng th ba nhn dng theo phng php LPC.
Khng, mt, hai, ba, bn, nm, su,

0 1 2 3 4 5 6 7 8 9

Phng php AMDF: Mng 10 nt nhp, 40 nt n, 2 nt xut

Khng, hai, ba, bn, nm su, tm, chn Phng php LPC: Mng 144 nt nhp, 220 nt n, 8 nt xut

Mt, by

V d c nhn dng 100 ln s 1 , nhn dng ng l 84 ln. Nhn dng sai thnh s 4 l 13 ln, s 6 l 1 ln, s 8 l 2 ln . Trong cch nhn dng dng phng php LPC, ta thy c mt s t c pht m gn ging nhau b nhm ln nhiu. V d, t mt v bn , hai, v by, nm, v tm. khc phc hin tng ny, kho st gii php AMDF kt hp LPC. 2. Hun luyn dng LPC kt hp vi AMDF Vi tng cn phi phn chia tp mu nhn dng khc phc nhng hn ch ca phng php LPC, phng php trch c trng dng AMDF t ra c hiu qu khi tch c t by v t mt ra mt nhm ring. Nh trn cp, phng php AMDF ch trch c trng v mt thanh iu ca tn hiu nn t b nh hng v mt pht m nh phng php LPC. Mt khc, nu xt v mt thanh iu th thanh ngang s gn ging vi thanh sc, v hai thanh ny khc xa so vi thanh nng v thanh hi, nn vic tch t mt v t by ra l c th. Phng php nhn dng: Tp mu c chia lm hai nhm: nhm 1 gm nhng t thanh hi v thanh nng, tng ng vi mt v by, nhm 2 gm nhng t thanh ngang v thanh sc, tng ng vi cc t khng, hai, ba, bn, nm, su, tm,

Phng php AMDF Mng 10 nt nhp, 40 nt n, 2 nt xut

Khng Hai Ba Bn Nm Su Tm Chn

Mt

By

Hnh 2: S khi phng php nhn dng Bng 2: Kt qu nhn dng hun luyn mng dng LPC kt hp AMDF 0 1 2 3 4 5 6 7 8 9 0 97 1 99 96 4 1 1 99 1 1 1 1 2 3 4 1 1 1 90 5 2 6 7 8 9

2 1 1 94 1 2 5 4 97 1 95 100 1

95

2 3

chnh xc trung bnh: 96,2%

V d nhn dng ng s 1 l 99 ln , sai s nhn dng l 1 ln . chng xc nhn dng cao nh s kt hp x l hp l. Nhn xt v phng php AMDF: Nh vy vic kt hp hai phng php LPC v AMDF nng cao xc sut nhn dng ng, thi gian hun luyn c tng ln khng ng k. u im ca AMDF l s ng vo t, kch thc mng hun

luyn nh. Mt khc, phng php AMDF l t ph thuc vo cch pht m nn t l c sai s thp hn so vi phng php LPC. Tuy nhin, khuyt im ca phng php AMDF l ch phn bit ting ni v mt thanh iu, do kh nng ng dng trong thc tin l hn ch. Mt khuyt im khc l phng php ny rt kh s dng trong trng hp t c lin tc. So vi AMDF, phng php LPC cho ra kt qu c th hn, nhng c khuyt im l s c trng kh ln v d b tc ng bi cch pht m ca ngi ni. T nhng u v khuyt im ca phng php LPC v phng php AMDF, cho thy hng kt hp hai phng php ny l hp l. Tuy nhin, cn phi c nhng nghin cu y hn na c th m rng ln b t vng ln hn cng nh ng dng phng php AMDF rng ri hn trong thc tin. Lu l s c ri rc, mi ngi c hun luyn c x l c lp. u im ca mng neuron trong nhn dng ting ni. Th nht l v mt tc hun luyn cng nh tc nhn dng, mng neuron t ra vt tri. Th hai l u th trong vic m rng b t vng , c pht trin thm cho nhn dng t iu khin di y.
Bng 3: Kt qu vi b t vng iu khin Ln Xung Tri Phi Ti Lui Xoay Dng Ln 200 Xung 200 Tri 196 4 Phi 200 Ti 200 Lui 198 2 Xoay 1 197 2 Dng 200 chnh xc trung bnh: 99,4%

Mng neuron cng c th nghim vi mt s t ghp nh: xoay tri, xoay phi, xoay ln, xoay xung, i ti, i lui, dng li, tip tc. Mi t c c ngt qung, v d t xoay tri c c c s ngt qung gia t xoay v t tri. Cc kt qu kho st trn cho thy vic nhn dng c chnh xc rt cao.
Bng 4: Kt qu vi b t vng iu khin dng mng neuron Xoay Xoay Xoay Xoay i i Tip Dng ln xung tri phi ti Lui tc li Xoay 199 1 ln Xoay 3 194 3 xung Xoay 199 1 tri Xoay 200 phi i 200 ti i 200 Lui Tip 200 tc Dng 200 li chnh xc trung bnh: 99,5%

VI. HNG PHT TRIN Vic kt hp LPC vi AMDF cho kt qu nhn dng ting ni ting vit rt cao nh kt hp hp l phng php nhn dng. Bc tip theo l pht trin b t vng ln hn, nng cp gii thut. Phng php AMDF lm vic rt tt vi t c ri rc. Tuy nhin, hin nay phng php ny cha c khai thc tt, c th l v n ch phn bit c ting ni v mt thanh iu. Cn c nhng tng kt hp phng php ny vi nhng phng php khc nhm nng cao chnh xc v nhn dng trn nn nhiu Nghin cu kh nng kt hp mng neuron vi nhng cng c khc trong nhn dng ting Vit, v d vi logic m, vi wavelet

3. Nhn dng t iu khin Bi bo ny cng xy dng b t vng nh gm mt s t nhm mc ch iu khin t ng. B t vng: ln, xung, tri, phi, ti, lui, xoay, dng. Vic hun luyn b t iu khin khng khc g so vi hun luyn b t vng s c trnh by phn trn.

Xy dng b nhn dng t c lin tc dng m hnh m v , nhiu ngi ni. TI LIU THAM KHO:
[1] L Tin Thng, Hong nh Chin. Vietnamese Speech Recognition Applied to Robot Communications. Au Journal of Technology, Volume 7 No. 3 January 2004. Published by Assumption University (ABAC) Hua Mak, Bangkok, Thailand. [2] L Tin Thng (ch nhim), Hong nh Chin, Trn Tin c ng dng Wavelets nhn dng ting ni ting vit trong iu khin v thng tin. Bo co nghim thu ti NCKH trng im HQG TP HCM ngy 28-012004. [3] L Tin Thng, Trn Tin c, Nhn dng thanh iu ting ni ting Vit bng mng neuron phn tng, Tp ch Tin hc v iu khin hc, 2004. [4] L.Rabiner and B.H.Juang, Fundamental of speech recognition, Prentice-hall. Englewood Cliffs. New Jersey 07632, 1993. [5] Hong nh Chin, L Tin Thng. An Efficient Approach Combining Wavelets And Neural Networks For Signal Procesing In Digital Communications. Proceedings of IASTED-International Conference on Communication

and Computer Networks in MIT Cambidge, USA. November 08 to November 10, 2004. [6] Cao Xun Ho, Ting Vit my vn ng m ng php ng ngha, Nh xut bn Gio Dc, 1998. [7] Claudio Becchetti and Lucio Prina Ricotti, Speech recognition. Theory & C++ Implementation, Fondazione Ugo Bordoni, Rome, Italy. John Wiley and Sons, LTD. [8] Patrick M.Mills, Fuzzy Speech University of South Carolina-1996. Recognition,

[9] Quch Tun Ngc, X l tn hiu s, Nh Xut bn Gio dc, 1995. [10] L Tin Thng, Trn Tin c, Nhn dng ting ni ting Vit lin tc bng mng neuron, Tp ch Pht trin Khoa hc v Cng ngh, i hc Quc gia TP. HCM, s 10, Tp 5, 2002. Ngy nhn bi: 05/08/2005

S LC TC GI HONG NH CHIN Sinh ngy: 17-4-1955 ti Qung Ngi. Tt nghip i hc MTYCI-Moscow 1979. Nhn bng Thc s in t Vin thng, H Bch khoa TP. HCM nm 1998, nhn bng Tin s nm 2003. Hin ang ging dy ti Khoa in- in t, H Bch khoa TP. HCM, Lnh vc nghin cu: Truyn thng v tinh, x l tn hiu s, h thng truyn thng, wavelets, neuron networks. Email: hdchien@hcmut.edu.vn

You might also like