You are on page 1of 5

K THUT NHN DNG TING NI v NG DNG TRONG IU KHIN

TS. Nguyn Vn Gip


KS. Trn Vit Hng
B mn C in t - Khoa C kh i hc Bch Khoa TPHCM
nvgiap@dme.hcmut.edu.vn; tvhong@dme.hcmut.edu.vn


TM TT
Vn nghin cu cc phng php nhn dng
ting ni v ang thu ht rt nhiu s u t v
nghin cu ca cc nh khoa hc trn khp th
gii. Tuy nhin cho n nay kt qu mang li vn
cha hon ton lm hi lng nhng ngi nghin
cu do tnh cht qu phc tp v khng c nh ca
i tng nhn dng l ting ni con ngi. c
bit, i vi ting Vit th kt qu cng cn nhiu
hn ch. Bi bo trnh by mt hng nhn dng
ting ni ting Vit, da trn vic trch c trng
ting ni bng phng php MFCC v b nhn
dng dng mng HMM. Kt qu c kim nghim
thc t bng m hnh xe iu khin t xa.
ABSTRACT
Researching and inventing speech recognition
methods have been paid much considerations by
many scientists over the world. However, the
achievements dont satisfy researchers demands
because of the complexity and unstability of speech
until now. Especially with Vietnamese speech, the
results are more unsatisfied. The paper suggests a
synthetic method for recogniting Vietnamese
speech: extract speechs particularities by MFCC
method and recognize by HMM network. The
results are experimented through a model of RF
controlled car.
1 T VN
1.1 Gii thiu
Ngy nay, cng vi s pht trin ca ngnh in t
v tin hc, cc h thng my t ng dn thay
th con ngi trong nhiu cng on ca cng vic.
My c kh nng lm vic hiu qu v nng sut
cao hn con ngi rt nhiu. Song cho n nay, vn
giao tip ngi my tuy c ci thin
nhiu nhng vn cn rt th cng: thng qua bn
phm v cc thit b nhp d liu khc. Giao tip
vi thit b my bng ting ni s l phng thc
giao tip vn minh v t nhin nht, du n giao
tip ngi my s mt i m thay vo l cm
nhn ca s giao tip gia ngi vi ngi, nu
hon thin th y s l mt phng thc giao tip
tin li v hiu qu nht.
Do c s khc bit v mt ng m gia cc ngn
ng nn ta khng th p dng cc chng trnh
nhn dng khc nhn dng ting Vit. Mt h
thng nhn dng ting ni nc ta phi c xy
dng trn nn tng ca ting ni ting Vit.
1.2 Tnh hnh nghin cu trong v ngoi nc
Vn nhn dng ting ni ting Vit ch mi c
quan tm nghin cu trong nhng nm gn y v
cha c mt chng trnh nhn dng hon chnh
no c cng b.
Trn th gii c rt nhiu h thng nhn dng
ting ni (ting Anh) v ang c ng dng rt
hiu qu nh: Via Voice ca IBM, Spoken Toolkit
ca CSLU (Central of Spoken Laguage Under-
standing) nhng trong ting Vit th cn rt nhiu
hn ch.
1.3 Mc tiu ca ti
ti ny nghin cu th nghim mt hng nhn
dng ting ni - ting Vit da trn vic trch c
trng ca ting ni bng phng php MFCC (Mel-
Frequency Ceptrums Coefficients), v nhn dng
bng m hnh HMM (Hidden Markov Models).
ng thi, mt m hnh iu khin bng ting ni
ting Vit c xy dng vi b t vng nh, thit
lp h thng iu khin bng ting ni vi mt tp
lnh c nh. Tp lnh ny dng iu khin
Robot, v m hnh iu khin xe bng ting ni
hon chnh l mt ng dng thc t mang tnh th
nghim ca ti.
2 XY DNG H THNG NHN DNG
TING NI
Mt h thng nhn dng ni chung thng bao gm
hai phn: phn hun luyn (training phase) v phn
nhn dng (recognition phase). Hun luyn l qu
trnh h thng hc nhng mu chun c cung
cp bi nhng ting khc nhau (t hoc m), t
hnh thnh b t vng ca h thng. Nhn
dng l qu trnh quyt nh xem t no c c
cn c vo b t vng c hun luyn. S
tng qut ca h thng nhn dng ting ni c
th hin trn hnh 1.
thun tin cho vic kim tra v nh gi kt qu,
t s trn chng ti chia chng trnh nhn dng
thnh ba m-un ring bit:
! M-un 1: Thc hin vic ghi m tn hiu ting
ni, tch ting ni khi nn nhiu v lu vo
c s d liu.
! M-un 2: Trch c trng tn hiu ting ni
thu m-un 1 bng phng php MFCC,
ng thi thc hin c lng vector cc
vector c trng ny.
! M-un 3: Xy dng m hnh Markov n vi 6
trng thi, ti u ha cc h s ca HMM
tng ng vi tng t trong b t vng, tin
hnh nhn dng mt t c c vo micro.





2.1 Thc hin m-un 1
Nhim v ca m-un ny l thu tn hiu t micro,
dng k thut x l u cui pht hin phn tn
hiu ting ni v phn tn hiu nhiu. T ta c
th tch ting ni ra khi nn nhiu (ch thu tn hiu
ting ni m khng thu tn hiu nhiu nn).
Tuy c nhiu phng php tch ting ni khc
nhau, nhng qua qu trnh nghin cu v th
nghim cc tc gi nhn thy s kt hp gia
phng php hm nng lng thi gian ngn v t
l qua im zero cho kt qu tt hn.
Phng php ny da vo tnh cht nng lng ca
tn hiu ting ni thng ln hn nng lng ca
tn hiu nhiu v t l qua im zero ca nhiu s
ln hn tn hiu ting ni. Hnh 2 cho thy mi
quan h gia tn hiu thu c, gi tr ca hm
nng lng thi gian ngn v t l qua im zero.

Hnh 2 S tng quan gia tn hiu ting ni v
nn nhiu.
Vi mt ca s kt thc ti mu th m, hm nng
lng thi gian ngn E(m) c xc nh bi:
( ) ( ) ( ) [ ]

=
=
n
n m w n s m E
2
(2.1) [4-6]
th ca hm nng lng thi gian ngn ca mt
on tn hiu c th hin trn hnh 3.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
Signal
Time (s)
Am
p
(a)






0 10 20 30 40 50 60 70 80 90
0
0.5
1
1.5
2
2.5
3
3.5
4
Short-Time Energy
Time (frame)
(b)

Hnh 3 Tn hiu (a)
v nng lng thi gian ngn (b)
T l qua im zero (zero crossing rate) l mt
thng s cho bit s ln m bin tn hiu i qua
im zero trong mt khong thi gian cho trc
c xc nh bi:
( )
( ) { } ( ) { }
( ) n m w
n s n s
N
m Z
m
N m n
s


=

+ = 1
2
1 sgn sgn
1

trong , N l chiu di ca ca s w(m-n).
Nhiu thut ton pht hin u cui c da trn
ln ca tn hiu nng lng thi gian ngn v t
l qua im zero c gng pht hin chnh xc
n mc c th. Qu trnh c bn ca thut ton
nh sau: mt mu tn hiu nh ca nn nhiu c
ly trong sut khong lng (silence) cho n
trc im bt u ca tn hiu ting ni. T y
ngng ting ni c xc nh da trn nng
lng khong lng v nng lng nh. Ban u,
nhng im kt thc c xc nh nhng ni
nng lng tn hiu vt qua ngng ny, sau ta
tnh khong cch gia hai im xem c tho mn
di ca mt t hay khng. Tng t ta p dng
cho t l qua im zero.
V d: tn hiu thu vo t micro bao gm nhiu nn
v ting ni c th nh sau:

Hnh 4 Tn hiu ca t ti.
Qua qu trnh x l theo chu trnh trn ta c c
th dng xung nh sau:
Hnh 1 S tng qut h thng nhn dng ting ni.
Mun 1 Mun 2 Mun 3
Ting ni Nhiu
(2.2)
T l qua im zero
Hm nng lng
thi gian ngn
Hnh 5 Dng xung sau x l kt hp hm nng
lng thi gian ngn v t l qua im zero
T hnh 5 ta thy ch cn xc nh di ti thiu
ca mt t l ta c th tch t ra khi nn nhiu.
n y m-un 1 hon thnh nhim v. y l
mt phn rt quan trng trong mt h thng nhn
dng ting ni, n nh hng rt ln n kt qu
nhn dng.
2.2 Thc hin m-un 2
n y chng ta c c cc mu ting ni
c kh nhiu. M-un 2 thc hin vic trch c
trng cc mu ting ni thu m-un 1. C
nhiu phng php trch c trng khc nhau nh:
wavelets, LPC, MFCC y chn phng php
MFCC (trch c trng theo thang tn s Mel) do
tc tnh ton cao, tin cy ln v c s
dng rt hiu qu trong cc chng trnh nhn dng
ting ni trn th gii.
S gii thut phng php MFCC nh sau:
Hnh 6 Qu trnh tnh cc h s MFCC.
! Ca s ho tn hiu (Windowing)
Nhng phng php nh gi ph c in ch ng
tin cy trong trng hp tn hiu dng (stationary
signal), v d mt tn hiu m nhng c trng l
bt bin i vi thi gian. i vi tn hiu ting ni
th iu ny ch c c trong mt khong thi gian
ngn, vic ny c th thc hin c bng cch
ca s ho mt tn hiu x(n) thnh mt chui
lin tc nhng ca s tun t x
t
(n), t=1,2,,T,
gi l nhng frame.
Trong h thng nhn dng t ng th dng ca s
thng dng nht l Hamming window, p ng
xung ca n l mt hm cosin tng:
( )

= |
.
|

\
|

=
khac n
N n
N
n
n w
0
1 ,..., 0
1
2
cos 46 . 0 54 . 0


! Phn tch ph
Nu nhng gi tr c khong cch u nhau, tc l
xem
N
k
w
2
= , th bin i Fourier ri rc (DFT)
ca tt c cc frame ca tn hiu l:
( ) . 1 ,..., 0 ) (
/ 2
= = N k e X k X
N k j
t t


Bn cnh nu s mu N l bi s ca 2 (N=2p, p
l s nguyn) th phc tp tnh ton s gim
ng k khi dng phng php FFT (Fast Fourier
Transform).
! Lc x l
Nhng nghin cu v sinh l hc chng t rng
mc cm nhn i vi tn s tn hiu ting ni
ca con ngi khng theo mt t l tuyn tnh. ng
vi mi tone l c mt tn s f, c o bng n
v Hz. m t chnh xc s tip nhn tn s ca
h thng thnh gic, ngi ta xy dng mt
thang khc thang Mel. Thang tn s mel tuyn
tnh tn s di 1000 Hz v logarit tn s trn
1000 Hz. Mt quan h nh x tng ng gia thang
tn s thc (vt l, Hz) v thang tn s sinh l Mel
c cho bi cng thc sau:
|
.
|

\
|
+ =
1000
1
2 l og
1000
10
Hz
mel
F
F
hay
|
|
.
|

\
|
+ =
1000
1 l og . 2595
10
Hz
mel
F
F (2.3)
Vic phn tch ph s th hin nhng c trng tn
hiu ting ni m do chnh hnh dng ca vng pht
m to ra. Nhng c trng ph ca tn hiu ting
ni s c c sau khi cho qua nhng b lc. i
vi thang tn s Mel th mt lc cho mi thnh
phn tn s mong mun (hnh 7). B lc ny c p
ng tn s dng tam gic, v khong cch hay bng
thng c xc nh bi mt hng s Mel.
Hnh 7 Mt v d v b lc thang Mel
! Tnh nng lng logarit (LOG)
Cc bc trc ng vai tr lm phng ph, thc
hin mt x l ging nh tai ca con ngi. n
bc ny tnh ton logarit ca bnh phng ln
nhng h s ti ng ra b lc. Ch rng tai ngi
thc hin rt tt vic x l ln v logarit. Hn
th na, x l ln th loi b nhng thng tin
khng cn thit trong khi x l logarit thc hin
mt nn ng, trch c trng t nhy i vi nhng
bin i ng.
! Tnh ph tn s mel
Bc cui cng trong vic tnh ph tn s mel
(MFCC) bao gm thc hin bin i ngc DFT
trn ln logarit ca ng ra ca b lc.
Ch rng do nng lng ph log l thc v i
xng nn bin i DFT ngc c ni gn l
chuyn i cosine ri rc (Discrete Cosine
Transform DCT). Tnh cht ca DCT l to ra
nhng c trng rt khc nhau. DCT cng c tc
dng lm phng ph nu ch c nhng h s u
tin c gi li. Trong nhn dng ting ni th s
h s MFCC thng nh hn 15. [6]
Sau khi tn hiu ting ni c trch c trng th
mi t c c c trng bi mt ma trn h s
thc. Do m hnh HMM ri rc c ng dng
nhn dng nn nhng vector c trng ny phi
c c lng vector (VQ) thnh mt ch s
codebook ri rc. Thut ton ph bin dng thit
k codebook l LBG (Linde, Buzo v Gray).

Hnh 8 c lng vector VQ trong nhn dng.
Phng php c s dng c lng vector l
phng php K-means.
2.3 Thc hin m-un 3
Sau khi thc hin xong 2 m-un trn th chng
ta c mt c s d liu cc vector c trng ng
vi tng t. Trong m un ny chng ta s xy
dng mt m hnh Markov n vi d liu hun
luyn l cc vector c trng c c t m-un 2.
S hun luyn v nhn dng bng m hnh
HMM c th hin trn hnh 9 vi b t vng
gm 3 t: ti, lui, tri.

Hun luyn:







Nhn dng:
= O


Hnh 9 S m hnh HMM
ng vi mi t cn nhn dng th chng ta c mt
c s d liu cc c trng t cc ln c khc
nhau (nh trn s l 3 ln ly mu). Sau ta s
c lng cc thng s ca m hnh ( ) , , B A =
xc sut P(O|) t cc i, tng ng vi mi
t l mt xc nh. nhn dng mt t th ta ch
vic tnh xc sut chui quan st ca t ng vi
cc c hun luyn, v chn mu no c xc
sut ln nht.
Da vo cc ti liu tham kho v nhng thng tin
v cc h thng nhn dng xy dng thnh cng
chng ti thy rng: i vi nhn dng tn hiu
ting ni th m hnh HMM thng c chn l
m hnh tri phi (left-right) c t 5 n 6 trng
thi. Qua qu trnh th nghim, m hnh c 6 trng
thi cho kt qu tt hn nn trong chng trnh ca
mnh, cc tc gi xy dng mt HMM vi s
trng thi l 6, xem hnh 10.
Hnh 10 M hnh HMM tri phi vi 6 trng thi.
3 M HNH H THNG XE IU KHIN
S m hnh xe v tuyn iu khin bng ting
ni t my tnh c trnh by trn hnh 11.

ti

lui

tri

Ti Lui Tri
Nhng mu
hun luyn
c lng
thng s
P(O/
ti
) P(O/
lui
) P(O/
tri
)














Xe v tuyn c th c iu khin t xa bng
ting ni t my tnh. Ting ni l t lnh s c
thu vo v nhn dng trn b nhn dng ting ni,
v cp chui t nhn dng c cho b quyt nh
xut lnh iu khin thng qua cng COM. Mt
mch giao tip my tnh thng qua cng ni tip
(RS232) c thit k iu khin. Mch giao
tip nhn tn hiu v ng m cc kho chuyn
thnh tn hiu ca b iu khin t xa. Mi khi c
mt kho c ng hoc mt t hp phm c
nhn, b iu khin t xa s m ha thch hp v
a ra anten pht. Tn hiu iu khin c iu
ch v truyn n xe bng sng v tuyn vi tn s
sng mang F
C
= 27MHz. B iu khin trn xe s
tin hnh iu khin vn hnh xe. M hnh hot
ng tt vi b t vng gm 4 t: phi, tri, ti, lui
vi kt qu tt (99%).
4 KT LUN
M hnh th nghim nhn dng ting ni ting Vit
theo hng kt hp MFCC v HMM tuy cn nhiu
hn ch nhng p ng c mc tiu ca ti.
Chng trnh c s dng iu khin robot vi
b t vng nh (di 16 t) cho chnh xc c
th chp nhn c (trn 90%). Trong thi gian ti
nhm tc gi s ti u ha chng trnh nhn dng
t c kt qu cao hn v tng tc x l.
TI LIU THAM KHO
1. GS. Phm Vn t , K thut lp trnh C, Nh
xut bn Khoa Hc v K Thut, 1999.
2. Nguyn Hong Hi Nguyn Khc Kim, Lp
trnh Matlab, Nh xut bn Khoa Hc v K
Thut, 2003.
3. PGS.TS. Nguyn Hu Phng, X l tn hiu
s, Nh xut bn Giao thng vn ti, 2000.
4. L Tin Thng, X l tn hiu s v wavelets,
Nh xut bn i Hc Quc Gia TP. H Ch
Minh, 2002.














5. Claudio Becchetti and Lucio Prina Ricotti,
Speech Recognition Theory and C++
Implementation, JOHN WILEY & SONS,
LTD, 2000.
6. Gordon E.Pelton, Voice Processing, McGraw
Hill, 1992.
7. John R.Deller & John G.Proakis & John H. L.
Hansen, Discrete Time Processing of Speech
Signals, Macmillan Publishing Company,
1993.
8. F.J. Owens, Signal Processing of Speech,
Macmillan, 1993.

B iu khin
trn xe
phi tri
ti
lui
anten
thu
phi tri ti lui
B iu khin t xa
SW
1
SW
2
SW
3
SW
4

anten
pht
Hnh 11 S tng quan h thng th nghim

You might also like