Professional Documents
Culture Documents
=
=
n
n m w n s m E
2
(2.1) [4-6]
th ca hm nng lng thi gian ngn ca mt
on tn hiu c th hin trn hnh 3.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
Signal
Time (s)
Am
p
(a)
0 10 20 30 40 50 60 70 80 90
0
0.5
1
1.5
2
2.5
3
3.5
4
Short-Time Energy
Time (frame)
(b)
Hnh 3 Tn hiu (a)
v nng lng thi gian ngn (b)
T l qua im zero (zero crossing rate) l mt
thng s cho bit s ln m bin tn hiu i qua
im zero trong mt khong thi gian cho trc
c xc nh bi:
( )
( ) { } ( ) { }
( ) n m w
n s n s
N
m Z
m
N m n
s
=
+ = 1
2
1 sgn sgn
1
trong , N l chiu di ca ca s w(m-n).
Nhiu thut ton pht hin u cui c da trn
ln ca tn hiu nng lng thi gian ngn v t
l qua im zero c gng pht hin chnh xc
n mc c th. Qu trnh c bn ca thut ton
nh sau: mt mu tn hiu nh ca nn nhiu c
ly trong sut khong lng (silence) cho n
trc im bt u ca tn hiu ting ni. T y
ngng ting ni c xc nh da trn nng
lng khong lng v nng lng nh. Ban u,
nhng im kt thc c xc nh nhng ni
nng lng tn hiu vt qua ngng ny, sau ta
tnh khong cch gia hai im xem c tho mn
di ca mt t hay khng. Tng t ta p dng
cho t l qua im zero.
V d: tn hiu thu vo t micro bao gm nhiu nn
v ting ni c th nh sau:
Hnh 4 Tn hiu ca t ti.
Qua qu trnh x l theo chu trnh trn ta c c
th dng xung nh sau:
Hnh 1 S tng qut h thng nhn dng ting ni.
Mun 1 Mun 2 Mun 3
Ting ni Nhiu
(2.2)
T l qua im zero
Hm nng lng
thi gian ngn
Hnh 5 Dng xung sau x l kt hp hm nng
lng thi gian ngn v t l qua im zero
T hnh 5 ta thy ch cn xc nh di ti thiu
ca mt t l ta c th tch t ra khi nn nhiu.
n y m-un 1 hon thnh nhim v. y l
mt phn rt quan trng trong mt h thng nhn
dng ting ni, n nh hng rt ln n kt qu
nhn dng.
2.2 Thc hin m-un 2
n y chng ta c c cc mu ting ni
c kh nhiu. M-un 2 thc hin vic trch c
trng cc mu ting ni thu m-un 1. C
nhiu phng php trch c trng khc nhau nh:
wavelets, LPC, MFCC y chn phng php
MFCC (trch c trng theo thang tn s Mel) do
tc tnh ton cao, tin cy ln v c s
dng rt hiu qu trong cc chng trnh nhn dng
ting ni trn th gii.
S gii thut phng php MFCC nh sau:
Hnh 6 Qu trnh tnh cc h s MFCC.
! Ca s ho tn hiu (Windowing)
Nhng phng php nh gi ph c in ch ng
tin cy trong trng hp tn hiu dng (stationary
signal), v d mt tn hiu m nhng c trng l
bt bin i vi thi gian. i vi tn hiu ting ni
th iu ny ch c c trong mt khong thi gian
ngn, vic ny c th thc hin c bng cch
ca s ho mt tn hiu x(n) thnh mt chui
lin tc nhng ca s tun t x
t
(n), t=1,2,,T,
gi l nhng frame.
Trong h thng nhn dng t ng th dng ca s
thng dng nht l Hamming window, p ng
xung ca n l mt hm cosin tng:
( )
= |
.
|
\
|
=
khac n
N n
N
n
n w
0
1 ,..., 0
1
2
cos 46 . 0 54 . 0
! Phn tch ph
Nu nhng gi tr c khong cch u nhau, tc l
xem
N
k
w
2
= , th bin i Fourier ri rc (DFT)
ca tt c cc frame ca tn hiu l:
( ) . 1 ,..., 0 ) (
/ 2
= = N k e X k X
N k j
t t
Bn cnh nu s mu N l bi s ca 2 (N=2p, p
l s nguyn) th phc tp tnh ton s gim
ng k khi dng phng php FFT (Fast Fourier
Transform).
! Lc x l
Nhng nghin cu v sinh l hc chng t rng
mc cm nhn i vi tn s tn hiu ting ni
ca con ngi khng theo mt t l tuyn tnh. ng
vi mi tone l c mt tn s f, c o bng n
v Hz. m t chnh xc s tip nhn tn s ca
h thng thnh gic, ngi ta xy dng mt
thang khc thang Mel. Thang tn s mel tuyn
tnh tn s di 1000 Hz v logarit tn s trn
1000 Hz. Mt quan h nh x tng ng gia thang
tn s thc (vt l, Hz) v thang tn s sinh l Mel
c cho bi cng thc sau:
|
.
|
\
|
+ =
1000
1
2 l og
1000
10
Hz
mel
F
F
hay
|
|
.
|
\
|
+ =
1000
1 l og . 2595
10
Hz
mel
F
F (2.3)
Vic phn tch ph s th hin nhng c trng tn
hiu ting ni m do chnh hnh dng ca vng pht
m to ra. Nhng c trng ph ca tn hiu ting
ni s c c sau khi cho qua nhng b lc. i
vi thang tn s Mel th mt lc cho mi thnh
phn tn s mong mun (hnh 7). B lc ny c p
ng tn s dng tam gic, v khong cch hay bng
thng c xc nh bi mt hng s Mel.
Hnh 7 Mt v d v b lc thang Mel
! Tnh nng lng logarit (LOG)
Cc bc trc ng vai tr lm phng ph, thc
hin mt x l ging nh tai ca con ngi. n
bc ny tnh ton logarit ca bnh phng ln
nhng h s ti ng ra b lc. Ch rng tai ngi
thc hin rt tt vic x l ln v logarit. Hn
th na, x l ln th loi b nhng thng tin
khng cn thit trong khi x l logarit thc hin
mt nn ng, trch c trng t nhy i vi nhng
bin i ng.
! Tnh ph tn s mel
Bc cui cng trong vic tnh ph tn s mel
(MFCC) bao gm thc hin bin i ngc DFT
trn ln logarit ca ng ra ca b lc.
Ch rng do nng lng ph log l thc v i
xng nn bin i DFT ngc c ni gn l
chuyn i cosine ri rc (Discrete Cosine
Transform DCT). Tnh cht ca DCT l to ra
nhng c trng rt khc nhau. DCT cng c tc
dng lm phng ph nu ch c nhng h s u
tin c gi li. Trong nhn dng ting ni th s
h s MFCC thng nh hn 15. [6]
Sau khi tn hiu ting ni c trch c trng th
mi t c c c trng bi mt ma trn h s
thc. Do m hnh HMM ri rc c ng dng
nhn dng nn nhng vector c trng ny phi
c c lng vector (VQ) thnh mt ch s
codebook ri rc. Thut ton ph bin dng thit
k codebook l LBG (Linde, Buzo v Gray).
Hnh 8 c lng vector VQ trong nhn dng.
Phng php c s dng c lng vector l
phng php K-means.
2.3 Thc hin m-un 3
Sau khi thc hin xong 2 m-un trn th chng
ta c mt c s d liu cc vector c trng ng
vi tng t. Trong m un ny chng ta s xy
dng mt m hnh Markov n vi d liu hun
luyn l cc vector c trng c c t m-un 2.
S hun luyn v nhn dng bng m hnh
HMM c th hin trn hnh 9 vi b t vng
gm 3 t: ti, lui, tri.
Hun luyn:
Nhn dng:
= O
Hnh 9 S m hnh HMM
ng vi mi t cn nhn dng th chng ta c mt
c s d liu cc c trng t cc ln c khc
nhau (nh trn s l 3 ln ly mu). Sau ta s
c lng cc thng s ca m hnh ( ) , , B A =
xc sut P(O|) t cc i, tng ng vi mi
t l mt xc nh. nhn dng mt t th ta ch
vic tnh xc sut chui quan st ca t ng vi
cc c hun luyn, v chn mu no c xc
sut ln nht.
Da vo cc ti liu tham kho v nhng thng tin
v cc h thng nhn dng xy dng thnh cng
chng ti thy rng: i vi nhn dng tn hiu
ting ni th m hnh HMM thng c chn l
m hnh tri phi (left-right) c t 5 n 6 trng
thi. Qua qu trnh th nghim, m hnh c 6 trng
thi cho kt qu tt hn nn trong chng trnh ca
mnh, cc tc gi xy dng mt HMM vi s
trng thi l 6, xem hnh 10.
Hnh 10 M hnh HMM tri phi vi 6 trng thi.
3 M HNH H THNG XE IU KHIN
S m hnh xe v tuyn iu khin bng ting
ni t my tnh c trnh by trn hnh 11.
ti
lui
tri
Ti Lui Tri
Nhng mu
hun luyn
c lng
thng s
P(O/
ti
) P(O/
lui
) P(O/
tri
)
Xe v tuyn c th c iu khin t xa bng
ting ni t my tnh. Ting ni l t lnh s c
thu vo v nhn dng trn b nhn dng ting ni,
v cp chui t nhn dng c cho b quyt nh
xut lnh iu khin thng qua cng COM. Mt
mch giao tip my tnh thng qua cng ni tip
(RS232) c thit k iu khin. Mch giao
tip nhn tn hiu v ng m cc kho chuyn
thnh tn hiu ca b iu khin t xa. Mi khi c
mt kho c ng hoc mt t hp phm c
nhn, b iu khin t xa s m ha thch hp v
a ra anten pht. Tn hiu iu khin c iu
ch v truyn n xe bng sng v tuyn vi tn s
sng mang F
C
= 27MHz. B iu khin trn xe s
tin hnh iu khin vn hnh xe. M hnh hot
ng tt vi b t vng gm 4 t: phi, tri, ti, lui
vi kt qu tt (99%).
4 KT LUN
M hnh th nghim nhn dng ting ni ting Vit
theo hng kt hp MFCC v HMM tuy cn nhiu
hn ch nhng p ng c mc tiu ca ti.
Chng trnh c s dng iu khin robot vi
b t vng nh (di 16 t) cho chnh xc c
th chp nhn c (trn 90%). Trong thi gian ti
nhm tc gi s ti u ha chng trnh nhn dng
t c kt qu cao hn v tng tc x l.
TI LIU THAM KHO
1. GS. Phm Vn t , K thut lp trnh C, Nh
xut bn Khoa Hc v K Thut, 1999.
2. Nguyn Hong Hi Nguyn Khc Kim, Lp
trnh Matlab, Nh xut bn Khoa Hc v K
Thut, 2003.
3. PGS.TS. Nguyn Hu Phng, X l tn hiu
s, Nh xut bn Giao thng vn ti, 2000.
4. L Tin Thng, X l tn hiu s v wavelets,
Nh xut bn i Hc Quc Gia TP. H Ch
Minh, 2002.
5. Claudio Becchetti and Lucio Prina Ricotti,
Speech Recognition Theory and C++
Implementation, JOHN WILEY & SONS,
LTD, 2000.
6. Gordon E.Pelton, Voice Processing, McGraw
Hill, 1992.
7. John R.Deller & John G.Proakis & John H. L.
Hansen, Discrete Time Processing of Speech
Signals, Macmillan Publishing Company,
1993.
8. F.J. Owens, Signal Processing of Speech,
Macmillan, 1993.
B iu khin
trn xe
phi tri
ti
lui
anten
thu
phi tri ti lui
B iu khin t xa
SW
1
SW
2
SW
3
SW
4
anten
pht
Hnh 11 S tng quan h thng th nghim