Professional Documents
Culture Documents
(SPEAKER RECOGNITION)
MC LC
1
M t bi ton .............................................................................................................. 2
Kt qu t c .......................................................................................................... 3
3.1
3.2
4.1.1
4.1.2
4.2
Kt qu t c .................................................................................................... 5
4.3
1 M t bi ton
Cc c trng ca ging ni:
fundamental frequency: c trng v tn s cao(thp) f1, f2, f3
mfcc mel - frequency cepstral coefficient: c trng ca vm ming.
Nhn dng ngi ni da trn c trng MFCC v phng Php GAUSS
2 M t phng php lm
Gm 3 bc chnh:
Bc 1: Chun b data
nhn dng n ngi, to n th mc, mi th mc cha file .wav thu m ging
ni ca tng ngi (30s/file). Trong trng hp mi ngi c nhiu file .wav ta s
ghp ht d liu ca mi ngi thnh mt file duy nht.
Khi ta s c n file cho n ngi (f1,f2,.,fn). y l tp Training Data
Bc 2: Hun luyn cho my bit ging ca tng ngi.
Xy dng n models mi model i din cho mt ngui, tng ng vi vic cn
xy dng n hm Gauss a bin.
M t thut ton xy dng n hm Gauss a bin:
Lp i = 1 ti n // Sau mi bc lp ta s c c hm Gauss ca tng ngi
{
ai = wavread('fi.wav');
bi = mfcc(ai);
}
end lp.
Sau vng lp, ta c n hm Gauss (G1, G2, Gn).
Bc 3: Nhn dng ngi ni
Chun b test data: tp cc file test (x1, x2, ...., xz)
Ln lt a tng file xi vo kim tra. Vi mi xi
Tnh bxi = mfcc(xi)
Tnh n gi tr p t n hm Gauss training ( bc 2)
p1 = G1 (bxi; 1, )
p2 = G2(bxi; 2, )
...
pn= Gn (bxi; n, )
Quyt nh Bayes: file xi l ca ngui i ni nu pi ln nht.
3 Kt qu t c
3.1 Phng php thng k (cch thng k kt qu nhn dng)
Test file: train/s1.wav - Recognize: 1
Test file: train/s2.wav - Recognize: 1
Test file: train/s3.wav - Recognize: 1
Test file: train/s4.wav - Recognize: 4
Test file: train/s5.wav - Recognize: 1
Test file: train/s6.wav - Recognize: 1
Test file: train/s7.wav - Recognize: 1
Test file: train/s8.wav - Recognize: 4
Test file: train/s9.wav - Recognize: 10
Test file: train/s10.wav - Recognize: 10
Test file: train/s11.wav - Recognize: 10
Cho trc M phn phi Gauss p1, p2, , pM, hm mt xc sut ca m hnh chnh l
tng trng ca M phn phi Gauss theo cng thc:
Trong :
-
-15.6611
-16.8774
-28.7799
-28.4373
-14.9673
-13.9481
-14.4844
-29.7783
-29.9242
-15.2371
-14.1288
-13.0496
-28.0165
-28.3526
-31.7188
-30.9641
-31.9618
-9.9211
-9.5705
-36.5947
-33.8140
-36.5920
-10.2937
-8.6886
Kt qu: ng 100%
S dng 5 file training, 5 file test, s lng phn lp Gauss: 2
A=
-15.6719
-16.5151
-18.0927
-28.1566
-27.0804
-15.3359
-14.8170
-14.8154
-23.9762
-24.0877
-15.4261
-14.7310
-13.8210
-25.6009
-25.6359
-35.6308
-34.3877
-34.7713
-11.2017
-11.1764
-37.4841
-36.0404
-36.9911
-11.8098
-10.5282
Kt qu: ng 60%
4.3 So snh kt qu vi phng php Gauss
H thng GMM
c xy dng trong t c 96.8% hiu sut phn lp trn tp d liu r (clean speech
m thanh thu c t micro cht lng tt) gm 10 ngi ni, mi on test c di 8
giy; v t 94.5% hiu sut phn lp trn tp d liu ting ni telephone gm 5 ngi
ni, mi on test di 8 giy.
M hnh Gauss
M hnh ny t 67.1% hiu sut phn lp trn tp d liu ging ni gm 16 ngi ni,
mi on test c di 5 giy