Professional Documents
Culture Documents
H CH MINH
11520094
11520603
Lp:
Lp mn hc:
Kho:
KHTN2011
C110.E11
2011
11520094
11520603
Lp:
Lp mn hc:
Kho:
KHTN2011
CS110.E11
2011
M U
Tm quan trng ca vic hc trong tri thc ca con ngi l lun
l vn t ln hng u. Trong tin hc, khi m cc h chuyn gia
cha p ng cc vn cn gii quyt. ng thi vic cp nht s
thay i t nhin l vic rt tn km. Gii php t ra l cho cc my
tnh t ng hc v gii quyt cc vn da trn nhng d liu thc
t. Hc my (Machine learning) l mt nhnh quan trng ca tr tu
nhn to nghin cu cac phng php, k thut cho php my tnh c
th t ng hc d liu gii quyt mt vn c th no .
Trong qu trnh tip nhn tri thc ca con ngi. Phn loi
(Classification) l mt qu trnh t nhin gip cho vic tip nhn v tri
thc c th c h thng lu tr c th. C nhiu phng php phn
loi c nghin cu v c p dng. Hin nay, phng php phn
loi Support Vector Machines l mt trong nhng phng php mnh v
hiu qu gii quyt cc bi ton phn lp phi tuyn c Vapnik v
Chervonenkis gii thiu vo nm 1995.
LI CM N
Li u tin chng em xin c by t lng bit n su sc nht
ti ThS. Nguyn nh Hin, Khoa Khoa hc my tnh, i hc Cng ngh
thng tin HQG-HCM, ngi tn tnh hng dn nhm em kin thc
cn bn v thit yu hon thnh n.
Tip n, xin cm n cc bn lp CS110.E11 cng nhm trao
i v b sung kin thc h tr cho vic hon thin n.
MC LC
DANH MC HNH V ............................................................ B
CHNG I. TNG QUAN N ............................................. 1
1. Ni dung chung ...................................................... 1
2. Phn cng thc hin ............................................... 1
CHNG II. CC KHI NIM C BN ..................................... 2
1. Bi ton phn lp
v phng php Support Vector Machines. ................ 2
2. Phn lp tuyn tnh ................................................ 2
3. Ma trn GRAM ........................................................ 3
4. Khng gian c trng.............................................. 3
5. Hm ht nhn ........................................................ 4
6. Kt chng ............................................................ 6
CHNG III. PHNG PHP SUPPORT VECTOR MACHINES ....... 7
1. tng ................................................................. 7
2. C s l thuyt ....................................................... 8
3. Bi ton phn 2 lp vi SVM .................................... 8
4. Bi ton phn a lp vi SVM ................................... 13
5. Cc bc chnh ca phng php SVM ...................... 13
6. So snh v mt s ci tin ....................................... 13
CHNG IV. NG DNG ....................................................... 15
1. Mt s ng dng ca SVM ....................................... 15
2. M phng phng php SVM
qua biu din hnh hc bng Matlab .......................... 16
3. Xy dng chng trnh lc th rc ............................ 16
CHNG V. TNG KT .......................................................... 21
PH LC
Chy th m phng phng php SVM
qua biu din hnh hc bng Matlab .......................... 22
THAM KHO ........................................................................ 25
A
DANH MC HNH NH
Hnh 2.1 nh x t khng gian d liu X
sang khng gian c trng F ....................................... 4
Hinh 2.2 V d hm ht nhn ................................................... 4
Hnh 3.1 M t phng php SVM ............................................ 8
Hnh 3.2 Tp d liu c phn chia tuyn tnh ......................... 9
Hnh 3.3 Tp d liu phn chia tuyn tnh nhng c nhiu .......... 10
Hnh 3.4 Tp d liu khng phn chia tuyn tnh ....................... 11
Hnh 3.5 V d biu din tp d liu trn khng gian 2 chiu ....... 12
Hnh 4.1 Phn loi biu cm .................................................... 15
Hnh 4.2 Phn loi nh ........................................................... 15
Hnh 4.3 Tin x l email ........................................................ 17
Hnh 4.4 Trch xut thng tin email .......................................... 18
Hnh 4.5 Qu trnh hc d liu ................................................. 19
Hnh 4.6 Cc t mu c kh nng l spam ................................ 19
Hnh 4.7 Chy th chng trnh ............................................... 20
CHNG I
TNG QUAN N
1. Ni dung chung.
* Tn ti:
Phng php Support Vector Machines: L thuyt v ng dng.
* Ging vin hng dn:
ThS. Nguyn nh Hin.
* Sinh vin thc hin:
Nguyn Tr Hi, Nguyn Hong Ngha.
* Kho hc:
Kho 2011 (Ngy nhp hc: Thng 09/2011).
* Thng tin lin lc ca sinh vin:
STT Tn
MSSV
Email
1
Nguyn Tr Hi
11520094 11520094@gm.uit.edu.vn
2
Nguyn Hong Ngha 11520603 11520603@gm.uit.edu.vn
* Chng trnh, ng dng s dng:
- Chng trnh lp trnh: Mathworks Matlab R2012b.
- Mi trng lp trnh: Matlab.
2. Phn cng thc hin.
Phn cng
Thc hin
Tm kim, tng hp ti liu
C nhm
Ln ni dung cn lm cho ti, tm kim ti liu:
- Thi gian thc hin.
- Ni dung l thuyt, chng trnh, ng dng h tr.
Thc hin n
Nguyn Tr Hi
- Tm hiu, xy dng ni dung ng dng
Nguyn Hong Ngha - Tng hp, xy dng ni dung l thuyt.
C nhm
- Vit bo co, trnh by slide
- Sa li
CHNG II.
CC KHI NIM C BN
1. Bi ton phn lp v phng php Support Vector Machines.
Phn lp l qu trnh nhm cc i tng ging nhau vo mt lp da trn
cc c trng d liu ca chng. Khi nghin cu mt i tng, hin tng, chng ta
ch c th da vo mt s hu hn cc c trng ca chng. Ni cch khc, ta ch xem
xt biu din ca i tng, hin tng trong mt khng gian hu hn chiu, mi chiu
ng vi mt c trng c la chn.
Bi ton phn lp c th c m t nh sau:
Cho tp mu = {( , ) | ; ; = 1,
}
Tm nh x : , sao cho () 0, ( , ) , =
1,
Trong :
N: S mu
Xi: Mu d liu th i;
Ci: lp ca mu d liu th i.
nh x y c th hiu l mt m hnh cc quy tc, cc lut xc nh tng
i tng ang xt thuc v lp no da trn cc c trng ca chng. Trong thc t,
vic xc nh cc c trng phn lp b nh hng bi nhiu yu t gy kh khn
cho qu trnh phn lp nh: i tng c nhiu thuc tch, xc nh cc thuc tnh no
l cn thit, cc thuc tnh no l khng cn thit.
Bi ton phn lp s dng phng php Support vector machines (SVM) nhm mc
ch tm mt siu phng c bin cc i gia lp mu m v lp mu dng. ng thi
cc tiu ho cc mu khng phn chia c trong tp hun luyn. SVM a trn c s
ton hc vng chc. Tuy nhin, vic hun luyn mu s dng SVM i hi phi gii
bi ton ti u nhiu bin. Ban u, SVM c pht trin gii cc bi ton phn lp,
v sau do tnh u vit, n cn c s dng rng ri gii cc bi ton hi quy.
2. Phn lp tuyn tnh.
Vic phn lp nh phn c thc hin bng cch s dng hm gi tr thc
= ( . 1 ) +
=1
, > 0
,
3. Ma trn GRAM.
Cho tp {1 , 1 , , } cc vector trong khng gian tch v hng X, ma trn G kch
thc vi , = , c gi l ma trn GRAM.
c im quan trng ca ma trn GRAM l: cc d liu u vo cho cc chng
trnh tng hp hoc khi qut hon ton c th biu din thng qua ma trn GRAM.
4. Khng gian c trng.
S phc tp ca hm mc tiu dn n qu trnh hc ph thuc vo cch n c
din t. Khi din t d liu mt cch ph hp, vn hc s tr nn d dng. V vy,
mt vic lm rt ph bin trong hc my l chuyn i d liu t khng gian u vo
X sang khng gian c trng:
= (1 , 2 , , ) () = (1 (), , ())
Trong n l s chiu ca u vo (s thuc tnh) v N l s chiu ca khng gian
c trng. D liu s c chuyn vo khng gian c trng vi N > n.
3
x. z2 = ( , ) = (1 1 + 2 2 )2
=1
= 1 2 1 2 + 21 1 2 2 + 2 2 2 2
= (12 , 21 2 , 22 ). (12 , 21 2 , 22 )
= (). ()
Ta thy khi chn (, ) = (). () = x. z2 . V vy ta khng cn phi tnh
nh x .
b) Hm ht nhn trong my hc tuyn tnh:
My hc khng tuyn tnh trn khng gian u vo c xy dng qua hai bc:
trc tin s dng mt nh x khng tuyn tnh chuyn i d liu vo khng gian
c trng v sau s dng my hc phn lp tuyn tnh trong khng gian c trng.
My hc tuyn tnh trong khng gian c trng tng ng vi hm:
() = . () +
=1
Chng ta khng cn xc nh tng minh trng s w, khi trin khai tip bng cch
a vo vector = =1 . , ta c:
() = . . ( ). (x) +
=1
() = ( , ) +
=1
t SV l tp cc ch s i tho mn 0, ta c:
() = ( , ) +
1.
1.
, 1 < . < 1
(, ) =
1
1.
, 1 < . < 1
CHNG III.
PHNG PHP SUPPORT VECTOR MACHINES
Trong thi i cng ngh thng tin hin nay, s pht trin ca cng ngh ko theo
s gia tng rt ln ca lu lng thng tin lu tr v trao i. Do , yu cu v t chc
lu tr v truy cp thng tin sao cho hiu qu c t ln hng u. Hng gii quyt
c a ra l t chc, tm kim v phn loi thng tin mt cch hiu qu. Bn thn con
ngi trong i sng cng tip nhn th gii xung quanh thng qua s phn loi v t chc
ghi nh tri thc mt cch hiu qu. Phn loi thng qua cc lp v m t cc lp gip cho
tri thc c nh dng v lu tr trong .
Support Vector Machines (SVM) l mt phng php phn loi xut pht t l
thuyt hc thng k, da trn nguyn tc ti thiu ri ro cu trc (Structural Risk
Minimisation). SVM s c gng tm cch phn loi d liu sao cho c li xy ra trn tp
kim tra l nh nht (Test Error Minimisation). y l mt phng php mi trong lnh
vc tr tu nhn to. Vo thi k u khi SVM xut hin, kh nng tnh ton ca my tnh
cn rt hn ch, nn phng php SVM khng c lu tm. Tuy nhin, t nm 1995 tr
li y, cc thut ton s dng cho SVM pht trin rt nhanh, cng vi kh nng tnh ton
mnh m ca my tnh, c c nhng ng dng rt to ln.
1. tng.
Cho trc mt tp hun luyn, c biu din trong khng gian vector, trong
mi ti liu l mt im, phng php ny tm ra mt siu phng f quyt nh tt nht
c th chia cc im trn khng gian ny thnh hai lp ring bit tng ng l lp +
v lp . Cht lng ca siu phng ny c quyt nh bi khong cch (gi l
bin) ca im d liu gn nht ca mi lp n mt phng ny. Khi , khong cch
bin cng ln th mt phng quyt nh cng tt, ng thi vic phn loi cng chnh
xc.
tng ca n l nh x (tuyn tnh hoc phi tuyn) d liu vo khng gian cc
vector c trng (space of feature vectors) m mt siu phng ti u c tm ra
tch d liu thuc hai lp khc nhau.
Mc ch ca phng php SVM l tm c khong cch bin ln nht:
+ = 0
t (
) = (
.
+ ) = {
+1,
.
+ > 0
1,
.
+ < 0
Nh vy, (
) biu din s phn lp ca
vo hai lp nh nu.
Ta ni = +1 nu
thuc lp I v = 1 nu
thuc lp II.
3. Bi ton phn 2 lp vi SVM.
Bi ton t ra l: Xc nh hm phn lp phn lp cc mut trong tng lai,
ngha l vi mt mu d liu mi th cn phi xc nh c phn vo lp +1 hay
lp 1.
Ta xt 3 trng hp, mi trng hp s c 1 bi ton ti u, gii c bi ton ti
u ta s tm c siu phng cn tm.
Trng hp 1:
Tp D c th phn chia tuyn tnh c m khng c nhiu (tt c cc im c
gn nhn +1 thuc v pha dng ca siu phng, tt c cc im c gn nhn -1
thuc v pha m ca siu phng).
Trng hp 2:
Tp d liu D c th phn chia tuyn tnh c nhng c nhiu. Trong trng hp
ny, hu ht cc im u c phn chia ng bi siu phng. Tuy nhin c 1 s im
b nhiu, ngha l: im c nhn dng nhng li thuc pha m ca siu phng, im
c nhn m nhng li thuc pha dng ca siu phng.
1
((, )) = 2 +
2
=1
( . + ) 1 , = 1, ,
{
0, = 1, ,
Trong C l tham s xc nh trc, nh ngha gi tr rng buc, C cng ln th
mc vi phm i vi nhng li thc nghim (l li xy ra lc hun luyn, tnh bng
thng s ca s phn t li v tng s phn t hun luyn) cng cao.
Trng hp 3:
Tp d liu D khng th phn chia tuyn tnh c, ta s nh x cc vector d liu
x t khng gian n chiu vo mt khng gian m chiu (m>n), sao cho trong khng gian
m chiu, D c th phn chia tuyn tnh c.
10
1
((, )) = 2 +
2
=1
(( ). + ) 1 , = 1, ,
{
0, = 1, ,
V d:
d hiu hn chng ta xt v d m t hnh hc sau: Xt trong khng gian
2 chiu (n=2), tp d liu c cho bi tp cc im trn mt phng.
11
6. So snh v mt s ci tin.
Mt s phng php nh mng neuron, fuzzy logic, mng fuzzy-neuron, , cng
c s dng thnh cng gii quyt bi ton phn lp. u im ca cc phng
php ny l khng cn xc nh m hnh ton i ca i tng (Gii quyt tt vi cc
h thng ln v phc tp).
SVM c 2 c trng c bn:
- N lun kt hp vi cc d liu c ngha v mt vt l, do vy d dng gii
thch c mt cch tng minh.
- Cn mt tp cc mu hun luyn rt nh.
13
Phng php SVM hin nay c xem l mt cng c mnh v tinh vi nht hin
nay cho nhng bi ton phn lp phi tuyn. N c m s bin th nh C SVC, v
SVC. Ci tin mi nht hin nay ca phng php SVM c cng b l thut ton
NNSRM (Nearest Neighbor and Structural Risk Minimization) l s kt hp gia 2 k
thut SVM v Nearest Neighbor.
14
CHNG IV.
NG DNG
1. Mt s ng dng ca SVM.
SVM v cc bin th, ci tin ca n hin nay c rt nhiu ng dng thit thc vo
gii quyt cc vn thc t. Mt s ng dng ni bt ca n l:
-
2. M phng phng php SVM qua biu din hnh hc bng Matlab.
Cc tp tin trong project Matlab:
setup.m : M phng qu trnh phn lp bng SVM
data1.mat : Tp d liu mu 1
data2.mat : Tp d liu mu 2
data3.mat : Tp d liu mu 3
svmTrain.m : Hm o to (training) bng SVM
svmPredict.m : Hm d on (prediction) bng SVM
plotData.m : Biu d liu 2 chiu
visualizeBoundaryLinear.m : Biu ng tuyn tnh
visualizeBoundary.m : Biu ng phi tuyn
linearKernel.m : Hm tuyn tnh cho SVM
gaussianKernel.m: Hm Gaussian cho SVM
dataset3Params.m: Tham s s dng cho tp d liu mu 3
Chng m phng cc tp d liu mu ln th 2 chiu v tm siu phng vi
cc d liu cho trc v hm ht nhn Gaussian (Gaussian Kernel):
2
( (),
() )
()
() 2
=1( )
() ()
) = (
)
= exp (
2 2
2 2
17
1
0
Mt c trng ca email c dng: = .
1
0
[0 ]
18
19
20
CHNG V.
TNG KT
Phng php Support Vector Machines hin l mt trong cc phng php tinh vi
nht cho bi ton phn lp phi tuyn. M rng v cc ng dng ca n ang c tip
tc nghin cu thm v pht trin lm tng tnh hiu qu.
n nhm cung cp nhng khi nim c bn, ni dung cng nh cc bc thc
hin ca phng php Support Vector Machines n gin. ng thi cng cung cp
thm mt s thng tin v cc m rng ca phng php ny. Phn ng dng gii thiu
mt cc ng dng ca phng php c th p dng v xy dng li chng trnh lc
th rc trn Matlab.
21
PH LC
* Chy th m phng phng php SVM qua biu din hnh hc bng Matlab.
Tp d liu mu 1:
Chy chng trnh bng cch g dng lnh : setup
22
23
Tp d liu mu 3:
24
25