You are on page 1of 32

I HC QUC GIA TP.

H CH MINH

TRNG I HC CNG NGH THNG TIN


KHOA KHOA HC MY TNH
BO CO N
NHP MN CNG NGH TRI THC V MY HC

PHNG PHP SUPPORT VECTOR MACHINES


L THUYT V NG DNG

Ging vin hng dn: THS. NGUYN NH HIN


Sinh vin thc hin:
NGUYN TR HI

11520094

NGUYN HONG NGHA

11520603

Lp:
Lp mn hc:
Kho:

KHTN2011
C110.E11
2011

TP. H Ch Minh, Ngy 04 thng 12 nm 2013

I HC QUC GIA TP. H CH MINH

TRNG I HC CNG NGH THNG TIN


KHOA KHOA HC MY TNH
BO CO N
NHP MN CNG NGH TRI THC V MY HC

PHNG PHP SUPPORT VECTOR MACHINES


L THUYT V NG DNG

Ging vin hng dn: THS. NGUYN NH HIN


Sinh vin thc hin:
NGUYN TR HI

11520094

NGUYN HONG NGHA

11520603

Lp:
Lp mn hc:
Kho:

KHTN2011
CS110.E11
2011

TP. H Ch Minh, Ngy 04 thng 12 nm 2013

M U
Tm quan trng ca vic hc trong tri thc ca con ngi l lun
l vn t ln hng u. Trong tin hc, khi m cc h chuyn gia
cha p ng cc vn cn gii quyt. ng thi vic cp nht s
thay i t nhin l vic rt tn km. Gii php t ra l cho cc my
tnh t ng hc v gii quyt cc vn da trn nhng d liu thc
t. Hc my (Machine learning) l mt nhnh quan trng ca tr tu
nhn to nghin cu cac phng php, k thut cho php my tnh c
th t ng hc d liu gii quyt mt vn c th no .
Trong qu trnh tip nhn tri thc ca con ngi. Phn loi
(Classification) l mt qu trnh t nhin gip cho vic tip nhn v tri
thc c th c h thng lu tr c th. C nhiu phng php phn
loi c nghin cu v c p dng. Hin nay, phng php phn
loi Support Vector Machines l mt trong nhng phng php mnh v
hiu qu gii quyt cc bi ton phn lp phi tuyn c Vapnik v
Chervonenkis gii thiu vo nm 1995.

LI CM N
Li u tin chng em xin c by t lng bit n su sc nht
ti ThS. Nguyn nh Hin, Khoa Khoa hc my tnh, i hc Cng ngh
thng tin HQG-HCM, ngi tn tnh hng dn nhm em kin thc
cn bn v thit yu hon thnh n.
Tip n, xin cm n cc bn lp CS110.E11 cng nhm trao
i v b sung kin thc h tr cho vic hon thin n.

TP. H Ch Minh, Ngy 04 thng 12 nm 2013

Nguyn Tr Hi Nguyn Hong Ngha

NHN XT CA GING VIN

MC LC
DANH MC HNH V ............................................................ B
CHNG I. TNG QUAN N ............................................. 1
1. Ni dung chung ...................................................... 1
2. Phn cng thc hin ............................................... 1
CHNG II. CC KHI NIM C BN ..................................... 2
1. Bi ton phn lp
v phng php Support Vector Machines. ................ 2
2. Phn lp tuyn tnh ................................................ 2
3. Ma trn GRAM ........................................................ 3
4. Khng gian c trng.............................................. 3
5. Hm ht nhn ........................................................ 4
6. Kt chng ............................................................ 6
CHNG III. PHNG PHP SUPPORT VECTOR MACHINES ....... 7
1. tng ................................................................. 7
2. C s l thuyt ....................................................... 8
3. Bi ton phn 2 lp vi SVM .................................... 8
4. Bi ton phn a lp vi SVM ................................... 13
5. Cc bc chnh ca phng php SVM ...................... 13
6. So snh v mt s ci tin ....................................... 13
CHNG IV. NG DNG ....................................................... 15
1. Mt s ng dng ca SVM ....................................... 15
2. M phng phng php SVM
qua biu din hnh hc bng Matlab .......................... 16
3. Xy dng chng trnh lc th rc ............................ 16
CHNG V. TNG KT .......................................................... 21
PH LC
Chy th m phng phng php SVM
qua biu din hnh hc bng Matlab .......................... 22
THAM KHO ........................................................................ 25
A

DANH MC HNH NH
Hnh 2.1 nh x t khng gian d liu X
sang khng gian c trng F ....................................... 4
Hinh 2.2 V d hm ht nhn ................................................... 4
Hnh 3.1 M t phng php SVM ............................................ 8
Hnh 3.2 Tp d liu c phn chia tuyn tnh ......................... 9
Hnh 3.3 Tp d liu phn chia tuyn tnh nhng c nhiu .......... 10
Hnh 3.4 Tp d liu khng phn chia tuyn tnh ....................... 11
Hnh 3.5 V d biu din tp d liu trn khng gian 2 chiu ....... 12
Hnh 4.1 Phn loi biu cm .................................................... 15
Hnh 4.2 Phn loi nh ........................................................... 15
Hnh 4.3 Tin x l email ........................................................ 17
Hnh 4.4 Trch xut thng tin email .......................................... 18
Hnh 4.5 Qu trnh hc d liu ................................................. 19
Hnh 4.6 Cc t mu c kh nng l spam ................................ 19
Hnh 4.7 Chy th chng trnh ............................................... 20

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

CHNG I
TNG QUAN N
1. Ni dung chung.
* Tn ti:
Phng php Support Vector Machines: L thuyt v ng dng.
* Ging vin hng dn:
ThS. Nguyn nh Hin.
* Sinh vin thc hin:
Nguyn Tr Hi, Nguyn Hong Ngha.
* Kho hc:
Kho 2011 (Ngy nhp hc: Thng 09/2011).
* Thng tin lin lc ca sinh vin:
STT Tn
MSSV
Email
1
Nguyn Tr Hi
11520094 11520094@gm.uit.edu.vn
2
Nguyn Hong Ngha 11520603 11520603@gm.uit.edu.vn
* Chng trnh, ng dng s dng:
- Chng trnh lp trnh: Mathworks Matlab R2012b.
- Mi trng lp trnh: Matlab.
2. Phn cng thc hin.
Phn cng

Thc hin
Tm kim, tng hp ti liu
C nhm
Ln ni dung cn lm cho ti, tm kim ti liu:
- Thi gian thc hin.
- Ni dung l thuyt, chng trnh, ng dng h tr.
Thc hin n
Nguyn Tr Hi
- Tm hiu, xy dng ni dung ng dng
Nguyn Hong Ngha - Tng hp, xy dng ni dung l thuyt.
C nhm
- Vit bo co, trnh by slide
- Sa li

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

CHNG II.
CC KHI NIM C BN
1. Bi ton phn lp v phng php Support Vector Machines.
Phn lp l qu trnh nhm cc i tng ging nhau vo mt lp da trn
cc c trng d liu ca chng. Khi nghin cu mt i tng, hin tng, chng ta
ch c th da vo mt s hu hn cc c trng ca chng. Ni cch khc, ta ch xem
xt biu din ca i tng, hin tng trong mt khng gian hu hn chiu, mi chiu
ng vi mt c trng c la chn.
Bi ton phn lp c th c m t nh sau:

Cho tp mu = {( , ) | ; ; = 1,
}
Tm nh x : , sao cho () 0, ( , ) , =
1,
Trong :
N: S mu
Xi: Mu d liu th i;
Ci: lp ca mu d liu th i.
nh x y c th hiu l mt m hnh cc quy tc, cc lut xc nh tng
i tng ang xt thuc v lp no da trn cc c trng ca chng. Trong thc t,
vic xc nh cc c trng phn lp b nh hng bi nhiu yu t gy kh khn
cho qu trnh phn lp nh: i tng c nhiu thuc tch, xc nh cc thuc tnh no
l cn thit, cc thuc tnh no l khng cn thit.
Bi ton phn lp s dng phng php Support vector machines (SVM) nhm mc
ch tm mt siu phng c bin cc i gia lp mu m v lp mu dng. ng thi
cc tiu ho cc mu khng phn chia c trong tp hun luyn. SVM a trn c s
ton hc vng chc. Tuy nhin, vic hun luyn mu s dng SVM i hi phi gii
bi ton ti u nhiu bin. Ban u, SVM c pht trin gii cc bi ton phn lp,
v sau do tnh u vit, n cn c s dng rng ri gii cc bi ton hi quy.
2. Phn lp tuyn tnh.
Vic phn lp nh phn c thc hin bng cch s dng hm gi tr thc

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

: l hm tuyn tnh, tng ng vi u ra {-1, 1}, c pht biu


nh sau:
u vo = (1 , 2 , , ) s c gn vo lp c nhn 1 nu () 0, trng
hp ngc li s c gn vo lp c nhn -1. Trng hp () l hm tuyn tnh ca
ta c th vit nh sau:
() = . +

= ( . 1 ) +
=1

Trong . biu th tch v hng


V mt hnh hc, cc phn t ca khng gian u vo X s c ri vo mt trong
hai phn c phn tch bi siu phng xc nh bng biu thc:
. + = 0
Trong : w l vector php tuyn ca siu phng, gi tr ngng b thay i c th
to ra cc siu phng song song vi nhau.
Vi mi mu d liu x cha xc nh s c phn chia thnh:
{

, > 0
,

3. Ma trn GRAM.
Cho tp {1 , 1 , , } cc vector trong khng gian tch v hng X, ma trn G kch
thc vi , = , c gi l ma trn GRAM.
c im quan trng ca ma trn GRAM l: cc d liu u vo cho cc chng
trnh tng hp hoc khi qut hon ton c th biu din thng qua ma trn GRAM.
4. Khng gian c trng.
S phc tp ca hm mc tiu dn n qu trnh hc ph thuc vo cch n c
din t. Khi din t d liu mt cch ph hp, vn hc s tr nn d dng. V vy,
mt vic lm rt ph bin trong hc my l chuyn i d liu t khng gian u vo
X sang khng gian c trng:
= (1 , 2 , , ) () = (1 (), , ())
Trong n l s chiu ca u vo (s thuc tnh) v N l s chiu ca khng gian
c trng. D liu s c chuyn vo khng gian c trng vi N > n.
3

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

Khng gian c trng k hiu l F:


= {()| }

Hnh 2.1 nh x t khng gian d liu X sang khng gian c trng F


5. Hm ht nhn.
a) Khi nim:
Mt hm ht nhn l mt hm K sao cho vi mi , , ta c:
(, ) = (). ()
y . l tch v hng trong khng gian c trng.
V d:
Xt php bin i d liu t khng gian u vo = 2 vo khng gian c
trng = 3 c cho bi:

Hinh 2.2 V d hm ht nhn


4

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

Khi ta chn x = (x1, x2) v z = (z1, z2) ta c:


2

x. z2 = ( , ) = (1 1 + 2 2 )2
=1

= 1 2 1 2 + 21 1 2 2 + 2 2 2 2
= (12 , 21 2 , 22 ). (12 , 21 2 , 22 )
= (). ()
Ta thy khi chn (, ) = (). () = x. z2 . V vy ta khng cn phi tnh
nh x .
b) Hm ht nhn trong my hc tuyn tnh:
My hc khng tuyn tnh trn khng gian u vo c xy dng qua hai bc:
trc tin s dng mt nh x khng tuyn tnh chuyn i d liu vo khng gian
c trng v sau s dng my hc phn lp tuyn tnh trong khng gian c trng.
My hc tuyn tnh trong khng gian c trng tng ng vi hm:

() = . () +
=1

Chng ta khng cn xc nh tng minh trng s w, khi trin khai tip bng cch
a vo vector = =1 . , ta c:

() = . . ( ). (x) +
=1

Hn na, cng khng cn xy dng tng minh nh x , nh s dng hm ht


nhn nn:

() = ( , ) +
=1

t SV l tp cc ch s i tho mn 0, ta c:

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

() = ( , ) +

Cc ng vi 0 c gi l cc vect tr gip (Support vector), hm phn


lp s c xc nh duy nht qua cc vector ny.
c) Mt s hm ht nhn:
(, ) = .
(, ) = (1 + . ) , vi d l tham s do ngi dng nh ngha.
(, ) =

1.
1.

, vi l tham s do ngi dng nh ngha

, 1 < . < 1
(, ) =

1
1.

, 1 < . < 1

(, ) = exp( | |2 ), vi do ngi dng nh ngha.


6. Kt chng.
Chng 1 gii thiu v bi ton phn lp, cc khi nim v phn lp tuyn tnh,
ma trn GRAM. V quan trng v khng gian c trng v hm ht nhn, kh nng
biu th ca hm ht nhn trong khng gian c trng.
y l cc khi nim c bn nht c lin quan n bi ton phn lp v phng
php SVM p dng cho vic gii quyt bi ton phn lp.

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

CHNG III.
PHNG PHP SUPPORT VECTOR MACHINES
Trong thi i cng ngh thng tin hin nay, s pht trin ca cng ngh ko theo
s gia tng rt ln ca lu lng thng tin lu tr v trao i. Do , yu cu v t chc
lu tr v truy cp thng tin sao cho hiu qu c t ln hng u. Hng gii quyt
c a ra l t chc, tm kim v phn loi thng tin mt cch hiu qu. Bn thn con
ngi trong i sng cng tip nhn th gii xung quanh thng qua s phn loi v t chc
ghi nh tri thc mt cch hiu qu. Phn loi thng qua cc lp v m t cc lp gip cho
tri thc c nh dng v lu tr trong .
Support Vector Machines (SVM) l mt phng php phn loi xut pht t l
thuyt hc thng k, da trn nguyn tc ti thiu ri ro cu trc (Structural Risk
Minimisation). SVM s c gng tm cch phn loi d liu sao cho c li xy ra trn tp
kim tra l nh nht (Test Error Minimisation). y l mt phng php mi trong lnh
vc tr tu nhn to. Vo thi k u khi SVM xut hin, kh nng tnh ton ca my tnh
cn rt hn ch, nn phng php SVM khng c lu tm. Tuy nhin, t nm 1995 tr
li y, cc thut ton s dng cho SVM pht trin rt nhanh, cng vi kh nng tnh ton
mnh m ca my tnh, c c nhng ng dng rt to ln.
1. tng.
Cho trc mt tp hun luyn, c biu din trong khng gian vector, trong
mi ti liu l mt im, phng php ny tm ra mt siu phng f quyt nh tt nht
c th chia cc im trn khng gian ny thnh hai lp ring bit tng ng l lp +
v lp . Cht lng ca siu phng ny c quyt nh bi khong cch (gi l
bin) ca im d liu gn nht ca mi lp n mt phng ny. Khi , khong cch
bin cng ln th mt phng quyt nh cng tt, ng thi vic phn loi cng chnh
xc.
tng ca n l nh x (tuyn tnh hoc phi tuyn) d liu vo khng gian cc
vector c trng (space of feature vectors) m mt siu phng ti u c tm ra
tch d liu thuc hai lp khc nhau.
Mc ch ca phng php SVM l tm c khong cch bin ln nht:

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

Hnh 3.1 M t phng php SVM


ng t m l siu phng tt nht v cc im c bao bi hnh ch nht l
nhng im gn siu phng nht, chng c gi l cc vector h tr (support vector).
Cc ng nt t m cc support vector nm trn c gi l l (margin).
2. C s l thuyt.
SVM thc cht l mt bi ton ti u, mc tiu ca thut ton ny l tm c mt
khng gian F v siu phng quyt nh f trn F sao cho sai s phn loi l thp nht.
Cho tp mu = {(1 , 1 ), (1 , 1 ), , ( , )} vi , thuc vo hai lp
nhn {1, 1} l nhn lp tng ng ca cc (-1 biu th lp I, 1 biu th lp II).
Ta c, phng trnh siu phng cha vector
trong khng gian:
.

+ = 0
t (
) = (
.
+ ) = {

+1,
.
+ > 0
1,
.
+ < 0

Nh vy, (
) biu din s phn lp ca
vo hai lp nh nu.
Ta ni = +1 nu
thuc lp I v = 1 nu
thuc lp II.
3. Bi ton phn 2 lp vi SVM.
Bi ton t ra l: Xc nh hm phn lp phn lp cc mut trong tng lai,
ngha l vi mt mu d liu mi th cn phi xc nh c phn vo lp +1 hay
lp 1.
Ta xt 3 trng hp, mi trng hp s c 1 bi ton ti u, gii c bi ton ti
u ta s tm c siu phng cn tm.

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

Trng hp 1:
Tp D c th phn chia tuyn tnh c m khng c nhiu (tt c cc im c
gn nhn +1 thuc v pha dng ca siu phng, tt c cc im c gn nhn -1
thuc v pha m ca siu phng).

Hnh 3.2 Tp d liu c phn chia tuyn tnh


Ta s tm siu phng tch vi l vector trng s, l h s t do, sao
cho:
+1, = +1
t ( ) = ( . + ) = {
( , )
1, = 1
Lc ny ta cn gii bi ton ti u:
1
2
{
2
( . + ) 1, = 1, ,
(()) =

Trng hp 2:
Tp d liu D c th phn chia tuyn tnh c nhng c nhiu. Trong trng hp
ny, hu ht cc im u c phn chia ng bi siu phng. Tuy nhin c 1 s im
b nhiu, ngha l: im c nhn dng nhng li thuc pha m ca siu phng, im
c nhn m nhng li thuc pha dng ca siu phng.

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

Hnh 3.3 Tp d liu phn chia tuyn tnh nhng c nhiu


Trong trng hp ny, ta s dng 1 bin mm 0 sao cho: ( . + )
1 , = 1, ,
Bi ton ti u tr thnh:

1
((, )) = 2 +
2
=1

( . + ) 1 , = 1, ,
{
0, = 1, ,
Trong C l tham s xc nh trc, nh ngha gi tr rng buc, C cng ln th
mc vi phm i vi nhng li thc nghim (l li xy ra lc hun luyn, tnh bng
thng s ca s phn t li v tng s phn t hun luyn) cng cao.
Trng hp 3:
Tp d liu D khng th phn chia tuyn tnh c, ta s nh x cc vector d liu
x t khng gian n chiu vo mt khng gian m chiu (m>n), sao cho trong khng gian
m chiu, D c th phn chia tuyn tnh c.

10

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

Hnh 3.4 Tp d liu khng phn chia tuyn tnh.


Gi l mt nh x phi tuyn t khng gian vo khng gian .
:
Bi ton ti u tr thnh:

1
((, )) = 2 +
2
=1

(( ). + ) 1 , = 1, ,
{
0, = 1, ,
V d:
d hiu hn chng ta xt v d m t hnh hc sau: Xt trong khng gian
2 chiu (n=2), tp d liu c cho bi tp cc im trn mt phng.

11

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

Hnh 3.5 V d biu din tp d liu trn khng gian 2 chiu


By gi ta tin hnh tm siu phng phn lp da trn phng php SVM (1). Ta s
tm 2 siu phng song song (nt t trong hnh ) sao cho khong cch gia chng l
ln nht c th phn tch lp ny thnh 2 pha (Ta gi l 2 siu phng phn tch).
Siu phng (1) nm gia 2 siu phng trn (nt m trong hnh).
Hnh trn cho ta tp d liu c th phn tch tuyn tnh. By gi ta xt trng hp
tp d liu khng th phn tch tuyn tnh. By gi ta s x l bng cch nh x tp d
liu cho vo mt khng gian mi c s chiu ln hn khng gian c (Gi l khng
gian c trng) m trong khng gian ny tp d liu c th phn tch tuyn tnh. Trong
khng gian c trng ta s tip tc tm 2 siu phng phn tch nh trng hp ban u.
Cc im nm trn 2 siu phng phn tch gi l cc vector h tr (Support vector).
Cc im ny quyt nh hm phn tch d liu. T y, chng ta c th thy phng
php SVM khng ph thuc vo cc mu d liu ban u, m ch ph thuc vo cc
suport vector (quyt nh 2 siu phng phn tch). Cho d cc im khc b xo th
thut ton vn cho ra cc kt qu tng t. y chnh l im ni bt ca phng php
SVM so vi cc phng php khc do cc im trong tp d liu u c dng ti
u kt qu.
Trn thc t ngi ta s khng xc nh c th nh x m dng hm ht nhn nh
trnh by mc 5 chng 1, nhm y nhanh tc tnh ton ng thi m bo d
liu s gn nh phn tch tuyn tnh. Thng thng ngi ta thng s dng cc hm
ht nhn c sn.
12

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

4. Bi ton phn a lp vi SVM.


phn a lp th k thut SVM s chia khng gian d liu thnh 2 phn v tip
tc vi khng gian c phn chia. Khi hm quyt nh phn d liu vo lp th
i s l:
() = +
Nhng phn t x l support vector nu tho iu kin:
1,
() = {
1,
Gi s bi ton phn loi k lp ( 2), ta s tin hnh ( 1)/2 ln phn lp
nh phn s dng phng php SVM. Mi lp s tin hnh phn tch vi k-1 lp cn
li xc nh k-1 hm phn tch (chin lc mt-i-mt (one-against-one).
K thut phn a lp bng phng php SVM hin vn ang c tip tc nghin
cu v pht trin.
5. Cc bc chnh ca phng php SVM.
-

Tin x l d liu: Phng php SVM yu cu d liu c din t nh cc


vector ca cc s thc. Nh vy nu u vo cha phi l s thc th ta cn phi
tm cch chuyn chng v dng s ca SVM. Trnh cc s qu ln, thng nn
co gin d liu chuyn v on [-1,1] hoc [0,1].
Chn hm ht nhn: Cn chn hm ht nhn ph hp tng ng cho tng bi
ton ton c th t c chnh xc cao trong qu trnh phn lp.
Thc hin vic kim tra cho xc nh cc tham s cho ng dng.
S dng cc tham s cho vic hun luyn tp mu.
Kim th tp d liu Test.

6. So snh v mt s ci tin.
Mt s phng php nh mng neuron, fuzzy logic, mng fuzzy-neuron, , cng
c s dng thnh cng gii quyt bi ton phn lp. u im ca cc phng
php ny l khng cn xc nh m hnh ton i ca i tng (Gii quyt tt vi cc
h thng ln v phc tp).
SVM c 2 c trng c bn:
- N lun kt hp vi cc d liu c ngha v mt vt l, do vy d dng gii
thch c mt cch tng minh.
- Cn mt tp cc mu hun luyn rt nh.

13

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

Phng php SVM hin nay c xem l mt cng c mnh v tinh vi nht hin
nay cho nhng bi ton phn lp phi tuyn. N c m s bin th nh C SVC, v
SVC. Ci tin mi nht hin nay ca phng php SVM c cng b l thut ton
NNSRM (Nearest Neighbor and Structural Risk Minimization) l s kt hp gia 2 k
thut SVM v Nearest Neighbor.

14

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

CHNG IV.
NG DNG
1. Mt s ng dng ca SVM.
SVM v cc bin th, ci tin ca n hin nay c rt nhiu ng dng thit thc vo
gii quyt cc vn thc t. Mt s ng dng ni bt ca n l:
-

Chun on virus my tnh: Gi s c mt tp cc i tng cn chun on


{1 , 2 , , } trong khng gian . Da vo cc c trng ca cc , ta s p
dng phng php SVM phn vo lp I: C th nhim virus V hoc lp II:
Khng nhim virus V.
Lc th rc (Mail Classification): Vi mi vn bn, ta s trch, chn c trng
v vector ho n thnh mt vector n chiu = {1 , 2 , , } c th biu din
dng d liu SVM c. Sau ta tin hnh cho chng trnh hc thng qua
mt tp d liu vo dng ( , ) vi {1, 1}. Ta s xy dng mt siu
phng bng phng php SVM trn tp d liu mu. T vi mi mu vn
bn mi, ta s vector ho n v so snh du ca n so vi siu phng tm c
phn loi.
Ngoi ra, phng php SVM cho bi ton phn a lp cn c ng dng cho
bi ton phn loi nh (Image classification) hay phn loi biu cm trn khun
mt (Facial expression classification).

Hnh 4.1 Phn loi biu cm

Hnh 4.2 Phn loi nh


15

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

2. M phng phng php SVM qua biu din hnh hc bng Matlab.
Cc tp tin trong project Matlab:
setup.m : M phng qu trnh phn lp bng SVM
data1.mat : Tp d liu mu 1
data2.mat : Tp d liu mu 2
data3.mat : Tp d liu mu 3
svmTrain.m : Hm o to (training) bng SVM
svmPredict.m : Hm d on (prediction) bng SVM
plotData.m : Biu d liu 2 chiu
visualizeBoundaryLinear.m : Biu ng tuyn tnh
visualizeBoundary.m : Biu ng phi tuyn
linearKernel.m : Hm tuyn tnh cho SVM
gaussianKernel.m: Hm Gaussian cho SVM
dataset3Params.m: Tham s s dng cho tp d liu mu 3
Chng m phng cc tp d liu mu ln th 2 chiu v tm siu phng vi
cc d liu cho trc v hm ht nhn Gaussian (Gaussian Kernel):
2

( (),

() )

()

() 2

=1( )
() ()
) = (
)
= exp (
2 2
2 2

3. Xy dng chng trnh lc th rc.


Cc tp tin s dng trong project Matlab:
setup_spam: Qu trnh phn loi email spam
spamTrain.mat B training spam
spamTest.mat B training test
emailSample.txt - Mu email 1
emailSample2.txt Mu email 2
spamSample.txt Mu email spam 1
spamSample2.txt Mu email spam 2
16

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

vocab.txt Danh sch t vng


getVocabList.m M danh sch t vng
porterStemmer.m Hm Stemming
readFile.m c file
processEmail.m Tin x l email
emailFeatures.m Rt trch t trng t email
Cc bc thc hin trong bi ton phn loi Email:
Bc 1. Tin x l email
Bc 2. Trch xut c trng email
Bc 3. Hun luyn tuyn tnh bng SVM
Bc 4. Kim tra phn lp Spam Email
Bc 5. Chn ra nhng t c trong mail c kh nng cao l spam
Bc 6. a email v v kim tra
Chi tit cc bc nh sau:
Bc 1. Tin x l email :
Da theo th vin t c (vocab.txt), xc nh cc t trong email c trong th vin
t hay khng.

Hnh 4.3 Tin x l email

17

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

Bc 2. Trch xut thng tin email:


Chng trnh xut ra chiu di ca cc vect c trng v s cc u vo khc
khng.
0

1
0
Mt c trng ca email c dng: = .
1
0

[0 ]

Hnh 4.4 Trch xut thng tin email


Kt qu:
+ Chiu di ca cc vect c trng : 1899
+ S cc u vo khc khng: 45
Bc 3, 4. Qu trnh training da theo tp d liu spamTrain.mat v kim tra qu
trnh training bng d liu kim tra spamTest.mat
Qu trnh hun luyn vi SVM v chuyn d liu thnh cc vect thng qua hm
svmTrain.m. Sau tnh chnh xc ca qu trnh hun luyn thng qua hm
svmPredict.m, cng nh tnh chnh xc ca qu trnh kim tra (Dng d liu
spamTest.mat kim tra)

18

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

Hnh 4.5 Qu trnh hc d liu


Kt qu:
+ chnh xc ca qu trnh hun luyn : 99.85%
+ chnh xc ca qu trnh kim tra: 98.5%
Bc 5. Chn ra nhng t c trong mail c kh nng cao l spam
Chng trnh a ra cc t (word) c th l spam trong mail.

Hnh 4.6 Cc t mu c kh nng l spam

19

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

Bc 6. a email v v kim tra

Hnh 4.7 Chy th chng trnh


Chng trnh a vo 2 email spamSample2.txt v emailSample2.txt v cho kt qu
ng nh c tnh ca mail . (Kt qu: 1 l spam, 0 l khng spam).

20

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

CHNG V.
TNG KT
Phng php Support Vector Machines hin l mt trong cc phng php tinh vi
nht cho bi ton phn lp phi tuyn. M rng v cc ng dng ca n ang c tip
tc nghin cu thm v pht trin lm tng tnh hiu qu.
n nhm cung cp nhng khi nim c bn, ni dung cng nh cc bc thc
hin ca phng php Support Vector Machines n gin. ng thi cng cung cp
thm mt s thng tin v cc m rng ca phng php ny. Phn ng dng gii thiu
mt cc ng dng ca phng php c th p dng v xy dng li chng trnh lc
th rc trn Matlab.

21

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

PH LC
* Chy th m phng phng php SVM qua biu din hnh hc bng Matlab.
Tp d liu mu 1:
Chy chng trnh bng cch g dng lnh : setup

Chng trnh s m d liu mu 1 v hin th ln s

Tip tc nhn Enter, qu trnh training s din ra v cho ra kt qu:

22

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

Kt qu nh gi vi hm ht nhn Gaussian cho tp d liu mu 1:


sigma = 0.3247
Hm ht nhn Gaussian gia x1 = [1; 2; 1], x2 = [0; 4; -1], sigma = 0.5 :
0.324652
Tp d liu mu 2:

Qu trnh training din ra vi hm ht nhn Gaussian, kt qu:

23

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

Tp d liu mu 3:

Qu trnh training din ra vi hm ht nhn Gaussian, kt qu:

24

GVHD: Th.s Nguyn nh Hin

Bo co nhp mn cng ngh tri thc v my hc

TI LIU THAM KHO


[1] Thi Sn: Lun vn thc s khoa hc: K thut Support Vector Machines v ng dng.
Nghnh ton tin ng dng: i hc Bch khoa H Ni, 2006.
[2] PGS.TS V Thnh Nguyn, Thi Minh Nguyn: Mt s ci tin ca bi ton phn lp
vn bn s dng thut ton SVM v p dng trong phn tch ting Vit. i hc Cng ngh
thng tin HQG HCM, 2011.
[3] SVM spam classification: https:\\githug.com\.

25

You might also like