You are on page 1of 11

K yu Hi ngh Quc gia ln th VIII v Nghin cu c bn v ng dng Cng ngh thng tin (FAIR); H Ni, ngy 9-10/7/2015

H THNG GI S DNG THUT TON TI U BY N


Phm Minh Chun1, L Thanh Hng2, Trn nh Khang2, Nguyn Vn Hu1
1
Khoa Cng ngh Thng tin, i hc SPKT Hng Yn
2
Vin CNTT & TT, i hc Bch khoa H Ni
chuanpm@gmail.com, huonglt@soict.hust.edu.vn, khangtd@soict.hust.edu.vn, nvhau66@gmail.com

TM TT - K thut lc cng tc (Collaborative Filtering - CF) l mt k thut gi ph bin nht c s dng nhiu
trong cc h thng gi c tch hp trong cc website thng mi in t (chng hn nh amazon.com, barnesandnoble.com,
Yahoo! news, TripAdvisor.com). K thut CF da trn gi thit rng nhng ngi dng (user) c cng s thch th s quan tm mt
tp item tng t. Phng php phn cm lc cng tc (Iterative Clustered CF - ICCF) v lp cng tc ti u trng s s dng
thut ton PSO (PSO-Feature Weighted) th hin tnh hiu qu cho h gi m gi tr nh gi thuc trong tp {1, 2,, 5}. Tuy
nhin, cc k thut khng th trc tip p dng cho cc h thng gi trong thc t m gi tr nh gi trong tp {0, 1}. Do vy,
bi bo ny xut vic ci tin hai phng php ICCF v PSOFeature Weighted c th p dng c cho cc h gi m gi
tr nh gi thuc tp {0, 1}. Kt qu thc nghim ca hai phng php m chng ti a ra p dng trn b d liu h gi cng
vic cho thy chnh xc m hnh d on c ci thin r rt so vi phng php CF truyn thng ng thi cng gii quyt c
vn d liu tha m phng php CF thng gp phi.
T kha - H thng gi , k thut lc cng tc da trn Item, k thut lc cng tc da trn User, phn cm lc cng tc,
ti u trng s lc cng tc, thut ton ti u by n, phn cm Spectral, thut ton k-mean.

I. GII THIU
H thng gi [1, 2] phn tch thng tin v s thch ca user v cc item cung cp cc khuyn ngh i vi
cc item m ph hp nht vi mong mun v s thch ca ngi dng. Trn thc t, h thng gi c gng thu thp
nhng s thch ca user v m hnh ha s tng tc gia user v item.
Trong k thut lc cng tc (Collaborative Filtering CF), vic a ra nhng khuyn ngh v cc item i vi
user c xc nh da trn nhng quan im ca nhng user c cng s thch vi user . H thng lc cng tc biu
din user da trn nhng nh gi ca h i vi tp cc item. H thng s la chn nhng user cng s thch ty
thuc vo o tng t hoc tng quan. Sau , a ra nhng d on i vi nhng item cha tng c user
nh gi hoc quan tm. Cui cng h thng s gi nhng item no vi mc d on cao nht cho user mc tiu.
K thut CF c khng nh s thnh cng bi rt nhiu nghin cu v thc nghim trong nhiu ng dng thc t
[2, 3, 4].
Nhn chung, cht lng ca h thng gi cng tc c th c tng cng bng cch ci thin o tng t
v vic la chn tp lng ging. Mt s hn ch chnh ca CF nh l vn d liu tha, kh nng m rng v thiu
d liu [5, 6] c nh hng ln n cht lng gi . Mc d c nhiu nh nghin cu c gng gii quyt vn
ny, k thut lc cng tc vn cn c ci tin nhiu hn ci thin chnh xc m hnh gi . V k thut lc
cng tc da trn nhng quan im ca tp lng ging nhng user c cng s thch vi user mc tiu, nn iu quan
trng l phi chn tp lng ging chnh xc. o mc tng t cng c ci thin, th vic la chn lng ging
cng chnh xc v gi cng ng n hn.
Hin ti, c nhiu phng php c xut ci thin o tng t, nhng phng thc bao gm o
PIP (Proximity-Impact-Popularity)[7], tng t Union [8], Random walk counting [9], tng t da trn phn
lp user (usersclass similarity) [10], v th tc lp message passing [11]; nhng phng php ny u c im mnh
ring v h tr cc tnh hung khc nhau. Tuy nhin, a s cc phng php u tp trung trn mt vn c th v b
nh hng bi mt vi hn ch. Chng hn, PIP xut gii quyt vn cold-start nhng li b hn ch bi vic tnh
ton tng t lc cng tc da trn user truyn thng. tng t da trn phn lp user ch s dng c trong
trng hp lp thng tin c sn v khng nhn c kt qu ngha i vi tp d liu ln cng nh vic cp nht
o tng t c thc hin nhiu ln. Union c s dng vi d liu tha nhng khng c kh nng m rng.
Trong [12] mt h thng gi phn cm lp cng tc (Iterative Clustered CF - ICCF) c xut. Trong ,
phng php phn cm spectral c s dng lp li trong c hai hng tip cn lc cng tc da trn user (user-
based) v da trn item (item-based) d on nhng nh gi cha bit. V th, ICCF thnh cng trong vic gii
quyt vn d liu tha v coldstart. Tuy nhin tt c user v item u c mc nh hng nh nhau khi tnh ton
tng t trong khi o tng t cn phn nh mc quan trng ca cc c trng khc nhau.
Mt s nghin cu a ra s ci thin chnh xc khi nhng c trng c gn trng s trong trng hp
tnh ton khong cch [13]. Trong CF, phng php trng s c trng gn mt trng s n mi mt c trng (user
hoc item) o mc quan trng ca c trng nh th no trong ton b tng t. Breese v cng s [14]
phng theo tng ca tn s vn bn ngc (inverse document frequency) gn trng s c trng trong CF.
tng chnh ca tip cn ny, gi l tn s user ngc (inverse user freqency), l nhng item ph bin th khng
H THNG GI S DNG THUT TON TI U BY N 287

cung cp nhiu thng tin v s thch thc s ca user. V th trng s ca nhng item ph bin c nh gi cn phi
gim. Cng vi tng gim trng s i vi nhng item ph bin c nhiu ngi bit n c thc hin bng
cch s dng phng sai trng s [15]. y, nhng item c phng sai ln hn s h tr tt trong vic phn bit s
thch ca user, do , n s nhn trng s ln hn. Tuy nhin, cc tc gi cng cho rng phng php ny cng gim
ng k trung bnh li tuyt i (Mean Squared Error-MAE) so vi trng hp khng gn trng s. Yu v cng s
[16], gii thiu tip cn l thuyt thng tin i vi gn trng s c trng. H cho rng nhng thng tin c tc ng
qua li s nhn c kt qu tt hn. Tc gi trong [17] cng biu din mt lc gn trng s t ng khc i vi
h thng gi CF. Phng php ny c gng tm nhng trng s gn vi cc item khc nhau lm cho mi user
gn hn vi nhng ngi lng ging ca h v xa hn vi nhng ngi khng tng ng. Phng php s dng
tng tip cn da trn m hnh (modelbased approaches) v lm gim trung bnh li tuyt i. Ngoi ra, S. H. Min
v I. Han [18] xut m hnh GA-CF ging nh lc trng s c trng trong k thut lc cng tc da trn
user truyn thng.
Bn cnh , tt c cc phng php gn trng s c trng, c xut n nay c gng nng cao vic tnh
ton mc tng ng m khng xem xt n nhng hn ch ca CF, yu t nh hng nhiu n hiu nng ca h
thng gi .
T quan im ton hc, trng s c trng c th xem nh mt vn ti u khng li phi tuyn vi ti thiu
a cc b a phng (multi local) [19]. K thut ti u by n (Particle Swarm Optimization - PSO) c th tm ra gi
tr ti u ton cc vi thit lp iu kin khi to n gin. V n ch s dng cc php ton nguyn thy do tit
kim chi ph tnh ton v b nh lu tr v tc x l [20].
Phng php PSO-Feature Weighted do Abdelwahab v cng s [27] xut s dng thut ton PSO tm ra
mt b trng s ti u U (i din cho mc quan trng ca user) v I (i din cho mc quan trng ca item)
trong vic tnh ton mc tng ng gia cc user v gia cc item, iu ny quyt nh ln n mc chnh xc
ca m hnh d on. Phng php ny lm tng cng qu trnh phn cm theo user v item khng ch da trn thng
tin phn hi tng minh ca user c biu din qua ma trn nh gi Rmxn (m- s lng user, n- s lng item), m
cn s dng cc trng s th hin cho mc quan trng ca mi user v item. V vy, n ci thin c tp lng
ging. Ngoi ra, m hnh d on c lp li nhiu ln ngoi suy cc gi tr nh gi cha bit trong ma trn R, kt
qu ngoi suy bc trc c s dng lm d liu u vo cho bc tip theo cho n khi nhn c ma trn R ti u
dy hn v t gip nng cao chnh xc cho m hnh gi .
Hn ch ng k ca hai phng php ICCF v PSO-Feature Weighted l chng khng p dng c vi h gi
m ma trn nh gi R ch nhn gi tr nh phn, chng hn nh trong h thng gi vic lm th ngi xin vic s
la chn nhng cng vic ng tuyn, hoc trong h thng gi bi bo khoa hc th ngi dng s la chn cc bi
bo quan tm vo trong th vin ring ca h. V vy, trong bi bo ny chng ti s xut ci tin cho hai phng
php ICCF v PSO-Feature Weighted nhm p dng i vi bi ton gi m ma trn nh gi R nhn gi tr dng nh
phn, ng thi chng ti cng iu chnh cch thc c lng gi tr rij cha bit trong ma trn R ph hp vi bi
ton gi cng vic; ngoi ra, chng ti tin hnh xc nh cc trng s khi lai ghp tuyn tnh phng php lc cng
tc da trn user v da trn item (IF v UF) cng vi qu trnh tm ra b trng s ti u i din cho mc quan
trng ca cc user v item trong vic tnh ton tng ng (U v I) vi mong mun khai thc hiu qu phng
php lai ghp gia k thut lc cng tc da trn user v da trn item nhm nng cao cht lng ca h thng gi .
II. GII THIU CC KIN THC LIN QUAN
2.1. Phng php phn cm Spectral
Phng php ICCF ngoi suy ra cc nh gi cha bit trong ma trn R thng qua qu trnh lp. Trong k thut
ny, m hnh d on s dng phng php phn cm spectral [22] trong c hai hng tip cn lc cng tc da trn
user v da trn item [21] v phng php phn cm spectral c thc hin theo th tc sau y:
Bc 1: Tnh tng ng gia cc user v gia cc item.

1
2
Trong l cc vc t tng ng vi hng th i v j trong ma trn R i din cho user i, j khi tnh
tng ng gia user i, j (v khi tnh tng ng gia hai item i, j th tng ng vi ct th i v j trong ma trn R)
v l tham s iu chnh ln ca tp lng ging. Nu nh s thu c mt cu hnh a phng tt hn i vi
tp lng ging. Tuy nhin nu qu nh th cc im s b phn tch (xa nhau). Do , gi tr thch hp nht ca
c tnh theo cng thc sau [23]

1
, 2

Trong d l khong gia v n l s cc user hoc s cc item


288
2 Phm Minh Chun,
C L Thanhh Hng, Trn nnh Khang, Nguy
yn Vn Hu

Bc
B 2: Ma trrn D (diagonnal degree maatrix) l ma trn ng cho chnh, trongg cc phnn t c tnh
h ton theo
cng
c thc d
i y:

Bc
B n cm Spectral xy dng m
3: Da trn ma trn ttng t S, thhut ton phn mt th tnng ng v tn
nh ton ma
trrn laplace L tng
t ng caa n nh sau:
4

Bc
B 4: Sau khi
k tnh ton mma trn L, thut ton phn c m Spectral s da trn k vc t ring (v1, v2,, vk) n ng vi k tr
ring
r ln nht u tin tha mn biu thc (5)
5
nxk
Bc
B 5: Xy dng
d mt ma ttrn mi VR vi cc vcc t ring (v1, v2,, vk) tnng ng vi cc ct ca ma trn t
Bc
B 6: Gi yiRk tng nng l cc vc tt hng ca mam trn V
Bc
B 7: S dng phng phhp phn cmm k-means phn p cc im
m (yi)i =1, 2,n vo k cm (C1, C2,, Ck)
Bc
B 8: Gn cc
c im d liiu ban u (xxi)i=1,2,n theo cc
c cm Cj t ng ng vi cm ca cc iim (yi)i=1,2,n.
2.2.
2 Phng php
p ti u by n (Partiicle Swarm Optimization
O - PSO)
Phngg php ti u by n l mt trong nhn ng thut ton xy
x dng da trn khi nim m tr tu by n tm
kim
k c cc bi toon ti u haa trn mt khng gian tm kim no .. Phng phpp ti u by n
li gii cho l mt
dng
d ca cc thut
t ton tinn ha qun th, vi s tng
g tc gia cc c th trong m
mt qun th khm ph mt khng
gian
g tm kim.. PSO l kt qqu ca s m hnh ha vicc n chim baay i tm kim m thc n cho nnn n thng
g c xp
vo
v cc loi thhut ton c s dng tr tu by n, v nm 1995 ti mt hi nngh ca IEEE bi James
c gii thiu vo
Kennedy
K v Russell
R C. Eberrhart [20]. Thuut ton ny
c p d ng thnh cngg trong nhiu lnh vc. Mtt ng dng
hiu
h qu v sinnh hc p dnng PSO c ttrnh by trong g [25].
PSO [226] c khi to bng mtt nhm c th ngu nhin v sau tm nnghim ti uu bng cch c p nht cc
h h. Trong mi th h, m
th mi c th c cp nht th heo hai v tr tt
t nht. Gi ttr th nht l v tr tt nhtt m n
t
ng t c cho ti thi im hin ti, gi l Pbest. MtM nghim tii u khc m c th ny bm theo l ngh him ti u
ton cc Gbest, l v tr tt nht trong ttt c qu trnh tm kim c qun th t ttrc ti thi im hin tii. Ni cch
khc,
k t trong qunn th cp nhtt v tr ca n theo v tr ttt nht ca n v ca c qun th tnh tii thi im
mi c th
hin
h ti (Hnh 1)

Hnh 1. S mt im
m tm kim bng
g phng php PSO
Trong :

k
Xi : V tr
t c th th i ti th h th k
Vik: Vnn tc c th thh i ti th h tth k
Xik+1: V
V tr c th th i ti th h thh k+1
Vik+1: Vn
V tc c th th i ti th hh th k+1
Pbest: V
V tr tt nht ca c th th i
Gbest: V
V tr tt nht trrong qun th
Vn tcc v v tr ca c th trong qqun th c tnh nh sau:
6

7
H THNG GI S DNG THUT TON TI U BY N 289

Trong :
: l h s qun tnh
c1, c2: Cc h s gia tc, nhn gi tr t 1.5 n 2.5
r1, r2: Cc s ngu nhin nhn gi tr trong khong [0, 1]
Gi tr ca trng s qun tnh s gim tuyn tnh t 1 n 0 ty thuc vo s ln lp xc nh trc.
Cc nh nghin cu tm ra gi tr ca ln cho php cc c th thc hin m rng phm vi tm kim, gi tr
ca nh lm tng s thay i nhn c gi tr ti u a phng. Bi vy, ngi ta nhn thy rng hiu nng
tt nht c th t c khi s dng gi tr ln (chng hn 0.9) thi im bt u v sau gim dn dn cho n
khi a ra c gi tr khc nh ca .
Thut ton PSO
1. Khi to qun th:
(a) Thit lp cc hng s: kmax, c1, c2.
(b) Khi to ngu nhin v tr c th x0i thuc min D trong tp IRn vi i = 1, 2, ..., s.
(c) Khi to ngu nhin vn tc c th : 0 v0i v0max vi i = 1, ..., s.
(d) t k = 1;
2. Ti u ha:
(a) nh gi hm fki bng ta ca xki tnh ton c trong khng gian tm kim.
(b) Nu fki<fbesti th fbesti = fki v pki = xki
(c) Nu fki<fbestg th fbestg = fki v pkg = xki
(d) Nu tha mn tiu chun hi t th dng li ri thc hin bc 3.
(e) Cp nht tt c cc vn tc vki v v tr xki
(f) Tng i. Nu i > s th t i = 1, tng k
(g) Quay tr li t bc 2(a).
3. Kt thc.
Vi kmaxl s ln lp ti a.
2.3. Phng php lp cng tc ti u trng s da trn thut ton PSO
Mc ch chnh ca phng php PSOFeature Weighted l nhm gii quyt vn d liu tha v thiu d
liu i vi phng php lc cng tc truyn thng. Nhng trng s tng ng vi user v item c xc nh bng
cch s dng thut ton ti u by n, nhng trng s ny cho bit tm quan trng ca mi user v mi item khi tnh
ton tng ng s dng trong qu trnh gi . Nhng trng s ti u c s dng tng cng o tng
ng gia nhng user v gia nhng item v n s ci thin ng k qu trnh la chn lng ging trong bi ton phn
cm. Phng php ny tch hp vic ti u cc trng s trong thut ton phn cm lp cng tc nng cao
chnh xc ca h thng gi .
Kt qu thc nghim khi p dng phng php ny trong h thng gi (s dng d liu MovieLens v Book -
crossing) cho thy cht lng gi c ci thin ng k so vi cc phng php hin ti, ng thi cng khc
phc mt s hn ch ca nhng phng php ny.
III. CI TIN PHNG PHP ICCF V PSO-FEATURE WEIGHTED
P DNG CHO H THNG GI CNG VIC
Phng php PSO-Feature Weighted da trn phng php lc cng tc, do vy, khi p dng phng php ny
i vi bi ton gi cng vic chng ti ch quan tm n thng tin lin quan n ngi dng tng ng tuyn cng
vic no apply(useid, jobid). Trong ma trn nh gi Rmxn=(rij)mxn th rij = 1 khi ngi dng i tng ng tuyn cng
vic j, vi m, n ln lt l s lng ngi dng v cng vic, nh vy mi ngi dng c biu din thng qua mt
vc t hng ca ma trn R, mi cng vic c biu din thng quan mt vc t ct ca ma trn R. Trong phn ny
chng ti trnh by s ci tin hai phng php ICCF v PSO-Feature Weighted c th p dng cho bi ton gi
cng vic.
3.1. Phng php ICCF ci tin (ICCF-Improved)

Phng php ICCF do Abdelwahab v cng s xut s dng qu trnh lp ni suy ra nhng gi tr
rij {1, 2, 3, 4, 5} cha bit trong ma trn R vi mc ch gii quyt vn d liu tha. Tuy nhin, i vi bi ton
gi cng vic th ma trn R l ma trn dng nh phn, bi vy, khng th c lng trc tip gi tr cha bit trong
ma trn R, do chng ti xut cng thc tnh mc quan tm ca ngi dng i i vi cng vic j k hiu
l pij sau da trn gi tr ca pij chng ti ch la chn ra nhng cp (ngi dng - cng vic) m c mc quan
tm ln hn mt ngng cho trc c lng nhng gi tr rij cha bit trong ma trn R. C th thut ton ICCF
Improved c trnh by nh sau:
290
2 Phm Minh Chun,
C L Thanhh Hng, Trn nnh Khang, Nguy
yn Vn Hu

(1)
( Xc nh cm ngi dnng v cng vic da trn phng php ph hn cm spectrral trnh by trong 2.1
(2) m quan tm ca ngi dng n cn
( Xc nh mc ng vic da trn cm ngi dng (user-baased CF)
Mc
M quan tm
t ca ngii dng i i v i cng vic j ( ) c c lng bi ccng thc sau :
1
, 8

Trong ,
l l ch ss ca cc nggi dng tron
ng cng cmm vi ngi ddng i m ng tuyn c ng vic j,
, l i dng l; NU l tng s nggi dng troong cng cm vi ngi
tng ngg gia ngi dng i v ng
dng
d i m ng
tuyn cnng vic j.
(3)
( Xc nh mc
m quan tm ca ngi dng n cn ng vic da trn cm cng vvic (item-based CF)
Mc
M quan tm
t ca ngii dng i i v
i cng vic j ( ) c c lng bi ccng thc (9)
1
, 9

Trong ,
k l ch s ca cc cngg vic trong c
ng cm vi cng vic j m c ngg tuyn bi ng
i dng i,
, l
tng ngg gia cng viic k v cng vic
v j; NI l tng s cng viic cng trongg cm vi cng
g vic j m

c ng tuuyn bi ng
i dng i.
(4)
( Xc nh mc
m quan tm ca ngi dng n cn ng vic
Kt hpp mc quann tm , t hai tip cn phn cm Spectral
S da trrn ngi dnng v da trn cng vic,
mc
m quan tm
t ca ngii dng i v cnng vic j cui cng pij c tnh nh sau:
10
(5)
( c lngg cc gi tr chha bit trongg ma trn R da trn mc quan tm ca ngi dngg vi cng vic c
rij = 1 nu (11)
Gi tr pi chng ti cchn da trn mc quan tm trung bn nh ca tng nggi dng vii tt c cc cn ng vic m
h
h ng tuyn, c th c tnh theo cng thc (12) nh
n sau:
| 12
|
Trong ,
Ri l tp ccc cng vic c ng tuyn bi ng gi dng i. T
Trong phngg php ny, ch
hng ti s
dng
d cosin tnh t
ng ng giaa hai ngi d
ng, gia hai cng
c vic.

Ma trn tha R

Phn cm
m ngi dng v cng vic
theo Spectral

Cm ngi dnng Cm
m cng vic

Tnh pijU (8) da trn Tnh pijI (9) da trn


t
cm ngi dnng cm cng vicc

Tnh pij ((10) theo (8) v (9)


c lngg rijcha bit tro
ong R
1
1

Sai
iu kin
dng

ng

Ma trrn R sau khi


c
l
ng dy hn

Hnh 2. Thuut ton ICCF-Improved


H THNG GI S DNG THUT TON TI U BY N 291

3.2. Phng php PSO - Feature Weighted ci tin (PSO-FW Improved)

Trong phn ny, chng ti xut mt ci tin cho phng php PSO-Feature Weighted ( cho gn, chng ti
s gi l PSO-FW Improved) trn c s phng php ICCFImproved trnh by phn trn, trong tp trung vo
vic iu chnh cch thc c lng cc gi tr cha bit trong ma trn R ng thi chng ti cng xut vic xc
nh nh cc trng s IF v UF trong cng thc (17) thng qua thut ton PSO. Hnh 4 cho bit thut ton bt u
vi vic la chn cc c th chnh l cc trng s tng ng (I, U, IF, UF). Mi gi tr trong c th th hin mc
quan trng ca c trng tng ng. Gi tr bng 1 cho bit mc quan trng nht, trong khi gi tr 0 c ngha c
trng l khng quan trng v s khng c s dng trong bc tnh ton mc tng t (tng ng).
3.2.1 Cu hnh thut ton PSO
Cc c th PSO c xy dng nh hnh 3 c biu din nh
, , ,, , , , ,, , , trong , , j =1, 2, n, v , i=1, 2,, m tng ng l cc
trng s ca cng vic v ngi dng; , l cc trng s tng ng vi mc u tin ca k thut gi user-
based CF v item-based CF s dng phn cm spectral trong mt cch kt hp lai mt cch tuyn tnh a ra kt
qu gi cui cng ca m hnh v cc trng s ny nhn gi tr thc trong khong [0, 1]. Qun th khi to l mt tp
ngu nhin cc c th i din bi cc tr bt k trong khng gian tm kim cc nghim ng c trong bi ton ti u.
Cc tham s ca thut ton c a ra trong bng 1. Nhng tham s ny iu khin tc c lng cc trng s ti
u ca thut ton v cn bng gia vic tm kim cc b v ton cc.


1 0 0.5 0.7 1 0.3 0.7

Trng s ca cc cng vic c s Trng s ca ngi dng c s dng trong ma Trng s tng
dng trong ma trn tng ng ngi trn tng ng cng vic ng vi mc u
dng tin ca hai k
thut gi
Hnh 3: V d v biu din c th cho thut ton PSO

Kch thc qun th (s c th) 15


S bc lp ti a (s th h ca qun th) 100
H s gia tc cc b (c1) 2
H s gia tc ton cc (c2) 2
H s qun tnh khi to 0.9
H s qun tnh kt thc 0.4
Bc lp khi h s qun tnh t gi tr cui 80
Bng 1. Cu hnh tham s ca thut ton PSO
3.2.2 M hnh d on v hm fitness ca thut ton PSO s dng trong PSO-FW Improved
Trong phn ny, chng ti s dng mt m hnh d on trong cc trng s c trng chnh l cc c th
ca thut ton PSO. Trong m hnh ny, cc trng s c s dng cp nht tng t cosin nh biu thc (13)
tng t nh [27] di y.

, 13

M hnh d on s dng trong PSO-FW Improved c p dng theo th tc sau:
(1) Phn cm ngi dng v cng vic s dng phng php Spectral
Khi xy dng ma trn tng ng gia cc ngi dng, th c gi tr nh gi v trng s ca cng vic c
s dng tnh ton mc tng ng gia cc ngi dng. V cng tng t nh khi xy dng ma trn tng
ng gia cc cng vic, gi tr nh gi v trng s ca ngi dng cng c s dng tnh ton mc tng
ng gia cc cng vic theo cng thc sau:

14
2
Trong , ,,
292 Phm Minh Chun, L Thanh Hng, Trn nh Khang, Nguyn Vn Hu

Sau chng ta chuyn n bc tip theo ca thut ton phn cm Spectral nh cp mc 2.1, nhn
c cc cm tng ng vi ngi dng v cng vic
(2) p dng phng php user-based CF da trn phn cm Spectral
Khi p dng thut ton phn cm da trn ngi dng, mc quan tm ca ngi dng i i vi cng vic j
( ) c c lng bi cng thc sau:
1
, 15

Trong , l l ch s ca cc ngi dng trong cng cm vi ngi dng i m ng tuyn cng vic j,
, l tng ng gia ngi dng i v ngi dng l; NU l tng s ngi dng trong cng cm vi ngi
dng i m ng tuyn cng vic j.
(3) p dng phng php item-based CF da trn phn cm Spectral
Khi p dng thut ton phn cm cng vic da trn Spectral th mc quan tm ca ngi dng i i vi
cng vic j ( ) c c lng bi cng thc sau:

, 16

Trong , k l ch s ca cc cng vic trong cng cm vi cng vic j m c ng tuyn bi ngi dng i,
, l tng ng gia cng vic k v cng vic j; NI l tng s cng vic cng trong cm vi cng vic j
m c ng tuyn bi ngi dng i.

(4) Xc nh mc quan tm ca ngi dng i i vi cng vic j


Kt hp mc quan tm , t hai tip cn item-based CF v user-based CF da trn phn cm Spectral,
mc quan tm ca ngi dng i v cng vic j cui cng pij c tnh nh sau:
17
Trong , 0 , 1 c xc nh qua thc nghim khi p dng thut ton PSO.

(5) c lng cc gi tr cha bit trong ma trn R da trn mc quan tm ca ngi dng vi cng vic theo
cng thc (11)
(6) Tnh ph hp (fitness) ca mi c th trong thut ton PSO
Khi p dng phng php PSO-FW Improvedcho h thng gi cng vic th fitness ca mi c th trong
thut ton PSO s c tnh da trn mc hi t ca ma trn c lng R(k) ti bc th k, tc l nu ma trn R(k) v
ma trn R(k-1) c s khai khc cng t th fitness cng nh; theo fitness ca mi c th c tnh theo cng thc
sau:

18

Trong , card (R(k)) l s cc phn t rij=1 trong ma trn R(k).


Gi tr ca hm fitness cho bit khong cch gia v tr hin ti ca c th v v tr ti u. Ti mi bc lp,
thut ton s c gng gim khong cch ny. Bi vy, thut ton tr thnh qu trnh cc tiu ha trong mi c th c
gng gim khong cch gia v tr hin ti v v tr ti u. V th, nu gi tr fitness bng 0 th v tr hin ti ca c th
l ti u.
3.2.3. Cc bc ti u trng s v d on gi tr rij cha bit
y l phn quan trng ca thut ton PSO-FW Improved, bao gm cc bc sau:
Bc 1: c lng nhng gi tr cha bit rij trong ma trn R theo hai kha cnh: da trn ngi dng v da trn
cng vic.
1 1
, , 19

Trong , Nu l tng cng vic do ngi dng i ng c; NI tng s ngi dng ng tuyn cng vic j.
Bc 2: Cu hnh cc tham s cho thut ton PSO nh trong mc 3.2.1
Bc 3: Vi mi c th, thc hin cc cng vic sau
a. p dng m hnh gi m t trong mc 3.2.2 c lng cc nh gi cha bit v tnh fitness cho c th;
b. Cp nht v tr tt nht ca c th (Pbest) theo thut ton PSO m t trong mc 2.2
H
H THNG GI S DNG TH
HUT TON T
I U BY N
N 293

Bc
B 4: Cp nht
n v tr tt nnht ca ton b qun th (Gbest)
( theo th
hut ton PSO O m t troong mc 2.2
Bc
B 5: Cp nht
n vn tc vv v tr ca mi c th theo cng thc (6)) v (7)
Bc
B 6: Thit lp ma trn nh gi R s dng v tr ttt nht ca c by
b n trongg m hnh gi v ma trn va v thit
l
p c s dng lm d liu u vo choo bc tip theo
Bc
B 7: Nu iu
kin dngg tha mn ( t c s bc lp ti a hoc
h fitnesss khng thay i) th chuy
n sang
bc
b 8, ngcc li quay li bbc 3
Bc
B 8: Nhn c trng s ti u (U, I, IF, UF) v
v ma trn n
nh gi R dy hhn.

Ma trn tha R
Thit lp cc hng s: kmmax, c1, c2.
Khi to cc v tr ca cc c th x0i (U, I, IF, UF)
Khi to nggu nhin vn tc c th : 0 v0i v0max vi
v i = 1, ...,s.

Trng s U, I, IF, UF
c lng
l gi tr fi
fitness (Pbest)
ca mi c th Gi tr fitneess M hnh d on

C nht gi ttr Pbest


Cp

C nht gi trr Gbest


Cp

Cp nh
ht v tr v vn tc ca mii
c th xkiv vkithheo (9, 10)

Cp nht
n ma trn R theo gi tr
Gbestt da trn m hnh d on
d trn cng tthc (11)
da

Sai nng
iu kin Trn
ng s c trn
ng c ti uu, ma
dng trn R dy
d hn

Hn
nh 4. Lu thu
ut ton PSO FW Improved
3.3.
3 Thc ngh
him

3.3.1
3 Chun b
b d liu th
c nghim
Chng ti th nghim m phng phhp xut trn b d liu v cng vic1 , bao gm 10054 ngi dn ng v 1682
cng
c vic, tronng mi ngi dng t ng ng c t nht mt cn
ng vic, mi cng vic cc ng c bi t nht mt
ngi
n dng; d liu ny
c biu din bbi ma trn ng
i dngcnng vic R vi 11054 hng v 1682 ct.
D liuu c chia thnh hai tp huun luyn v tp kim tra; trrong tp d liu ban uu c chia th
nh 5 phn
da
d trn nhnng cp (ngi dng, cng vvic ng tuyn n) trong ma trrn R. Sau ln lt chnn 1 phn bt k k lm tp
kim
k tra, nhnng phn cn li lm tp hunn luyn.

1
http://www.kagggle.com/c/job-recommendation
294 Phm Minh Chun, L Thanh Hng, Trn nh Khang, Nguyn Vn Hu

Ngoi ra, bi bo cn xem xt n mc tha ca tp d liu. Trong mc tha ca tp d liu i vi


ma trn X c tnh nh sau:
,
Mc tha (R) =1- (20)
,
Nh vy, mc tha ca tp d liu c biu hin thng qua ma trn R = 1 - = 0.9848. Vi tp d
liu ny mc tha l kh cao.
3.3.2 nh gi chnh xc m hnh gi
nh gi chnh xc ca m hnh gi chng ta c th s dng o Precision hoc Recall hoc l kt
hp gia hai o trn.
Precision = #TP /(#TP+#FP) (21)
Recall (True Positive Rate)= #TP/(#TP+#FN) (22)
F-Measure = 2x (PrecisionxRecall)/(Precision + Recall) (23)
Trong #TP l s cng vic d on ng, #FP s cng vic d on khng ng, #FN s cng vic khng
c d on.
Trong ma trn R, gi tr 0 ni ln rng hoc ngi dng khng ng tuyn cng vic hoc h khng bit v
cng vic . iu ny gy kh khn cho vic tnh ton chnh xc (Precision) ca m hnh d on. Thay vo ,
gi tr 1 cho bit chc chn ngi dng ng tuyn cng vic, v vy chng ta s s dng ph nh gi
mc chnh xc ca m hnh d on. Theo , cng thc v ph Recall@ N c tnh nh sau:

@ 24

nh gi ton b m hnh gi chng ti s dng gi tr trung bnh ca Recall@N tng ng vi 5 tp kim
tra, trong mi mt tp kim tra ph ca m hnh gi l ph trung bnh ca tt c ngi dng v gi tr ny cng
gn 1 th m hnh gi c chnh xc cng cao.
3.3.3 Kt qu thc nghim

1
0.9
0.8
0.7
0.6 CF
Recall

0.5
ICCF -Improved
0.4
PSO - FW Improved
0.3
0.2
0.1
0
5 10 15 20 25 30

Hnh 5. Kt qu thc nghim so snh kt qu gia cc phng php CF, ICCF-Improved v PSO-FW Improved
i vi h thng gi cng vic

Trong phn ny chng ti s tin hnh thc nghim d liu ca bi ton gi cng vic vi cc phng php
gi lc cng tc truyn thng, phng php ICCF-Improved v phng php PSOFW Improved. Chng ti thc
nghim vi d liu gi cng vic nh m t trong phn trn.
Hnh 5 biu din hi tng tng ng vi cc phng php CF truyn thng, v hai phng php do chng
ti xut l ICCF-Improved v PSO-FW Improved ln lt vi s lng cng vic c la chn a ra gi
(TopN) l 5, 10, 15, 20, 25, v 30. Thng qua th biu din trong hnh 5, chng ti nhn thy rng phng php CF
truyn thng chnh xc thp hn hn so vi hai phng php ICCF-Improved v PSO-FW Improved ngay c khi gi
tr TopN c tng ln 30. Ngoi ra, i vi phng php ICCF-Improved mc d chnh xc c ci thin ng
k so vi phng php CF nhng vn thp hn so vi phng php PSO-FW Improved ng vi cc gi tr TopN khc
nhau. R rng s ci tin trong phng php PSO-FW Improved cho kt qu tt hn hn so vi hai phng php
cn li trong ton b cc s lng cng vic c la chn a ra gi . y l mt kt qu rt ng ch .
H THNG GI S DNG THUT TON TI U BY N 295

IV. KT LUN
Phng php ICCF v PSO-Feature Weighted gii quyt kh tt cho h gi m gi tr nh gi l cc s
trong tp {1, 2,, 5}. Rt tic n khng p dng trc tip c cho nhiu h gi chng hn nh h thng gi bi
bo, gi vic lm v gi tin tc; m min nh gi nhn gi tr nh phn. gii quyt vn ny, bi bo
c hai ng gp.
Th nht, chng ti iu chnh cch thc c lng gi tr nh gi cha bit (rij) trong ma trn R trong c
hai phng php ICCF v PSO-Feature Weighted p dng cho bi ton gi c min nh gi nh phn. Th hai,
chng ti a ra cch tin hnh xc nh trng s khi lai ghp tuyn tnh phng php user-based CF v item-based CF
da trn phn cm spectral (IF v UF) cng vi qu trnh tm ra b trng s ti u i din cho mc quan trng
ca cc ngi dng v cng vic trong vic tnh ton tng ng (U v I). Bng cch nh vy, chng ti mong
mun khai thc hiu qu phng php lai ghp gia user-based CF v item-based CF nhm nng cao cht lng ca h
gi .
Chng ti s tip tc nghin cu ci thin phng php xut nhm khai thc thm thng tin m t ca
cc cng vic khc phc vn cng vic mi tc l nhng cng vic cha tng c ai ng tuyn v nng cao cht
lng gi .
V. TI LIU THAM KHO
[1] B. N. Miller, J. A. Konstan, and J. Riedl. Toward a personal recommender system. In Proceedings of ACM Trans.
Inform. Syst., 22(3), pages 437-476, 2004.
[2] G. Adomavicius and A. Tuzhilin. Towards the next generation of recommender systems: a servey of the state-of-
the-art and possible extensions. IIEEE Trans. Knowl. Data Engine, 17(6), pages 734-749, 2005.
[3] R. Baraglia and F. Silvestri. An online recommender system for large web sites. In proceedings of IEEE/WIC/ACM
International Conference on Web Intelligence, Beijing, China, pages 199-205, 2004.
[4] J. Bobadilla, F. Serradilla, and A. Hernando. Collaborative filtering adapted to recommender systems of e-learning.
Knowledge- Based Systems, 22, pages 261-265, 2009.
[5] M. Grcar, D. Mladenic, B. Fortuna, and M. Grobelnik. Data sparsity issues in the collaborative filtering framework.
In Proceedings of Advances in Web Mining and Web Usage Analysis (LNAI. 4198), pages 58-76, 2006.
[6] M. Deshpande , G. Karypis. Item-based top- n recommendation algorithms. ACM Transactions on Information
Systems, 22, pages 5-53, 2004.
[7] H. Ahn. A new similarity measure for collaborative filtering to alleviate the new user cold-starting problem.
Information Sciences, 178(1), pages 37-51, 2008.
[8] P. Symeonidis, A. Nanopoulos, A.N. Papadopoulos, and Y. Manolopoulos. Collaborative filtering: fallacies and
insights in measuring similarity. In Proceedings of the 10th PKDD Workshop on Web Mining (WEBMine2006),
Berlin, Germany, pages 56-67, 2006.
[9] F. Fouss, A. Pirotte, J. M. Renders, and M. Saerens Random-walk computation of similarities between nodes of a
graph with application to collaborative recommendation. IEEE Transactions on Knowledge and Data
Engineering, 19(3), pages 355-369, 2007.
[10] C. Zeng, C. Xing, L. Zhou, and X. Zheng Similarity measure and instance selection for collaborative filtering.
International Journal of Electronic Commerce, 8(4), pages 115-129, 2004. B. Jeong , J. Lee , and H. Cho.
Improving memory-based collaborative filtering via similarity updating and prediction modulation. Information
Sciences, 180(5), pages 602-612, 2010.
[12] A. Amira, H. Sekiya, I.Matsuba, Y. Horiuchi, and S. Kuroiwa. Collaborative Filtering Based on an Iterative
PredictionMethod to Alleviate the Sparsity Problem. In Proceedings of the 11th International Conference on
Information Integration and Web-based Applications and Services (iiWAS2009), pages 373-377, Kuala Lumpur,
Malaysia, 2009.
[13] D. Wettschereck, D. W. Aha, and T. Mohri. A review and empirical evaluation of feature weighting methods for a
class of lazy learning algorithms. Artif. Intell. Rev., 11(1-5), pages 273-314, 1997.
[14] J. S. Breese, D. Heckerman, and C. M. Kadie. Empirical analysis of predictive algorithms for collaborative
filtering. In Proceedings of the 14th Conf. of Uncertainty in Artificial Intelligence, pages 43-52, Madison, WI,
1998.
[15] J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl. An algorithmic framework for performing collaborative
filtering. In Proceedingsof the 22nd annual international ACM SIGIR conference on Research and development
in information retrieval , pages 230-237, Berkeley, California, United States, 1999.
296 Phm Minh Chun, L Thanh Hng, Trn nh Khang, Nguyn Vn Hu

[16] K. Yu, X. Xu, M. Ester, and H. P. Kriegel. Feature weighting and instance selection for collaborative filtering: An
information-theoretic approach. Knowl. Inf. Syst., 5(8), pages 201-224, 2003.
[17] R. Jin, J. Y. Chai, and L. Si. An automatic weighting scheme for collaborative filtering. In Proceedings of the 27th
annual international ACM SIGIR conference on Research and development in information retrieval , pages 337-
344, Sheffield, United Kingdom, 2004.
[18] S. H. Min and I. Han. Optimizing Collaborative Filtering Recommender Systems. Lecture Notes in Computer
Science, vol.3528, pages 313-319, 2005.
[19] D. Pudjianto, S. Ahmed, and G. Strbac. Allocation of VAr support using LP and NLP based optimal power flows.
IEE Proc. Generation, transmission and distribution, 149(4), pages 377-383,2002.
[20] J. Kennedy, and R.C. Eberhart. Particle swarm optimization. In Proceedings of the IEEE International Joint
Conference on Neural Networks, pages 1942- 1948, 1995.
[21] M. Papagelis , D. Plexousakis. Qualitative analysis of user-based and item-based prediction algorithms for
recommendation agents. Engineering Applications of Artificial Intelligence, 18(7), pages 781-789, 2005.
[22] A. Y. Ng, M. I. Jordan, Y. Weiss. On spectral clustering: Analysis and an algorithm. Advances in Neural
Information Processing Systems, vol.14, pages 849-856, 2001.
[23] A. Afifi, T. Nakaguchi, N. Tsumura, Y. Miyake. A Model Optimization Approach to the Automatic Segmentation
of Medical Images. IEICE Trans. on Information and Systems, E93-D(4), pages 882-889, 2010.
[24] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and A. Y. Wu. An Efficient k-Means
Clustering Algorithm: Analysis and Implementation. IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol.24, pages 881-892, 2002.
[25] R. Poli. Analysis of the publications on the applications of particle swarm optimization applications. Artificial
Evolution and Applications , 2007.
[26] A. Lazinica, Particle Swarm Optimization, In-Tech, Croatia, 2009.
[27] Amira Abdelwahab, Hiroo Sekiya, Ikuo Matsuba,Yasuo Horiuchi, Shingo Kuroiwa, Feature Optimization
Approach for Improving the Collaborative Filtering Performance Using Particle Swarm Optimization, Journal of
Computational Information Systems, vol. 8, no. 1, pp. 435-450, 2012.
[28] Pham Minh Chuan, Le Thanh Huong, Tran Dinh Khang va Cao Xuan Bach, H thng khuyn ngh cng vic,
DOI 10.15625/FAIR VII.2014-0336, pages 153-159.

RECOMMENDATION SYSTEMS USING SWARM OPTIMIZATION


ALGORITHM
Pham Minh Chuan, Le Thanh Huong, Tran Dinh Khang, Nguyen Van Hau
ABSTRACT - The Collaborative Filtering (CF), one of the most commonly used techniques of Recommendations Systems, has been
integrated in ecommerce sites (such as, amazon.com, barneandnoble.com, Yahoo! news, TripAdvisor.com). The CF approach
based on the underlying assumption in which if one person, namely X, has the same opinion as another person, namely Y, on a
particular issue, then X has a similar opinion with B on another issue than with another person which was chosen randomly. The
Iterative Clustered CF (ICCF) and the Feature - Weighted ICCF using particle swarm optimization (PSO) methods demonstrate the
effectiveness for the recommendation systems in which the assessed values are in a set of {1, 2, ..., 5}. However, such techniques
cant be directly applied to the recommendation systems in which the rating values belong in the set {0, 1}. To deal with this issue,
this paper proposes some improvements to ICCF and PSO - Feature Weighted, so that these proposed methods can be applied to the
recommendation systems in which the rating value are in a set of {0,1}. By using these proposed for the job recommender system,
experimental results show that accuracy of predictive models has considerably improved compared with the traditional CF method,
and one often issue of CF, sparse problems, has been mostly solved.

You might also like