You are on page 1of 23

LI M U

Mn hoc Cu truc d liu nng cao la mt mn c ban, nn tang cho mi ngi hoc va lam v cng ngh thng tin. Trong mn hoc nay, chung ta c tim hiu, nghin cu rt nhiu cu truc d liu cung vi nhng ng dung cua chung trong thc t. Tai liu nay nhm gii thiu v cu truc splay tree cy splay va ng dung splay tree trong K thut c nhn ha web ng ngha trong trng hp bng n lng truy cp web Chung ti xin trn trng t lng bit n ti Tin s Nguyn Manh Hung - ngi trc tip ging dy mn hc Cu truc d liu nng cao. Chung ti cng xin chn thnh cm n cc bn b v ng nghip nhit tnh gip ti hon thnh bi tp ny. Mc d c gng nhng chc hn tai liu khng trnh khi nhng thiu st. V vy chung ti rt mong c Thy cng cc bn nhn xt v gp ti c tai liu ny c hon thin hn. Chung ti xin trn thanh cam n!

Ha Ni, thang 1 nm 2012 Nhom hoc vin thc hin: Quang Ha - L Thanh Mai

Phn 1: Splay tree


1.1 Gii thiu v Splay tree Splay tree c cc tc gi D.D.Sleator v R.E.Tarjan a ra nm 1983. Cy Splay l cy tm kim nh phn, song mi php ton trn cy i km theo thao tc cu trc li cy, nh cac cy t iu chinh khac la cy AVL hay cy o en, c gi l splaying cy. Vi cy AVL hoc cy - en, chng ta khng quan tm ti tn sut truy cp ca cc phn t d liu, m thay cho iu chng ta lun lun m bo cy khng bao gi mt cn bng ti mi nh, v do thi gian thc hin cc php ton trn cy l O(logn). ci t cy AVL hoc cy - en, ta cn phi a vo mi nh thng tin v s cn bng hoc v mu ca nh . Trong khi o, Splaying cy nhm mc ch gim bt tng thi gian truy cp d liu bng cch dch chuyn cc d liu c thng xuyn truy cp ln gn gc cy, v v vy s truy cp ti cc d liu s nhanh hn. u im ca Splay cy l chng ta khng cn lu thng tin v s cn bng ca cc nh, v do , tit kim c b nh v s ci t cng n gin hn. Bn canh o, Splay tree la cy nhi phn tim kim nn cung kha ro rang, d hiu va d dang thc hin cac thao tac c bn nh: tm kim, xo, chn, Y tng cua Splay tree la sp xp cc nt va c xt, c tn sut s dng cao v gc va xem xt vi cc nt trong c nhiu con chu hay cc nt l. Vic chuyn mt nh v bt k ln gc cy l rt n gin bng cch s dng cc php quay cy (tri hoc phi), mi ln quay nh v c chuyn ln 1 mc. Chng hn, di chuyn nt c truy nhp x v gc bi php quay:

1.2 Nguyn tc hot ng cua splay tree Trn cy splay, cc php quay c nh ngha bng cc quy tc quay quay dn nt c xt v gc C 2 phng php quay: - Bottom Up: Xut pht t nt c truy cp, ta quay nt ny t di ln trn cho n khi n tr thnh nt gc. - Top Down: Xut pht t gc ca cy, ta quay nt ny t trn xung di n khi n nt ang c truy cp. - Trn cy splay, cc php quay c nh ngha bng cc quy tc quay quay dn nt c xt v gc. Nu x la gc cua cy thi khng phai thc hin gi. 1.2.1 Phng phap Bottom Up Ph thuc vo cu trc ca ng dn truy cp v lun m bo cy Splay vn la cy tim kim nhi phn, phng phap Bottom Up s dung 3 quy tc quay c ban sau: Dng 1: Zig Gia s cha cua nut x la gc cua cy, ta thc hin quay x quanh nut cha cua x.

Dng 2: Zig-Zig Cha ca nt x khng phai l gc ca cy, nt x l con tri ( hoc con phi) cua nut gc, cha ca nt x l con

Quay y quanh z

Quay x quanh y

phi (hoc con tri) cua nut gc. Ta thc hin phep quay cha cua x quanh ng cua x va quay x quanh cha ca x.

Dng 3: Zig-Zag Cha ca nt x khng l gc ca cy, nt x l con tri (hoc con phi) cua nut gc, cha ca nt x l con phi (hoc con tri) cua nut gc. Ta thc hin quay x quanh cha ca x va quay x quanh ng ca x. Quay x quanh y Quay x quanh z

Vi du: Ap dung quy tc quay quay nut a thanh nut gc.


Cha ca a l nt d, khng phi l nt gc, d l con tri, cn a l con phi,p dng quy tc zig-zag
i f e d a b D E a c F C D C D E F B c G A d b E F f a e G H g H h I J

i i h g f A d c b e G H I J h

J I

Cha ca a l nt b, khng phi l nt gc, b v a u l con phi, p dng quy tc zig-zig

B C

i h g a J I H a d b c e h

Cha ca a l nt f, khng phi l nt gc, f l con tri, cn a l con phi. p dng quy tc zig-zag

g d b c e

H G

E D

G h a I

C J

E D

f d b c F e

A B

H G

Cha ca a l nt h, khng phi l nt gc, a v h u l con tri p dng quy tc zig-zig

E D

1.2.2 Phng phap phn tich Top Down Ph thuc vo cu trc ca ng dn truy cp v lun m bo cy Splay vn la cy tim kim nhi phn, phng phap Bottom Up s dung 4 quy tc quay c ban sau: Dng 1: Zig Cha ca nt Y l gc ca cy, ta thc hin: Quay nt Y quanh nt cha ca Y. Nt Y tr thnh nt ch ca X.

Dng 2: Zig - Zig Cha ca nt Z khng phi l gc, nt Z v cha ca nt Z cng l con tri (hoc con phi) cua nut gc. Ta thc hin: Quay cha ca Z quanh ng ca Z va quay Z quanh cha ca Z

Dang 3: Zig Zag Cha ca nt Z khng l gc ca cy, nt Z l con tri (hoc con phi), cha ca nt Z l con phi (tri) cua nut gc. Ta thc hin quay cha ca Z quanh ng ca Z.

Dang 4: Reasembling Thc hin: sp xp li cy

V D: Ap dng cc quy tc quay nt 18 tr thnh nt gc

g -Za g i Z

Zi gZi g

-6-

Zig

Re a

se

m bl e

1.3 Cac phep cp nht trn Splay Tree 1.3.1 Find (i, T) - Tm kim nt i trn cy T - Tm kim nt i trn T, nh BST - Nu tm thy, quay nt i v gc - Nu khng c i, quay nt cui cng c thm trn ng dn tm kim v gc. V d tim nt 65 trn cy T
5 0 4 0 2 0 1 6 2 5 6 3 4 3 6 65 5 6 6 5 0 4 0 2 0 1 6 4 3 6 3 6 5 7 0 6 0 7 0

Dng zig-zag

=>quay nt 65 quanh nt
70, quay nt 65 quanh 60
2 0 1 6 4 0 4 3

5 0 6 5 6 3 6 6 7 0

6 0

Dng zig => quanh 65 quanh 50

6 6

6 0

V d tim nt 42 trn cy T
5 0 4 0 2 0 1 6 2 5 6 3 4 3 6 5 6 6 6 0 7 0

Nt 42 khng tim thy trn cy, nt 43 c thm ln cui cng trn ng dn tim kim => quay nt 43

4 3 4 0 2 0 1 6 6 5 6 3 6 6 5 0 6 0 7 0

Dng zig-zag =>quay nt 43 quanh nt 40, quay nt 43 quanh 50

1.3.2 Catenate (T1,T2)

- Ni 2 cy T1 v T2 c 1 cy BST - Tm nt i ln nht trong cy T1 - Quay i v lm gc T1 - Ni T2 lm con phi ca nt gc i

i T1 T2 T1
V d ni 2 cy

i T2 T1 T2
sau

1 0 8 2 3 2 5 2 9 5 5

6 0 7 5 6 5 9 0

Cy T1
2 9 2 5 1 0 8 5 2 3 5 5 6 5 6 0

Cy T2

7 5 9 0

1.3.3 Split (i,T) Tch cy T ti node i Trng hp 1: i T Quay nt i v lm gc ca T Ct lin kt tri hoc lin kt phi ca nt i 9

=> cy T c tch ti nt i thnh 2 cy : T1, T2

Ct lin kt phi ca i
i

Quay i v gc
T 1 T 2

T 1

T i 2

Ct lin kt tri ca i
T 1 T 2

Trng hp 2: i T gc ca T Ct lin kt phi ca nt i- hoc lin kt tri ca i+ => cy T c tch ti nt i- hoc i+ thnh 2 cy: T1, T2
Quay i- v gc
T 1 i -

Quay nt i- (nt c gi tr lin trc i) hoc i+ ( nt c gi tr lin sau i) v

i -

Ct lin kt phi ca iT 2 T 1 T i 2 +

Quay i + v gc

i +

T1
Ct lin kt tri ca i +
T 2

T V d tch cy T ti nt 60 1
5 0 4 0 2 0 1 6 6 3 4 3 6 5 6 6 6 0

T 1

T 2

Quay nt 60 (dng zig)


5 0 7 0 2 0 1 6 5 0 4 0 2 0 1 6 4 3 6 5 6 3 6 6 6 0 7 0 4 0 4 3 6 3

6 0 7 0 6 5 6 6

Ct lin kt tri ca nt 60

Cy T1

Cy T2
-9-

10

1.3.4 Insert (i,T) Insert (i,T) - Phep chen 1 - Chn nt vo nh trong BST - Quay nt va c chn v gc - Nu c i trong T th quay nt v gc Insert (i,T) Phep chen 2: - Thc hin Split(i,T) ta c 2 cy con T1, T2 - Ni cy T1 thnh con bn tri ca nt i, T2 thnh con bn phi ca nt i
Ni T1 thnh con tri ca i Split(i,T)
T 1 T 2

Ni T2 thnh con phi ca i

T 1

T 2

V d chn nt 42 vo cy T
4 0 4 3 5 0 6 0

5 0 4 0 2 0 1 6 6 3 4 3 6 5 6 6 6 0 7 0

Split(42,T )
1 6

2 0

Cy T1
4 2 6 5 4 3 5 0 6 0 7 0 6 5 6 3 6 6 6 3 6 6

7 0

4 0 2 0

Cy T2

Ni T1 thanh con trai cua 42 Ni T2 thanh con phai cua 42

1 6

11

1.3.5 Delete (i,T) Xo nt i khi cy T Quay nt i v nt gc ca cy T (nu nt i khng c trn cy T thi quay nt c thm ln cui cng trn ng dn tim kim) Ct b lin kt tri v lin kt phi ca nt i, ta c hai cy T1 v T2 Xa nt i Catenate(T1, T2)
i i

Quay nt i

Ct lin kt tri v lin kt phi ca nt i, xa nt i


T 1 T 2 T 1 T 2

Catenate(T1, T2)
T

V d xa nt 40 khi cy T

5 0 4 0 2 0 1 6 2 5 6 3 4 3 6 5 6 0

Quay nt 40 v gc
2 0

40 5 0 2 5 4 3 6 0 7 0

7 0

1 6

6 6 2 0 1 6

2 5
5 0 4 3 6 0 7 0 6 5 6 3 6 6 6 3

6 5 6 6

V d xa nt 80 khi cy T
5 0 4 0 2 0 1 6 2 5 6 3 4 3 6 5 6 6 6 0 7 0 2 0 4 0 4 3 2 5 5 0 6 3 6 0

7 0

6 5 6 6

12

1 6

Phn 2 : Bi bo khoa hc
K THUT C NHN HA WEB NG NGHA Trong trng hp bng n lng truy cp web Khi lc S pht trin v bo v kch thc cng nh lng s dng ca World Wide Web vn ang tip tc to ra nhng thch thc v nhu cu to ln mi. Nhu cu mong mun d on c nhng s thch ca ngi dng nhm gii quyt v ci thin vic duyt mt trang web c th t c thng qua vic c nhn ha cc trang web. K thut c nhn ha c thc hin da trn nhng khai bo v s thch c th ca ngi dng v mt qu trnh lp i lp li ca vic kim nh hot ng duyt web ca ngi dng, thu thp nhng yu cu ca qu trnh ny v cc i tng bn th hc (ontological objects) v lu gi chng trong cc h s (profile) nhm mc ch cung cp cc ni dung mang tnh c nhn. Vn m chng ta s cp l trng hp khi mt s trang web tr nn phor bin trong mt giai on ngn v c truy cp thng xuyn lin tc trong mt khng gian v thi gian gii hn. Mc tiu ca chng ta l i ph c vi s bng n v lng truy cp ny v c th tip tc thu ht c nhng ngi dng tim nng trong tng lai c cng nhng mi quan tm chung n vi nhng trang web c lng truy cp cao ny. Do , trong bi bo ny, chng ti s xut mt k thut c nhn ha web mi, da trn cc cu trc d liu tin tin. Cc cu trc d liu c s dng bao gm Cy Splay (1) v cc ng Nh phn (Binary Heaps) (2). Chng ti s miu t kin trc ca k thut, phn tch mc phc tp v khng gian v thi gian v chng mnh v mt hiu nng ca xut a ra. Thm na, chng ti s so snh v c hai mt l thuyt v th nghim k thut xut vi cc cch tip cn khc nhm chng mnh cho tnh hiu qu ca n. Gii php ca chng ti t c phc tp v khng gian O(P2) v chy trong thi gian l k.logP, ti k l s lng cc trang v P l s lng cc i tng bn th (ontonology) ca cc trang Web. Cc t kha-thnh phn: personalization (c nhn ha), ontologies (cc i tng bn th hc), cu trc d liu thch nghi (adaptive data structure). I. Gii thiu Web ng ngha (The Semantic Web) tr thnh mt n by a vic tch hp cc tri thc trn Web ln cc cp mi. Bt chp nhng n lc dnh cho cc vn v nghin cu v k thut, c rt t cc ng dng thc t trin khai v nh gi v web ng ngha i vi nhng ngi dng thc. Web ng ngha ch c th c cung cp nu n c vn hnh bi nhu cu, ng cnh, cc h s ca ngi dng c th tch hp mt cch lin tc cc tri thc ln web nhm cung cp cc ni dung thc s c mong i. Ng cnh v s ty bin (customization) l mt trong nhng yu t ch yu quyt nh tnh chnh xc, hiu qu, ph hp ca cc thng tin truy cp trn cc th vin s ca Internet m ni chung li l Web ng ngha. Trong cc ng dng Web truyn thng, ngi dng duyt Web theo cc cu trc siu vn bn c nh ngha trc. V vy, vic tm kim ni dung yu cu ngi dng phi hiu c b cc ca trang Web, m b cc ny th khng phi lc no cng r rng. Vic b sung v cc ng dng Web cc kin ngh c c nhn ha s cung cp cc hng i thay th cho vic xut bn d liu, v tng cng kh nng cho ngi dng trong vic tm kim cc d liu m h quan tm. Tuy nhin, tnh hiu qu ca vic c nhn ha c da trn cht lng ca h s ngi dng v mi quan h gia cc i tng ni dung. Vic m hnh ha cc d liu s c xut bn v h s ngi dng bng cc i tng bn th hc (ontologies) s cho php th hin c hiu qu hn cc mi quan tm ca ngi dng v cc

13

mi quan h gia cc b phn thng tin, thng qua vic thc y cc tnh nng tin tin ca cng ngh Web ng ngha. Chnh cc mi quan h v ng ngha ny c th c khai thc thu c cc kt qu v c nhn ha c chnh xc hn. K thut c nhn ha c thc hin da trn nhng khai bo v s thch c th ca ngi dng v mt qu trnh lp i lp li ca vic kim nh hot ng duyt web ca ngi dng, thu thp nhng yu cu ca qu trnh ny v cc i tng bn th hc (ontological objects) v lu gi chng trong cc h s (profile) nhm mc ch cung cp cc ni dung mang tnh c nhn. Chng ti hng ti vic lu gi cc d liu quan h gia cc i tng bn th hc lin quan n tnh ph bin ca chng v cc yu cu ca ngi dng v cc i tng bn th lin quan n vic duyt cc ni dung . Cc thut ton v c nhn ha v tin c nhm a ra cc kin ngh cc trang web n ngi dng thng qua vic truy cp hin ti ca h v da vo cc mu duyt web ca ngi dng trong qu kh. Vn chng ta quan tm l trng hp khi mt s trang web tr nn ph bin trong mt thi gian ngn v c truy cp thng xuyn trong khng gian v thi gian gii hn. Mc tiu ca chng ta l gii quyt vn bng n v truy cp ny v gii thiu c cc trang web c lng truy cp cao ny n ngi dng tim nng tng lai c cng mi quan tm chung. V th, trong bi bo ny, chng ti xut mt k thut c nhn ha web mi, da trn cc cu trc d liu tin tin. Cc cu trc d liu c s dng bao gm Cy Splay (1) v cc ng Nh phn (Binary Heaps) (2). Chng ti s miu t kin trc ca k thut, phn tch mc phc tp v khng gian v thi gian v chng mnh v mt hiu nng ca xut a ra. Thm na, chng ti s so snh v c hai mt l thuyt v th nghim k thut xut vi cc cch tip cn khc nhm chng minh cho tnh hiu qu ca n. Gii php ca chng ti t c phc tp v khng gian O(P 2) v chy trong thi gian l k.logP, ti k l s lng cc trang v P l s lng cc i tng bn th (ontonology) ca cc trang Web. II. Cc nghin cu trc y Vic c nhn ha Web tr thnh mt vn quan trng do s ph bin ca cc ng dng v thng mi in t [1,7,9]. Mt s phng php cho vic c nhn ha website c xut [1,3,4,6]. Mc tiu ca mt trang web c c nhn ha l thu c nhng ch li t cc tri thc t c t vic phn tch v hnh vi duyt web ca ngi dng kt hp vi cc thng tin c thu thp khc, v d nh v v tr truy cp ca ngi dng, cc mu v duyt web ca ngi dng trong qu kh, hay cc m h mua trn mng [16,5,6,7,9]. Mt vn rt quan trng khc na l cu trc ca mt trang web v cc nghin cu mang tnh thng k v cc lin kt v trang web nm trong cu trc . UPR l mt thut ton v nh gi bc ca trang web (Page Rank) ti kt hp cc d liu s dng v cc k thut phn tch v cc lin kt nhm nh gi v kh nng truy cp ca cc trang Web da trn s quan trng ca chng trong mt s tng th v vic duyt c trang Web [15]. Mt k thut c nhn ha c s dng rt rng ri khc l k thut c nhn ha vic khai ph d liu s dng web (Web usage data mining personalization) [1]. Vi d, mt thut ton phn lp cho vic cc nhn ha Web da trn k thut khai thc cc d liu s dng web c xut. Thut ton ny gn cho ti khon ngi dng c cc thng tin mang tnh cht tnh, thng qua cc k thut xp nhm c in, v cc hnh vi ng ca ngi dng, t xut mt thut ton phn lp li mi v hiu qu hn [17]. Vic kt hp in ton Thng minh (Computational Intelligent) cng c ng dng trong ng cnh ca qu trnh c nhn ha Web, thng qua vic cung cp cc v d khc nhau v cc h thng thng minh, c thit k cung cp cho ngi s dng Web cc thng tin m h tm kim, m khng cn phi i hi h phi hi mt cch r rng [19].

14

Mt khc, mt h thng cung cp c tiu thuyt online xy dng cc m hnh h s v a ra cc khuyn ngh m khng cn ngi dng phi t dn dt [18] . Ni chung, vic c nhn ha trong thi gian gn y cng c s dng vo mt s cc lnh vc khc. Trong qung co, mt k thut xy dng mc tiu cho cc qung co trc tuyn mi cng c xut [28], vi vic s dng v thay i cho ph hp mt s cc k thut v t vng v thu thp thng tin kh mnh c kim th y , xy dng mt c lng v s ham thch ca mt ngi dng i vi cc sn phm v dch v c th da trn vic phn tch hnh vi duyt web ca ngi dng . Hn na, cc k thut theo hng tip cn nghin cu thit k kt hp cc hnh vi v cng ngh c xut nhm h tr cho vic m hnh ha ngi dng tt hn trong cc ng dng qung co trn mobile c c nhn ha [29]. Thm ch trong qu trnh download cc tr chi trn in thoi bng cc thit b in thoi, mt h thng gii thiu cc tr chi trn in thoi c c nhn ha s c a ra, hot ng da trn vic phn tch nhng thi gian trong ngy, v thi gian trong tun c s dng cung cp cc kinh nghim c c nhn ha nhiu hn [31]. Ngay khi vic bng n v lng truy cp c coi nh l mt vn v thut ton, mt s bi bo cng c trnh by. Mt b khung (framework) v thut ton mi cho php pht hin cc du hiu ca s bng n c gii thiu: mt b cu trc d liu tng qut ha cy nh phn dch chuyn (Shifted Binary Tree), v mt thut ton tm kim phng on nhm tm ra mt cu trc d liu hiu qu lm u vo cho thut ton. Ngoi ra, vic c nhn ha ng ngha c ci thin trong cc th vin s cng nh cc cng thng tin Web. Vic duyt web theo ng ngha (Semantic Browsing) cung cp cc ni dung Web c to ra mt cch linh ng theo ng cnh, t ci tin li tri thc cho ph hp hn vi mong mun ca ngi dng. V d, vi mt th vin s v y hc ang tn ti trong thc t, Th vin in t quc gia v truyn nhim (The National electronic Library of Infection NeLI, www.neli.org.uk) [32], c b sung mi bng mt ontology v lnh vc truyn nhim, t cho php cc dch v v ng ngha mi c th c pht trin mt cch nh tnh. Trong qu trnh ny, vic h s ha cc nhm c s dng ci tin qu trnh duyt ng ngha, thng qua vic tch hp cc ngun tri thc phn tn. Dch v c nh gi thng qua vic phn tch log ca my ch web, vic ci tin mt cch linh hot cc h s v thng qua vic phn hi nh tnh t chnh nhng ngi dng thc t ca cng thng tin NeLI. Mng Internet bao gm cc Websites s dng nhng loi cu trc khc nhau c coi nh l xng sng ca qu trnh xy dng chng. Tuy nhin, ngi dng th li duyt web da trn ni dung ca chng, khng quan tm n cu trc. Ti mc [33], chng ta s tho lun v kh nng s dng cc ontologies trong vic khm ph cc cu trc ca cc websites v vic s dng to ra cc gi v duyt web cho cc khch ving thm cc website . Mt h thng log c bit cho php thu thp c cc d liu truy cp s c gii thiu cng nh cc k thut c s dng cho vic khai ph d liu. Ontology ca cc h s ngi dng s c xy dng thng qua vic khai thc cc m hnh nh hng ngi dng. Hn na, vic b sung thm cc ng dng web vi cc d liu c nhn ha l mt mi quan tm chnh nhm ci thin cc truy cp ca ngi dng n cc ni dung xut bn, v v vy, c th m bo c hot ng duyt thng tin ca ngi dng c thnh cng. Trong mc [34], mt m hnh nh ngha cho cc khuyn ngh c nhn ha b sung da trn vic xy dng h s ngi dng, cc m hnh v lnh vc bn th hc (ontological domain models), v cc l do ng ngha a ra. Cch tip cn ny cung cp mt cch trnh by cp cao v cc ng dng c thit k da trn mt m hnh siu c t lnh vc (domain-specific metamodel) cho ng dng Web gi l WebML. Vic tch hp d liu s dng vi ni dung, cu trc hay d liu h s ngi dng s ci thin kt qu ca qu trnh c nhn ha. Trong mc [35], SEWep s c trnh by, l mt h thng to ra c hai ni dung l cc Logs v vic s dng cng nh ng ngha ca ni dung ca website nhm mc ch cc nhn ha n. Ni dung Web c gii thch mt cch ng ngha

15

s dng vic phn cp cc khi nim (taxonomy). C-logs s c gii thiu, y l mt dng m rng ca th vin log thng tin v qu trnh duyt Web, n s bao gm cc tri thc k tha t cc ng ngha ca cc lin kt. C-logs c s dng nh u vo cho qu trnh khai ph vic s dng Web, v a ra kt qu l tp hp cc khuyn ngh c tp trung mang tnh ng ngha v rng ln hn. Thch thc ca cc cng ngh khai ph Web ngh ngha trong lnh vc hc trc tuyn (eLearning) c th lin quan n vic d tr sn cc kinh nghim c c nhn ha n vi ngi dng. c bit l cc ng dng ny c th thu c cc nhu cu v yu cu mang tnh c nhn ca ngi hc. Trong mc [36], c xut mt b khung (framework) cho vic c nhn ha e-Learning da trn vic kt hp cc h s s dng v cc ontology v lnh vc ny. Nhng ngi vit phn bit hai giai on khc nhau trong c qu trnh, mt dnh cho cc nhim v offline bao gm chun b d liu, to ontology, v khai ph thng tin v s dng; v mt l dnh cho cc nhim v online lin quan n vic to ra cc khuyn ngh n ngi dng. Vic khai ph cc d liu v s dng Web c s dng mt cch hiu qu nh mt hng tip cn cho vic c nhn ha t ng v coi nh l cch vt qua cc thiu st ca cc cch tip cn truyn thng nh vic lc mang tnh cng tc. Bt chp s thnh cng ca cc vn , c cc h thng, m ging nh trong cc h thng truyn thng hn, khng a vo ti khon cc tri thc ng ngha v lnh vc lin quan. Khng c cc tri thc v ng ngha ny, cc h thng c nhn ha khng th gi c cc loi khc nhau ca cc i tng phc tp da trn cc thuc tnh c bn ca chng. Cc h thng ny va khng th c cc kh nng din gii mt cch t ng hoc l do v cc m hnh ngi dng hay cc khuyn ngh vi ngi dng. Vic tch hp cc tri thc ng ngha trong thc t l thch thc ch yu nht cho cc th h c nhn ha tip theo. Trong mc [37], s trnh by ni dung khi qut v cc cch tip cn trong vic sp nhp cc tri thc ng ngha vo qu trnh khai ph d liu s dng Web v cc qu trnh c nhn ha. c bit l s c cc tho lun v cc vn v cc c t cho vic tch hp thnh cng cc tri thc ng ngha t cc ngun khc nhau, v d nh ni dung v cu trc ca cc Websites s c s dng trong vic c nhn ha. V cui cng, s trinh by mt b khung chung cho vic tch hp y cc ontologies lin quan n mt lnh vc vi qu trnh khai ph vic s dng Web v cc qu trnh c nhn ha ti cc giai on khc nhau, bao gm cc vic tin x l v cc giai on khm ph mu, cng nh l trong giai on cui cng m ti cc mu khm ph s c s dng cho vic c nhn ha. III.Vic c nhn ha v nhng s bng n v truy cp Vic c nhn ha c th c xc nh nh thit k, qun l v cung cp ni dung da trn cc thng tin bit, quan st c hoc mang tnh cht d on. Cc k thut c nhn ha kt hp mt c nhn ngi dng, s thch ca anh/ch ta v cc thi quen truy cp trn website, vi cc ni dung da trn h s ca ngi dng . Trong th gii bng n thng tin hin nay, c rt nhiu cc cng ngh tng t nhau c s dng nh l mt cch lc v t chc cc d liu quan trng nht i vi h. Nu c thc hin chnh xc, vic c nhn ha cc kinh nghim ca mt khch ving thm cc trang web, s lm cho nhng thi gian m anh ta s dng trn cc website, hay cc ng dng c hiu qu v hp dn hn. Vic c nhn ha cng c th c gi tr i vi mt t chc, mt cng thng tin hay mt ca hng trc tuyn, v n s quyt nh cc kt qu kinh doanh mong i nh tng lng ngi dng phn hi, hoc lin quan n vic qung b thng tin n cc khch hng. Trong nghin cu ny, chng ti s c gng gii quyt trng hp bng n lng truy cp c nhn vo cc trang web. Rt nhiu kha cnh trong i sng hng ngy c m t trong s kin [27]. Mt s lng ln khng mong i cc s kin xy ra trong phm vi thi gian nht nh c gi l mt s bng n (burst), n s dn n cc hnh ng hoc cc tin trnh khng bnh thng. S bng n c th xy ra trong rt nhiu hon cnh hng ngy t kinh t n cc hin tng t nhin, v d nh hot ng bn hng hay s kin sao ri. Ph thuc vo mc

16

quan trng ca hin tng hay qu trnh quan st c, vic tm ra mt cch hiu qu cc s bng n l v cng cn thit. Mt cch c th, mt s bng n ph thuc vo phm vi thi gian m chng ta tp trung vo, cn gi l kch thc ca ca s. Cc s bng n cng xy ra i vi lu lng truy cp ca mt website, v nh hng n chc nng ca website trn rt nhiu kha cnh. Khi m ngy cng c nhiu cc doanh nghip thng mi tham gia vo cc hot ng trn mng, thc s l cn thit lm cho cc website ca h hp dn vi khch hng. Mt cch ci thin lu lng truy cp ca trang web l thc hin qung co trc tuyn trn cc my tm kim (search engine). Trong trng hp ny, bn cnh cc kt qu tm kim trn trang web tm kim, s c mt qung co c hin th thm vo. Mt vn pht sinh vi vic tr tin cho mi ln click vo lin kt l vic thc hin cc click gi. Mt ngi no c th s dng cc on m t ng hoc lp trnh m phng m s lng ln cc click ca mt trnh duyt ln mt link qung co. V d nhin, s lng click phi ln t ng lng tin mong mun. V vy, khi c mt s bng n cc click c th c coi l cc click gi. Trong bi bo ny, chng ti s x l vi trng hp c bng n lng ving thm i vi mt webpage, v lm th no mt ngi no c th thu c tri thc t thc t ny v tr gip trong vic c nhn ha web. Mt mu cc ving thm hay truy cp s c coi l bng n (bursty) khi chng xy ra vi cng ln qua mt giai on thi gian gii hn. C th, trong cc trng hp bng n, mt vi trang web s tr nn rt ph bin trong mt thi gian ngn v c truy cp rt thng xuyn trong mt khng gian thi gian gii hn. Cc mu cng c quan st trong mt s lng ln cc ng dng Internet vi s lng nghin cu [10]. Trong cc trng hp mu v tm kim web bng n, ngi dng c gng tm kim cc kt qu c th no thuc v cc ontology gii hn c quan tm trong mt khong thi gian ngn. Nh mt qu trnh lin tc, cn thit phi c mt k thut thu thp v lu gi hiu qu gi li nhng ontology c c nhn ha v cc kt qu thng xuyn ca ngi dng. Chng ta c mt tp cc ontology ca cc Webpages v mt s lng truy cp ngu nhin c thc hin n tt c cc Webpages bi cc ngi dng. Chng ta xc nh mt tp cc webpages c ngi dng mong mun khi cc webpages ny tr thnh cc trang c ving thm nhiu nht xc nh bi s lng ving thm c ghi li trong tng khong thi gian nht nh. C th hn, chng ta s m i vi mi webpage, xem c bao nhiu truy cp c thc hin t ln cui cng n c ving thm. Nu con s ny l xc nhn trang web ny c yu thch v thi gian m cc truy cp c thc hin tha mn, th cc mu truy cp ny c coi l s bng n v truy cp. V s qu ti ca cc webpages, vic x l cc ontology cung cp nhiu thun li hn, t khi n gip sng t c vn . Vi vic tng u ca s lng cc webpages, n tr thnh mt vn kh vi mt ngi dng c th nh v c nhng thng tin m mnh mong i trn mt website. ngi s dng c n gin, c rt nhiu cc website c th t chc cc webpages ca h thnh cc ontology nhm h tr trong vic tm kim mt webpage da trn vic n nh chng vi mt ontology. Do , mc ch ca chng ta l lm sao thu li c nhng li ch ca vic t chc cc Webpages thnh cc ontologies v s dng chng trong vic x l cc bng n v truy cp mt ontology xc nh ca Webpages. V d, chng ti gi s rng mt dng thng xuyn ving thm mt ontology nht nh ca mt website ca hng trc tuyn v v mc ch thng mi, anh ta ving thm ontology ca video v m thanh. Ti im ny, mt k thut c nhn ha web nhm x l cc bng n v truy cp, nn c cung cp ti ngi dng v cc ontology ca cc webpages m ngi dng la chn ra ontology v video v m thanh cng c ving thm. IV. Cc trang web v cc i tng bn th (Ontologies)

17

Trc khi m t thut ton v c nhn ha ca chng ti, chng ti s phi gii thch v cch m cc trang web c th c n nh tng ng vi cc i tng bn th (ontologies). Ontology trong khoa hc my tnh l i tng din t cc thc th, tng hay cc s kin, cng vi cc thuc tnh v mi quan h ca chng, tng ng vi mt h thng ontology c th. Chng ti s s dng cng c phn tch log ca Web c tn ORGAN, cung cp mt gii php tch hp trong vic phn tch qu trnh xy dng v thc hin, thc hin trn c cc ng ngha v ni dung ca site cng nh vic ving thm cc trang web. Thng tin v s thch ca ngi dng lin quan n cc ch ca website s c trch rt ra, sau c kt hp vi ORGAN nh l mt ng dng trong qu trnh ra quyt nh ca ngi qun tr v vic t chc li cu trc ca Website. Do , trc khi s dng thut ton c nhn ha ca chng ti trn site, chng ti s dng cng c ORGAN quy cc trang web ca website v cc Ontology ph hp. V. M t vn (bi ton) Chng ta s m t bi ton nh sau: chng ta c mt tp hp P ontologies ca cc webpages v N users. Mi Webpage thuc v mt ontology nht nh v mi h s ngi dng c lu trong mt cy m rng (splayed tree). Ging nh h s, chng ta nh ngha cc logfile ca cc webpages m ngi dng gh thm. Trong cy m rng, chng ta lu cc ontology ca cc webpages. Ty thuc theo cc thuc tnh ca cy m rng, hng mc c ving thm cui cng s c a v gc ca cy. Trong trng hp ca chng ta, chng ta s chnh sa li cy, v vy hng mc c truy cp thng xuyn nht s l hng mc c a v nt gc. Trong thc t, chng ta hng n m rng mt ontology khi chng ta quan st thy c s bng n v lng ving thm n n. Sau khi mt ontology c m rng v nt gc do c s bng n v truy cp, s khng cn phi cu trc li hay m rng li cy. V vy, cc ontology xut hin cc mc trn ca cy m rng ca mi ngi dng s l nhng hng mc thuc v s thch ca ngi dng .

18

T lc chng ta gi h s ca mi ngi dng, chng ta mong mun xy dng c cu trc d liu m s lu cc ontology ph bin nht m c nhiu ngi ving thm nht website A cng mong mun c ving thm nht. V vy, i vi mi ontology, chng ta xy dng mt hng i. Hng i m chng ta la chn cho cc mc ch ni trn c s dng l cc ng nh phn (binary heap). Mi mt ontology gi mt ng nh phn vi cc ontology khc v mc ph bin ca chng. Ty thuc vo cc thuc tnh ca hng i u tin, nt gc, chng ta lu gi tr key nh nht v do chng ta c th truy cp chng trong thi gian l O(1). Chng ta lu lng ph bin ca mi ontology bng du (-), nhm gi c gi tr ln nht trong nt gc ca hng i u tin. Mi ln chng ta quan st thy c s bng n v ving thm n mt topology A, chng ta s tng mt n v m ca ph bin ca ontology ny trong tt c cc hng i u tin ca cc ontology tm thy trong mc cao nht ca cy m rng ca mi ngi dng xc nh. Ngoi ra chng ta cn tng lng m ca cc ontology ny trong hng i u tin ca ontology A. Do , chng ta c th rt ra c ontology no l ph bin nht i vi ngi dng m ving thm ontology A trong mt thi gian c nh. Ph thuc vo iu kin nu trong cy m rng, chng ta lu cc ontology ca Webpages, c th s c mt cch tip cn khc. Trong trng hp Webpages, m bo rng, c t nht mt phn ln cc trang ca mt ontology s c m rng v n nt gc, chng ta c th tun theo k thut di y. Chng ta gi s rng mt trang x c truy cp thi gian k, xc nh rng trang ny l trang c truy cp thng uyn nht. Khi , t khi nt cha ca nt ny thuc v cng ontology, chng ta s duyt cy t di ln. Ngng trn ca s lng ca cc cp m

19

chng ta duyt ln s ph thuc vo s lng ca cc ontologies v Webpages. Cho z l nt c duyt trc cui cng ca nt x thuc v cng ontology vi x. Khi chng ta s m rng tt c cc nt thuc v cy con ca n v vi nt gc. Tip theo, chng ta mun phn bit/ nh du cc ontology c ving thm nhiu nht bi ngi dng. Cc ontologies gn vi nt gc ca cy, l cc nt c a thch bi ngi dng. V vy, vn duy nht cn li quyt nh l su ca cy, gi tr s th hin gii hn, pha trn tt c cc ontologies u c coi l c ngi dng a thch. Thut ton 1. Khi (Webpage A ca ontology W c truy cp bi ngi dng. /* Thu thp cc thng tin t file log ca webpage*/ 2. Nu (truy cp ny to ra mt bng n v truy cp n ontology W) 3. Th /* Sp xp li cy splay ca ngi dng, t ontology vi mu truy cp bng n cui cng s c a v nt gc*/ 4. Di chuyn (splay) ontology W v nt gc ca cy m rng ca ngi dng /* Cp nht hng i u tin ca cc ontologies nhm a ra c cc ontologies ph bin ca ngi dng*/ 5. nh ngha tp hp cc ontologies, TOP, tn ti trong cc level cao nht ca cy m rng 6. Tng s m ca W trong tt c cc hng i u tin ca cc ontologies thuc v TOP 7. Tng s m ca cc ontologies thuc v TOP trong hng i u tin ca W 8. Tr v nh khuyn ngh nt gc ca hng i u tin ca W 9. endif 10. else 11. continue VI. Phn tch A. Yu cu v khng gian Khi chng ta quan tm n phc tp v khng gian, y khng gian, theo mong i s ch yu l khng gian c to ra bi hai cu trc d liu. - Cc cy Splay: Chng ta cn mt cy splay cho mi ngi dng. Trong trng hp xu nht, ti mi mt cy Splay chng ta s lu W webpages. V vy, nu da trn cc trng tng thm cn thit cho mi nt ca cy Splay, khng gian cn thit s l 5.N.W - Hng i u tin: i vi mi ontology, chng ta s dng mt hng i u tin. V vy, i vi P ontologies, chng ta s mt mt khng gian l O(P2). B. Yu cu v thi gian Nu lin quan n phc tp v thi gian, mi mt truy cp s cn: - .log(/), cho nt ca cy Splay. Gi tr ny cng chnh l .log(#pages). - Chng ta cn thi gian O(1) quay tr li nt gc ca cy t mi hng i u tin. V vy s mt N.O(1) c th gii thiu mt ontology n N ngi dng. Cui cng, chng ta cn cp nht cc hng i u tin. Ni cch khc, trc khi gii thiu nt gc ca mt hng i u tin ca ontology, chng ta phi tng cc kha ca cc

20

ontology m chng ta tm thy trong s thch ca ngi dng, nu h hon ton nm trong hng i u tin ca ontology c m rng (splayed). Cui cng, chng ta phi tng kha ca ontology c m rng , trong tt c cc hng i u tin ca mc cao nht ca cy m rng ca ngi dng. Ngha l nu tnh tng, ta s mt k.logP thi gian. VII. Tng kt v cc cng vic trong tng lai Cc thut ton v khuyn ngh v c nhn ha c mc tiu l gii thiu cc trang web n ngi dng da trn ni dung m h ang truy cp v cc mu duyt web trong qu kh ca h. Trong bi bo ny, chng ti a ra mt k thut c nhn ha web, da tren cc cu trc d liu tin tin. Khi nim chnh ca vic ny l i ph vi trng hp bng n v lng truy cp n mt trang web thng qua vic xy dng mt thut ton c tc dng gii thiu n nhng ngi ving thm trang web ca mt i tng ontology c th ca cc trang web A, cc i tng ontologies ca trang web m nhng ngi ving thm A trc cng mong mun c duyt qua. Cc cu trc d liu c s dng l Cy Splay (1) v Cc ng nh phn (Binary heaps) (2). Chng ti m t kin trc ca k thut cng nh phn tch phc tp v khng gian v thi gian. Gii php ca chng ti t c chnh xc v khng gian l O(P 2) v chy mt thi gian l k.logP trong k l s lng cc trang v P l s lng ca cc ontologies. Cc bc nghin cu tip theo trong tng lai bao gm vic ci tin cc thut ton nhm a vo ti khon ngi dng nhng phn hi khng r rng ca ngi dng v cc la chn sn phm cui cng, khng ch l cc ca hng hay cc dch v trc tuyn. y l trng hp c bit hiu qu cho cc hot ng thc hin v kinh doanh trc tuyn (e-businesses) da trn cc dch v Web di ng RESTful gn nh.

21

KT LUN
Splay tree c cc tc gi D.D.Sleator v R.E.Tarjan a ra nm 1983. Splaying cy nhm mc ch gim bt tng thi gian truy cp d liu bng cch dch chuyn cc d liu c thng xuyn truy cp ln gn gc cy, v v vy s truy cp ti cc d liu s nhanh hn. u im ca Splay cy l chng ta khng cn lu thng tin v s cn bng ca cc nh, v do , tit kim c b nh v s ci t cng n gin hn. Vi thi gian co han nn cha th nghin cu su cung nh tim hiu nhiu hn na cac ng dung cua splay tree. Rt mong nhn c s ong gop y kin cua thy giao va cac hoc vin khac.

22

TI LIU THAM KHO


[1]. Gio trnh thut ton. NXB Thng k 2002. Nhm Ngc Anh Th dch [2]. Slide Bi ging mn hc Cu trc d liu nng cao. TS Nguyn Mnh Hng [3]. Ti liu COMP670 online Algorithm Self-organized Splay Tree. Hung Lau Yung [4]. Handbook of Data Structures and applications. 2005. Dinesh P.Mehta v Sartaj Sahni. [5]. Cc cu trc d liu cao cp. Website congdongCviet.com

23

You might also like