You are on page 1of 39

Cu 1. Khai ph d liu l g? Phn bit d liu v tri thc.

V d
1.1. Khai ph d liu: Khai ph d liu l vic trch lc nhng thng tin c ch (khng hin nhin, khng tng minh, khng bit trc v c ch mt cch tim nng) v nhng mu d liu trong cc c s d liu ln. Khai ph d liu cn c mt s tn gi khc: - Khm ph tri thc trong c s d liu(KDD) - Trch lc tri thc - Phn tch mu/ d liu - Kho c d liu - Tri thc kinh doanh(business intelligence) v cn nhiu tn khc t dng Mt ng dng khai ph d liu thng bt u bng vic hiu cc min ng dng bng phn tch d liu(khai ph d liu), sau xc nh cc ngun d liu ph hp v d liu ch. Vi d liu, khai ph d liu c th thc thi bng cch tin hnh 3 bc chnh: - Tin x l: D liu th thng khng thch hp khai ph do nhiu l do khc nhau nh l d liu c th qu ln hoc bao gm nhiu thuc tnh khng thch hp,. N cn c lm sch b ting n v cc d thng. - Khai ph d liu: D iu c x l sau c cung cp ti mt thut ton khai ph d liu , ci m s to ra cc mu v tri thc - Hu x l:Trong rt nhiu trng hp, khng phi tt c cc mu c khm ph u hu dng. Bc ny s xc nh nhng mu c ch cho ng dng. Cc cng ngh nh gi v trc quan khc nhau s c s dng thc hin la chn ny. 1.2. Phn bit d liu v tri thc: D liu l nhng con s hoc d kin thun ty, ri rc m quan st hoc o m c khng c ng cnh hay din gii. D liu c th hin ra ngoi bng cch m ha v d dng truyn ti. D liu c chuyn thnh thng tin bng cch thm gi tr thng qua ng cnh, phn loi, tnh ton, hiu chnh v nh gi. Tri thc l thng tin c cu trc ha, c kim nghim v c th s dng c vo mc ch c th. Tri thc thng th hin trong nhng hon cnh c th kt hp vi kinh nghim v vic phn quyt hay ra quyt nh.

Cu 2 : khng chc chn

Tin trnh khm ph tri thc trong CSDL gm cc bc lp i lp li : Data clearning: lm sch d liu, pha ny s loi b cc d liu nhiu hay cc d liu khng thch hp ra khi database Data intergration: pha tch hp d liu , nhiu ngun d liu thng khng ng nht c th c tch hp trong 1 ngun chung Data selection: la chn d liu, bc ny cc d liu lin quan n vic phn tch c quyt nh v ly t ngun tp hp d liu Data transformation: chuyn i d liu, l giai on m d liu c chn s chuyn i thnh cc kiu thch hp cho cc th thc khai thc Data mining: khai thc d liu. l bc quan trng , dc p dng cho vic trch xut cc mo hnh c kh nng hu ch Pattern evaluation: mu nh gi, bc ny , cc mu tng trng cho tri thc c xc nh da trn cc bin php c a ra Knowledge representation: l giai on cui cng , trong tri thc pht hin c biu din trc quan cho ngi s dng. Bc ny s dng k thut trc quan gip ngi dng hiu v gii thch cc kt qu khai tahcs d liu.

Cu 3: Cc loi d liu v cch biu din/x l: D liu quan h, XML, d liu nh, d liu a phng tin, d liu web V d
Cc loi d liu c khai ph D liu hng c s d liu (database-oriented data) o D liu quan h (relational database) o Kho d liu (data warehouse) o D liu giao dch (Transactional database) D liu nng cao(Advanced data) o D liu cm bin (sensor data) o Data streams (lung d liu) o Sequence data o Structure data o D liu ph thuc khng gian (Spatial data) o D liu quan h hng i tng (Object-relational) o D liu nh (graphs) o D liu a phng tin (multimedia) o D liu vn bn (Text) o D liu web C nhiu k thut c s dng biu din / x l d liu : da trn c s d liu, phn tch kho d liu, hc my, thng k, hin th ha,

Cu 5. Mu tun t l g? Cc khi nim v cc thut ton ? V d.


5.1 Khai ph mu tun t (Sequential Pattern Mining): Mu tun t l g? Bi ton khai ph lut kt hp khng quan tm n th t cc sn phm xut hin trong cc giao dch ca khch hng (hay th t cc s kin xy ra). Tuy nhin yu t thi gian, hay th t li tr nn rt quan trng trong nhiu ng dng nh phn tch cc mu truy cp trang Web (Web access pattern), phn tch th nghim khoa hc, php iu tr bnh dch, d bo thm ho t nhin, phn tch cc dy DNA, bn thn trong ng dng phn tch thi quen mua sm ca khch hng, vic d bo c th t mua cc sn phm s mang li li th ln hn cho vic thc hin tip th chn lc, ng thi cng mang li li ch ln hn cho ngi dng. VD, trong phn tch th trng : ngi ta s mua ging trc sau mi mua ga tri ging

V d 1: Trong website bn sch trc tuyn, khi khch hng chn xem cun Oracle Database 11g PL/SQL Programming (Osborne ORACLE Press Series), h thng s a ra khuyn ngh rng trc khi mua cun sch ny, khch hng nn mua trc cun Oracle Essentials: Oracle Database 11g c kin thc c bn cn thit. Cun Oracle Database 11g PL/SQL Programming (Osborne ORACLE Press Series) nn mua km vi cun Oracle Database 11g SQL (Osborne ORACLE Press Series). Cui cng, khch hng nn mua tip cun Oracle Database 11g The Complete Reference (Osborne ORACLE Press Series)v Oracle PL/SQL Programming Covers Versions Through Oracle Database 11g Release 2 nng cao kin thc. T li ch , bi ton Khai ph mu tun t (sequential pattern mining) c Rakesh Agrawal v Ramakrishnan Srikant gii thiu vo nm 1995: Cho mt tp cc dy tun t, trong mi dy bao gm mt tp cc giao dch, v mi giao dch bao gm mt tp cc phn t, cng mt ngng min_support cho trc bi ngi s dng, khai ph mu tun t tm ra tt c cc phn dy(subsequence) ph bin, l dy tun t xut hin trong tp d liu tun t vi tn s khng nh hn min_support. V d 2: T mt CSDL lch s giao dch ca mt ca hng bn sch, ta c th tm ra cc mu mua sch tun t ph bin (frequent sequential purchasing patterns), nh 80% khch hng mua cun Database Management cng mua km vi cun Data Warehouse v sau mt khong thi gian, h s tr li mua cun Web Information System. Trong v d trn, tt c cc cun sch khng cn phi c mua ti cng mt thi im hoc lin tip nhau, iu quan trng nht l th t m cc cun sch c mua v chng phi c mua bi cng mt khch hng. Con s 80% y ch phn trm khch hng mua 3 cun sch vi th t chnh xc nh th. Mu tun t c th c s dng rng ri trong nhiu lnh vc khc nhau, nh phn tch mu truy cp ca ngi s dng trn cc website, hay s dng lch s cc triu chng d on mt loi bnh dch no , cng c th bng khai ph mu tun t, ngi bn hng kim sot kho hng mt cch hiu qu hn. Cc mu tun t ch ra s tng quan gia cc giao dch trong khi cc lut kt hp biu din nhng mi quan h trong ni b ca giao dch. i vi Khai ph lut kt hp trn CSDL giao dch, kt qu khai ph cho bit nhng sn phm no thng xuyn c mua cng nhau, cc sn phm bt buc phi thuc cng mt giao dch. Trong khi kt qu ca Khai ph mu tun t cho bit cc sn

phm c mua theo mt th t nht nh bi cng mt khch hng, cc sn phm ny c th thuc nhiu giao dch khc nhau. 5.2 Cc khi nim: Cho I = {i1, i2,, in-1, in} l tp tt c cc sn phm. Mt dy (sequence) l mt danh sch cc giao dch (nhm sn phm/ transation) theo th t thi gian (nhc li: mt nhm sn phm X l mt tp cc sn phm khc rng v l tp con ca I). - K hiu mt dy s = <t1, t2,, tn-1, tn>, trong ti l mt giao dch. - K hiu mt giao dch ti = {x1, x2,, xn-1, xn}, trong xi l mt sn phm thuc I. Khng mt tnh tng qut, c th gi s cc sn phm trong mt giao dch ca mt dy c sp xp bng m ca chng theo th t t in. Mt sn phm ch c th xut hin mt ln trong mt giao dch, tuy nhin c th xut hin nhiu ln trong mt dy.
- Kch thc (size) ca mt dy l s lng cc giao dch ca dy . - di (length) ca mt dy l s lng cc sn phm ca dy . Mt dy c di k c gi l k-sequence. Nu mt sn phm xut hin nhiu ln trong mt dy, chng vn c tnh thnh nhiu ln cho di ca dy. - Mt dy s1=<a1, a2,, an-1, an> c coi l mt dy con (subsequence) ca dy s2 = < b1, b2,, bn-1, bn>, hay s2 l mt dy cha (supersequence) ca s1 nu tn ti cc s nguyn: 1 j1< j2< < jk-1< jk Khi ta cng ni s2 cha (contain) s1. v sao cho a1 bj1, a2 bj2, ar bjr.

V d 3: Cho I = {1, 2, 3, 4, 5, 6, 7, 8, 9}. Dy <{3}{4,5}{8}> l dy con ca <{6}{3, 7}{9}{4, 5, 8}{3, 8}> v {3} {3, 7}; {4, 5} {4, 5, 8}; {8} {3, 8}. Tuy nhin <{3}{8}> khng phi l dy con ca <{3, 8}>. D liu u vo ca bi ton Khai ph mu tun t l mt CSDL dy tun t (sequence database) S, hay mt tp cc bn ghi c dng [sid, s] vi sid l m ca dy kho chnh v s l d liu ca dy, v d: [2, <{30} {90}>]. Thng sid chnh l m khch hng bi mi bn ghi y chnh l tng hp tt c cc giao dch ca mt khch hng duy nht.
- Support count ca mt dy s (k hiu s.count) l s dy trong CSDL tun t S c cha s. Gi tr support ca mt dy s: s.support = s.count/ n (n l s dy d liu trong CSDL tun t S).

- Sequential pattern: Mt dy s c coi l mt mu tun t (sequential pattern) hay mt dy ph bin (frequent sequence) nu n c gi tr support ngi s dng a ra). - Sequential rule: Mt lut tun t (sequential pattern rule) c dng: X Y, vi Y minsup (do

l mt dy ph bin v X l mt dy con ca Y (Y cha X v Y c di ln hn X). Tnh tin cy ca mt lut tun t cng c i din bi 2 gi tr support v confidence nh lut kt hp. Tuy nhin cng thc tnh gi tr support ca mt lut tun t l:

(1)

v X l dy con ca Y, nn bt c dy d liu no trong S cha Y th cng cha X. Tng t, cng thc tnh gi tr confidence ca mt lut tun t l:
= (2)

Mc tiu ca bi ton Khai ph mu tun t: Cho mt tp S gm cc bn ghi dng d liu dy tun t (data sequences) hay CSDL tun t (sequence database), bi ton Khai ph mu tun t tm ra ton b cc dy tho mn ngng minsup cho trc ca ngi s dng. Mi dy c gi l mt dy ph bin (frequent sequence), hay mt mu tun t (sequential pattern). Gi tr support ca mt dy l phn trm s dy d liu (data sequence) trong S cha dy . C th thy mc tiu bi ton Khai ph mu tun t ch tng ng vi bc tm ra cc nhm sn phm ph bin (frequent itemset) trong bi ton Khai ph lut kt hp, vic sinh ra cc lut tun t (sequential rule) t cc mu tun t tm c khng c cp ti v i vi nhiu ng dng ch cn dng ti mu tun t. Tuy nhin trong ng dng demo ca n v mt h thng thng mi in t, bc sinh ra cc lut tun t l cn thit cho vic to ra cc khuyn ngh cho khch hng. V d 4: Mt v d v ng dng phn tch gi hng: mi dy trong ng dng ny biu din mt danh sch cc giao dch theo th t ca mt khch hng c th. Mi giao dch l mt tp cc sn phm m khch hng mua ti cng mt thi im. Th t ca cc giao dch trong dy tun theo th t thi gian.

Bng 1 biu din mt CSDL giao dch c sp xp theo m khch hng (kho chnh) v thi gian thc hin giao dch (kho ph). Bng 2 biu din cc data sequences (hay cn gi l customer sequence) c to thnh t d liu ca Bng 1. Bng 3 biu din cc mu tun t thu c sau khi p dng 1 thut ton Khai ph mu tun t trn d liu Bng 2 vi ngng minimum support l 25%: Customer Transaction Transaction (items ID Time bought) 1 July 20, 2005 30 1 July 25, 2005 90 2 July 9, 2005 10, 20 2 July 14, 2005 30 2 July 20, 2005 10, 40, 60, 70 3 July 25, 2005 30, 50, 70, 80 4 July 25, 2005 30 4 July 29, 2005 30, 40, 70, 80 4 August2, 90 2005 5 July 12, 2005 90 Bng 1 Mt tp giao dch c sp xp theo m khch hng v thi im giao dch Tt c cc giao dch ca mi khch hng c nhm li v sp xp theo trnh t thi gian to thnh mt bn ghi <sid, s> vi sid chnh l Customer ID, tp hp cc bn ghi ny to nn sequence database: Customer ID 1 2 Data Sequence

<{30} {90}> <{10, 20} {30} {10, 40, 60, 70}> 3 <{30, 50, 70, 80}> 4 <{30} {30, 40, 70, 80} {90}> 5 <{90}> Bng 2 Sequence database c to thnh t cc giao dch ca Bng 1 Sau khi p dng mt thut ton Khai ph mu tun t ln CSDL tun t trn, ta thu c cc mu tun t:

- di 1: <{30}> (support = 4/5); <{40}> (support = 2/5); <{70}> (support = 3/5); <{80}> (support = 2/5); <{90}> (support = 3/5). - di 2: <{30} {40}> (support = 2/5);<{30} {70}> (support = 2/5);<{30} {90}> (support = 2/5);<{30, 70}> (support = 2/5);<{40, 70}> (support = 2/5); <{70, 80> (support = 2/5). - di 3: <{30} {40, 70}> (support = 2/5);<{30, 70, 80}> (support = 2/5).

Length 1sequence s 2sequence s

Sequential Patterns with support 25% <{30}>, <{40}>, <{70}>, <{80}>, <{90}> <{30} {40}>, <{30} {70}>, <{30}, {90}>, <{30, 70}>, <{30, 80}>, <{40, 70}>, <{70, 80}> <{30} {40, 70}>, <{30, 70, 80}>

3sequence s Bng 3 Kt qu u ra ca qu trnh khai ph mu tun t t Bng 2 Vi ngng tin cy minconf = 1, lut tun t c th rt ra t tp mu tun t trn l: <{40}><{30} {40, 70}> (<{40}>.support = <{30}{40, 70}>.support = 1) hoc <{80}><{30, 70, 80}> (<{80}>.support = <{30, 70, 80}>.support = 1) Cng ging nh cc thut ton khai ph lut kt hp, cc thut ton khai ph mu tun t d c hiu sut s dng b nh v tc x l khc nhau tuy nhin kt qu u ra u l mt tp mu tun t ging nhau, tun theo nh ngha chung v mu tun t v tho mn gi tr minsup ca ngi s dng.
5.3 Cc thut ton 5.3.1 Thut ton GSP

GSP l thut ton tiu biu cho hng tip cn da trn Apriori, dnh cho bi ton Khai ph mu tun t. Cch thc hot ng ca thut ton GSP hot ng rt ging vi Apriori:

Duyt qua CSDL k ln, ti mi ln duyt tm ra tt c cc dy ph bin c di k (k-sequence). S dng tp Ck lu tr cc dy ng c di k, cc dy c gi tr support tho mn ngng minsup c a vo tp Fk lu tr cc dy ph bin di k. S dng mt hm sinh tp ng c Ck t tp ph bin Fk-1 ca bc trc candidate-gen-SPM(Fk-1). Thut ton kt thc khi hm candidate-gen-SPM(Fk-1) khng sinh ra c dy ng c di k no.

Thut ton GSP


1. C1 init-pass(S); // Ln duyt u tin qua CSDL tun t S 2. F1 {f | f C1, f.count/n giao dch trong T 3. for ( k = 2; Fk-1 sau 4. Ck minsup}; // n l s

; k++) do // Cc ln duyt T

candidate-gen-SPM(Fk-1);

5. for each transaction s inSdo 6. for each candidate c in Ckdo 7. if t contain c then 8. c.count++; 9. endfor 10. endfor 11. Fk {c Ck | c.count/n
k

minsup}

12. endfor 13. return F ;

Bng 4 Thut ton GSP im khc nhau chnh gia 2 thut ton GSP v Apriori l hm candidate-gen-SMP(Fk-1) ca GSP thay th cho candidate-gen(Fk-1) ca Apriori. Hm candidate-gen-SMP ca GSP cng c 2 bc: Joining Step kt hp cc dy ph bintrong Fk-1 to thnh cc dy ng c di k a vo Ck v Pruning Step kim tra tnh hp l ca cc dy ng c vi tnh cht Apriori. Tuy nhin bc Joining Step kt hp 2 dy s1 v s2 ch khi chng tha mn iu kin: dy con thu c khi b i sn phm u tin ca s1 ng bng vi dy con thu c khi b i sn phm cui cng ca s2. Khi dy kt qu s3c to thnh bng cch thm sn phm cui cng ca s2 vo cui s1, vic kt hp ny c th cho 2 kt qu khc nhau:

Nu sn phm cui cng ca s2 l mt giao dch, n tr thnh giao dch cui cng ca s3. Nu sn phm cui cng ca s2 l mt sn phm nm trong mt giao dch, n tr thnh mt mt hng trong giao dch cui cng ca s3.

V d 5: Minh ho bc kt hp ca hm candidate-gen-SMP(F k1): Gi s trong F3c s1 = <{a,b}{c}>v s2 = <{b}{c}{d}>, b i sn phm u ca s1 hay sn phm cui ca s2 ta u thu c dy con s=<{b},{c}>. Vy s1 v s2 c th kt hp vi nhau to ra dy ng c di 4:s3 = <{a,b}{c}{d}>. Ch : Nu s2 = <{b}{c,d}> (d khng phi l 1 giao dch m l 1 sn phm), b i sn phm d cui s2 ta vn thu c dy con s=<{b},{c}>, tuy nhin khi kt hp s thu c s3 c dng khc:s3 = <{a,b}{c,d}> (d tr thnh mt sn phm thuc giao dch cui cng ca s3 m khng phi l mt giao dch ring l nh trng hp trc). V d 6: Minh ha cho bc xn ta ca candidate-gen-SPM(Fk-1): F3 (Tp dy ph bin di 3 (3- C4 (Tp dy ng c di 4 (4sequences)) sequences)) Kt hp Xn ta <{1, 2}{4}> <{1, 2}{4, 5}> <{1, 2}{4, 5}> <{1, 2}{5}> <{1, 2}{4} {6}> <{1}{4, 5}> <{1, 4}{6}> <{2}{4, 5}> <{2}{4}{6}> Bng 5 Minh ho xn ta tp ng c C th thy bc kt hp sinh ra C 4 c thc hin bi hm candidate-gen-SPM(F3):
<{1,2}{4}> kt hp <{2}{4,5}> c <{1,2}{4,5}> <{1,2}{4}> kt hp <{2}{4}{6}> c <{1,2}{4}{6}>

Bc xn ta loi b dy ng c di 4:<{1,2}{4}{6}> do vi phm tnh cht Apriori: Khng c dy ph bin di 3no chaphn dy<{1}{4}{6}> hoc <{1,2}{6}>. V d 7: Hot ng ca ton b thut ton GSP: Gi s d liu u vo l CSDL tun t sau: Customer ID 1 2 Sequence Data <{30} {90}> <{10, 20} {30} {10,

3 4 5

40, 60, 70}> <{30, 50, 70, 80}> <{30} {30, 40, 70, 80} {90}> <{90}>

Vi gi tr minsup = 25%, sau ln duyt u tin, ta c tp dy ph bin di 1 F1 sau (gi tr support count c ghi bn cnh dy): F1 = {<{30}>:4, <{40}: 2>, <{70}: 3>, <{80}: 2>, <{90}: 3>} Thc hin cc ln duyn tip theo, ta c kt qu: C2 = { <{30}{30}>, <{30, 30}>, <{30}{40}>, <{30, 40}>, <{30} {70}>, <{30, 70}>, <{30}{80}>, <{30, 80}>,<{30}{90}>, <{30, 90}>, <{40}{30}>, <{40, 30}>, <{40}{40}>, <{40, 40}>, <{40} {70}>, <{40, 70}>, <{40}{80}>, <{40, 80}>, <{40} {90}>, <{40, 90}>, <{70}{30}>, <{70, 30}>, <{70}{40}>, <{70, 40}>, <{70} {70}>, <{70, 70}>, <{70}{80}>, <{70, 80}>, <{70} {90}>, <{70, 90}>, <{80}{30}>, <{80, 30}>, <{80}{40}>, <{80, 40}>, <{80} {70}>, <{80, 70}>, <{80}{80}>, <{80, 80}>, <{80} {90}>, <{80, 90}>, <{90}{30}>, <{90, 30}>, <{90}{40}>, <{90, 40}>, <{90} {70}>, <{90, 70}>, <{90}{80}>, <{90, 80}>, <{90} {90}>, <{90, 90}> } F2 = { <{30}{90}>: 2, <{30}{40}: 2>, <{30}{70}: 2>, <{30, 70}: 2>, <{30, 80}: 2>, <{40, 70}: 2>, <{70, 80}: 2> } C3 = {<{30}{40, 70}>, <30, 70, 80>} F3 = {<{30}{40, 70}>: 2, <30, 70, 80>: 2} Thut ton kt thc do C4 = Ch : Cc dy ng c sau y b loi khi C3 sau bc xn ta: <{30}{40}{70}>, vi phm Apriori: <{40}{70}>

<{30}{70}{80}>, vi phm Apriori: <{70}{80}> <{30}{70, 80}>, vi phm Apriori: <{30}{80}> <{30, 70}{80}>, vi phm Apriori: <{70}{80}> <{40, 70}{80}>, vi phm Apriori: <{70}{80}> <{40, 70, 80}>, vi phm Apriori: <{40, 80}> 5.3.2 Thut ton PrefixSpan
Mt s khi nim ring ca PrefixSpan

Ngoi h khi nim c bn ca bi ton Khai ph mu tun t, PrefixSpan s dng thm mt s khi nim ring: Cho 2 dy = <a1a2an> v = <b1b2bm> (vi ai v bi l mt giao dch) v m n.

- Prefix (tin t): dy c coi l tin t ca khi v ch khi: + bi = ai i m-1 + bm am + Mi sn phm trong khong t (am - bm) u c sp xp theo th t t in tnh t sn phm bm+1. V d: =<{a}{abc}{a}> l tin t ca =<{a}{abc}{ac}{d}{cf}> tuy nhin =<{a}{abc}{c}> khng phi l tin t ca .

- Projection (hnh chiu): cho 2 dy v , trong l mt phn dy ca . Mt phn dy c coi l hnh chiu ca theo tin t khi v ch khi:
+ c tin t l . + Khng tn ti dy no cha m l mt phn dy ca v c cha tin t .

V d: =<{a}{abc}{ac}{d}{cf}>, =<{bc}{a}> th = <{bc} {ac}{d}{cf}> l mt hnh chiu ca theo tin t .


- Postfix(hu t): cho dy = <a1a2an> l hnh chiu ca dy theo tin t = <a1a2am> (m n). Dy = < amam+1an> c gi l hu t ca theo tin t , k hiu: = / hoc = . trong am = (am am). V d: =<{a}{abc}{ac}{d}{cf}> l mt hnh chiu ca theo tin t =<{a} {abc}{a}> th = <{_c}{d}{cf}> l hu t ca theo tin t . Ch : K hiu _ ghi nh tin t cng nm trong mt giao dch vi sn phm c, yu t ny rt quan trng cho qu trnh hot ng ca thut ton. - Projected database(CSDL hnh chiu): cho l mt mu tun t trong CSDL tun t S, -projected database (CSDL hnh chiu theo ), k hiu S| l tp tt c cc hu t ca cc dy trong S theo tin t .

V d: Cho = <{30}>v CSDL tun t S:

Customer ID 1 2 3 4 5

Sequence Data <{30} {90}> <{10, 20} {30} {10, 40, 60, 70}> <{30, 50, 70, 80}> <{30} {30, 40, 70, 80} {90}> <{90}>

CSDL hnh chiu ca S theo (-projected database) l:

Customer ID 1 2

Sequence Data

<{90}> <{10, 40, 60, 70}> 3 <{_50, 70, 80}> 4 <{30, 40, 70, 80} {90}> Bng 6 CSDL hnh chiu
Ni dung thut ton PrefixSpan

Thut ton PrefixSpan Input Mt c s d liu tun t S, v gi tr ngng minsupdo ngi s dng t ra Output Tp y cc mu tun t F Method Chng trnh chnh: Gi hm PrefixSpan(<>, 0 , S) Subrountin Hm PrefixSpan(, l, S|) e Pamameters: ngha cc tham s
1. : mt mu tun t 2. l: di ca mu tun t (khi ci t bng ngn ng hng i tng, tham s ny khng cn thit) 3. S|: Hnh chiu ca CSDL S vi theo .

Method:Ni dung hm PrefixSpan(, l, S|)

1. Qut qua S| mt ln, tm tt c cc sn phm ph bin x tho mn:

(a) x c th thm vo giao dch cui cng ca t thnh mt mu tun t mi hoc (b) giao dch {x} c th thm vo cui dy to thnh mt mu tun t mi
2. Vi mi sn phm ph bin x, thm n vo to thnh mt mu tun t mi, a mu ny vo tp kt qu F 3. Vi mi mu tun t , xy dng CSDL hnh chiu (-projected database) S| v gi quy hm PrefixSpan(, l+1, S|).

Bng 7 Thut ton PrefixSpan Vy lm cch no tm c tt c cc sn phm ph bin x tho mn iu kin nhu trong bc 1 ca hm PrefixSpan? v c th cch xy dng CSDL hnh chiu bc 3 nh th no? V d sau y s lm r nhng vn ny. V d 8: Hot ng ton b ca thut ton PrefixSpan:Cho minsup = 25% v mt CSDL tun t S nh Bng 2: Customer ID 1 2 3 4 5 Sequence Data <{30} {90}> <{10, 20} {30} {10, 40, 60, 70}> <{30, 50, 70, 80}> <{30} {30, 40, 70, 80} {90}> <{90}>

Ln gi hm u tin PrefixSpan(<>, 0 , S): Bc 1: Tm tt c cc mu tun t di 1 (chnh l cc sn phm ph bin):


Sn phm support count <{3 0}> 4 <{4 0}> 2 <{7 0}> 3 <{8 0}> 2 <{9 0}> 3

Bc 2: a tt c cc mu tm c vo tp kt qu F. Bc 3: Vi mi mu tm c, xy dng CSDL hnh chiu v gi quy hm PrefixSpan (, l+1, S|). Xt mu tun t <{30}>:

<{30}>-projected database S|30 Sequence Sequence Data ID 1 <{90}> 2 <{10, 40, 70}> 3 <{_70, 80}> 4 <{30, 40, 70, 80} {90}> Bng 8 CSDL Hnh chiu S|30 Ch khi thc hin php chiu tin t ln mt dy:
- Dy 1:Chiu <{30}> ln dy <{30}{90}> => loi b giao dch {30}, hu t thu c l <{90}> - Dy 2: Chiu <{30}> ln dy<{10, 20} {30} {10, 40, 60, 70}> => loi b cc giao dch t {30} tr v trc, hu t thu c l <{10, 40, 60, 70}>. Tuy nhin sn phm 60 b xo b do <{60}> khng nm trong tp cc mu ph bin di 1 tm c bc 1 => n s khng c c hi xut hin trong mt mu ph bin kt qu no (theo tnh cht Apriori). - Dy3: Chiu <{30}> ln dy <{30, 50, 70, 80}> => Loi b sn phm 30 trong giao dch, v tr ca sn phm c nh du bng k t _, hu t thu c l <{_70, 80}>, sn phm 50 khng ph bin b loi b. - Dy 4: Tng t dy 1. - Dy 5: <{90}> khng cha tin t <{30}> => hu t ca dy ny theo tin t <{30}> l , b loi ra khi CSDL hnh chiu.

Gi quy hmPrefixSpan (<{30}>, 1, S|30): Bc 1:Tm tt c cc sn phm ph bin:


Sn phm support count <{_7 0}> 4 <{4 0}> 2 <{7 0}> 2 <{_8 0}> 2 <{9 0}> 2

C th cch tnh support count cho cc sn phm: tm ra tt c cc mu tun t c tin t l {30}, thut ton s m rng dn tin t ny bng cch ln lt thm vo 1sn phm ph bin, cc sn phm ny c tm ra trong CSDL hnh chiu S| 30. C 2 cch thm sn phm vo tin t:
Sn phm mi x c thm vo giao dch cui cng ca prefix: <{30,x}> (<{_ 70}> v <{_80}>)

Sn phm mi x c coi nh giao dch cui cng ca prefix: <{30}{x}> (<{70}>, <{40}>, <{90}>).

Cch tm tt c cc sn phm tho mn mu tun t dng <{30,x}> trong S|30: S dng 2 mu {_x} v {30,x} (nu tin t l mt giao dch, sn phm 30 c thay th bng tt c cc sn phm trong giao dch ) tnh support count ca x (x l bt c sn phm no xut hin trong S|30). Nu trong cng 1 dy, mu khp c nhiu ln, ch c tnh 1:
{_70} : 2 (trong <{_70,80} v <{30,40,70,80}{90}>) {_80}: 2 (trong <{_70,80} v <{30,40,70,80}{90}>)

Cch tm tt c cc sn phm tho mn mu tun t dng <{30}{x}> hoc <{x}> trong S|30:
{90}: 2 (trong <{90}> v <{30, 40, 70, 80} {90}>) {40}: 2 (trong <{10, 40, 70}> v <{30, 40, 70, 80} {90}>) {70}: 2 (trong <{10, 40, 70}> v <{30, 40, 70, 80} {90}>))

Bc 2: To ra cc mu ph bin di 2 t kt qu thu c Bc 1 v a chng vo tp kt qu F. Cc mu mi l: <{30,70}>, <{30, 80}>, <{30}{90}>, <{30}{40}>, <{30} {70}>. Bc 3: Vi mi tin t va tm c, xy dng CSDL hnh chiu v gi quy hm PrefixSpan (, l+1, S|).Xt tin t <{30}{40}>: <{30}{40}>-projected database S|{30}{40} Sequence Sequence Data ID 2 <{_, 70}> 4 <{_, 70, 80} {90}> Bng 9 CSDL hnh chiu S|{30}{40} Gi quy PrefixSpan({30}{40}, 2, S|{30}{40}): Bc 1: Ch tm c duy nht mt sn phm ph bin: {_, 70} Bc 2: Mu tun t mi: {30}{40, 70} c thm vo tp F Bc 3:<{30}{40, 70}>-projected database S|{30}{40, 70} khng c bn ghi no => Thut ton kt thc tm kim vi nhnh tin t {30}, tip tc quay li thc hin x l tng t vi cc tin t {40}, {70}, {80}, {90}. Kt qu cui cng tp F c tm ra b thut ton PrefixSpan hon ton trng khp vi thut ton GSP V d 7, tuy nhin tc tm ra tp F ca PrefixSpan nhanh hn ng k so vi GSP.
2.2.3 c im ca PrefixSpan v cc k thut ci tin
- PrefixSpan khng cn thc hin sinh cc dy ng c: Khng ging nh cc thut ton da theo Apriori, PrefixSpan ch lm tng dn chiu di ca mu tun

t t cc mu ngn hn. PrefixSpan khng cn sinh hay kim tra bt c dy ng c no khng tn ti trong CSDL ban u cng nh cc CSDL hnh chiu (GSP sinh ra nhng dy ny v phi s dng bc xn ta ca hm candidate-gen-SPM loi b chng). GSP ngc li, ti mi ln lp phi sinh ra rt nhiu dy ng c v tin hnh kim tra ton b tp ny, nh vy khng gian tm kim ca PrefixSpan nh hn rt nhiu so vi GSP. - Cc CSDL hnh chiu lin tc gim kch thc: Mt CSDL hnh chiu lun nh hn CSDL gc ca n bi n ch cha cc dy hu t ca cc dy d liu trong CSDL gc.Trn thc t, s gim kch thc ca cc CSDL hnh chiu c th rt ng k v 2 nguyn nhn: (1) thng thng ch c mt tp nh cc mu tun t c di ln trong CSDL ban u, do s lng bn ghi trong cc CSDL hnh chiu s gim i rt nhiu mi khi di ca tin t tng ln; (2) php chiu ch gi li phn hu t ca mt dy v b i ton b cc giao dch k t tin t tr v trc, dn ti cc bn ghi trong CSDL hnh chiu cng gim di nhanh chng. - Chi ph x l chnh ca thut ton PrefixSpan dnh cho vic xy dng cc CSDL hnh chiu: Trong trng hp xu nht, thut ton PrefixSpan phi xy dng mi CSDL hnh chiu cho mi mu tun t. Nu nh s lng mu tun t tn ti trong CSDL ban u ln, chi ph ny tr nn khng tm thng. Ngoi ra ti nhng ln lp u tin, khi di tin t nh, tng dung lng cc CSDL hnh chiu c th gim khng ng k so vi CSDL gc, v vy cc CSDL hnh chiu phi ghi vo cng m khng c lu tr b nh chnh, dn ti thi gian c ghi d liu lu hn.

Cu 6: Hc c gim st? V d?
Hc c gim st l mt k thut ca ngnh hc my xy dng mt hm (function) t d liu hun luyn. D liu hun luyn bao gm cc cp gm i tng u vo (thng dng vec-t), v u ra mong mun. u ra ca mt hm c th l mt gi tr lin tc (gi l hi qui), hay c th l d on mt nhn phn loi cho mt i tng u vo (gi l phn loi). Nhim v ca chng trnh hc c gim st l d on gi tr ca hm cho mt i tng bt k l u vo hp l, sau khi xem xt mt s v d hun luyn (ngha l, cc cp u vo v u ra tng ng). t c iu ny, chng trnh hc phi tng qut ha t cc d liu sn c d on c nhng tnh hung cha gp phi theo mt cch "hp l" (xem thin kin qui np - inductive bias). (So snh vi hc khng c gim st.) Hc c gim st c th to ra 2 loi m hnh. Ph bin nht, hc c gim st to ra mt m hnh ton cc ( global model) nh x i tng u vo n u ra mong mun. Tuy nhin, trong mt s trng hp, vic nh x c thc hin di dng mt tp cc m hnh cc b (nh trong phng php lp lun theo tnh hung (casebased reasoning) hay gii thut lng ging gn nht).

Tp d liu (dataset) bao gm cc v d m mi v d c gn km vi mt nhn lp/gi tr u ra mong mun Mc ch l hc(xp x) mt gi thit(vd: mt phn lp, mt hm mc tiu ) ph hp vi tp d liu hin c mc tiu ,...) ph hp vi tp d liu hin c Gi thit hc c (learned hypothesis) sau s c dng phn lp/d on i vi cc v d mi Tm li: Hc c gim st l tp hun luyn l mt tp cc mu, mi mu cha mt cp gi tr (pair value) bao gm : (1) cc d liu u vo hay cn gi l cc c trng, (2) d liu u ra mong mun.Thng th tp d liu ny c lm bng tay c c d liu u ra chnh xc. V d: hc nhn dng ch vit tay ngi ta phi xem xt nhiu bc khc nhau: 1. Xc nh loi ca cc v d hun luyn. Trc khi lm bt c iu g, ngi k s nn quyt nh loi d liu no s c s dng lm v d. Chng hn, c th l mt k t vit tay n l, ton b mt t vit tay, hay ton b mt dng ch vit tay. 2. Thu thp tp hun luyn. Tp hun luyn cn c trng cho thc t s dng ca hm chc nng. V th, mt tp cc i tng u vo c thu thp v u ra tng ng c thu thp, hoc t cc chuyn gia hoc t vic o c tnh ton. 3. Xc nh vic biu din cc c trng u vo cho hm chc nng cn tm. S chnh xc ca hm chc nng ph thuc ln vo cch cc i tng u vo c biu din. Thng thng, i tng u vo c chuyn i thnh mt vec-t c trng, cha mt s cc c trng nhm m t cho i tng . S lng cc c trng khng nn qu ln, do s bng n t hp (curse of dimensionality); nhng phi ln d on chnh xc u ra. 4. Xc nh cu trc ca hm chc nng cn tm v gii thut hc tng ng. V d, ngi k s c th la chn vic s dng mng n-ron nhn to hay cy quyt nh. 5. Hon thin thit k. Ngi k s s chy gii thut hc t tp hun luyn thu thp c. Cc tham s ca gii thut hc c th c iu chnh bng cch ti u ha hiu nng trn mt tp con (gi l tp kim chng -validation set) ca tp hun luyn, hay thng qua kim chng cho (cross-validation). Sau khi hc v iu chnh tham s, hiu nng ca gii thut c th c o c trn mt tp kim tra c lp vi tp hun luyn.

Cu 7:
Hc cy quyt nh l mt trong nhng k thut c s dng rng ri nht phn loi.

chnh xc phn loi ca n l cnh tranh vi cc phng php hc thut khc,v n rt hiu qu. M hnh phn loi c i din nh mt ci cy, c gi l mt cy quyt nh. Cy quyt nh l mt cu trc ra quyt nh dng cy, nhn u vo l mt b gi tr thuc tnh m t mt i tng hay mt tnh hung v tr v mt gi tr ri rc. Cy c hai loi nt, cc nt quyt nh, trong c cc nt ni b(internal node) v cc nt l. Mt nt quyt nh quy nh c th mt s php kim tra (v d,hi mt cu hi) trn mt thuc tnh duy nht. Mt nt l biu din mt lp. Thut ton ID3 La chn thuc tnh tt nht: Mt im quan trng trong thut ton xy dng cy quyt nh l la chn thuc tnh tt nht ti mi nt. Trong trng hp l tng, thuc itnhs la chn l thuc tnh cho php chia tp d liu thnh cc tp con c cng mt nhn, v do vy ch cn mt php kim tra thuc tnh khi phn loi. Trong thut ton ID3, s dng entropy lm mc o ng nht ca tp d liu. Trn c s entropy thut ton tnh tng thng tin nh mc tng ng nht, t y xc nh thuc tnh tt nht ti mi nt V d:

Cc c im thut ton hc cy quyt nh: -ID3 l thut ton tm kim cy quyt nh ph hp vi d liu hun luyn - y l phng php tm kim theo kiu tham lam, t trn xung bt u t cy rng. Hm nh gi l tng thng tin.

- ID3 c khuynh hng la chn cy quyt nh n gin, tc l cy c t nt, trong nhng nt tng ng vs thuc tnh c tng thng tin ln c xp gn gc hn. Vn : - Vn qu va d liu: cy quyt nh c chnh xc tt trn d liu hun luyn nhng li cho chnh xc khng tt trn d liu ni chung

Cu 8: Phn loi da trn lut kt hp. V d:


Mt cy quyt nh c th c chuyn i sang mt tp hp cc quy tc, v chng ti thy rng mt b quy tc cng c th c tm thy trc tip phn loi. V th, ch hy vng rng cc quy tc kt hp, c bit l lp lut kt hp (CAR), c th c s dng cho phn loi qu. Vng, qu tht vy! Trong thc t, bnh thng lut kt hp c th c s dng cho phn loi cng nh thy CBA(Classification Based on Associations), chun vit tt ca Phn loi da trn kt hp, l h thng u tin bo co c s dng kt hp cc quy tc phn loi. Trong phn ny, chng ti m t ba cch tip cn s dng cc lut kt hp phn loi: 1. S dng lut kt hp lp phn loi trc tip. 2. S dng lut kt hp lp nh cc tnh nng hoc cc thuc tnh. 3. S dng cc lut kt hp bnh thng (c in) phn loi. Hai phng php tip cn u tin c th c p dng cho d liu dng bng hoc d liu giao dch. Cch tip cn cui cng thng ch c s dng cho cc d liu giao dch. Tt c nhng phng php ny l hu ch trong mi trng Web nh nhiu loi Web d liu trong cc hnh thc giao dch, v d, truy vn tm kim do ngi s dng, v cc trang web nhp vo bi khch truy cp. B d liu giao dch kh khn x l bng cc k thut phn loi truyn thng, nhng d nhin cho quy tc kt hp. Di y, chng ti ln lt m t ba phng php tip cn.Chng ti lu rng nhng quy tc khc nhau theo trnh t c th c s dng phn loi cch tng t nh nu cc b d liu tun t c lin quan. 1. Phn loi s dng lut kt hp lp Nhc li rng mt lut kt hp lp (CAR Class Association Rules) l mt lut kt hp vi ch c mt lp nhn bn bn tay phi ca cc quy tc nh h qu ca n. V d, t cc d liu trong Bng 3.1,

cc quy tc sau y c th c tm thy: Own_house = false, Has_job = true Class = Yes [sup = 3/15, conf = 3/3], cng l mt lut t cy quyt nh trong hnh. 3.3.

Trong thc t, khng c s khc bit gia cc quy tc t mt cy quyt nh vi CAR nu chng ta xem xt ch phn loi thuc tnh. S khc bit l trong qu trnh khai ph v tp nguyn tc cui cng. Khai ph CAR tm thy tt c cc quy tc trong cc d liu p ng ngi dng quy nh h tr ti thiu (minsup) v s tin cy ti thiu (minconf). Mt cy quyt nh hoc quy tc cm ng mt h thng tm thy ch l mt tp hp con ca cc quy tc th hin nh mt cy hoc mt danh sch cc quy tc phn loi. V d: Nh li rng cc cy quyt nh hnh. 3,3 cho ba nguyn tc sau:

Tuy nhin, c nhiu quy tc khc tn ti trong d liu, v d nh

v nhiu quy tc na, nu chng ta s dng minsup = 2/15 = 13,3% v minconf = 70%. Trong nhiu trng hp, cc quy tc khng nm trong cy quyt nh c th cho php thc hin phn loi chnh xc hn. Thc nghim bo co so snh mt s nh nghin cu cho thy rng phn loi bng cch s dng CAR thc hin chnh xc hn trn nhiu tp d liu so vi cc cy quyt nh v quy tc cm ng h thng. Tp hon chnh cc quy tc t khai ph CAR cng c li t mt lut s dng im nhn. Trong mt s ng dng, ngi dng mun thc hink mt s quy nh. Nn ghi nh nhng iu sau: 1. Hc cy quyt nh v quy tc cm ng khng s dng minsup hoc hn ch minconf. 2. Khai ph CAR khng s dng cc thuc tnh lin tc (s), trong khi cy quyt nh i ph vi cc thuc tnh lin tc t nhin. Khai ph lut kt hp lp cho phn loi C rt nhiu k thut s dng CAR xy dng phn loi. Ta lut: quy tc CAR c nh gi d tha cao. Lut ct ta l cn thit. tng ca ct ta CAR c bn ging nh xy dng cy trong quyt nh hoc quy tc cm ng. a lp h tr ti thiu: mt minsup duy nht l khng y cho khai ph CAR v nhiu tp hp d liu thc t phn loi c lp phn phi khng ng u, ngha l, mt s lp bao gm mt t l ln ca d liu, trong khi nhng lp khc ch chim mt t l rt nh. V d 14: Gi s chng ta c mt s liu vi hai lp, Y v N. 99% d liu thuc v lp Y, v ch c 1% ca cc d liu thuc v lp N. Nu chng ta thit lp minsup = 1,5%, ta s khng tm thy bt k quy tc no cho lp N. gii quyt vn , chng ta cn phi h thp xung cc minsup. Gi s chng ta thit lp minsup = 0,2%. Sau , ta c th tm thy mt s lng ln cc quy tc overfitting cho lp Y v minsup = 0,2% l qu thp cho cc lp Y. H tr ti thiu nhiu lp c th c p dng i ph vi vn ny. Chng ta c th ch nh mt minsupi h tr ti thiu lp khc nhau cho mi lp ci, tt c cc quy tc lp ci phi p ng minsupi. Ngoi ra, ta c th cung cp mt minsup tng s, k hiu l t_minsup, m sau c phn phi cho mi lp theo lp sp xp: Minsupi = t_minsup x sup (ci) v sup (ci) l s h tr ca lp ci trong d liu o to. Cng

thc cho cc lp thng xuyn minsups cao hn v cc l khng thng xuyn thp hn minsups. La chn tham s: Cc thng s c s dng trong khai thc CAR l s h tr ti thiu v tin cy ti thiu. Lu rng mt tin cy ti thiu khc nhau cng c th c s dng cho mi lp.Tuy nhin, tin cy ti thiu khng nh hng nhiu n vic phn loi bi v phn loi c xu hng s dng quy tc tin cy cao. nh dng d liu: Cc thut ton khai ph CAR cho tp khai thc d liu giao dch. Tuy nhin, nhiu phn loi tp hp d liu trong cc nh dng bng. Mt b d liu dng bng c th d dng chuyn i mt tp hp d liu giao dch. S dng cc lut mnh nht: y c l l chin lc n gin.N ch n gin l s dng cc CAR trc tip cho phn loi. i vi mi trng hp th nghim, n tm thy cc quy tc mnh nht bao ph cc th hin. Chn mt tp hp con ca cc quy tc xy dng mt phn loi: Phng php i din ca th loi ny l mt trong nhng c s dng trong h thng CBA.Phng php ny tng t phng php bao ph tun t, nhng p dng cho cc lp lut kt hp vi nhng ci tin b sung. Lut a kt hp: Ging nh cc phng php tip cn u tin , phng php ny khng thc hin bt k bc b sung xy dng mt phn loi . 2. Lut kt hp lp nh tnh nng Trong hai phng php trn, cc quy tc c s dng trc tip phn loi. Trong phng php ny, cc quy tc c s dng nh cc tnh nng tng cng cho d liu gc hoc ch n gin l to thnh mt tp hp d liu mi, m sau cho mt thut ton phn loi truyn thng. s dng CAR nh cc tnh nng, ch c mt phn iu kin ca tng quy tc l cn thit, v n thng c coi l mt tnh nng / thuc tnh Boolean. Nu mt trng hp d liu trong cc d liu ban u c cha cc phn c iu kin, gi tr ca tnh nng / thuc tnh c thit lp thnh 1, v nu khng n c thit lp l 0. L do m phng php ny l hu ch l CAR nm bt mi tng quan a thuc tnh hoc a mc vi nhn lp. Nhiu thut ton phn loi khng tm thy mi tng quan nh vy (v d nh, Bayesian), nhng h c th kh hu ch. 3. Phn loi s dng lut kt hp thng Khng ch cc lut kt hp lp c th c s dng phn loi, m cn c lut kt hp thng.V d, lut kt hp thng c

s dng trong cc trang web thng mi in t cho cc khuyn ngh sn phm, m lm vic nh sau: Khi mt khch hng mua mt s sn phm, h thng s ngh ngi mt s sn phm lin quan khc da trn nhng g ngi mua. Khuyn ngh v c bn l phn loi hoc vn d bo.N d on nhng g mt khch hng c kh nng mua. Lut kt hp t nhin i vi cc ng dng nh vy. Qu trnh phn loi nh sau: 1. H thng u tin s dng cc giao dch mua bn trc (ging nh giao dch gi th trng) khai thc lut kt hp. Trong trng hp ny, khng c cc lp c nh. Bt k mc no c th xut hin pha bn tri hoc bn tay phi ca mt quy tc. i vi mc ch gii thiu, thng thng ch c mt mc xut hin pha bn tay phi ca mt quy tc. 2. Thi gian d on (v d, gii thiu), mt giao dch (v d, mt tp hp cc mt hng c mua bi mt khch hng), tt c cc quy tc bao gm cc giao dch c la chn. Cc quy tc mnh nht c chn v mc pha bn tay phi ca qui tc (tc l, kt qu) sau l cc mt hng d bo v khuyn co cho ngi s dng. Nu c nhiu quy tc mnh, th nhiu mc c th c khuyn khch. (c th c thm http://www.scribd.com/doc/89253259/20/Phan-lo %E1%BA%A1i-d%E1%BB%B1a-tren-s%E1%BB%B1-k%E1%BA%BFt-h %E1%BB%A3p)

Cu 9:
1. Phn loi (phn lp) d liu Phn lp d liu (classification) l mt trong nhng hng nghin cu chnh ca khai ph d liu. Thc t t ra nhu cu l t mt c s d liu vi nhiu thng tin n con ngi c th trch rt ra cc quyt nh nghip v thng minh. Phn lp mt dng phn tch d liu nhm trch rt ra mt m hnh m t cc lp d liu quan trng hay d on xu hng d liu tng lai. Phn lp d liu d on gi tr ca nhng nhn xc nh (categorical label) hay nhng gi tr ri rc (discrete value), ngha l phn lp thao tc vi nhng i tng d liu m c b gi tr l bit trc. V d nh nh cc lut v xu hng mua hng ca khch hng trong cc siu th, cc nhn vin kinh doanh c th ra nhng quyt nh ng n v s lng mt hng, chng loi by bn, v tr by bn Trong nhng nm qua, phn lp d liu thu ht s quan tm cc nh nghin cu trong nhiu lnh vc khc nhau nh hc my, h chuyn gia, thng k Cng ngh ny cng ng dng trong nhiu lnh vc khc nhau nh: thng mi, maketing, nghin cu th trng, bo him, y t, ngn hng, gio dc... Phn ln cc thut ton ra i trc u s dng c ch d liu c tr trong b nh (memory resident), thng thao tc

vi lng d liu nh. Mt s thut ton ra i sau ny s dng k thut c tr trn a ci thin ng k kh nng m rng ca thut ton vi nhng tp d liu ln ln ti hng t bn ghi. Phn lp d liu l 1 dng hc c gim st (supervised learning)

Cc nh ngha phn lp d liu Cho c s d liu D = {t1, t2, tn} v tp cc lp C = {C1, Cm} , phn lp l bi ton xc nh nh x f : D -> C sao cho mi ti c gn vo 1 lp. Phn lp d liu l Dng phn tch d liu nhm rt trch cc m hnh m t cc lp d liu hoc d on xu hng d liu. Phn lp d liu l vic cho tp cc mu phn lp trc, xy dng m hnh cho tng lp. Mc ch : Gn cc mu mi vo cc lp vi chnh xc cao nht c th. V d phn lp : Phn lp khch hng (trong ngn hng) xc nh xem c cho vay tn dng hay khng. Chun on t bo khi u l lnh hay c tnh. D bo thi im c l. Phn lp th tn dng l hp php hay gian ln Phn loi tin tc thuc lnh vc gii tr, thi tit, th thao, gio dc, i sng Chun on y hc Cc bc ca qu trnh phn lp d liu Qu trnh phn lp d liu gm hai bc : Bc hc/ hun luyn (learning) Qu trnh hc nhm xy dng mt m hnh m t mt tp cc lp d liu hay cc khi nim nh trc. u vo ca qu trnh ny l mt tp d liu c cu trc c m t bng cc thuc tnh v c to ra t tp cc b gi tr ca cc thuc tnh. Mi b gi tr c gi chung l mt phn t d liu (data tuple), c th l cc mu (sample), v d (example), i tng (object), bn ghi (record) hay trng hp (case). Trong tp d liu ny, mi phn t d liu c gi s thuc v mt lp nh trc, lp y l gi tr ca mt thuc

tnh c chn lm thuc tnh gn nhn lp hay thuc tnh phn lp (class label attribute). u ra ca bc ny thng l cc quy tc phn lp di dng lut dng if-then, cy quyt nh, cng thc logic, hay mng nron. Qu trnh ny c m t nh trong hnh

Bc hc/hun luyn Bc phn loi (classification) Bc th hai dng m hnh xy dng bc trc phn lp d liu mi. Trc tin chnh xc mang tnh cht d on ca m hnh phn lp va to ra c c lng. Holdout l mt k thut n gin c lng chnh xc . K thut ny s dng mt tp d liu kim tra vi cc mu c gn nhn lp. Cc mu ny c chn ngu nhin v c lp vi cc mu trong tp d liu o to. chnh xc ca m hnh trn tp d liu kim tra a l t l phn trm cc cc mu trong tp d liu kim tra c m hnh phn lp ng (so vi thc t). Nu chnh xc ca m hnh c c lng da trn tp d liu o to th kt qu thu c l rt kh quan v m hnh lun c xu hng qu va d liu. Qu va d liu l hin tng kt qu phn lp trng kht vi d liu thc t v qu trnh xy dng m hnh phn lp t tp d liu o to c th kt hp nhng c im ring bit ca tp d liu . Do vy cn s dng mt tp d liu kim tra c lp vi tp d liu o to. Nu chnh xc ca m hnh l chp nhn c, th m hnh c s dng phn lp nhng d liu tng lai, hoc nhng d liu m gi tr ca thuc tnh phn lp l cha bit.

Bc phn loi (nh gi v p dng) Trong m hnh phn lp, thut ton phn lp gi vai tr trung tm, quyt nh ti s thnh cng ca m hnh phn lp. Do vy cha kha ca vn phn lp d liu l tm ra c mt thut ton phn lp nhanh, hiu qu, c chnh xc cao v c kh nng m rng c. Trong kh nng m rng c ca thut ton c c bit tr trng v pht trin. k thut phn loi d liu Phn loi vi cy quyt nh (decision tree) Phn loi vi mng Bayesian Phn loi vi mng neural Phn loi vi k phn t cn gn nht (k-nearest neighbor) Phn loi vi suy din da trn tnh hung (case-based reasoning) Phn loi da trn tin ho gen (genetic algorithms) Phn loi vi l thuyt tp th (rough sets) Phn loi vi l thuyt tp m (fuzzy sets) Cc phng php da trn lut (Rule-based Methods) Cc phng php Bayes v mng tin cy Bayes (Nave Bayes and Bayesian Belief Networks) Cc phng php my vector h tr (Support Vector Machines) Lp lun da trn ghi nh (Memory based reasoning) Ti sao phi phn lp d liu? t ra 1 s tnh hung nh sau : Cc

Trong mt ngn hng, nu bit nhng thng tin v 1 ngi mun vay tin c th, th ngn hng c quyt nh cho vay hay khng?

Trong trng i hc, khi c cc thng tin nh im tng kt cc mn hc ca sinh vin, lm cch no xc nh c sinh vin s tt nghip. Trong bnh vin, c cch no bc s chun on bnh cho bnh nhn chnh xc nht c th. T nhng tnh hung trn, c th thy rng tn ti nhiu l do m ti sao chng ta xy dng cc m hnh phn lp : Nhanh hn : V d nh vic c m bu in bng my c th sp xp v phn loi th, gim kh khn cho nhn vin bu chnh. Xc nh ri ro trong cc n xin vay tin: da vo cc thng tin khch hng vay tin, chng ta c th xp khch hng i vay tin vo nhm khch hng c ri ro thp, cao, bnh thng, c th c quyt nh ng Trong lnh vc y khoa, chng ta hy vng l trnh c cc vic chun on nhm, do vy, 1 h thng chun on c lp, ng tin cy da vo cc triu chng c bnh nhn cung cp, l cn thit. C s d on trc a ra quyt nh trong kinh doanh chng khon, u t bt ng sn Cc tiu chun ca m hnh phn lp Trong tng ng dng c th cn la chn m hnh phn lp ph hp. Vic la chn cn c vo s so snh cc m hnh phn lp vi nhau, da trn cc tiu chun sau:

chnh xc d on (predictive accuracy) chnh xc l kh nng ca m hnh d on chnh xc nhn lp ca d liu mi hay d liu cha bit. tin cy ca 1 lut da vo chnh xc khi phn lp, mc d c mt s li nghim trng hn cc li khc nhng vn quan trng l c iu chnh v nh gi chnh xc vi cc li quan trng. Tc (speed) Tc l nhng chi ph tnh ton lin quan n qu trnh to ra v s dng m hnh. Trong 1 s tnh hung, tc phn lp c xem l 1 yu t quan trng. Khi mt b phn lp vi chnh xc 92% c th c a chung hn b phn lp c chnh xc 95% nhng chm hn 100 ln trong cc php kim tra. Sc mnh (robustness) Sc mnh l kh nng m hnh to ta nhng d on ng t nhng d liu noise hay d liu vi nhng gi tr thiu. Kh nng m rng (scalability) Kh nng m rng l kh nng thc thi hiu qu trn lng ln d liu ca m hnh hc. Tnh hiu c (interpretability) : Tnh hiu c l mc hiu v hiu r nhng kt qu sinh ra bi m hnh hc. Mt b phn lp d hiu s to cho ngi s dng tin tng vo h thng, ng thi cng gip cho ngi s dng trnh c nhng hiu lm kt qu ca 1 lut c a ra bi h thng. Thi gian hc : Vn ny quan trng khi h thng c s dng trong cc mi trng thay i thng xuyn, iu yu cu h thng phi hc rt nhanh 1 lut phn lp hoc nhanh iu chnh lut c hc cho ph hp hon cnh thc t. y cng l nhng vn 1 b phn lp cn quan tm, gii quyt. Vn chun b d liu cho vic phn lp Vic tin x l d liu cho qu trnh phn lp l mt vic lm khng th thiu v n c vai tr quan trng quyt nh ti s p dng c hay khng ca m hnh phn lp. Qu trnh tin x l d liu s gip ci thin chnh xc, tnh hiu qu v kh nng m rng c ca m hnh phn lp. Qu trnh tin x l d liu gm c cc cng vic sau: Lm sch d liu Lm sch d liu lin quan n vic x l vi li (noise) v gi tr thiu (missing value) trong tp d liu ban u. Noise l cc li ngu nhin hay cc gi tr khng hp l ca cc bin trong tp d liu. x l vi loi li ny c th dng k thut lm trn. Missing value l nhng khng c gi tr ca cc thuc tnh. Gi tr thiu c th do li ch quan trong qu trnh nhp liu, hoc trong trng hp c th

gi tr ca thuc tnh khng c, hay khng quan trng. K thut x l y c th bng cch thay gi tr thiu bng gi tr ph bin nht ca thuc tnh hoc bng gi tr c th xy ra nht da trn thng k. Mc d phn ln thut ton phn lp u c c ch x l vi nhng gi tr thiu v li trong tp d liu, nhng bc tin x l ny c th lm gim s hn n trong qu trnh hc (xy dng m hnh phn lp). Phn tch s cn thit ca d liu C rt nhiu thuc tnh trong tp d liu c th hon ton khng cn thit hay lin quan n mt bi ton phn lp c th. V d d liu v ngy trong tun hon ton khng cn thit i vi ng dng phn tch ri ro ca cc khon tin cho vay ca ngn hng, nn thuc tnh ny l d tha. Phn tch s cn thit ca d liu nhm mc ch loi b nhng thuc tnh khng cn thit, dtha khi qu trnh hc v nhng thuc tnh s lm chm, phc tp v gy ra s hiu sai trong qu trnh hc dn ti mt m hnh phn lp khng dng c. Chuyn i d liu Vic khi qut ha d liu ln mc khi nim cao hn i khi l cn thit trong qu trnh tin x l. Vic ny c bit hu ch vi nhng thuc tnh lin tc (continuous attribute hay numeric attribute). V d cc gi tr s ca thuc tnh thu nhp ca khch hng c th c khi qut ha thnh cc dy gi tr ri rc: thp, trung bnh, cao. Tng t vi nhng thuc tnh ri rc (categorical attribute) nh a ch ph c th c khi qut ha ln thnh thnh ph. Vic khi qut ha lm c ng d liu hc nguyn thy, v vy cc thao tc vo/ ra lin quan n qu trnh hc s gim. 2. Phn loi d liu vi mng Bayes nh l Bayes X: mt tuple/itng (evidence) H: gi thuyt (hypothesis)

Cho mt RID, RID thuc v lp yes (buys_computer = yes) nh l Bayes P(H|X): posterior probability - Xc sutc iu kin ca H ivi X. - V d: P(buys_computer=yes|age=young, income=high) l xc sut mua my tnh ca khch hng c tui young v thu nhphigh. P(X|H): posterior probability - Xc sut c iu kin ca X i vi H. - V d: P(age=young, income=high|buys_computer=yes) l xc sut khch hng mua my tnh c tui young v thu nhphigh. P(age=young, income=high|buys_computer=yes) = 0 P(age=young, income=high|buys_computer=no) = 2/5 = 0.4 P(H): prior probability - Xc sut ca H - V d: P(buys_computer=yes) l xc sut mua my tnh ca khch hng ni chung. P(buys_computer=yes) = 9/14 = 0.643 P(buys_computer=no) = 5/14 = 0.357 - P(X): prior probability Xc sut ca X

V d: P(age=young, income=high) l xc sut khch hng c tui young v thu nhphigh. P(age=young, income=high) = 2/14 = 0.143

P(H), P(X|H), P(X) c th c tnh t tp d liu cho trc. P(H|X) c tnh t nh l Bayes.

P(buys_computer=yes|age=young, income=high) = P(age=young, income=high| buys_computer=yes)P(buys_computer=yes)/P(age=young, income=high) = 0 P(buys_computer=no|age=young, income=high) = P(age=young, income=high| buys_computer=no)P(buys_computer=no)/P(age=young, income=high) = 0.4*0.357/0.143 = 0.9986 Cho trc tp d liu hun luyn D vi m t (nhn) ca cc lpCi, i=1..m, qu trnh phn loi mt tuple/itng X = (x1, x2, , xn) vi mng Bayesian nh sau: - X c phn loi vo Ci nu v ch nu P(Ci|X) > P(Cj|X) vi 1<=j<=m, j<>i

-> Ti a ha P(Ci|X) (i.e. chnCi nuP(Ci|X) l tr lnnht) -> Ti a haP(X|Ci)P(Ci) -> P(C1) = P(C2) = .. = P(Cm) hocP(Ci) = |Ci,D|/|D|

P(X|Ci) c tnh vi gi nh class conditional independence. xk, k = 1..n: tr thuc tnh Ak caX. P(xk|Ci) c tnh nh sau: - Ak l thuc tnh ri rc. P(xk|Ci) = |{X|xk = xk X Ci}|/|Ci,D| - Ak l thuc tnh lin tc.

P(xk|Ci) tun theo mt phn b xc sut no (v d: phn b Gauss). NuP(xk|Ci) = 0 th P(X|Ci) = 0 - Ban u P(xk|Ci) = |{X|xk = xk X Ci}|/|Ci,D| - Laplace (Pierre Laplace, nh ton hcPhp, 1749-1827) P(xk|Ci) = (|{X|xk = xk X Ci}|+1)/(|Ci,D| + m) - z-estimate P(xk|Ci) = (|{X|xk = xk X Ci}| + z*P(xk))/(|Ci,D| + z)

Cu 10.
Hc c gim st l pht hin ra cc mu trong d liu c lin quan t thuc tnh d liu n thuc tnh lp.Nhng mu ny sau c s dng d on gi tr thuc tnh ca lp vi cc th hin ca d liu trong tng lai.Tuy nhin trong mt s ng dng khc,cc d liu khng c thuc tnh lp.Ngi s dng mun khm ph d liu tm trong cu trc ni ti ca chng.Clustering l mt trong nhng cng ngh cho vic tm kim cc cu trc nh vy.T chc cc trng d liu thnh cc nhm tng t nhau.Clustering thng c gi l hc khng gim st bi v khng ging nh hc c gim st,cc gi tr lp biu th mt phn vng u tin hoc nhm cc d liu khng c a ra.Theo nh ngha ny c th ni lut khai ph kt hp l hc khng gim st.Clustering c chng minh l mt trong nhng k thut phn tch c s dng ph bin nht.N c

s dng trong nhiu lnh vc nh y hc,tm l hc,thc vt hc,x hi hc,... Clustering l qu trnh t chc th hin ca d liu thnh cc nhm m cc thnh phn l tng t nhau theo 1 cch no .Mt cm l mt tp hp cc d liu m "tng t " vi nhau v d liu l "khng tng t" trong cm khc.Trong phn cm(clustering literature) mt th hin ca d liu cng c gi l i tng nh l th hin c th i din cho i tng trong th gii thc.N cng c gi l im d liu v n c th c xem nh mt im trong khng gian ,trong r l s cc thuc tnh trong d liu.Hnh 4.1 ch ra mt tp d liu 2 chiu.Chng ti c th thy r rng 3 nhm cc im d liu.Mi nhm l 1 cluster.Nhim v ca phn cm l tm ra 3 cm n trong d liu.Mc d rt d dng cho 1 ngi tm ra cc cm trong mt khng gian 2 chiu hoc thm ch 3 chiu nhng n tr nn rt kh,nu khng ni l khng th tm ra cc cm nu s chiu tng ln.Ngoi ra,trong nhiu ng dng cc cm khng r rng hoc ko tch ring ra 3 cm nh trong hnh 4.1 K thut t ng tch ra l cn thit trong phn cm Example: V d 1: Mt cng ty mun tin hnh mt chin dch tip th qung b sn phm camnh.Chin lc hiu qu nht l thit k mt tp hp cc ti liu tip th c nhn cho mi khch hng theo h s c nhn ca ng ta / c ta v tnh hnh ti chnh.Tuy nhin, iu ny l qu t i vi mt s lng ln khch hng. mt khc,cng ty thit k ch c mt tp hp cc ti liu tip th c s dng cho tt c khch hng.Phng php hiu qu kinh t nht l phn nhm khch hng vo mt s lng nh ca cc nhm theo s tng ng gia h v thit k mt s ti liu tip th cho mi nhm.Nhim v phn nhm ny thng c thc hin bng cch s dng cc thut ton phn nhm, khch hng c phn vo cc nhm tng t.Nghin cu th trng, clustering thng c gi l phn nhm.

Cu 11: Kmeans

Thut ton K-means Thut ton ny c nhiu nh nghin cu trn cc lnh vc khc nhau nghin cu. ng ch nht l Lloyd(1957-1982), Forgey(1965), Friedman v Rubin(1967), McQueen(1967) D liu u vo D liu u vo phc v cho phn cm d liu bao gm: - D liu iu kin - D liu nh phn - D liu giao dch

- D liu chui thi gian - D liu tng trng Chun ha d liu u vo: Trong nhiu ng dng ca phn cm d liu, d liu th, hoc o c thc t, khng c s dng trc tip. Vic chun b cho vic phn cm d liu yu cu mt s loi chuyn i, chng hn nh bin i v chun ha d liu. Mt s phng php bin i d liu thng c s dng phn cm d liu. Chun ha d liu lm cho d liu gim kch thc i. N c ch xc nh tiu chun ha ch s. sau chun ha, tt c cc kin thc v v tr v quy m ca cc d liu gc c th b mt. Chun ha cc bin l cn thit trong trng hp cc bin php khng ging nhau. Cc phng php tip cn cc chun ha ca cc bin bao gm 2 loi: chun ha ton cc v chun ha trong cm. Chun ha ton cc lm chun cc bin trn tt c cc yu t trong tp d liu. Trong vng cm tiu chun ha dng ch tiu chun ha xy ra trong cc cm bin mi ngy. Mt s hnh thc tiu chun ha c th c s dng trong cc chun ha ton cc v chun ha trong phm vi rt tt, nhng mt s hnh thc chun ha ch c th c s dng trong chun ha ton cc. Khng th trc tip chun ha cc bin trong cc cm trong phn cm, bi v cc cm khng c bit trc khi chun ha. Khi qut chung v K-means Thut ton K-means tip cn phn cm tng phn, mi cm c cng nhn bi mt tm im, mi im c ch nh cm vi tm gn nht. Thut ton K-means ly tham s u vo k v phn chia mt tp n i tng vo trong k cm cho kt qu tng ng trong cm l cao trong khi tng ng ngoi cm l thp. tng ng cm c o khi nh gi gi tr trung bnh ca cc i tng trong cm, n c th c quan st nh l trng tm ca cm Thut ton x l nh sau: trc tin la chn ngu nhin k i tng, mi i tng i din cho mt trung bnh cm hay tm cm. i vi nhng i tng cn li, mt i tng c n nh vo mt cm m n ging nht da trn khong cch gia i tng v trung bnh cm. Sau cn tnh gi tr trung bnh mi cho mi cm. X l ny c lp li cho ti khi hm tiu chun hi t. Bnh phng sai s tiu chun thng c dng, nh ngha nh sau: E= Vi x l im trong khng gian, i din cho i tng cho trc, mi l trung bnh cm Ci (c x v mi u l a chiu). Tiu chun ny c kt qu k cm cng ring bit cng tt

tng thut ton K-mean l mt trong nhng thut ton ca phn cm d liu. Mc ch ca thut ton l sinh ra k cm d liu {C1,C2Ck} t mt tp d liu cha n i tng trong khng gian d chiu. u vo: c s d liu: tp d liu khch hng S cm k u ra: mt tp k cm-cm ti thiu ha bnh phng sai s tiu chun Thut ton: - Bc 1: chn ngu nhin k mu vo k cm, coi tm ca cm chnh l mu c trong cm - Bc 2: tm tm mi ca cm - Bc 3: gn cc mu vo tng cm sao cho khong cch t mu n tm ca cm l nh nht - Bc 4: nu cc cm khng c s thay i no sau khi thc hin bc 3 th chuyn sang bc 5, ngc li th sang bc 2 - Bc 5: dng thut ton

K-means biu din cc cm bi cc trng tm ca cc i tng trong cm . Thut ton K-means chi tit nh sau:
BEGIN Nhp n i tng d liu Nhp k cm d liu

MSE=+ For I=1 to k do mi=Xi+ (i-1)*[n/k] // khi to k trng tm Do{ OldMSE=MSE MSE=0 For j=1 to k do {m[j]=0; n[j]=0} End for For I=1 to n do For j=1 to k do Tnh ton khong cch Euclide bnh phng: D2(x[i]; m[j]) End for Tm trng tm gn nht m[h] ti X[i] m[h]=m[h]+X[i]; n[h]=n[h]+1; MSE=MSE + D2(x[i];m[j]) End for n[j]= max(n[j],1); m[j]=m[j]/n[j]; MSE=MSE } while (MSE<= OldMSE) END

Trong : MSE l sai s bnh phng trung bnh hay l hm tiu chun D2(x[i]; m[j]) l khong cch Euclide t i tng th I ti trng tm j OldMSE m[j], n[j]: bin tm lu gi tr cho trng thi trung gian cho cc bin tng ng phc tp v hn ch ca thut ton phn cm K-Means Cht lng ca thut ton K-Mean ph thuc nhiu vo cc tham s u vo nh: s cm k, v k trng tm khi to ban u. Trong trng hp cc trng tm khi to ban u m qu lch so vi cc trong tm cm t nhin th kt qu phn cm ca K-means l rt thp, ngha l cc cm d liu c khm ph rt lch so vi cc cm trong thc t. Trn thc t, cha c mt gii php no chn

tham s u vo, gii php thng c s dng nht l th nghim vi cc gi tr u vo k khc nhau ri sau chn gii php tt nht phc tp ca thut ton K-means l O(n*K*I*d) vi n l s im, K l s cm, I l s lng vng lp, d l s thuc tnh C mt s gii php khc phc hn ch ca thut ton K-means nh sau: - Gii quyt vn khi khi to cm ban u: thut ton Kmeans s dng chy nhiu ln, s dng phn cp cm ly mu tm ban u, tin x l d liu, s dng K-means phn i(bisecting K-means) - Gii quyt vn khi c cc cm khc nhau v kch thc, mt , hnh dng. Chng ta s dng phn chia nh cc cm, cc cm c chia nh hn s d phn cm hn - Mt s vn khi d liu c nhiu c gii quyt bng cch tin x l d liu v hu x l d liu. Nh tch ra mt s d liu nu mt trung bnh tha, v gn li mt s d liu nu nh d liu gn tm im, hoc l loi b mt s cm c kch thc nh v nhng d liu thng l nhiu.

Cu 15 :Khai ph theo s dng Web

1.Khi nim Khai ph theo s dng Web l khai ph truy cp Web (Web log) khm ph cc mu ngi dng truy nhp vo WebSite. Thng qua vic phn tch v kho st nhng quy tc trong vic ghi nhn li qu trnh truy cp Web ta c th chng thc khch hng trong thng mi in t, nng cao cht lng dch v thng tin trn Internet n ngi dng, nng cao hiu sut ca cc h thng phc v Web. Thm vo , t pht trin cc Web site bng vic hun luyn t cc mu truy xut ca ngi dng. Phn tch qu trnh ng nhp Web ca ngi dng cng c th gip cho vic xy dng cc dch v Web theo yu cu i vi tng ngi dng ring l c tt hn. 2.ng dng ca khai ph theo s dng Web Tm ra nhng khch hng tim nng trong thng mi in t. - Chnh ph in t (e-Gov), gio dc in t (e-Learning). Xc nh nhng qung co tim nng. Nng cao cht lng truyn ti ca cc dch v thng tin Internet n ngi dng cui. Ci tin hiu sut h thng phc v ca cc my ch Web. C nhn dch v Web thng quan vic phn tch cc c tnh c nhn ngi dng.

Ci tin thit k Web thng qua vic phn tch thi quen duyt Web v phn tch cc mu ni dung trang quy cp ca ngi dng. Pht hin gian ln v xm nhp bt hp l trong dch v thng mi in t v cc dch v Web khc. Thng qua vic phn tch chui truy cp ca ngi dng c th d bo nhng hnh vi ca ngi dng trong qu trnh tm kim thng tin. 3.Cc k thut c s dng trong khai ph theo s dng Web Lut kt hp: tm ra nhng trang Web thng c truy cp cng nhau ca ngi dng nhng la chn cng nhau ca khch hng trong thng mi in t. K thut phn cm: Phn cm ngi dng da trn cc mu duyt tm ra s lin quan gia nhng ngi dng Web v cc hnh vi ca h. 4.V d khai ph theo s dng Web Chng thc ngi dng truy cp vo Web site, phn tch nhng ngi dng c bit tm ra nhng ngi dng quan trng thng qua mc truy cp ca h, thi gian lu li trn v mc yu thch trang Web. Phn tch cc ch c bit v chiu su ni dung Web. V d, hot ng thng ngy ca mt quc gia, gii thiu cc tour,... Quan h kh t nhin gia ngi dng v ni dung Web. Tm ra nhng dch v hp dn v tin li vi ngi dng. Ty theo mc hiu qu hot ng truy cp Web site v iu kin ca vic duyt Web site ta c th d kin v nh gi ni dung Web site tt hn.

You might also like