You are on page 1of 12

KHAI THC D LIU & NG DNG

(DATA MINING)
GV : NGUYN HONG T ANH
1

N TP
2

HNH THC KIM TRA V NH GI


H thng thang im:
L thuyt: Thi l thuyt cui k 6 im Lm bi tp theo nhm & c nhn 1 im Thc hnh: Lm bi tp np 3 im Lm n 3 + 1 im Cuc thi vui 1 im Tng 11-12 im
3

Cu trc thi cui k


1. Thi gian : 120 2. c s dng ti liu, khng s dng

Laptop. 3. thi gm 3 cu

a) Cu 1(2 im): ni dung thuc chng 3

: Tp ph bin v lut kt hp b) Cu 2 (2 im): ni dung thuc chng 4 v 5 : Phn lp, gom nhm c) Cu 3 (0.5 im): cu hi dng phn tch,

tng hp t kin thc ca tt c cc chng v bo co seminar. 4

N TP
Chng 1 : 1. Khai thc d liu l g?

Qu trnh khng tm thng ca vic xc nh cc mu tim n c tnh hp l, mi l, c ch v c th hiu c ti a trong CSDL.

2. Nguyn nhn no dn n s cn thit ca

lnh vc khai thc d liu ?

S pht trin ca cng ngh phn cng, cng ngh thu thp & qun l DL, cc thut ton pht trin mnh. Nhu cu phn tch d liu h tr qu trnh ra quyt nh. 5

N TP
Chng 1 : 3. Cc kiu d liu v cc loi thng tin no c th c s dng trong qu trnh khm ph tri thc t d liu (KDD)?

Cc loi thng tin : thng tin thng mi, sn xut, khoa hc v thng tin c nhn. Cc kiu d liu : DL c thuc tnh dng phi s, dng lin tc, DL dng tnh, ng, DL phn tn, DL vn bn, web, a phng tin: hnh nh, audio, video,

4. Trnh by mt vi v d thc t c s dng

loi d liu, thng tin nu trn.

6 www.kdnuggets.com/solutions/index.html

N TP
Chng 1 : 5. Khai thc d liu gm cc loi nhim v no?
Nhim v m t v d on. Da trn kinh nghim bn thn, theo cc em loi nhim v no ca khai thc d liu c quan tm nhiu nht v c ng dng nhiu trong : Lnh vc kinh doanh Lnh vc gio dc

N TP
Chng 2 : 6. Ti sao cn chun b DL?

7. Cc bc trong qu trnh chun b DL ? Cho

Do cht lng DL trong thc t l xu. Cht lng DL s nh hng n qu trnh ra quyt nh.

v d c th tng bc.

8. Phng php chia gi, phng php s dng

Lm sch DL -> Chn lc/ Tch hp DL -> Bin i/ m ha DL -> Rt gn DL

biu (histogram) c th p dng vo trong cc bc no ca qu trnh chun b d liu ?


Lm sch DL : kh nhiu; M ha DL : ri rc ha DL; Rt gn DL . 8

N TP
Chng 3 : 9. Pht biu bi ton khai thc tp ph bin, khai thc lut kt hp?
Bi ton khai thc tp ph bin l bi ton tm tt c cc tp cc hng mc S (hay tp ph bin S) c ph bin tha mn ph bin ti thiu minsupp: supp(S) minsupp
Bi ton khai thc LKH l bi ton tm tt c cc lut dng X Y (X, Y I v X Y = {}) tha mn ph bin v tin cy ti thiu supp (X Y ) minsupp 9 conf (X Y ) minconf

N TP
Chng 3 : 10. Trnh by cc tnh cht ca tp ph bin? Tp ph bin ti i, tp ph bin ng.
Tt c cc tp con ca tp ph bin u l tp ph bin . Nu tp con khng ph bin th tp bao n (tp cha) khng ph bin. Tp ph bin ti i : l tp ph bin & khng tn ti tp no bao n l ph bin Tp ph bin ng: l tp ph bin & khng tn ti tp no bao n c cng ph bin nh n.
10

N TP
Chng 3 : 11. Trnh by qui trnh tm lut kt hp? Hy a ra ci tin cho phng php to lut kt hp t tp ph bin (trong Bc 2 ca qui trnh)? Gii thch v sao n hiu qu hn. Cho v d minh ha. B1 : Tm tt c cc tp ph bin (theo ngng minsupp) B2 : To ra cc lut t cc tp ph bin ( tm thy t bc 1)
i vi mi tp ph bin S, to ra tt c cc tp con khc rng ca S. i vi mi tp con khc rng A ca S,
o Lut A (S - A) l lut kt hp cn tm nu : conf (A (S - A)) = supp(S) / supp(A) minconf

Nghin cu ci tin B2 trong ti liu tham kho P.-N. Tan, M. Steinbach, V. Kumar, Chng 6 - Introduction to Data Mining http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf, pp.23
11

N TP
Chng 3 : 12. Thut ton Apriori? Hy trnh by mt phng php ci tin thut ton Apriori.
Lu : Ch to cc ng vin Ck+1 t tp ph bin Lk theo ng nguyn tc kt ng vin v cn thc hin bc loi b cc ng vin cha t nht mt tp con khng ph bin. Phng php ci tin : nghin cu mt trong cc ti liu tham kho 12 trong bi ging.

N TP
Chng 3 : 13. Thut ton Fp_growth? So snh vi Apriori.

Lu : Trc khi to cy Fp cng nh cy Fp-iu kin, phi sp xp cc giao dch/c s mu iu kin theo th t ca danh sch f-list/cc tp mt hng mc ph bin theo chiu gim dn.
Interest ( X Y ) P( X , Y ) P( X ) * P(Y )

14. o Interest xc nh tnh l th ca

lut

13

N TP
Chng 4 : 15. Pht biu bi ton phn lp ? Cho v d ng dng trong thc t .
Cho CSDL D={t1,t2,,tn} v tp cc lp C={C1,,Cm}, phn lp l bi ton xc nh nh x f : DC sao cho mi ti c gn vo mt lp. http://www.kdnuggets.com/software/classif ication.html

16. Phng php phn lp da trn cy quyt

nh?

p dng cho DL gm cc thuc tnh c dng phi s. Da trn ng nht ca DL phn chia: IG, Gini index.
14

N TP
Chng 4 : 17. Phng php phn lp da trn lut?

Xy dng lut trc tip/gin tip t DL: Thut ton ILA/ rt lut t cy quyt nh. Thut ton ILA : ch th t cc t hp thuc tnh trong danh sch cc thuc tnh kt hp.

18. Phng php phn lp da trn m hnh

xc sut:Nave Bayes?

p dng cho DL gm cc thuc tnh c dng phi s v c dng s/gi tr lin tc. Nn s dng phng php lm trn (Laplace) khi c lng cc xc sut.
15

N TP
Chng 4 : 19. Phng php phn lp da trn th hin: thut ton k-NN? So snh cc phng php phn lp.
Gn mu mi vo lp c s mu chim a s trong k mu lng ging gn n nht (hoc mu mi nhn gi tr trung bnh ca k mu). Nn chun ha gi tr thuc tnh trc khi thc hin thut ton k-NN. vi min vi ai 16 max vi min vi

N TP
Chng 5 : 20. Pht biu bi ton gom nhm ? Cho v d ng dng trong thc t.
Cho CSDL D={t1,t2,,tn} v s nguyn k, gom nhm l bi ton xc nh nh x f : D {1,,k} sao cho mi ti c gn vo mt nhm (lp) Kj, 1jk. http://www.kdnuggets.com/software/clus tering.html

21. Phng php gom nhm theo kiu

phn hoch: Thut ton k-mean?

phc tp ca thut ton ph thuc vo vic la chn k trung tm cm u tin. S dng o Euclide tnh khong cch gia cc i tng. 17

N TP
Chng 5 : 22. Phng php gom nhm theo kiu phn cp : thut ton Agnes?
Lu : phn bit gia 2 cch tnh khong cch gia 2 nhm theo Single link v Complete link. Khi v s hnh cy: cn th hin r th t gom nhm cng nh v tr trn trc Y khong cch m cc nhm c gom li vi nhau.

18

N TP
Chng 5 : 23. Phng php gom nhm da trn mt : thut ton DBSCAN? So snh u khuyt im ca cc phng php gom nhm.
Cc khi nim chnh : Eps, MinPts. S dng PM WEKA.

19

N TP
Chng 6 : 24. Th no l Text mining? Cc lnh vc lin quan? Cc nhim v ca text mining ?Trnh by v d ng dng thc t .
Mt nhnh ca khai thc d liu. Mc ch : tm kim v rt trch tri thc t ti liu vn bn Cc lnh vc lin quan: X l ngn ng t nhin, Rt trch thng tin, Truy vn thng tin, Web mining, Data mining chun. Phn loi vn bn, gom nhm ti liu, xy dng tm tt, d on, theo vt xu hng, http://www.kdnuggets.com/software/text .html
20

10

N TP
Chng 6 : 25. Th no l Web mining? Phn loi web mining ? Trnh by v d ng dng thc t.

Web mining = Data mining ( p dng cho ti liu Web v cc dch v) + Web technology. Web Content Mining : Tm tri thc t ni dung Web ( nhiu loi d liu nh ti liu, hnh nh, audio, video, hyperlinks, ) Web Structure Mining : Tm cc m hnh nm di cc cu trc lin kt ca Web Web Usage Mining : Tm cc tri thc t hnh vi v qu trnh s dng web ca ngi dng http://www.kdnuggets.com/solutions/webmining.html
21

tr thnh chuyn gia trong lnh vc Khai thc D liu, cn


Tm hiu nhiu hn Xy dng ng dng
thc t

22

11

23

12

You might also like