Professional Documents
Culture Documents
(DATA MINING)
GV : NGUYN HONG T ANH
1
N TP
2
Laptop. 3. thi gm 3 cu
: Tp ph bin v lut kt hp b) Cu 2 (2 im): ni dung thuc chng 4 v 5 : Phn lp, gom nhm c) Cu 3 (0.5 im): cu hi dng phn tch,
N TP
Chng 1 : 1. Khai thc d liu l g?
S pht trin ca cng ngh phn cng, cng ngh thu thp & qun l DL, cc thut ton pht trin mnh. Nhu cu phn tch d liu h tr qu trnh ra quyt nh. 5
N TP
Chng 1 : 3. Cc kiu d liu v cc loi thng tin no c th c s dng trong qu trnh khm ph tri thc t d liu (KDD)?
Cc loi thng tin : thng tin thng mi, sn xut, khoa hc v thng tin c nhn. Cc kiu d liu : DL c thuc tnh dng phi s, dng lin tc, DL dng tnh, ng, DL phn tn, DL vn bn, web, a phng tin: hnh nh, audio, video,
6 www.kdnuggets.com/solutions/index.html
N TP
Chng 1 : 5. Khai thc d liu gm cc loi nhim v no?
Nhim v m t v d on. Da trn kinh nghim bn thn, theo cc em loi nhim v no ca khai thc d liu c quan tm nhiu nht v c ng dng nhiu trong : Lnh vc kinh doanh Lnh vc gio dc
N TP
Chng 2 : 6. Ti sao cn chun b DL?
Do cht lng DL trong thc t l xu. Cht lng DL s nh hng n qu trnh ra quyt nh.
v d c th tng bc.
N TP
Chng 3 : 9. Pht biu bi ton khai thc tp ph bin, khai thc lut kt hp?
Bi ton khai thc tp ph bin l bi ton tm tt c cc tp cc hng mc S (hay tp ph bin S) c ph bin tha mn ph bin ti thiu minsupp: supp(S) minsupp
Bi ton khai thc LKH l bi ton tm tt c cc lut dng X Y (X, Y I v X Y = {}) tha mn ph bin v tin cy ti thiu supp (X Y ) minsupp 9 conf (X Y ) minconf
N TP
Chng 3 : 10. Trnh by cc tnh cht ca tp ph bin? Tp ph bin ti i, tp ph bin ng.
Tt c cc tp con ca tp ph bin u l tp ph bin . Nu tp con khng ph bin th tp bao n (tp cha) khng ph bin. Tp ph bin ti i : l tp ph bin & khng tn ti tp no bao n l ph bin Tp ph bin ng: l tp ph bin & khng tn ti tp no bao n c cng ph bin nh n.
10
N TP
Chng 3 : 11. Trnh by qui trnh tm lut kt hp? Hy a ra ci tin cho phng php to lut kt hp t tp ph bin (trong Bc 2 ca qui trnh)? Gii thch v sao n hiu qu hn. Cho v d minh ha. B1 : Tm tt c cc tp ph bin (theo ngng minsupp) B2 : To ra cc lut t cc tp ph bin ( tm thy t bc 1)
i vi mi tp ph bin S, to ra tt c cc tp con khc rng ca S. i vi mi tp con khc rng A ca S,
o Lut A (S - A) l lut kt hp cn tm nu : conf (A (S - A)) = supp(S) / supp(A) minconf
Nghin cu ci tin B2 trong ti liu tham kho P.-N. Tan, M. Steinbach, V. Kumar, Chng 6 - Introduction to Data Mining http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf, pp.23
11
N TP
Chng 3 : 12. Thut ton Apriori? Hy trnh by mt phng php ci tin thut ton Apriori.
Lu : Ch to cc ng vin Ck+1 t tp ph bin Lk theo ng nguyn tc kt ng vin v cn thc hin bc loi b cc ng vin cha t nht mt tp con khng ph bin. Phng php ci tin : nghin cu mt trong cc ti liu tham kho 12 trong bi ging.
N TP
Chng 3 : 13. Thut ton Fp_growth? So snh vi Apriori.
Lu : Trc khi to cy Fp cng nh cy Fp-iu kin, phi sp xp cc giao dch/c s mu iu kin theo th t ca danh sch f-list/cc tp mt hng mc ph bin theo chiu gim dn.
Interest ( X Y ) P( X , Y ) P( X ) * P(Y )
lut
13
N TP
Chng 4 : 15. Pht biu bi ton phn lp ? Cho v d ng dng trong thc t .
Cho CSDL D={t1,t2,,tn} v tp cc lp C={C1,,Cm}, phn lp l bi ton xc nh nh x f : DC sao cho mi ti c gn vo mt lp. http://www.kdnuggets.com/software/classif ication.html
nh?
p dng cho DL gm cc thuc tnh c dng phi s. Da trn ng nht ca DL phn chia: IG, Gini index.
14
N TP
Chng 4 : 17. Phng php phn lp da trn lut?
Xy dng lut trc tip/gin tip t DL: Thut ton ILA/ rt lut t cy quyt nh. Thut ton ILA : ch th t cc t hp thuc tnh trong danh sch cc thuc tnh kt hp.
xc sut:Nave Bayes?
p dng cho DL gm cc thuc tnh c dng phi s v c dng s/gi tr lin tc. Nn s dng phng php lm trn (Laplace) khi c lng cc xc sut.
15
N TP
Chng 4 : 19. Phng php phn lp da trn th hin: thut ton k-NN? So snh cc phng php phn lp.
Gn mu mi vo lp c s mu chim a s trong k mu lng ging gn n nht (hoc mu mi nhn gi tr trung bnh ca k mu). Nn chun ha gi tr thuc tnh trc khi thc hin thut ton k-NN. vi min vi ai 16 max vi min vi
N TP
Chng 5 : 20. Pht biu bi ton gom nhm ? Cho v d ng dng trong thc t.
Cho CSDL D={t1,t2,,tn} v s nguyn k, gom nhm l bi ton xc nh nh x f : D {1,,k} sao cho mi ti c gn vo mt nhm (lp) Kj, 1jk. http://www.kdnuggets.com/software/clus tering.html
phc tp ca thut ton ph thuc vo vic la chn k trung tm cm u tin. S dng o Euclide tnh khong cch gia cc i tng. 17
N TP
Chng 5 : 22. Phng php gom nhm theo kiu phn cp : thut ton Agnes?
Lu : phn bit gia 2 cch tnh khong cch gia 2 nhm theo Single link v Complete link. Khi v s hnh cy: cn th hin r th t gom nhm cng nh v tr trn trc Y khong cch m cc nhm c gom li vi nhau.
18
N TP
Chng 5 : 23. Phng php gom nhm da trn mt : thut ton DBSCAN? So snh u khuyt im ca cc phng php gom nhm.
Cc khi nim chnh : Eps, MinPts. S dng PM WEKA.
19
N TP
Chng 6 : 24. Th no l Text mining? Cc lnh vc lin quan? Cc nhim v ca text mining ?Trnh by v d ng dng thc t .
Mt nhnh ca khai thc d liu. Mc ch : tm kim v rt trch tri thc t ti liu vn bn Cc lnh vc lin quan: X l ngn ng t nhin, Rt trch thng tin, Truy vn thng tin, Web mining, Data mining chun. Phn loi vn bn, gom nhm ti liu, xy dng tm tt, d on, theo vt xu hng, http://www.kdnuggets.com/software/text .html
20
10
N TP
Chng 6 : 25. Th no l Web mining? Phn loi web mining ? Trnh by v d ng dng thc t.
Web mining = Data mining ( p dng cho ti liu Web v cc dch v) + Web technology. Web Content Mining : Tm tri thc t ni dung Web ( nhiu loi d liu nh ti liu, hnh nh, audio, video, hyperlinks, ) Web Structure Mining : Tm cc m hnh nm di cc cu trc lin kt ca Web Web Usage Mining : Tm cc tri thc t hnh vi v qu trnh s dng web ca ngi dng http://www.kdnuggets.com/solutions/webmining.html
21
22
11
23
12