Professional Documents
Culture Documents
WEKA
Ni dung
Gii thiu Weka
Lch s pht trin Cc phin bn Cc chng trnh con nh dng tp tin Tin x l d liu Khai thc lut kt hp Phn lp Gom nhm
2
8/15/2012
Weka
Waikato Environment for Knowledge Analysis c pht trin bi H Waikato, New Zealand. L phn mm m ngun m tch hp cc thut ton my hc & khai thc d liu Cung cp cc chc nng c bn ca khai thc d liu: Kho st d liu Thng k & hc my Thc nghim & kim chng m hnh S dng trong nghin cu, ging dy v ng dng.
8/15/2012
Cc phin bn Weka
Snapshot: Cc bn v li mi nht, cp nht lin tc Book: Phin bn gm cc chc nng c m t theo cun Data Mining: Practical Machine Learning Tools and Techniques ca Ian Witten v Eibe Frank. (3.4.x) Developer: Phin bn h tr nhiu tnh nng mi nhng cha n nh (3.7.x) Stable: Phin bn n nh nht. (3.6.x) Trang ch Weka: http://www.cs.waikato.ac.nz/ml/weka/
Source code: https://svn.scms.waikato.ac.nz/svn/weka/
5
8/15/2012
Explorer
Chng trnh chnh ca Weka. Kho st d liu, tin x l, phn lp, gom nhm
Simple CLI
Tng t Explorer, giao din command-line
8/15/2012
Experimenter
Thc nghim, kim chng v so snh cc m hnh my hc trn cc tp d liu
KnowledgeFlow
Gm cc chc nng ca Explorer, ng thi cho php kt ni & tng hp cc thut ton vi giao din th trc quan
10
8/15/2012
nh dng ARFF
1. Phn Header: Cha cc khai bo @relation <tn d liu> @attribute <tn thuc tnh 1> <Kiu d liu> @attribute <tn thuc tnh 2> <Kiu d liu> @attribute <tn thuc tnh n> <Kiu d liu> 2. Phn d liu: Cha d liu @data Instance 1 Instance 2 Instance n
11
nh dng ARFF
Cc kiu d liu
Numeric Nominal String Date @ATTRIBUTE name numeric @ATTRIBUTE class {setosa, versicolor} @ATTRIBUTE name string @ATTRIBUTE discovered date
12
8/15/2012
nh dng ARFF
@relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ...
13
8/15/2012
15
Tin x l
Cc tc v tin x l trong Weka c gi l Filter
Cu hnh Filter
16
8/15/2012
Cc b lc h tr bi Weka
Cung cp 68 b lc. H tr cc thng dng nh ly mu li, ri rc ha, chun ha, thay th gi tr thiu, chn lc thuc tnh, tng hp thuc tnh, Lc mu: Resample, Randomize, Normalize, Remove Frequent Values, Lc thuc tnh: Dicretize, Replace Missing Values, Add, Copy, Remove, Reorder, Add Values, Add Noise, ... Cho php thit lp cu hnh cho tng b lc. Cho php kt hp nhiu b lc khc nhau (Multi Filter).
17
V d: Ri rc ha d liu
Trc khi ri rc
18
8/15/2012
19
Phn lp
Khong 80 thut ton phn lp, c chia thnh nhiu nhm:
Bayes: Nave Bayes, Bayes Net, Hm phn lp: Gaussian, Liner Regression, SVM, Cy: ID3, J48, NBTree, Cc thut ton phn lp da trn lut. Lazy: K*,
20
10
8/15/2012
Gom nhm
Cung cp 11 thut ton gom nhm:
DBScan Simple K Means X Means Farthest First EM
21
22
11
8/15/2012
Apriori
Khai thc tp ph bin & lut kt hp H tr nhiu o tnh l th Tt c cc thuc tnh phi l ri rc
Cc tham s ca Apriori
12
8/15/2012
Cc tham s ca Apriori
car: Ch c thuc tnh phn lp v phi ca lut
Dng false khai thc tt c LKH.
Cc tham s ca Apriori
metricType: o tnh l th ca LKH
Confident (mc nh) Lift Leverage Conviction
13
8/15/2012
V d
D liu: Weather.Nominal Numrules: 10, minsup: 0.1 minconf = 0.8 Kt qu:
1. outlook=overcast 4 ==> play=yes 4 conf:(1) 2. temperature=cool 4 ==> humidity=normal 4 conf:(1) 3. humidity=normal windy=FALSE 4 ==> play=yes 4 conf:(1) 4. outlook=sunny play=no 3 ==> humidity=high 3 conf:(1) 5. outlook=sunny humidity=high 3 ==> play=no 3 conf:(1) 6. outlook=rainy play=yes 3 ==> windy=FALSE 3 conf:(1) 7. outlook=rainy windy=FALSE 3 ==> play=yes 3 conf:(1) 8. temperature=cool play=yes 3 ==> humidity=normal 3 conf:(1) 9. humidity=normal 7 ==> play=yes 6 conf:(0.86) 10. play=no 5 ==> humidity=high 4 conf:(0.8)
14