You are on page 1of 14

8/15/2012

i Hc Khoa Hc T Nhin TP.HCM Khoa Cng Ngh Thng Tin

WEKA

Bin son: Nguyn Hong Khai

Ni dung
Gii thiu Weka
Lch s pht trin Cc phin bn Cc chng trnh con nh dng tp tin Tin x l d liu Khai thc lut kt hp Phn lp Gom nhm
2

Kho st d liu vi Explorer

8/15/2012

Weka
Waikato Environment for Knowledge Analysis c pht trin bi H Waikato, New Zealand. L phn mm m ngun m tch hp cc thut ton my hc & khai thc d liu Cung cp cc chc nng c bn ca khai thc d liu: Kho st d liu Thng k & hc my Thc nghim & kim chng m hnh S dng trong nghin cu, ging dy v ng dng.

Lch s pht trin


1993: Xy dng nn tng kin trc v giao din 1994: Pht hnh phin bn Weka u tin 1997: Bt u xy dng li Weka bng Java 1999: Hon thin Weka 100% Java 2005: SIGKDD Data Mining and Knowledge Discovery Service Award

Xp hng trn Sourgeforce.net (2009/09/16): 245 (1,630,936 lt download)

8/15/2012

Cc phin bn Weka
Snapshot: Cc bn v li mi nht, cp nht lin tc Book: Phin bn gm cc chc nng c m t theo cun Data Mining: Practical Machine Learning Tools and Techniques ca Ian Witten v Eibe Frank. (3.4.x) Developer: Phin bn h tr nhiu tnh nng mi nhng cha n nh (3.7.x) Stable: Phin bn n nh nht. (3.6.x) Trang ch Weka: http://www.cs.waikato.ac.nz/ml/weka/
Source code: https://svn.scms.waikato.ac.nz/svn/weka/
5

Cc chng trnh con

8/15/2012

Explorer
Chng trnh chnh ca Weka. Kho st d liu, tin x l, phn lp, gom nhm

Simple CLI
Tng t Explorer, giao din command-line

8/15/2012

Experimenter
Thc nghim, kim chng v so snh cc m hnh my hc trn cc tp d liu

KnowledgeFlow
Gm cc chc nng ca Explorer, ng thi cho php kt ni & tng hp cc thut ton vi giao din th trc quan

10

8/15/2012

nh dng ARFF
1. Phn Header: Cha cc khai bo @relation <tn d liu> @attribute <tn thuc tnh 1> <Kiu d liu> @attribute <tn thuc tnh 2> <Kiu d liu> @attribute <tn thuc tnh n> <Kiu d liu> 2. Phn d liu: Cha d liu @data Instance 1 Instance 2 Instance n

11

nh dng ARFF
Cc kiu d liu
Numeric Nominal String Date @ATTRIBUTE name numeric @ATTRIBUTE class {setosa, versicolor} @ATTRIBUTE name string @ATTRIBUTE discovered date

D liu thiu: K hiu ?

12

8/15/2012

nh dng ARFF
@relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ...
13

Kho st d liu vi Explorer


Tin x l Khai thc lut kt hp Phn lp Gom nhm Chn lc thuc tnh Biu din d liu

Phin bn s dng trong slide: Weka 3.6.1


14

8/15/2012

Hin th thng tin v d liu


D liu: Tn d liu S mu S thuc tnh Thuc tnh Tn thuc tnh Kiu d liu S mu thiu gi tr S gi tr khc nhau S mu c gi tr khc bit Cc thng tin khc Biu quan h gia cc thuc tnh

15

Tin x l
Cc tc v tin x l trong Weka c gi l Filter

Cu hnh Filter

16

8/15/2012

Cc b lc h tr bi Weka
Cung cp 68 b lc. H tr cc thng dng nh ly mu li, ri rc ha, chun ha, thay th gi tr thiu, chn lc thuc tnh, tng hp thuc tnh, Lc mu: Resample, Randomize, Normalize, Remove Frequent Values, Lc thuc tnh: Dicretize, Replace Missing Values, Add, Copy, Remove, Reorder, Add Values, Add Noise, ... Cho php thit lp cu hnh cho tng b lc. Cho php kt hp nhiu b lc khc nhau (Multi Filter).
17

V d: Ri rc ha d liu
Trc khi ri rc

Ri rc bng b lc Discretize. Chia 5 gi theo su Kt qu

18

8/15/2012

Khai thc lut kt hp


Cung cp 6 thut ton khai thc lut kt hp:
Apriori Predictive Apriori Filtered Associator Generalized Sequence Patterns Hotspot Tertius

19

Phn lp
Khong 80 thut ton phn lp, c chia thnh nhiu nhm:
Bayes: Nave Bayes, Bayes Net, Hm phn lp: Gaussian, Liner Regression, SVM, Cy: ID3, J48, NBTree, Cc thut ton phn lp da trn lut. Lazy: K*,

20

10

8/15/2012

Gom nhm
Cung cp 11 thut ton gom nhm:
DBScan Simple K Means X Means Farthest First EM

21

V d Khai thc lut kt hp vi Weka


Apriori FPGrowth Filtered Associator Generalized Sequental Patterns Hot Spot Predictive Apriori Tertius

22

11

8/15/2012

Apriori
Khai thc tp ph bin & lut kt hp H tr nhiu o tnh l th Tt c cc thuc tnh phi l ri rc

Cc tham s ca Apriori

12

8/15/2012

Cc tham s ca Apriori
car: Ch c thuc tnh phn lp v phi ca lut
Dng false khai thc tt c LKH.

classIndex: V tr thuc tnh phn lp trong d liu


Mc nh: -1 (thuc tnh cui cng)

lowerBoundMinSupport: gi tr MinSup ca tp ph bin. outputItemSets: xut cc tp ph bin.

Cc tham s ca Apriori
metricType: o tnh l th ca LKH
Confident (mc nh) Lift Leverage Conviction

minMetric: l th chp nhn c numRules: s LKH cn khai thc


Thng dng s ln khai thc tt c.

significanceLevel: Tnh li 4 o l th cho LKH sau khi khai thc

13

8/15/2012

V d
D liu: Weather.Nominal Numrules: 10, minsup: 0.1 minconf = 0.8 Kt qu:
1. outlook=overcast 4 ==> play=yes 4 conf:(1) 2. temperature=cool 4 ==> humidity=normal 4 conf:(1) 3. humidity=normal windy=FALSE 4 ==> play=yes 4 conf:(1) 4. outlook=sunny play=no 3 ==> humidity=high 3 conf:(1) 5. outlook=sunny humidity=high 3 ==> play=no 3 conf:(1) 6. outlook=rainy play=yes 3 ==> windy=FALSE 3 conf:(1) 7. outlook=rainy windy=FALSE 3 ==> play=yes 3 conf:(1) 8. temperature=cool play=yes 3 ==> humidity=normal 3 conf:(1) 9. humidity=normal 7 ==> play=yes 6 conf:(0.86) 10. play=no 5 ==> humidity=high 4 conf:(0.8)

14

You might also like