You are on page 1of 16

i hc Quc Gia Thnh ph H Ch Minh

Trng i hc Khoa Hc T Nhin


Khoa Cng Ngh Thng Tin
B mn Khoa Hc My Tnh

Khai thc d liu v ng dng

Ti liu tham kho

HNG DN S DNG
WEKA EXPLORER 3.6.3

________________________________________________

________________________________________________

Thng 8/2011

Weka Explorer 3.6.3

CTT305 Khai thc d liu & ng dng

MC LC
1. Gii thiu..................................................................................................................................................................................1
1.1. Cc chc nng ca Weka Explorer ....................................................................................................................1
1.2. Kho st d liu ...........................................................................................................................................................1
2. Tin x l d liu .................................................................................................................................................................3
3. Tp ph bin & lut kt hp ..........................................................................................................................................5
4. Phn loi ...................................................................................................................................................................................8
5. Gom cm................................................................................................................................................................................10
6. Mt s nh dng tp tin ...............................................................................................................................................12

Weka Explorer 3.6.3

CTT305 Khai thc d liu & ng dng

1. Gii thiu
1.1. Cc chc nng ca Weka Explorer

Cc chc nng chnh ca Weka Explorer th hin trong cc th (tab) ca mn hnh


chnh, bao gm:
Preprocess: Cho php m, iu chnh, lu mt tp tin d liu, th ny cha cc
thutt ton p dng trong tin x l d liu.
Classify: Cung cp cc m hnh phn loi d liu hoc hi quy.
Cluster: Cung cp cc m hnh gom cm.
Associate: Khai thc tp ph bin v lut kt hp.
SelectAttributes: La chn cc thuc tnh thch hp nht trong tp d liu
Visualize: Th hin d liu di dng biu

1.2. Kho st d liu


S dng th Preprocess
(1) Open file: M mt tp tin d liu.
(2) Edit: Hin th v chnh sa d liu bng tay nu cn thit.
(3) Save: Lu d liu hin ti ra tp tin.
Weka Explorer h tr mt s nh dng trong c 2 nh dng chnh cn quan tm l
*.arff v *.csv (Xem phn 6)
(4) Filter: Cc tc v tin x l c gi l cc b lc, (xem phn 2).

B mn KHMT | Khoa CNTT | H KHTN TP HCM

Trang 1

Weka Explorer 3.6.3

CTT305 Khai thc d liu & ng dng

(5) Selected attribute: Thng tin v thuc tnh ang c chn:


o Type: Kiu d liu ca thuc tnh (Numeric: Dng s, Nominal: Dng ri rc/phi
s).
o Missing: S mu thiu gi tr trn thuc tnh ang xt
o Distinct: S gi tr phn bit
o Unique: S mu khng c gi tr trng vi mu khc
o Bng thng k:
Dng phi s:Th hin cc gi tr v tn sut ca mi gi tr

Dng s:Th hin mt s i lng thng k nh gi tr nh nht, ln nht,


gi tr trung bnh v lch chun.

B mn KHMT | Khoa CNTT | H KHTN TP HCM

Trang 2

Weka Explorer 3.6.3

CTT305 Khai thc d liu & ng dng

2. Tin x l d liu
Choose: Chn mt b lc.

Textbox: Cc tham s ca b lc chn, click vo y thay i tham s.

o Thng thng, vi nhng b lc c th p dng trn cc thuc tnh ring l s cho


php la chn tm nh hng ca b lc i vi nhng thuc tnh ngi dng
quan tm.
o More: Hin th thng tin chi tit v b lc.
o Capabilities: Cc yu cu cn thit i vi d liu thc hin b lc.
Apply: Thc thi b lc vi cc tham s xc nh trn d liu hin ti.

B mn KHMT | Khoa CNTT | H KHTN TP HCM

Trang 3

Weka Explorer 3.6.3

CTT305 Khai thc d liu & ng dng

V d: Unsupervised.Attribute.Discretize
o Hnh bn di l mn hnh iu chnh tham s cho phng php chia gi, trong
c cc tham s nh s lng gi (bins), chia gi theo rng/ su
(useEqualFrequency),

V d: Unsupervised.Attribute.Normalize: Chun ha min-max vi tham s gii hn


(scale) v gi tr nh nht (translation).

B mn KHMT | Khoa CNTT | H KHTN TP HCM

Trang 4

Weka Explorer 3.6.3

CTT305 Khai thc d liu & ng dng

3. Tp ph bin & lut kt hp


S dng th Asscociate

Associator: Phng php khai thc lut kt hp.


o Choose: La chn mt phng php
o Textbox: Thay i tham s cho phng php la chn

V d: Apriori: Khai thc tp ph bin v lut kt hp.


o [lowerBoundMinSupport, upperBoundMinSupport]: ph bin ca cc tp
hng mc khai thc c s nm trong khong ny.
o metricType: o tnh l th ca lut kt hp, gm c Confidence, Lift,
Leverage, Conviction.
o minMetric: Cc lut khai thc c s c o tha gi tr ny.

B mn KHMT | Khoa CNTT | H KHTN TP HCM

Trang 5

Weka Explorer 3.6.3

CTT305 Khai thc d liu & ng dng

o numRule v delta: Thut ton lun khi ng vi mc l th mc tiu cao


nht. Khis lut t con s numRule, thut ton s dng, ngc li gi tr ca
minMetric s gim mt lng delta tm cc lut c o l th thp hn.
o outputItemsets: Kt xut tp ph bin trong kt qu.

Th hin kt qu:
o Tp ph bin: Danh sch cc hng mc v ph bin

o Lut kt hp: Lut v o l th.

B mn KHMT | Khoa CNTT | H KHTN TP HCM

Trang 6

Weka Explorer 3.6.3

CTT305 Khai thc d liu & ng dng

V d: FP-Growth, Khai thc lut kt hp


Ngoi cc tham s nh ca Apriori, FP-Growth trong Weka cn c h tr mt s tin
ch khc:
o findAllRulesForSupportedLevel: Khai thc tt c cc lut vi o lng
chn.
o maxNumberofItems:S hng mc ti a trong lt khai thc c.
o rulesMustContainv transactionsMustContain:Ch khai thc trn cc hng mc
c quan tm.

B mn KHMT | Khoa CNTT | H KHTN TP HCM

Trang 7

Weka Explorer 3.6.3

CTT305 Khai thc d liu & ng dng

4. Phn loi
S dng th Classify.
(1): Classifier: La chn b phn loi v cc tham s.
(2): Test Options: Cc ty chn kim th m hnh:
o Use training set: S dng chnh tp d liu hun luyn kim nghim.
o Supplied test set: S dng mt tp d liu khc.
o Cross-validation: Chia d liu thnh nhiu phn (Folds) thc hin nhiu ln
nh gi kt qu.
o Percentage split: Chia d liu thnh 2 phn theo t l %, mt phn dng xy
dng m hnh, phn cn li dnh cho kim th.
o More Options: iu chnh mt s tham s khc:
- Output predictions:
Tr ra kt qu phn loi chi tit cho tng mu
trong d liu kim nghim.
- Preserve order for % Split:
Chia cc mu vo tp hun luyn v kim th
khng theo cch la chn ngu nhin. Th t
nh trong d liu hin ti c gi nguyn.
- iu chnh vic kt xut mt s thng tin.

(3): Result list: Danh sch kt qu cc ln chy thut ton, c th tng tc trn danh
sch ny thc hin mt cc chc nng ph.
- Load model, Save model: M/Lu m hnh
phn loi ra tp tin.
- Visualize tree: Mt s b phn loi s dng cy
quyt nh c th cho hnh nh cy.

B mn KHMT | Khoa CNTT | H KHTN TP HCM

Trang 8

Weka Explorer 3.6.3

CTT305 Khai thc d liu & ng dng

(4): Classifier output:


Kt qu sau c lit k bng vn bn vi nhng phn phn bit nh sau:
o Run information:
Thng tin chung v thut ton c s dng, tp d liu.
o Classifier model

B mn KHMT | Khoa CNTT | H KHTN TP HCM

Trang 9

Weka Explorer 3.6.3

CTT305 Khai thc d liu & ng dng

Chi tit m hnh phn loi, tuy nhin i vi mt s b phn loi th m


hnh phn loi khng th hin y thng tin bng vn bn c.
o Summary
Lit k thng tin tng qut v mc chnh xc ca b phn loi trong th
nghim va thc thi.

o Detailed Accuracy By Classv Confusion Matrix


Chi tit kt qu chnh xc ca b phn loi trn tng phn lp.

5. Gom cm
S dng th Cluster.
(1): Clusterer: La chn m hnh gom cm v cc tham s.
(2): Cluster mode: Cc ty chn kim th m hnh:
o Use training set: S dng chnh tp d liu hun luyn kim nghim.
o Supplied test set: S dng mt tp d liu khc.
o Percentage split: Chia d liu thnh 2 phn theo t l %, mt phn dng xy
dng m hnh, phn cn li dnh cho kim th.
o Classes to clusters evaluation:
Gomcmtrntonbdliuvnhgivitiuchlilthpnht.Viphngphpn
y ta cthpdngccphngphpnhngoikhostchtlnggomcm.
Ignore attributes: B qua ccthuctnhchnhkhitinhnhgomcm.
B mn KHMT | Khoa CNTT | H KHTN TP HCM

Trang 10

Weka Explorer 3.6.3

CTT305 Khai thc d liu & ng dng

(3): Clusterer output: Chaccktqugomcm.


o Thng tin mhnh:cthhintytheobgomcmcsdng
Vdivithutton Farthest First ththng tin
chinthbaogmtrngtmcaccnhm, cnvithutton HAC
thldanhschccnhm qua
mivnglp.TrongktqucathuttonKmeanscncthng tin vchs SSE.

B mn KHMT | Khoa CNTT | H KHTN TP HCM

Trang 11

Weka Explorer 3.6.3

CTT305 Khai thc d liu & ng dng

o Ktqugomcm: Thhinsmugomcmc/khnggomcmc.
iviphngphpnhgiClasses to clusters evaluationthcncthng tin
vsmubgomcmsai.

6. Mt s nh dng tp tin
Attribute-Relation File Format (*.arff)
o L tp tin vn bn, gm 2 phn:

Phn khai bo (header)

B mn KHMT | Khoa CNTT | H KHTN TP HCM

Trang 12

Weka Explorer 3.6.3

CTT305 Khai thc d liu & ng dng

Phn d liu (data)

o Phn khai bo:


@relation <tn d liu>
@attribute <tn thuc tnh 1><Kiu d liu>
@attribute <tn thuc tnh 2><Kiu d liu>

@attribute <tn thuc tnh n><Kiu d liu>


o Cc kiu d liu
Numeric
D liu dng s
V d: @ATTRIBUTE name numeric
Nominal
D liu ri rc
V d: @ATTRIBUTE class {setosa, versicolor}
String
D liu chui
V d: @ATTRIBUTE name string
Date
D liu kiu ngy V d: @ATTRIBUTE discovered date
D liu thiu c k hiu bng du chm hi ?
o Phn d liu:
Mi mu d liu c t trn mt dng, gi tr ca cc thuc tnh c lit k
theo th t t tri qua phi v ngn cch bi du phy ,
Comma Separated Values (*.csv)
o L tp tin vn bn
o Cu trc tng t phn d liu ca tp tin arff: Cc mu c lu trn mt dng,
cc thuc tnh c ngn cch bng du phy.
o Dng u tin cha tn cc thuc tnh.
V d:
Mt tp tin csv c ni dung nh sau:

B mn KHMT | Khoa CNTT | H KHTN TP HCM

Trang 13

Weka Explorer 3.6.3

CTT305 Khai thc d liu & ng dng

C ngha l d liu ny gm c 14 mu v 5 thuc tnh (outlook, temperature, humidity,


windy, play).
Hin th tp tin ny bng arffViewer:

B mn KHMT | Khoa CNTT | H KHTN TP HCM

Trang 14

You might also like