You are on page 1of 33

Trng i hc Khoa hc T nhin Khoa Cng ngh Thng tin

GII THIU PHN MM WEKA


Gio vin hng dn thc hnh: Nguyn Ngc Tho Nguyn Hi Minh

NI DUNG TRNH BY
Gii thiu phn mm Weka Cc chc nng ca phn mm Tm hiu ng dng Explorer

NI DUNG TRNH BY
Gii thiu phn mm Weka Cc chc nng ca phn mm Tm hiu ng dng Explorer

LCH S PHT TRIN


WEKA Waikato Environment for Knowledge Analysis. L phn mm khai thc d liu, thuc d n nghin cu ca i hc Waikato, New Zealand. Mc tiu: xy dng mt cng c hin i nhm pht trin cc k thut my hc v p dng chng vo bi ton khai thc d liu trong thc t.

LCH S PHT TRIN


1993 i hc Waikato, New Zealand, khi ng d n, xy dng phin bn u tin ca Weka. 1997 Quyt nh xy dng li Weka t u bng Java, c ci t cc thut ton m hnh ha. 2005 Weka nhn gii thng SIGKDD Data Mining and Knowledge Discovery Service Award. Xp hng trn Sourceforge.net t 25-06-2007: 241 (907,318 lt).
5

CU TRC PHN MM
WEKA c xy dng bng ngn ng Java, cu trc gm hn 600

lp, t chc thnh 10 packages.


Cc chc nng chnh ca phn mm: Kho st d liu: tin x l d liu, phn lp, gom nhm d liu,

v khai thc lut kt hp.


Thc nghim m hnh: cung cp phng tin kim chng, nh gi cc m hnh hc. Biu din trc quan d liu bng nhiu dng th khc nhau.

TRANG CH WEKA
Trang ch: http://www.cs.waikato.ac.nz/ml/weka/

TRANG CH WEKA
D n my hc Weka
Lch s pht trin Thnh phn nhn s Bi bo cng b

Phn mm Weka
Phin bn ci t Hng dn s dng Cc tp d liu

Ti liu tham kho

CC PHIN BN WEKA
Snapshots l cc bn v li mi nht, thng l c cp nht hng m. Book versions l cc phin bn th hin nhng chc nng c m t trong quyn sch Data Mining: Practical Machine Learning Tools and Techniques (2nd Edition) ca Ian.H.Witten v Eibe Frank. Developer versions l cc phin bn th nghim, h tr nhiu tnh nng mi nhng cn cha n nh.

NI DUNG TRNH BY
Gii thiu phn mm Weka Cc chc nng ca phn mm Hng dn s dng ng dng Explorer

10

KHO ST D LIU
Explorer: l ng dng con cho php thc nghim cc nhim v

khai thc d liu thng gp nh:


Tin x l d liu Khai thc lut kt hp

Phn lp
Gom nhm

11

KHO ST D LIU

12

THC NGHIM M HNH


Experimenter: l ng dng con cung cp mi trng thc nghim

kim chng cc m hnh hc, so snh vi nhau nh gi.

13

KT NI THNG TIN
ArffViewer: l ng dng con trnh by ni dung tp d liu c nh

dng *.ARFF thnh bng d liu.


SqlViewer: cho php kt ni vi c s d liu (MySQL, PostGre) v truy vn ly thng tin.

14

BIU DIN TRC QUAN


Weka h tr ngi dng biu din trc quan d liu qua nhng

dng biu thng dng: biu trc, cy, th, biu vng.

15

BIU DIN TRC QUAN

16

BIU DIN TRC QUAN

17

NI DUNG TRNH BY
Gii thiu phn mm Weka Cc chc nng ca phn mm Tm hiu ng dng Explorer

18

CC CHC NNG
Khai thc LKH Tin x l d liu Chn lc thuc tnh

Phn lp

Gom nhm
19

TIN X L D LIU
Hin th thng tin v d liu ang xt

Tp d liu: tn, s mu, s thuc tnh.


Cc thuc tnh: tn, kiu d liu, gi tr thuc tnh, t l %... Biu minh ha thng tin. Cung cp cc b lc d liu thng dng, v d: ReplaceMissingValues: thay th gi tr thiu. Normalize: chun ha d liu v on [0, 1]. Discretize: ri rc ha d liu.

20

TIN X L D LIU

21

KHAI THC LUT KT HP


Cung cp cc thut ton khai thc lut kt hp

Apriori
PredictiveApriori: l ci tin ca thut ton Apriori.

22

KHAI THC LUT KT HP

23

PHN LP
Cung cp rt nhiu thut ton phn lp, c gom thnh cc nhm da trn c s l thuyt hoc chc nng. Bayes: mng Bayes, Nave Bayes... Hm: SVM, cc phng php hi quy, hu tuyn tnh Cy: ID3, C4.5 (J58) Cc phng php phn lp da trn lut. Bagging, AdaBoost

24

PHN LP

25

GOM NHM
Cung cp cc thut ton gom nhm ph bin, v d:

DBSCan
EM (Expectation Maximization). K-Means

26

GOM NHM

27

CU TRC TP TIN ARFF


ARFF l nh dng d liu chuyn bit ca Weka, t chc d liu theo cu trc c qui nh trc. Cu trc tp tin *.ARFF bao gm cc thnh phn: Header: cha khai bo quan h, danh
HEADER

sch cc thuc tnh (tn, kiu d liu).


Data: gm nhiu dng, mi dng th hin gi tr ca cc thuc tnh cho mt mu.
DATA

28

CU TRC TP TIN ARFF


% This is a relation about wather Ch thch @relation weather Tn quan h @attribute outlook {sunny, overcast, rainy} @attribute temperature real Tn thuc tnh kiu DL @attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no overcast,83,86,FALSE,yes 1 mu

29

CU TRC TP TIN ARFF


Cc kiu d liu c h tr trong ARFF bao gm numeric: l kiu d liu s, gm real v integer nominal: l kiu d liu danh sch. string: l kiu d liu dng chui date: kiu d liu thi gian (ngy thng nm, gi pht giy)

30

CU TRC TP TIN ARFF


@relation nhanvien @attribute hoten string @attribute ngaysinh date "dd/MM/yy" @attribute gioitinh {nam, nu} @attribute hesoluong real @data 'Nguyen Van A', 10/12/1957, nam, 1.34 'Tran Thi B', ?, nu, 1.5

31

CU TRC TP TIN ARFF


Dng ghi ch c bt u bng du %. D liu thiu c biu din bng du ?. Chui nu c khong trng phi t trong du nhy n. Cc gi tr trong phn data phi tuyt i theo ng thng tin khai bo trong header.

32

CM N CC BN THEO DI.

33

You might also like