You are on page 1of 9

1) Thut ton ID3

Thut ton ID3 c pht biu bi Quinlan (trng i hc Syney,


Australia) v c cng b vo cui thp nin 70 ca th k 20. Sau , thut ton
ID3 c gii thiu v trnh by trong mc Induction on decision trees, machine
learning nm 1986. ID3 c xem nh l mt ci tin ca CLS vi kh nng la
chn thuc tnh tt nht tip tc trin khai cy ti mi bc. ID3 xy dng cy
quyt nh t trn- xung (top -down) [5] .
1.1.

Entropy o tnh thun nht ca tp d liu : dng o tnh thun nht ca


mt tp d liu. Entropy ca mt tp S c tnh theo cng thc (1)
Entropy(S)= - P + log 2 ( P ) P - log 2 ( P )

(2.1)

Trong trng hp cc mu d liu c hai thuc tnh phn lp "yes" (+),


"no" (-). K hiu p+ l ch t l cc mu c gi tr ca thuc tnh quyt nh l
"yes", v p- l t l cc mu c gi tr ca thuc tnh quyt nh l "no" trong tp S.
Trng hp tng qut, i vi tp con S c n phn lp th ta c cng thc
sau:
n

Entropy(S)=

(- P log
i

( Pi ))

(2.2)

i=1

Trong Pi l t l cc mu thuc lp i trn tp hp S cc mu kim tra.


Cc trng hp c bit
-

Nu tt c cc mu thnh vin trong tp S u thuc cng mt lp th


Entropy(S) =0

Nu trong tp S c s mu phn b u nhau vo cc lp th Entropy(S) =1

Cc trng hp cn li 0< Entropy(S)<1

1.2.) Information Gain (vit tt l Gain): Gain l i lng dng o tnh hiu qu ca
mt thuc tnh c la chn cho vic phn lp. i lng ny c tnh thng qua hai
gi tr Information v Entropy.
-

Cho tp d liu S gm c n thuc tnh Ai(i=1,2n) gi tr Information ca


thuc tnh Ai k hiu l Information(Ai) c xc nh bi cng thc .

Information(A i ) = - log 2 ( pi ) Entropy(S)

(2.3)

i=1

Gi tr Gain ca thuc tnh A trong tp S k hiu l Gain(S,A) v c tnh


theo cng thc sau:

Gain(S , A) Information(A) - Entropy(A)= Entropy(S)-

vvalue(A)

Sv
Entropy(Sv ) (2.4)
S

Trong :
S l tp hp ban u vi thuc tnh A. Cc gi tr ca v tng ng l cc gi
tr ca thuc tnh A.
Sv bng tp hp con ca tp S m c thuc tnh A mang gi tr v.
|Sv| l s phn t ca tp Sv.
|S| l s phn t ca tp S.
Trong qu trnh xy dng cy quyt nh theo thut ton ID3 ti mi bc
trin khai cy, thuc tnh c chn trin khai l thuc tnh c gi tr Gain ln
nht.
Hm xy dng cy quyt nh trong thut ton ID3 [2]
Function induce_tree(tp_v_d, tp_thuc_tnh)
begin
if mi v d trong tp_v_d u nm trong cng mt lp then
return mt nt l c gn nhn bi lp
else if tp_thuc_tnh l rng then
return nt l c gn nhn bi tuyn ca tt c cc lp trong
tp_v_d
else begin
chn mt thuc tnh P, ly n lm gc cho cy hin ti;
xa P ra khi tp_thuc_tnh;
vi mi gi tr V ca P
begin
to mt nhnh ca cy gn nhn V;
t vo phn_vng cc v d trong tp_v_d c gi tr V
V

ti thuc tnh P;
Gi induce_tree(phn_vng , tp_thuc_tnh), gn kt qu
V

vo nhnh V

end
end
end

V d minh ha

Chng ta hy xt bi ton phn loi xem ta c i chi tennis ng vi thi tit


no khng. Gii thut ID3 s hc cy quyt nh t tp hp cc v d sau:
Ngy

Quang
cnh

Nhit

Gi

Chi Tennis

Dl

Nng

Nng

Cao

Nh

Khng

D2

Nng

Nng

Cao

Mnh

Khng

D3

m u

Nng

Cao

Nh

D4

Ma

m p

Cao

Nh

D5

Ma

Mt

Trung bnh

Nh

D6

Ma

Mt

Trung bnh

Mnh

Khng

D7

m u

Mt

Trung bnh

Mnh

D8

Nng

m p

Cao

Nh

Khng

D9

Nng

Mt

Trung bnh

Nh

Dl0

Ma

m p

Trung bnh

Nh

Dl1

Nng

m p

Trung bnh

Mnh

Dl2

m u

m p

Cao

Mnh

Dl3

m u

Nng

Trung bnh

Nh

Dl4

Ma

m p

Cao

Mnh

Khng

Bng 2.1. Tp d liu v d cho chi Tennis


Tp d liu ny bao gm 14 v d. Mi v d biu din cho tnh trng thi
tit gm cc thuc tnh quang cnh, nhit , m v gi; v u c mt thuc
tnh phn loi chi Tennis(c, khng). Khng ngha l khng i chi tennis ng
vi thi tit , C ngha l chi tennis ng vi thi tit . Gi tr phn loi
y ch c hai loi (c, khng), hay cn ta ni phn loi ca tp v d ca khi
nim ny thnh hai lp (classes). Thuc tnh Chi tennis cn c gi l thuc
tnh ch (target attribute).

Mi thuc tnh u c mt tp cc gi tr hu hn. Thuc tnh quang cnh c


ba gi tr: m u , ma , nng; nhit c ba gi tr: nng, mt, m p; m c hai
gi tr: cao, T v gi c hai gi tr: mnh, nh. Cc gi tr ny chnh l k hiu
(symbol) dng biu din bi ton.
T tp d liu rn luyn ny, gii thut ID3 s hc mt cy quyt nh c
kh nng phn loi ng n cc v d trong tp ny, ng thi hy vng trong
tng lai, n cng s phn loi ng cc v d khng nm trong tp ny. Mt cy
quyt nh v d m gii thut ID3 c th quy np c l:

Hnh 2.2. Cy quyt nh thut ton ID3


Cc nt trong cy quyt nh biu din cho mt s kim tra trn mt thuc
tnh no , mi gi tr c th c ca thuc tnh tng ng vi mt nhnh ca
cy. Cc nt l th hin s phn loi ca cc v d thuc nhnh , hay chnh l gi
tr ca thuc tnh phn loi.

Sau khi gii thut quy np c cy quyt nh, th cy ny s c s


dng phn loi tt c cc v d hay th hin (instance) trong tng lai. V cy
quyt nh s khng thay i cho n khi ta cho thc hin li gii thut ID3 trn
mt tp d liu rn luyn khc.
ng vi mt tp d liu rn luyn s c nhiu cy quyt nh c th phn
loi ng tt c cc v d trong tp d liu rn luyn. Kch c ca cc cy quyt
nh khc nhau ty thuc vo th t ca cc kim tra trn thuc tnh.
Ta c: S = [9+, 5-]
Entropy(S) = entropy(9+,5-)
= p+log2p+ p-log2p- = (9/14)log2(9/14) (5/14)log2(5/14)
= 0.940
Values(Quang cnh) =Nng, m u, ma
Snng = [2+, 3-]
Sm u = [4+, 0-]
Sma = [3+, 2-]
Gain ( S , Quangcanh) Entropy( S )

| Sv |
Entropy( S v )
v{nang ,mu ,mua} | S |

= Entropy(S) (5/14)Entropy(Snng) (4/14)Entropy(Sm u)


(5/14)Entropy(Sma)
Trong :
Entropy(S) = 0.940
Entropy(Snng) = (2/5)log2(2/5) (3/5)log2(3/5)
= 0.5288 + 0.4422 = 0.971

Entropy(Sm u) = (4/4)log2(4/4) (0/4)log2(0/4) = 0 + 0 = 0


Entropy(SMa) = (3/5)log2(3/5) (2/5)log2(2/5)
= 0.4422 + 0.5288 = 0.971
Suy ra:
Gain(S, Quang cnh) = 0.940 (5/14)* 0.971 (4/14)* 0 (5/14)* 0.971 = 0.246
Values(Nhit ) =Nng, m p, mt
SNng = [2+, 2-]
Sm p = [4+, 2-]
SMt = [3+, 1-]
Gain ( S , Nhietdo) Entropy( S )

| Sv |
Entropy ( S v )
|
S
|
v{ Nong , Amap, Mat}

= Entropy(S) (4/14)Entropy(SNng) (6/14)Entropy(Sm p)


(4/14)Entropy(SMt)
Trong :
Entropy(S) = 0.940
Entropy(SNng)

= (2/4)log2(2/4) (2/4)log2(2/4)

= 0.5 + 0.5 = 1
Entropy(Sm p) = (4/6)log2(4/6) (2/6)log2(2/6)
= 0.3896 + 0.5282 = 0.9178
Entropy(SMt) = (3/4)log2(3/4) (1/4)log2(1/4)
= 0.3112781 + 0.5 = 0.81128
Suy ra:

Gain(S, Temperature) = 0.940 (4/14)*1 (6/14)*0.9178 (4/14)*0.81128 =


0.029
Values( m) = Cao, Trung bnh
SCao = [3+, 4-]
STrung bnh = [6+,1-]
Gain ( S , doam) Entropy( S )

| Sv |
Entropy ( S v )
v{Cao ,Trungbinh } | S |

= Entropy(S) (7/14)Entropy(SCao) (7/14)Entropy(STrung bnh)


Trong :
Entropy(S) = 0.940
Entropy(SCao) = (3/7)log2(3/7) (4/7)log2(4/7)
= 0.5238 + 0.4613 = 0.9851
Entropy(STrung bnh) = (6/7)log2(6/7) (1/7)log2(1/7)
= 0.1966 + 0.4010 = 0.5976
Suy ra:
Gain(S, m) = 0.940 (7/14)*0.9851 (7/14)*0.5976 = 0.151
Values(Gi) =Nh, Mnh
SNh = [6+, 2-]
SMnh = [3+, 3-]
Gain ( S , Gi ) Entropy( S )

| Sv |
Entropy ( S v )
v{ Nhe , Manh} | S |

= Entropy(S) (8/14)Entropy(SNh) (6/14)Entropy(SMnh)

Trong :
Entropy(S) = 0.940
Entropy(SNh) = (6/8)log2(6/8) (2/8)log2(2/8)
= 0.3112 + 0.5 = 0.8112
Entropy(SMnh) = (3/6)log2(3/6) (3/6)log2(3/6)
= 0.5 + 0.5 = 1
Suy ra:
Gain(S, Gi) = 0.940 (8/14)*0.811 (6/14)*1 = 0.048
Ta thu c kt qu:
Gain(S, Quang cnh) = 0.246
Gain(S, Nhit ) = 0.029
Gain(S, m) = 0.151
Gain(S, Gi) = 0.048
Ta thy gi tr Gain(S, Quang cnh) ln nht nn Quang cnh c chn lm
nt gc.
Quang cnh
Nng

m u

Ma

{D1, D2, D8, D9, D11}

{D3, D7, D12, D13}

{D4, D5, D6, D10, D14}

2+, 3-

2+, 3-

2+, 3-

[2+,3-]

[4+,0-]

tt1 ?

Yes

[3+,2-]
tt2 ?

Sau khi lp c cp u tin ca cy quyt nh ta li xt nhnh Nng


Tip tc ly Entropy v Gain cho nhnh Nng ta c hiu sut nh sau:
Gain(SNng, m) = 0.970
Gain(SNng, Nhit ) = 0.570
Gain(SNng, Gi) = 0.019
Nh vy thuc tnh m c hiu sut phn loi cao nht trong nhnh Nng ta chn
thuc tnh m lm nt k tip .
Tng t nh vy i vi nhnh cn li ca cy quyt nh ta c cy quyt nh hon
chnh nh sau

Quang cnh
Nng

m u

Ma

{D1, D2, D8, D9, D11}

{D3, D7, D12, D13}

{D4, D5, D6, D10, D14}

2+, 3-

2+, 3-

2+, 3-

[2+,3-]

[4+,0-]

Yes

Cao

[3+,2-]
Gi

Nh

TB

{D1, D2, D8}


2+, 3[0+,3-]

{D9, D11}
2+, 3[2+,0-]

No

Yes

{D4, D5, D10}


2+, 3[3+,0-]
Yes

Mnh
{D6, D14}
2+, 3[0+,2-]
No

Vi vic tnh ton gi tr Gain la chn thuc tnh ti u cho vic trin
khai cy, thut ton ID3 c xem l mt ci tin ca thut ton CLS. Tuy nhin
thut ton ID3 khng c kh nng x l i vi nhng d liu c cha thuc tnh s thuc tnh lin tc (numeric attribute) v kh khn trong vic x l cc d liu thiu
(missing data)v d liu nhiu (noisy data). Vn ny s c gii quyt trong
thut ton C4.5 sau y.

You might also like