You are on page 1of 50

Chng 3: Lut kt hp

Khai ph lut kt hp
Tm tn s mu, mi kt hp, s tng quan hay cc cu
trc nhn qu gia cc tp i tng trong cc c s d
liu giao tc, c s d liu quan h, v nhng kho thng tin
khc
Yu cu:
- Tnh hiu c: d hiu
- Tnh s dng c: cung cp thng tin thit thc
- Tnh hiu qu:
ng dng:
- Phn tch d liu gi hng, thng mi,..
- Thit k catalogue, website, ha,...
- Sinh hc, sa cha,...

Chng 3: Lut kt hp
Khai ph lut kt hp
nh dng th hin c trng cho cc lut kt hp:
- Khn Bia [0.5%,60%]
- Mua:Khn Mua:Bia [0.5%,60%]
- NU mua khn TH mua bia trong 60% trng hp.
Khn v bia c mua chung trong 0.5% dng d liu
Cc hnh thc biu din khc:
- Mua(x,khn) Mua(x,bia) [0.5%,60%]
- Khoa(x, CNTT) ^ Hc(x,DB) im(x,A)
[1%,75%]

Chng 3: Lut kt hp
Cc hng tip cn lut kt hp
Lut kt hp nh phn (Binary Association Rule):
Cc thuc tnh ch c quan tm l c hay khng xut
hin trong giao tc ca c s d liu (khng quan tm v
mc xut hin)
V d:
- Vic gi 10 cuc in thoi v 1 cuc c xem l ging
nhau (c cuc gi hay khng C hay Khng?)
- NU gi lin tnh=yes AND gi di ng=yes
TH gi quc t=yes AND gi dch v 108 = yes
vi h tr 20% v tin cy 80%

Chng 3: Lut kt hp
Cc hng tip cn lut kt hp
Lut kt hp c thuc tnh s v thuc tnh hng mc
(Quantitative And Categorial Association Rule)
Cc thuc tnh ca cc c s d liu c kiu a dng (nh
phn binary, s quantitative, hng mc categorial, ...)
Ri rc ho nhm chuyn dng lut ny v dng nh
phn
V d:
NU phng thc gi=T ng
AND gi gi 23:00...23:59
AND Thi gian m thoi 20.. 30 pht
TH gi lin tnh =c ,
vi h tr l 23. 53% , v tin cy l 80%.

Chng 3: Lut kt hp
Cc hng tip cn lut kt hp
Lut kt nhiu mc (Multi-level Association Rule)
Dng lut u l dng lut tng qut ho ca dng lut sau
v tng qut theo nhiu mc khc nhau
V d:
Lut c dng:
NU mua my tnh PC
TH mua h iu hnh
AND mua phn mm tin ch vn phng
thay v ch nhng lut qu c th:
NU mua my tnh iBM PC
TH mua h iu hnh Microsoft Windows
AND mua phn mm Microsoft Office

Chng 3: Lut kt hp
Cc hng tip cn lut kt hp
Lut kt hp m (Fuzzy Association Rule)
Trong qu trnh ri rc ho cc thuc tnh s, lut kt hp
m nhm khc phc cc hn ch v chuyn lut kt hp v
mt dng t nhin hn, gn gi hn vi ngi s dng
V d:
NU thu bao t nhn = yes
AND thi gian m thoi ln (Thuc tnh c m ha)
AND cc ni tnh = yes
TH cc khng hp l = yes
vi h tr 4% v tin cy 85%.

Chng 3: Lut kt hp
Khai ph lut kt hp
Phn tch nh dng lut kt hp:
NU mua khn TH mua bia trong 60% trng hp. Khn
v bia c mua chung trong 0.5% dng d liu
Khn Bia [0.5%,60%]
1. Tin : Khn (v tri)
2. Mnh kt qu: Bia (v phi, u)
3. Support: 0.5% - tn s (hay h tr, ph bin)
trong bao nhiu % d liu th nhng iu v tri v
v phi cng xy ra?
4. Confidence: 60% - mnh (hay xc sut iu kin,
tin cy, gn kt) nu v tri xy ra th c bao nhiu
kh nng v phi xy ra?

Chng 3: Lut kt hp
Khai ph lut kt hp
ng h: Biu th tn s lut c trong cc giao tc

tin cy: biu th s phn trm giao tc c cha lun B


trong cc giao tc c cha A

Chng 3: Lut kt hp
Khai ph lut kt hp
ng h ti thiu (min support):
- Cao:
t tp phn t (itemset) ph bin
t lut hp l rt thng xut hin
- Thp:
nhiu lut hp l him xut hin
tin cy ti thiu (min confidence):
- Cao:
t lut nhng tt c gn nh dng
- Thp:
nhiu lut, phn ln rt khng chc
chn
Gi tr tiu biu:
minsupport: 2-10%, minconfidence: 70-90%

Chng 3: Lut kt hp
Khai ph lut kt hp
item v itemsets:
i = { i1, i2, , in } l tp bao gm n mc (item cn gi l
thuc tnh attribute). X i c gi l tp mc (itemset).
Giao tc:
T = { t1, t2, , tm} l tp gm m giao tc (Transaction cn
gi l bn ghi record). Mi giao tc c nh danh bi
TiD (Transaction identification).
Tp phn t ph bin:
Tp cc phn t c ng h (support) ng h ti
thiu (minsupport)

Chng 3: Lut kt hp
Khai ph lut kt hp
Cho: CSDL cc giao tc, mi giao tc l
mt danh sch mt hng c mua
(trong mt lt mua ca khch hng).
Tm tt c cc lut vi minsupport=50% v
minconfidence=50%

Tid

Hng mua

100

A B C

200

400

500

C
D
B

E F

Qu trnh 2 bc khai ph lut kt hp:

- Bc 1: Tm cc tp ph bin: cc tp cc phn t c support ti thiu.


* Mo Apriori: tp con ca tp ph bin cng l mt tp ph bin
VD: Nu {AB} l tp ph bin th {A} v {B} l tp ph bin
* Lp vic tm tp ph bin vi kch thc t 1 n k (tp kch
thc k)
- Bc 2: Dng cc tp ph bin to cc lut lin kt

Chng 3: Lut kt hp
Khai ph lut kt hp
Tid

Tp ph bin

Hng mua

tin cy

100

A B C

{A, C} = 50%

2/3=66,6%

200

{C, A} = 50%

2/2=100%

400

500

C
D
B

Tp

E F

ph bin

{A}

3=75%

{B}

2=50%

......

{A} v {B}

1=25%

{A} v {C}

2=50%

......

A C [50%,66.6%]
C A [50%,100%]

Chng 3: Lut kt hp
Cc tp ph bin vi mo Apriori
1. Bc kt hp: Ck c to bng cch kt Lk-1 vi chnh n
2. Bc rt gn: Nhng tp kch thc k-1 khng ph bin th khng th l
tp con ca tp ph bin kch thc k
3. M gi:
Ck l tp ng vin c kch thc k
Lk l tp phn bin c kch thc k
L1={cc phn t ph bin}
FOR (k=1;Lk != NULL; k++)
Ck+1 ={cc ng vin c to t Lk}
FOR mi giao tc t trong database
tng s m ca tt c cc ng vin trong Ck+1 m c
cha trong t
Lk+1={cc ng vin trong Ck+1 c ng h ti thiu}
END FOR
RETURN Lk

Chng 3: Lut kt hp
Tm tp ng vin vi mo Apriori
Nguyn tc Apriori:
Nhng tp con ca tp ph bin cng phi ph bin

V d:
Ta c: L3 = {abc,abd,acd,ace,bcd}
T kt:
* {abcd}
* {abce}
* {acde}
Rt gn:
* {abce} b loi v bce khng c trong L3
* {acde} b loi v ade khng c trong L3
C4 = {abcd}

Chng 3: Lut kt hp
Khai ph lut kt hp vi Apriori
Database D
Tid
100

C1

Hng mua
1

Tp

Duyt D

3 4

L1
ng h

Tp

ng h

{1}

2=50%

{1}

2=50%

200

{2}

3=75%

{2}

3=75%

300

1 2 3

{3}

3=75%

{3}

3=75%

400

{4}

1=25%

{5}

3=75%

{5}

3=75%

C2
Tp
{1,2}

Duyt D

L2

C2
Tp

ng h

Tp

ng h

{1,3}

{1,2}

1=25%

{1,3}

2=50%

{1,5}

{1,3}

2=50%

{2,3}

2=50%

{2,3}

{1,5}

1=25%

{2,5}

3=75%

{2,5}

{2,3}

2=50%

{3,5

2=50%

{3,5

{2,5}

3=75%

{3,5

2=50%

Chng 3: Lut kt hp
Khai ph lut kt hp vi Apriori
C3
Tp
{2,3,5}

Duyt D

L3

C3
Tp
{2,3,5}

ng h
2=50%

Tp
{2,3,5}

ng h
2=50%

Chng 3: Lut kt hp
Khng gian tm kim ca CSDL D

Chng 3: Lut kt hp
Khng gian tm kim ca CSDL D
p dng mo tm kim Apriori trn cp 1

Chng 3: Lut kt hp
Khng gian tm kim ca CSDL D
p dng mo tm kim Apriori trn cp 2

Chng 3: Lut kt hp
Rt cc lut kt hp t cc tp
M gi:
FOR mi tp ph bin l
to tt c cc tp con khc rng s ca l
FOR mi tp con khc rng s ca l
iF support(l)/support(s)min_conf
cho ra lut s l-s

Chng 3: Lut kt hp
V d mu
Cho tp mt hng i={i1, i2, i3, i4, i5, i6, i7, i8} v tp giao tc O={o1, o2,
o3, o4, o5, o6}
Trong :
O1={i1, i7, i8}
O2={i1, i2, i6, i7, i8}
O3={i1, i2, i6, i7}
O4={i1, i7, i8}
O5={i3, i4, i5, i6, i8}
O6={i1, i4, i5}
a. Xy dng ng cnh khai ph d liu
b. Tm cc tp ph bin ti i vi min_supp=0.3
c. Tm cc lut kt hp vi min_supp=0.3 v min_conf=1.0

Chng 3: Lut kt hp
V d mu
a. Ng cnh khai thc d liu
Transition database

O1
O2
O3
O4
O5
O6

i1 i2 i3 i4 i5
1
1 1
1 1
1
1 1 1
1
1 1

i6 i7
1
1 1
1 1
1
1

i8
1
1
1
1

O1
O2
O3
O4
O5
O6

i1
i1 i2
i1 i2
i1
i1

i7 i8
i6 i7 i8
i6 i7
i7 i8
i3 i4 i5 i6
i8
i4 i5

Chng 3: Lut kt hp
b. Tm cc tp ph bin ti i vi min_supp=0.3
L1

C1
Tp

ng h

Tp

ng h

{i1}

=5/6=0.83

{i1}

=5/6=0.83

{i2}

=2/6=0.33

{i2}

=2/6=0.33

{i3}

=1/6=0.16

{i4}

=2/6=0.33

{i4}

=2/6=0.33

{i5}

=2/6=0.33

{i5}

=2/6=0.33

{i6}

=3/6=0.5

{i6}

=3/6=0.5

{i7}

=4/6=0.66

{i7}

=4/6=0.66

{i8}

=4/6=0.66

{i8}

=4/6=0.66

i1

i2

i3

i4

i5

i6

i7

i8

Chng 3: Lut kt hp
C2

Tp
i1, i2
i1, i4
i1, i5
i1, i6
i1, i7
i1, i8
i2, i4
i2, i5
i2, i6
i2, i7
i2, i8
i4, i5
i4, i6
i4, i7
i4, i8
i5, i6
i5, i7
i5, i8
i6, i7
i6, i8
i7, i8

L2

ng h
2=0.33
1=0.16
1=0.16
2=0.33
4=0.66
3=0.5
0
0
2=0.33
2=0.33
1=0.16
2=0.33
1=0.16
0
1=0.16
1=0.16
0
1=0.16
2=0.33
2=0.33
3=0.5

{i1,i2}

{i1,i6}

{i1,i7}

Tp
i1, i2
i1, i6
i1, i7
i1, i8
i2, i6
i2, i7
i4, i5
i6, i7
i6, i8
i7, i8

{i1,i8}

i1

i2

ng h
2=0.33
2=0.33
4=0.66
3=0.5
2=0.33
2=0.33
2=0.33
2=0.33
2=0.33
3=0.5

{i2,i6}

i3

i4

{i2,i7}

{i4,i5}

i5

i7

i6

i8

{i6,i7}

{i6,i8}

{i7,i8}

Chng 3: Lut kt hp
C3

Tp
i1, i2, i6
i1, i2, i7
i1, i2, i8
i1, i6, i7
i1, i6, i8
i1, i7, i8
i2, i6, i7
i2, i7, i8
i6, i7, i8

L3

ng h
2=0.33
2=0.33
1=0.16
2=0.33
1=0.16
2=0.33
2=0.33
1=0.16
1=0.16

Tp
i1, i2, i6
i1, i2, i7
i1, i6, i7
i1, i7, i8
i2, i6, i7

{i1,i2,i6}

{i1,i2}

{i1,i6}

{i1,i7}

{i1,i2,i7}

ng h
2=0.33
2=0.33
2=0.33
2=0.33
2=0.33

{i1,i6,i7}

{i1,i7,i8}

{i1,i8}

{i2,i6}

{i2,i7}

i1

i3

i5

i2

i4

i6

{i4,i5}

i7

i8

{i2,i6,i7}

{i6,i7}

{i6,i8}

{i7,i8}

Chng 3: Lut kt hp
C4

Tp
i1, i2, i6,i7

L4

ng h
2=0.33

Tp
i1, i2, i6,i7

ng h
2=0.33

Cc tp ph bin ti i vi min_supp=0.3:
1. {i4,i5}
2. {i6,i8}
3. {i1,i7,i8}
4. {i1,i2,i6,i7}

{i1,i2,i6,i7}

{i1,i2,i6}

{i1,i2}

{i1,i6}

{i1,i7}

{i1,i2,i7}

{i1,i7,i8}

{i1,i6,i7}

{i1,i8}

{i2,i6}

{i2,i7}

i1

i3

i5

i2

i4

i6

{i4,i5}

i7

i8

{i2,i6,i7}

{i6,i7}

{i6,i8}

{i7,i8}

Chng 3: Lut kt hp
c. Tm cc lut kt hp vi min_supp=0.3 v min_conf=1.0
Xt tp {i4, i5}:
conf({i4} {i5}) = 1
OK
conf({i5} {i4}) = 1
OK
Xt tp {i6, i8}:
Conf({i6} {i8}) = 2/4
Conf({i8} {i6}) = 2/3
Xt tp {i1, i7, i8}:
Conf({i1} {i7, i8}) = 2/5
Conf({i7} {i1, i8}) = 2/4
Conf({i8} {i1, i7}) = 2/4
Conf({i1, i7} {i8}) = 2/4
Conf({i1, i8} {i7}) = 2/3
Conf({i7, i8} {i1}) = 2/3

Xt tp {i1, i2, i6, i7}:


Conf({i1} {i2, i6, i7}) = 2/5
Conf({i2} {i1, i6, i7}) = 2/2
Conf({i6} {i1, i2, i7}) = 2/3
Conf({i7} {i1, i2, i6}) = 2/4
Conf({i1, i2} {i6, i7}) = 2/2
Conf({i1, i6} {i2, i7}) = 2/2
Conf({i1, i7} {i2, i6}) = 2/4
Conf({i2, i6} {i1, i7}) = 2/2
Conf({i2, i7} {i1, i6}) = 2/2
Conf({i6, i7} {i1, i2}) = 2/2
Conf({i2, i6, i7} {i1}) = 2/5
Conf({i1, i6, i7} {i2}) = 2/2
Conf({i1, i2, i7} {i6}) = 2/3
Conf({i1, i2, i6} {i7}) = 2/4

OK

OK
OK
OK
OK
OK
OK

Chng 3: Lut kt hp
c. Tm cc lut kt hp vi min_supp=0.3 v min_conf=1.0

C tt c 9 lut c rt ra:
{i4} {i5} [0.33%,100%]
{i5} {i4} [0.33%,100%]
{i1, i2} {i6, i7} [0.33%,100%]
{i1, i6} {i2, i7} [0.33%,100%]
{i2, i6} {i1, i7} [0.33%,100%]
{i2, i7} {i1, i6} [0.33%,100%]
{i6, i7} {i1, i2} [0.33%,100%]
{i1, i6, i7} {i2} [0.33%,100%]

Chng 3: Lut kt hp
Cc hn ch ca thut ton Apriori
-

Phi duyt c s d liu nhiu ln

- Khi khai thc cc mu di, phi to lng ln tp ng vin


V d: tm tp ph bin ca I={i1,i2,....,i100} cn:
- S ln duyt CSDL:
100
- S lng ng vin: 2100-1

Chng 3: Lut kt hp
Thut ton FP-Growth
Quy trnh:
- Bc 1: * Tm tp ph bin 1 phn t (Duyt CSDL ln 1)
* Sp xp tp ph bin gim dn vo trong danh sch F_List
* Sp xp CSDL theo tp ph bin trong danh sch F_List (Duyt
CSDL ln 2) v thit lp cy FP
- Bc 2: Xy dng c s mu iu kin (Conditional Patern Bases) cho mi
hng mc ph bin.
- Bc 3: Thit lp cy FP iu kin (Conditional FP Tree) t mi c s
mu iu kin
- Bc 4: quy cy FP iu kin v pht trin mu ph bin cho n khi
cy FP iu kin ch cn cha 1 ng dn duy nht To tt c t hp
ca mu ph bin

Chng 3: Lut kt hp
V d mu
Cho CSDL
(minsupp=60% 3 ):

100
200
300
400
500

{f,a,c,d,g,i,m,p}
{a,b,c,f,l,m,o}
{b,f,h,j,o,w}
{b,c,k,s,p}
{a,f,c,e,l,p,m,n}

- Bc 1: * Tm tp ph bin 1 phn t (Duyt CSDL ln 1)


* Sp xp tp ph bin gim dn vo trong danh sch F_List

Tp ( sp xp)

Tp ph bin 1 phn t
sp xp gim dn

ng h

Chng 3: Lut kt hp
V d mu
- Bc 1: ......
Duyt CSDL , sp xp CSDL theo tp ph bin 1 phn t v thit
lp cy FP

TID
100
200
300
400
500

Items bought
{f,a,c,d,g,i,m,p}
{a,b,c,f,l,m,o}
{b,f,h,j,o,w}
{b,c,k,s,p}
{a,f,c,e,l,p,m,n}

TID
100
200
300
400
500

Ordered frequent items


{f,c,a,m,p}
{f,c,a,b,m}
{f,b}
{c,b,p}
{f,c,a,m,p}

Chng 3: Lut kt hp
V d mu
- Bc 1: ......
Duyt CSDL , sp xp CSDL theo tp ph bin 1 phn t v thit
lp cy FP

TID
100
200
300
400
500

Ordered
frequent items
{f,c,a,m,p}
{f,c,a,b,m}
{f,b}
{c,b,p}
{f,c,a,m,p}

f:1
c:1
a:1
m:1
p:1

Chng 3: Lut kt hp
V d mu
- Bc 1:
......
Duyt CSDL , sp xp CSDL theo tp ph bin 1 phn t v thit
lp cy FP
TID
100
200
300
400
500

Ordered
frequent items
{f,c,a,m,p}
{f,c,a,b,m}
{f,b}
{c,b,p}
{f,c,a,m,p}

f:2
c:2
a:2
b:1

m:1

m:1

p:1

Chng 3: Lut kt hp
V d mu
- Bc 1: ......
Duyt CSDL , sp xp CSDL theo tp ph bin 1 phn t v thit
lp cy FP

TID
100
200
300
400
500

Ordered
frequent items
{f,c,a,m,p}
{f,c,a,b,m}
{f,b}
{c,b,p}
{f,c,a,m,p}

f:3
c:2
a:2
b:1

m:1

m:1

p:1

b:1

Chng 3: Lut kt hp
V d mu
- Bc 1: ......
Duyt CSDL , sp xp CSDL theo tp ph bin 1 phn t v thit
lp cy FP

TID
100
200
300
400
500

Ordered
frequent items
{f,c,a,m,p}
{f,c,a,b,m}
{f,b}
{c,b,p}
{f,c,a,m,p}

c:1

f:3
c:2
a:2
b:1

m:1

m:1

p:1

b:1

b:1
p:1

Chng 3: Lut kt hp
V d mu
- Bc 1: ......
Duyt CSDL , sp xp CSDL theo tp ph bin 1 phn t v thit
lp cy FP

TID
100
200
300
400
500

Ordered
frequent items
{f,c,a,m,p}
{f,c,a,b,m}
{f,b}
{c,b,p}
{f,c,a,m,p}

c:1

f:4
c:3
a:3
b:1

m:2

m:1

p:2

b:1

b:1
p:1

Chng 3: Lut kt hp
V d mu
Bc 2: Xy dng c s mu iu kin (Conditional Patern Bases) cho
mi hng mc ph bin (mi nt trn cy FP-Tree)

c:1

f:4

c:3
a:3
b:1

m:2

m:1

p:2

b:1

b:1
p:1

Items h tr
f
4
c
4
a
3
b
3
m
3
p
3

Items
f
c
a
b
m
p

Conditional Patern Bases


{}

Chng 3: Lut kt hp
V d mu
Bc 2: Xy dng c s mu iu kin (Conditional Patern Bases) cho
mi hng mc ph bin (mi nt trn cy FP-Tree)

c:1

f:4

c:3
a:3
b:1

m:2

m:1

p:2

b:1

b:1
p:1

Items h tr
f
4
c
4
a
3
b
3
m
3
p
3

Items
f
c
a
b
m
p

Conditional Patern Bases


f:3
fc:3
fca:1, f:1, c:1
fcab:1, fca:2
fcam:2, cb:1

Chng 3: Lut kt hp
V d mu
Bc 3: Thit lp cy FP iu kin (Conditional FP Tree) t mi c s
mu iu kin:
- Vi c s mu iu kin cho p l {fcam:2, cb:1}
s lng mi mu trn c s mu: f:2, c:3, a:2, m:2, b:1
Minsupp3 nn: c:3 ph bin trn c s mu iu kin ca p

Items
f
c
a
b
m
p

Conditional
Patern Bases
f:3
fc:3
fca:1, f:1, c:1
fcab:1, fca:2
fcam:2, cb:1

Items
c

frequence
3

c:3

Chng 3: Lut kt hp
V d mu
Bc 3: Thit lp cy FP iu kin (Conditional FP Tree) t mi c s
mu iu kin:
Items
f
c
a
b
m
p

Conditional
Patern Bases
f:3
fc:3
fca:1, f:1, c:1
fcab:1, fca:2
fcam:2, cb:1

c-conditional

m-conditional

Items freq
f
3

Items freq
f
3
c
3
a
3

f:3

a-conditional

Items freq
f
3
c
3

f:3
c:3
a:3

f:3
c:3

p-conditional

Items freq
c
3

c:3

Chng 3: Lut kt hp
V d mu
Bc 4: quy cy FP iu kin v pht trin mu ph bin cho n khi
cy FP iu kin ch cn cha 1 ng dn duy nht To tt c t hp
ca mu ph bin
Quy tc:
- Da trn tnh cht m rng mu:
Gi s l tp ph bin trong CSDL, B l c s mu iu kin
ca v l mt tp cc hng mc trong B.
Khi : v l tp ph bin trong CSDL khi v ch khi ph
bin trong B.
-

Gi s cy FP T l mt ng dn n, tp mu ph bin cui cng


ca T sinh ra bng cch lit k tt c cc t hp con ca ng dn
con thuc p.

Chng 3: Lut kt hp
V d mu
Bc 4: quy cy FP iu kin v pht trin mu ph bin cho n khi
cy FP iu kin ch cn cha 1 ng dn duy nht To tt c t hp
ca mu ph bin
p-conditional

Items freq
c
3

c:3

m-conditional

Items freq
f
3
c
3
a
3

f:3
c:3
a:3

Tt c cc mu ph bin lin quan


n p l:
p:3,
cp:3
Tt c cc mu ph bin lin quan
n m l:
m:3,
fm:3, cm:3, am:3,
fcm:3, fam:3, cam:3,
fcam:3

Chng 3: Lut kt hp
V d mu
Bc 4: quy cy FP iu kin v pht trin mu ph bin cho n khi
cy FP iu kin ch cn cha 1 ng dn duy nht To tt c t hp
ca mu ph bin
c-conditional

Items freq
f
3

f:3

a-conditional

Items freq
f
3
c
3

f:3
c:3

Tt c cc mu ph bin lin quan


n c l:
c:3,
fc:3
Tt c cc mu ph bin lin quan
n a l:
a:3,
fa:3, ca:3,
fca:3

Chng 3: Lut kt hp
V d mu
Item

Conditional FP Tree

Frequent Pateerns

{}

{(f:3)} | c

c,
fc

{(f:3,c:3)} | a

a,
fa, ca, fca

{}

{(f:3, c:3, a:3)} | m

m,
fm, cm, am,
fcm, fam, cam,
fcam

{(c:3)} | p

p,
cp

Chng 3: Lut kt hp
Thut ton FP-Growth
Procedure FP_Growth(Tree, )
If (cy cha mt ng n P) then
For mi t hp () ca cc nt trong ng dn P
- Sinh mu vi support= h tr nh nht ca cc
nt trong
Else
For mi ai trong header ca cy
- Sinh mu = i
- Support = i . Support
- Tm c s mu ph thuc ca v khi to cy FP-Tree
ph thuc Tree
- If Tree then FP_Growth(Tree ,)

Chng 3: Lut kt hp
o l th
Th no l lut hay, l th?
- Thut ton khai thc lut kt hp c xu hng sinh ra qu
nhiu lut
- C nhiu lut khng hay/b tha
Cc o khch quan:
- ph bin (supp)
- tin cy (conf)
- Khong hn 20 o khc
Cc o ch quan:
- o l th: Lut kt hp c gi l l th nu n l iu
mi l, gy ngc nhin v c kh nng ng dng
- o l th gip loi bt/hn ch lut

Chng 3: Lut kt hp
o l th
V d:
Trong 5000 sinh vin, c:
- 3000 chi bng
- 3750 thch ung bia
- 2000 chi bng v thch ung bia
+ Vi minsupp=40% v minconf=66.7%:
chi bng thch ung bia
Lut ny l sai lm v % sinh vin thch ung bia l 75%>66.7%
+ Vi minsupp=20% v minconf=33.3%:
chi bng khng thch ung bia
Lut ny c ngha thc tin hn, d supp v conf thp hn

Chng 3: Lut kt hp
o l th
- Cn o s ph thuc hay mi lin quan gia cc s kin
- Xc nh o l th:

(,)
()

- X v Y c l tng quan nghch nu Interest<1 v ngc


li
V d:
Interest(Chi bng , Thch ung bia)
=

=
(
)

0.89

Chng 3: Lut kt hp
Bi tp
Cho CSDL:

O1
O2
O3
O4
O5
O6

i1 i2 i3 i4 i5
1
1 1
1 1
1
1 1 1
1
1 1

i6 i7
1
1 1
1 1
1
1

i8
1
1
1
1

a. Tm cc tp ph bin vi phng php Apriori vi


minsupp=30%
b. Tm cc tp ph bin vi phng php FP-Growth vi
minsupp=30%. So snh vi Apriori
c. Tm cc lut kt hp v tnh o l th Interest cho cc lut
c tm thy tng ng vi cc trng hp.

You might also like