Professional Documents
Culture Documents
Khai ph lut kt hp
Tm tn s mu, mi kt hp, s tng quan hay cc cu
trc nhn qu gia cc tp i tng trong cc c s d
liu giao tc, c s d liu quan h, v nhng kho thng tin
khc
Yu cu:
- Tnh hiu c: d hiu
- Tnh s dng c: cung cp thng tin thit thc
- Tnh hiu qu:
ng dng:
- Phn tch d liu gi hng, thng mi,..
- Thit k catalogue, website, ha,...
- Sinh hc, sa cha,...
Chng 3: Lut kt hp
Khai ph lut kt hp
nh dng th hin c trng cho cc lut kt hp:
- Khn Bia [0.5%,60%]
- Mua:Khn Mua:Bia [0.5%,60%]
- NU mua khn TH mua bia trong 60% trng hp.
Khn v bia c mua chung trong 0.5% dng d liu
Cc hnh thc biu din khc:
- Mua(x,khn) Mua(x,bia) [0.5%,60%]
- Khoa(x, CNTT) ^ Hc(x,DB) im(x,A)
[1%,75%]
Chng 3: Lut kt hp
Cc hng tip cn lut kt hp
Lut kt hp nh phn (Binary Association Rule):
Cc thuc tnh ch c quan tm l c hay khng xut
hin trong giao tc ca c s d liu (khng quan tm v
mc xut hin)
V d:
- Vic gi 10 cuc in thoi v 1 cuc c xem l ging
nhau (c cuc gi hay khng C hay Khng?)
- NU gi lin tnh=yes AND gi di ng=yes
TH gi quc t=yes AND gi dch v 108 = yes
vi h tr 20% v tin cy 80%
Chng 3: Lut kt hp
Cc hng tip cn lut kt hp
Lut kt hp c thuc tnh s v thuc tnh hng mc
(Quantitative And Categorial Association Rule)
Cc thuc tnh ca cc c s d liu c kiu a dng (nh
phn binary, s quantitative, hng mc categorial, ...)
Ri rc ho nhm chuyn dng lut ny v dng nh
phn
V d:
NU phng thc gi=T ng
AND gi gi 23:00...23:59
AND Thi gian m thoi 20.. 30 pht
TH gi lin tnh =c ,
vi h tr l 23. 53% , v tin cy l 80%.
Chng 3: Lut kt hp
Cc hng tip cn lut kt hp
Lut kt nhiu mc (Multi-level Association Rule)
Dng lut u l dng lut tng qut ho ca dng lut sau
v tng qut theo nhiu mc khc nhau
V d:
Lut c dng:
NU mua my tnh PC
TH mua h iu hnh
AND mua phn mm tin ch vn phng
thay v ch nhng lut qu c th:
NU mua my tnh iBM PC
TH mua h iu hnh Microsoft Windows
AND mua phn mm Microsoft Office
Chng 3: Lut kt hp
Cc hng tip cn lut kt hp
Lut kt hp m (Fuzzy Association Rule)
Trong qu trnh ri rc ho cc thuc tnh s, lut kt hp
m nhm khc phc cc hn ch v chuyn lut kt hp v
mt dng t nhin hn, gn gi hn vi ngi s dng
V d:
NU thu bao t nhn = yes
AND thi gian m thoi ln (Thuc tnh c m ha)
AND cc ni tnh = yes
TH cc khng hp l = yes
vi h tr 4% v tin cy 85%.
Chng 3: Lut kt hp
Khai ph lut kt hp
Phn tch nh dng lut kt hp:
NU mua khn TH mua bia trong 60% trng hp. Khn
v bia c mua chung trong 0.5% dng d liu
Khn Bia [0.5%,60%]
1. Tin : Khn (v tri)
2. Mnh kt qu: Bia (v phi, u)
3. Support: 0.5% - tn s (hay h tr, ph bin)
trong bao nhiu % d liu th nhng iu v tri v
v phi cng xy ra?
4. Confidence: 60% - mnh (hay xc sut iu kin,
tin cy, gn kt) nu v tri xy ra th c bao nhiu
kh nng v phi xy ra?
Chng 3: Lut kt hp
Khai ph lut kt hp
ng h: Biu th tn s lut c trong cc giao tc
Chng 3: Lut kt hp
Khai ph lut kt hp
ng h ti thiu (min support):
- Cao:
t tp phn t (itemset) ph bin
t lut hp l rt thng xut hin
- Thp:
nhiu lut hp l him xut hin
tin cy ti thiu (min confidence):
- Cao:
t lut nhng tt c gn nh dng
- Thp:
nhiu lut, phn ln rt khng chc
chn
Gi tr tiu biu:
minsupport: 2-10%, minconfidence: 70-90%
Chng 3: Lut kt hp
Khai ph lut kt hp
item v itemsets:
i = { i1, i2, , in } l tp bao gm n mc (item cn gi l
thuc tnh attribute). X i c gi l tp mc (itemset).
Giao tc:
T = { t1, t2, , tm} l tp gm m giao tc (Transaction cn
gi l bn ghi record). Mi giao tc c nh danh bi
TiD (Transaction identification).
Tp phn t ph bin:
Tp cc phn t c ng h (support) ng h ti
thiu (minsupport)
Chng 3: Lut kt hp
Khai ph lut kt hp
Cho: CSDL cc giao tc, mi giao tc l
mt danh sch mt hng c mua
(trong mt lt mua ca khch hng).
Tm tt c cc lut vi minsupport=50% v
minconfidence=50%
Tid
Hng mua
100
A B C
200
400
500
C
D
B
E F
Chng 3: Lut kt hp
Khai ph lut kt hp
Tid
Tp ph bin
Hng mua
tin cy
100
A B C
{A, C} = 50%
2/3=66,6%
200
{C, A} = 50%
2/2=100%
400
500
C
D
B
Tp
E F
ph bin
{A}
3=75%
{B}
2=50%
......
{A} v {B}
1=25%
{A} v {C}
2=50%
......
A C [50%,66.6%]
C A [50%,100%]
Chng 3: Lut kt hp
Cc tp ph bin vi mo Apriori
1. Bc kt hp: Ck c to bng cch kt Lk-1 vi chnh n
2. Bc rt gn: Nhng tp kch thc k-1 khng ph bin th khng th l
tp con ca tp ph bin kch thc k
3. M gi:
Ck l tp ng vin c kch thc k
Lk l tp phn bin c kch thc k
L1={cc phn t ph bin}
FOR (k=1;Lk != NULL; k++)
Ck+1 ={cc ng vin c to t Lk}
FOR mi giao tc t trong database
tng s m ca tt c cc ng vin trong Ck+1 m c
cha trong t
Lk+1={cc ng vin trong Ck+1 c ng h ti thiu}
END FOR
RETURN Lk
Chng 3: Lut kt hp
Tm tp ng vin vi mo Apriori
Nguyn tc Apriori:
Nhng tp con ca tp ph bin cng phi ph bin
V d:
Ta c: L3 = {abc,abd,acd,ace,bcd}
T kt:
* {abcd}
* {abce}
* {acde}
Rt gn:
* {abce} b loi v bce khng c trong L3
* {acde} b loi v ade khng c trong L3
C4 = {abcd}
Chng 3: Lut kt hp
Khai ph lut kt hp vi Apriori
Database D
Tid
100
C1
Hng mua
1
Tp
Duyt D
3 4
L1
ng h
Tp
ng h
{1}
2=50%
{1}
2=50%
200
{2}
3=75%
{2}
3=75%
300
1 2 3
{3}
3=75%
{3}
3=75%
400
{4}
1=25%
{5}
3=75%
{5}
3=75%
C2
Tp
{1,2}
Duyt D
L2
C2
Tp
ng h
Tp
ng h
{1,3}
{1,2}
1=25%
{1,3}
2=50%
{1,5}
{1,3}
2=50%
{2,3}
2=50%
{2,3}
{1,5}
1=25%
{2,5}
3=75%
{2,5}
{2,3}
2=50%
{3,5
2=50%
{3,5
{2,5}
3=75%
{3,5
2=50%
Chng 3: Lut kt hp
Khai ph lut kt hp vi Apriori
C3
Tp
{2,3,5}
Duyt D
L3
C3
Tp
{2,3,5}
ng h
2=50%
Tp
{2,3,5}
ng h
2=50%
Chng 3: Lut kt hp
Khng gian tm kim ca CSDL D
Chng 3: Lut kt hp
Khng gian tm kim ca CSDL D
p dng mo tm kim Apriori trn cp 1
Chng 3: Lut kt hp
Khng gian tm kim ca CSDL D
p dng mo tm kim Apriori trn cp 2
Chng 3: Lut kt hp
Rt cc lut kt hp t cc tp
M gi:
FOR mi tp ph bin l
to tt c cc tp con khc rng s ca l
FOR mi tp con khc rng s ca l
iF support(l)/support(s)min_conf
cho ra lut s l-s
Chng 3: Lut kt hp
V d mu
Cho tp mt hng i={i1, i2, i3, i4, i5, i6, i7, i8} v tp giao tc O={o1, o2,
o3, o4, o5, o6}
Trong :
O1={i1, i7, i8}
O2={i1, i2, i6, i7, i8}
O3={i1, i2, i6, i7}
O4={i1, i7, i8}
O5={i3, i4, i5, i6, i8}
O6={i1, i4, i5}
a. Xy dng ng cnh khai ph d liu
b. Tm cc tp ph bin ti i vi min_supp=0.3
c. Tm cc lut kt hp vi min_supp=0.3 v min_conf=1.0
Chng 3: Lut kt hp
V d mu
a. Ng cnh khai thc d liu
Transition database
O1
O2
O3
O4
O5
O6
i1 i2 i3 i4 i5
1
1 1
1 1
1
1 1 1
1
1 1
i6 i7
1
1 1
1 1
1
1
i8
1
1
1
1
O1
O2
O3
O4
O5
O6
i1
i1 i2
i1 i2
i1
i1
i7 i8
i6 i7 i8
i6 i7
i7 i8
i3 i4 i5 i6
i8
i4 i5
Chng 3: Lut kt hp
b. Tm cc tp ph bin ti i vi min_supp=0.3
L1
C1
Tp
ng h
Tp
ng h
{i1}
=5/6=0.83
{i1}
=5/6=0.83
{i2}
=2/6=0.33
{i2}
=2/6=0.33
{i3}
=1/6=0.16
{i4}
=2/6=0.33
{i4}
=2/6=0.33
{i5}
=2/6=0.33
{i5}
=2/6=0.33
{i6}
=3/6=0.5
{i6}
=3/6=0.5
{i7}
=4/6=0.66
{i7}
=4/6=0.66
{i8}
=4/6=0.66
{i8}
=4/6=0.66
i1
i2
i3
i4
i5
i6
i7
i8
Chng 3: Lut kt hp
C2
Tp
i1, i2
i1, i4
i1, i5
i1, i6
i1, i7
i1, i8
i2, i4
i2, i5
i2, i6
i2, i7
i2, i8
i4, i5
i4, i6
i4, i7
i4, i8
i5, i6
i5, i7
i5, i8
i6, i7
i6, i8
i7, i8
L2
ng h
2=0.33
1=0.16
1=0.16
2=0.33
4=0.66
3=0.5
0
0
2=0.33
2=0.33
1=0.16
2=0.33
1=0.16
0
1=0.16
1=0.16
0
1=0.16
2=0.33
2=0.33
3=0.5
{i1,i2}
{i1,i6}
{i1,i7}
Tp
i1, i2
i1, i6
i1, i7
i1, i8
i2, i6
i2, i7
i4, i5
i6, i7
i6, i8
i7, i8
{i1,i8}
i1
i2
ng h
2=0.33
2=0.33
4=0.66
3=0.5
2=0.33
2=0.33
2=0.33
2=0.33
2=0.33
3=0.5
{i2,i6}
i3
i4
{i2,i7}
{i4,i5}
i5
i7
i6
i8
{i6,i7}
{i6,i8}
{i7,i8}
Chng 3: Lut kt hp
C3
Tp
i1, i2, i6
i1, i2, i7
i1, i2, i8
i1, i6, i7
i1, i6, i8
i1, i7, i8
i2, i6, i7
i2, i7, i8
i6, i7, i8
L3
ng h
2=0.33
2=0.33
1=0.16
2=0.33
1=0.16
2=0.33
2=0.33
1=0.16
1=0.16
Tp
i1, i2, i6
i1, i2, i7
i1, i6, i7
i1, i7, i8
i2, i6, i7
{i1,i2,i6}
{i1,i2}
{i1,i6}
{i1,i7}
{i1,i2,i7}
ng h
2=0.33
2=0.33
2=0.33
2=0.33
2=0.33
{i1,i6,i7}
{i1,i7,i8}
{i1,i8}
{i2,i6}
{i2,i7}
i1
i3
i5
i2
i4
i6
{i4,i5}
i7
i8
{i2,i6,i7}
{i6,i7}
{i6,i8}
{i7,i8}
Chng 3: Lut kt hp
C4
Tp
i1, i2, i6,i7
L4
ng h
2=0.33
Tp
i1, i2, i6,i7
ng h
2=0.33
Cc tp ph bin ti i vi min_supp=0.3:
1. {i4,i5}
2. {i6,i8}
3. {i1,i7,i8}
4. {i1,i2,i6,i7}
{i1,i2,i6,i7}
{i1,i2,i6}
{i1,i2}
{i1,i6}
{i1,i7}
{i1,i2,i7}
{i1,i7,i8}
{i1,i6,i7}
{i1,i8}
{i2,i6}
{i2,i7}
i1
i3
i5
i2
i4
i6
{i4,i5}
i7
i8
{i2,i6,i7}
{i6,i7}
{i6,i8}
{i7,i8}
Chng 3: Lut kt hp
c. Tm cc lut kt hp vi min_supp=0.3 v min_conf=1.0
Xt tp {i4, i5}:
conf({i4} {i5}) = 1
OK
conf({i5} {i4}) = 1
OK
Xt tp {i6, i8}:
Conf({i6} {i8}) = 2/4
Conf({i8} {i6}) = 2/3
Xt tp {i1, i7, i8}:
Conf({i1} {i7, i8}) = 2/5
Conf({i7} {i1, i8}) = 2/4
Conf({i8} {i1, i7}) = 2/4
Conf({i1, i7} {i8}) = 2/4
Conf({i1, i8} {i7}) = 2/3
Conf({i7, i8} {i1}) = 2/3
OK
OK
OK
OK
OK
OK
OK
Chng 3: Lut kt hp
c. Tm cc lut kt hp vi min_supp=0.3 v min_conf=1.0
C tt c 9 lut c rt ra:
{i4} {i5} [0.33%,100%]
{i5} {i4} [0.33%,100%]
{i1, i2} {i6, i7} [0.33%,100%]
{i1, i6} {i2, i7} [0.33%,100%]
{i2, i6} {i1, i7} [0.33%,100%]
{i2, i7} {i1, i6} [0.33%,100%]
{i6, i7} {i1, i2} [0.33%,100%]
{i1, i6, i7} {i2} [0.33%,100%]
Chng 3: Lut kt hp
Cc hn ch ca thut ton Apriori
-
Chng 3: Lut kt hp
Thut ton FP-Growth
Quy trnh:
- Bc 1: * Tm tp ph bin 1 phn t (Duyt CSDL ln 1)
* Sp xp tp ph bin gim dn vo trong danh sch F_List
* Sp xp CSDL theo tp ph bin trong danh sch F_List (Duyt
CSDL ln 2) v thit lp cy FP
- Bc 2: Xy dng c s mu iu kin (Conditional Patern Bases) cho mi
hng mc ph bin.
- Bc 3: Thit lp cy FP iu kin (Conditional FP Tree) t mi c s
mu iu kin
- Bc 4: quy cy FP iu kin v pht trin mu ph bin cho n khi
cy FP iu kin ch cn cha 1 ng dn duy nht To tt c t hp
ca mu ph bin
Chng 3: Lut kt hp
V d mu
Cho CSDL
(minsupp=60% 3 ):
100
200
300
400
500
{f,a,c,d,g,i,m,p}
{a,b,c,f,l,m,o}
{b,f,h,j,o,w}
{b,c,k,s,p}
{a,f,c,e,l,p,m,n}
Tp ( sp xp)
Tp ph bin 1 phn t
sp xp gim dn
ng h
Chng 3: Lut kt hp
V d mu
- Bc 1: ......
Duyt CSDL , sp xp CSDL theo tp ph bin 1 phn t v thit
lp cy FP
TID
100
200
300
400
500
Items bought
{f,a,c,d,g,i,m,p}
{a,b,c,f,l,m,o}
{b,f,h,j,o,w}
{b,c,k,s,p}
{a,f,c,e,l,p,m,n}
TID
100
200
300
400
500
Chng 3: Lut kt hp
V d mu
- Bc 1: ......
Duyt CSDL , sp xp CSDL theo tp ph bin 1 phn t v thit
lp cy FP
TID
100
200
300
400
500
Ordered
frequent items
{f,c,a,m,p}
{f,c,a,b,m}
{f,b}
{c,b,p}
{f,c,a,m,p}
f:1
c:1
a:1
m:1
p:1
Chng 3: Lut kt hp
V d mu
- Bc 1:
......
Duyt CSDL , sp xp CSDL theo tp ph bin 1 phn t v thit
lp cy FP
TID
100
200
300
400
500
Ordered
frequent items
{f,c,a,m,p}
{f,c,a,b,m}
{f,b}
{c,b,p}
{f,c,a,m,p}
f:2
c:2
a:2
b:1
m:1
m:1
p:1
Chng 3: Lut kt hp
V d mu
- Bc 1: ......
Duyt CSDL , sp xp CSDL theo tp ph bin 1 phn t v thit
lp cy FP
TID
100
200
300
400
500
Ordered
frequent items
{f,c,a,m,p}
{f,c,a,b,m}
{f,b}
{c,b,p}
{f,c,a,m,p}
f:3
c:2
a:2
b:1
m:1
m:1
p:1
b:1
Chng 3: Lut kt hp
V d mu
- Bc 1: ......
Duyt CSDL , sp xp CSDL theo tp ph bin 1 phn t v thit
lp cy FP
TID
100
200
300
400
500
Ordered
frequent items
{f,c,a,m,p}
{f,c,a,b,m}
{f,b}
{c,b,p}
{f,c,a,m,p}
c:1
f:3
c:2
a:2
b:1
m:1
m:1
p:1
b:1
b:1
p:1
Chng 3: Lut kt hp
V d mu
- Bc 1: ......
Duyt CSDL , sp xp CSDL theo tp ph bin 1 phn t v thit
lp cy FP
TID
100
200
300
400
500
Ordered
frequent items
{f,c,a,m,p}
{f,c,a,b,m}
{f,b}
{c,b,p}
{f,c,a,m,p}
c:1
f:4
c:3
a:3
b:1
m:2
m:1
p:2
b:1
b:1
p:1
Chng 3: Lut kt hp
V d mu
Bc 2: Xy dng c s mu iu kin (Conditional Patern Bases) cho
mi hng mc ph bin (mi nt trn cy FP-Tree)
c:1
f:4
c:3
a:3
b:1
m:2
m:1
p:2
b:1
b:1
p:1
Items h tr
f
4
c
4
a
3
b
3
m
3
p
3
Items
f
c
a
b
m
p
Chng 3: Lut kt hp
V d mu
Bc 2: Xy dng c s mu iu kin (Conditional Patern Bases) cho
mi hng mc ph bin (mi nt trn cy FP-Tree)
c:1
f:4
c:3
a:3
b:1
m:2
m:1
p:2
b:1
b:1
p:1
Items h tr
f
4
c
4
a
3
b
3
m
3
p
3
Items
f
c
a
b
m
p
Chng 3: Lut kt hp
V d mu
Bc 3: Thit lp cy FP iu kin (Conditional FP Tree) t mi c s
mu iu kin:
- Vi c s mu iu kin cho p l {fcam:2, cb:1}
s lng mi mu trn c s mu: f:2, c:3, a:2, m:2, b:1
Minsupp3 nn: c:3 ph bin trn c s mu iu kin ca p
Items
f
c
a
b
m
p
Conditional
Patern Bases
f:3
fc:3
fca:1, f:1, c:1
fcab:1, fca:2
fcam:2, cb:1
Items
c
frequence
3
c:3
Chng 3: Lut kt hp
V d mu
Bc 3: Thit lp cy FP iu kin (Conditional FP Tree) t mi c s
mu iu kin:
Items
f
c
a
b
m
p
Conditional
Patern Bases
f:3
fc:3
fca:1, f:1, c:1
fcab:1, fca:2
fcam:2, cb:1
c-conditional
m-conditional
Items freq
f
3
Items freq
f
3
c
3
a
3
f:3
a-conditional
Items freq
f
3
c
3
f:3
c:3
a:3
f:3
c:3
p-conditional
Items freq
c
3
c:3
Chng 3: Lut kt hp
V d mu
Bc 4: quy cy FP iu kin v pht trin mu ph bin cho n khi
cy FP iu kin ch cn cha 1 ng dn duy nht To tt c t hp
ca mu ph bin
Quy tc:
- Da trn tnh cht m rng mu:
Gi s l tp ph bin trong CSDL, B l c s mu iu kin
ca v l mt tp cc hng mc trong B.
Khi : v l tp ph bin trong CSDL khi v ch khi ph
bin trong B.
-
Chng 3: Lut kt hp
V d mu
Bc 4: quy cy FP iu kin v pht trin mu ph bin cho n khi
cy FP iu kin ch cn cha 1 ng dn duy nht To tt c t hp
ca mu ph bin
p-conditional
Items freq
c
3
c:3
m-conditional
Items freq
f
3
c
3
a
3
f:3
c:3
a:3
Chng 3: Lut kt hp
V d mu
Bc 4: quy cy FP iu kin v pht trin mu ph bin cho n khi
cy FP iu kin ch cn cha 1 ng dn duy nht To tt c t hp
ca mu ph bin
c-conditional
Items freq
f
3
f:3
a-conditional
Items freq
f
3
c
3
f:3
c:3
Chng 3: Lut kt hp
V d mu
Item
Conditional FP Tree
Frequent Pateerns
{}
{(f:3)} | c
c,
fc
{(f:3,c:3)} | a
a,
fa, ca, fca
{}
m,
fm, cm, am,
fcm, fam, cam,
fcam
{(c:3)} | p
p,
cp
Chng 3: Lut kt hp
Thut ton FP-Growth
Procedure FP_Growth(Tree, )
If (cy cha mt ng n P) then
For mi t hp () ca cc nt trong ng dn P
- Sinh mu vi support= h tr nh nht ca cc
nt trong
Else
For mi ai trong header ca cy
- Sinh mu = i
- Support = i . Support
- Tm c s mu ph thuc ca v khi to cy FP-Tree
ph thuc Tree
- If Tree then FP_Growth(Tree ,)
Chng 3: Lut kt hp
o l th
Th no l lut hay, l th?
- Thut ton khai thc lut kt hp c xu hng sinh ra qu
nhiu lut
- C nhiu lut khng hay/b tha
Cc o khch quan:
- ph bin (supp)
- tin cy (conf)
- Khong hn 20 o khc
Cc o ch quan:
- o l th: Lut kt hp c gi l l th nu n l iu
mi l, gy ngc nhin v c kh nng ng dng
- o l th gip loi bt/hn ch lut
Chng 3: Lut kt hp
o l th
V d:
Trong 5000 sinh vin, c:
- 3000 chi bng
- 3750 thch ung bia
- 2000 chi bng v thch ung bia
+ Vi minsupp=40% v minconf=66.7%:
chi bng thch ung bia
Lut ny l sai lm v % sinh vin thch ung bia l 75%>66.7%
+ Vi minsupp=20% v minconf=33.3%:
chi bng khng thch ung bia
Lut ny c ngha thc tin hn, d supp v conf thp hn
Chng 3: Lut kt hp
o l th
- Cn o s ph thuc hay mi lin quan gia cc s kin
- Xc nh o l th:
(,)
()
=
(
)
0.89
Chng 3: Lut kt hp
Bi tp
Cho CSDL:
O1
O2
O3
O4
O5
O6
i1 i2 i3 i4 i5
1
1 1
1 1
1
1 1 1
1
1 1
i6 i7
1
1 1
1 1
1
1
i8
1
1
1
1