Professional Documents
Culture Documents
GOM CM D LIU
Gom cm d liu l mt tc v trong khai ph d liu. Gom cm d liu gip ta c th h thng li d liu lm cho chng khng b ri rc. Vi mt c s d liu ln v ri rc th vic gom cm rt cn thit v hu nh l khng th thiu.
MC CH CA GOM CM
Mc ch ca gom cm d liu l nhm khm ph ra cu trc d liu thnh lp cc tp d liu t cc nhm d liu ln
YU CU CA GOM CM D LIU
Gom cm d liu l lm cho cc d liu trong cm th tng t nhau. Cn cc phn t khc cm th khng tng t nhau. tng t gia cc cm d liu do ngi dng nh ngha. c xc nh da trn cc i tng thuc tnh m t i tng. Thng ta o khon cch gia cc i tng.
YU CU CA GOM CM D LIU
Kh nng co gin v tp d liu. Kh nng x l nhiu thuc tnh khc nhau. Kh nng khm ph cc cm vi hnh dng ty . Ti thiu ha yu cu v tri thc min trong vic xc nh thng s nhp. Kh nng x l d liu c nhiu.
YU CU CA GOM CM D LIU
Kh nng gom cm tng dn c lp vi d liu nhp Kh nng x l d liu a chiu Kh nng gom cm da trn rng buc Kh din v kh dng
nh gi ni (internal validation)
nh gi kt qu gom cm theo s lng cc vector ca chnh tp d liu (ma trn gn proximity matrix) o : :Huberts statistic, Silhouette index, Dunns index,
Entropy ( I ) i pi ( j
pij
pij
CC VN CN GII QUYT
Biu Din Kiu D Liu + Ta ch quan tm n nhng kiu m cn thit cho vic gom cm m thi + Ta nh ngha d(i,j) l khon cch gia 2 i tng i v j.
d(i,j) 0 d(i,i) = 0 d(i,j) = d(j,i) d(i,j) d(i,k) + d(k,j)
vi k l mt im bt k khc i,j.
CC VN CN GII QUYT
i tng i,j c biu din bi vector x,y tng t (similarity) gia i v j dc tnh theo cng thc
x = (x1, , xp)
y = (y1, , yp)
s(x, y) = (x1*y1 + + xp*yp)/((x12 + + xp2)1/2*(y12+ + yp2)1/2)
CC VN CN GII QUYT
Interval-scaled variables/attributes + khon lch
s f 1 (| x1 f m f | | x2 f m f | ... | xnf m f |) n
+ khon cch
m f 1 (x1 f x2 f ... xnf ) n
xif m f zif sf
.
+ Z-score measurement
CC VN CN GII QUYT
Cc cng thc tnh o khon cch + o khong cch Minkowski
CC VN CN GII QUYT
Binary variables/attributes Obj j
1 0 b d sum a b cd p 1 0 a c
Obj i
sum a c b d
d (i, j) d (i, j)
bc a bc d bc a bc
CC VN CN GII QUYT
Variables/attributes of mixed types ( ( p 1 ij f ) dij f ) d (i, j) f p ( f 1 ij f )
Nu xif hoc xjf b thiu (missing) th f (variable/attribute): binary (nominal)
r 1 M 1
if f
CC VN CN GII QUYT
zif
r 1 M 1
if f