You are on page 1of 57

I HC QUC GIA H NI

TRNG I HC CNG NGH

V Ngc Anh

P DNG K THUT OLAP V KHO D LIU


TRONG D BO TI CHNH

KHO LUN TT NGHIP I HC H CHNH QUY

Ngnh: Cc h thng thng tin

H NI - 2010

I HC QUC GIA H NI
TRNG I HC CNG NGH

V Ngc Anh

P DNG K THUT OLAP V KHO D LIU


TRONG D BO TI CHNH

KHO LUN TT NGHIP I HC H CHNH QUY

Ngnh: Cc h thng thng tin

Cn b hng dn: TS. Nguyn H Nam


Cn b ng hng dn: Ths. Nguyn Thu Trang

H NI - 2010

Li cm n
Trc tin ti xin gi li cm n v lng bit n su sc ti TS.Nguyn H Nam v
Ths.Nguyn Thu Trang tn tnh ch bo v hng dn ti trong sut qu trnh thc hin
kha lun tt nghip.
Ti xin chn thnh cm n cc thy, cc c to cho ti nhng iu kin thun li
hc tp v nghin cu ti trng i Hc Cng Ngh.
Ti xin cm n cc bn trong nhm lm Data Warehouse v OLAP cng tho lun
v trao i v gip ti rt nhiu trong qu trnh thu thp ti liu.
Ti xin gi li cm n v hn ti gia nh, bn b, nhng ngi thn yu lun bn
cnh ng vin ti trong sut qu trnh thc hin kha lun.
Ti xin chn thnh cm n!

Sinh vin
V Ngc Anh

Mc lc
Mc lc ....................................................................................................................... 1
Danh sch cc hnh ...................................................................................................... 3
Bng t vit tt ............................................................................................................ 5
Li m u .................................................................................................................. 6
Chng 1. Gii thiu kho d liu v d liu ti chnh .................................................. 7
1.1.

D liu trong lnh vc ti chnh ...................................................................... 7

1.2.

Kho d liu (Data warehouse) ........................................................................ 8

1.2.1.

Kho d liu .............................................................................................. 8

1.2.2.

Mc ch ca kho d liu ......................................................................... 9

1.2.3.

Li ch ca kho d liu............................................................................. 9

1.2.4.

Thnh phn ca kho d liu ................................................................... 10

1.2.5.

Cu trc ca kho d liu......................................................................... 11

1.2.6.

M hnh thc th trong kho d liu ........................................................ 12

1.2.7.

Cc lnh vc ng dng ca kho d liu .................................................. 15

Chng 2. K thut phn tch OLAP ......................................................................... 16


2.1.

Gii thiu OLAP .......................................................................................... 16

2.2.

M hnh d liu a chiu .............................................................................. 16

2.3.

Kin trc khi (Cube) ca OLAP .................................................................. 18

2.4.

So snh OLAP v OLTP ............................................................................... 19

2.5.

Cc thnh phn ca OLAP ............................................................................ 20

2.6.

Chuyn i d liu t OLTP ti OLAP ........................................................ 21

2.7.

Cc m hnh lu tr h tr OLAP................................................................. 22

2.7.1.

M hnh Multidimentional OLAP (MOLAP) ......................................... 22

2.7.2.

M hnh Relational OLAP (ROLAP) ..................................................... 23

2.7.3.

M hnh Hybird OLAP (HOLAP) .......................................................... 24

2.7.4.

So snh cc m hnh............................................................................... 25

Chng 3. B cng c Pentaho .................................................................................. 26


3.1

Tng quan .................................................................................................... 26

3.2

Cc kh nng BI ca pentaho ........................................................................ 26


1

3.3

Nhng c tnh v li ch .............................................................................. 29

Chng 4. Gii thiu bi ton trin khai trn Pentaho v kt qu t c ................ 33


4.1.

Gii thiu bi ton ........................................................................................ 33

4.2.

Thu thp,x l d liu................................................................................... 33

4.3.

To data warehouse ...................................................................................... 36

4.4.

X l d liu bng k thut OLAP ............................................................... 42

4.4.1.

To cube ................................................................................................ 42

4.4.2.

Analysis View ........................................................................................ 43

Kt lun ..................................................................................................................... 52
Ti liu tham kho ..................................................................................................... 53

Danh sch cc hnh


Hnh 1. Cc thnh phn ca kho d liu .......................................................................11
Hnh 2. M hnh sao ......................................................................................................13
Hnh 3. M hnh bng tuyt ..........................................................................................14
Hnh 4. M hnh chm sao ............................................................................................15
Hnh 5. M phng cc chiu trong kinh doanh .............................................................17
Hnh 6. M hnh d liu MOLAP .................................................................................22
Hnh 7. M hnh d liu ROLAP ..................................................................................23
Hnh 8. M hnh d liu HOLAP ..................................................................................24
Hnh 9. Cu trc Pentaho...............................................................................................26
Hnh 10. D liu t gi ..................................................................................................33
Hnh 11. D liu gi vng .............................................................................................34
Hnh 12. D liu gi du ...............................................................................................35
Hnh 13. D liu ch s VnIndex ...................................................................................35
Hnh 14. D liu tng hp .............................................................................................36
Hnh 15. M hnh kho d liu .......................................................................................37
Hnh 16. Spoon workspace ............................................................................................37
Hnh 17. Spoon nhp d liu .........................................................................................38
Hnh 18. Combination Lookup/Update .........................................................................38
Hnh 19. Thay i thuc tnh .........................................................................................39
Hnh 20. Kt ni c s d liu.......................................................................................39
Hnh 21. To bng Dim_time ........................................................................................40
Hnh 22. To bng dim_factor.......................................................................................40
Hnh 23. To Table Output ............................................................................................41
3

Hnh 24. To bng fact_price ........................................................................................41


Hnh 25. Nhp d liu ...................................................................................................42
Hnh 26. Kt ni c s d liu.......................................................................................42
Hnh 27. Kin trc Cube ................................................................................................43
Hnh 28. Repository Login ............................................................................................43
Hnh 29. Kt ni c s d liu.......................................................................................44
Hnh 30. Khung lm vic Pentaho .................................................................................45
Hnh 31. Chn schema v cube .....................................................................................45
Hnh 32. D liu schema v cube ..................................................................................45
Hnh 33. Ni dung phn tch .........................................................................................46
Hnh 34. Chn Measures ...............................................................................................46
Hnh 35. Chn factor .....................................................................................................46
Hnh 36. Chn nm phn tch ........................................................................................47
Hnh 37. Chn chi tit ngy thng .................................................................................47
Hnh 38. Chn loi biu ............................................................................................48
Hnh 39. Biu t gi USD/VND ...............................................................................48
Hnh 40. Biu gi vng .............................................................................................49
Hnh 41. Biu gi du ...............................................................................................49
Hnh 42. Biu ch s VnIndex ..................................................................................50
Hnh 43. Biu gi vng v gi du ...........................................................................50
Hnh 44. Biu t gi v gi vng ..............................................................................51
Hnh 45. Biu gi vng v VNIndex ........................................................................51

Bng t vit tt
OLAP

Online Analysis Processing

MOLAP

Multidimensional Online Analysis Processing

ROLAP

Relational Online Analysis Processing

HOLAP

Hybird Online Analysis Processing

BI

Business Intelligence

OLTP

OnLine Transaction Processing

Li m u
Cng vi vic p dng rng ri cng ngh thng tin vo trong hu ht cc lnh vc
trong i sng, kinh t, x hi l vic d liu thu nhn c qua thi gian ngy
cng nhiu.V vy, yu cu thit yu t ra i vi cc doanh nghip l vic khai
thc cc d liu ny mt cc hiu qu phc v cho vic kinh doanh ngy cng tt
hn.
Kha lun ny vi ti p dng k thut OLAP v kho d liu trong bo co ti
chnh gii thiu v kho d liu, phng php OLAP v ng dng trong phn tch bin
ng gi du, gi vng v ch s VNIndex bng cng c Pentaho.
Kha lun gm bn chng:
Chng 1. Gii thiu kho d liu v d liu ti chnh gii thiu v c im ca d
liu ti chnh, gii thiu tng quan v kho d liu, cu trc kho d liu, cc thnh
phn ca kho d liu, cch thit k kho d liu v ng dng ca kho d liu.
Chng 2. Gii thiu tng quan v OLAP gii thiu tng quan v k thut OLAP, cc
m hnh lu tr h tr k thut OLAP, u im v nhc im ca cc m hnh. Cc
bc chuyn d liu t OLTP sang OLAP.
Chng 3. Gii thiu b cng c Pentaho gii thiu tng quan b cng c Pentaho,
kin trc, cng ngh, v cc tin ch ca Pentaho.
Chng 4. Gii thiu bi ton trin khai trn Pentaho v kt qu t c trin khai
Pentaho trn mt bi ton thc, p dng k thut kho d liu v k thut OLAP
thc hin
Phn kt lun tng kt v tm lc nhng kt qu, ng gp chnh ca kha lun.

Chng 1. Gii thiu kho d liu v d liu ti chnh


1.1. D liu trong lnh vc ti chnh
Vi c im tnh ton chnh xc, nhanh chng, khch quan nn cng ngh
thng tin c p dng kh rng ri trong lnh vc ti chnh t rt sm.
D liu trong lnh vc ti chnh c c im sau:
-

Lun lun bin i

D liu phn tn

Giao dch chng cho

S lng giao dch ln

Do , cn c mt chin lc lu tr d liu mt cch hiu qu.Nhng h thng p


ng c cc c im trn thuc nhm h thng x l giao dch trc tuyn OLTP
(OnLine Transaction Processing)[4].
Cc ng dng x l giao dch trc tuyn OLTP (OnLine Transaction Processing)
l nhng ng dng gip ngi dng truy cp trc tip thng tin theo hnh thc ng
dng Client/Server. OLTP bao gm mt dy lnh: thu nhn (gathering) d liu u
vo, x l (processing) d liu, v cp nht (updating) d liu c vi d liu mi c
nhp v x l.
OLTP l phng thc hiu qu khi ngi dng mun:
-

X l cc d liu n vi s lng v tn s khng th c lng.

Truy cp tc th vo d liu c cp nht, phn nh cc giao dch trc .

Thay i d liu tc th phn nh giao dch va x l.

Cc chc nng c bn ca OLTP[4]: cng vi kh nng truy cp v cp nht cc d


liu chia s, cc h thng OLTP cn h tr cc user kh nng truy cp trc tuyn
(online), kh nng truy cp tc thi (availability), kh nng phn hi nhanh chng
(response), v tit kim chi ph i vi tng transaction (low cost).
tr li cc cu hi n gin trong qu trnh kinh doanh nh doanh thu ca thng
7

ny bao nhiu? Thng ny bn c bao nhiu sn phm nhng sn phm v s liu


chi tit c h thng OLTP tr li 1 cch nhanh chng.Nhng i vi cc nh qu l
cp co trong doanh nghip, h khng yu cu nhng d liu qu chi tit nh vy. H
yu cu mun bit nhng thng tin mang tnh hoch nh v lnh o v d nh: mt
hng ny ang bn chy khu vc ny liu c bn chy khu vc khc khng?...Nu
tr li cc cu hi ny h thng OLTP th s rt kh v hiu qu thp v d liu ca
OLTP qu chi tit, lu tr phn tn gii quyt vn ny, h thng data
warehouse (kho d liu) ra i cng vi cc k thut OLAP, Data mining (khai ph d
liu) c th gip c ngi qun tr cp cao tr li cc cu hi m h yu cu.

1.2.

Kho d liu (Data warehouse)

1.2.1. Kho d liu


Data warehouse - kho d liu l 1 tp hp thng tin c bn trn my vi tnh m
chng c tnh quyt nh n vic thc hin thnh cng bc u trong cng vic kinh
doanh[1].
Mt kho d liu, gi mt cch chnh xc hn l kho thng tin (information
warehouse), l mt c s d liu hng i tng c thit k vi vic tip cn cc
kin trong mi lnh vc kinh doanh. N cung cp cc cng c p ng thng tin cn
thit cho cc nh qun tr kinh doanh ti mi cp t chc - khng nhng ch l
nhng yu cu d liu phc hp, m cn l iu kin thun tin nht t c vic
ly thng tin nhanh, chnh xc. Mt kho d liu c thit k ngi s dng c th
nhn ra thng tin m h mun c v truy cp n bng nhng cng c n gin[9].
Mt kho d liu l mt s pha trn ca nhiu cng ngh, bao gm cc c s d
liu a chiu v mi quan h gia chng, kin trc ch khch, giao din ngi dng
ha v nhiu na. D liu trong kho d liu khng ging d liu ca h iu hnh l
loi ch c th c nhng khng chnh sa c. H iu hnh to ra, chnh sa v xa
nhng d liu sn xut m nhng d liu ny cung cp cho kho d liu. Nguyn nhn
chnh cho s pht trin mt kho d liu l hot ng tch hp d liu t nhin ngun
khc nhau vo mt kho d liu n l v dy c m kho ny cung cp cho vic phn
tch v ra quyt nh trong cng vic kinh doanh.
i vi mt s cng vic kinh doanh thng tin l ngun ti nguyn c gi tr rt
ln th mt kho d liu tng i ging nh mt nh kho cha hng. H iu hnh to
8

ra nhng phn d liu v np chng vo kho. Mt s phn c tm tt trong thnh


phn thng tin v c ct vo kho. Ngi s dng kho d liu a ra nhng yu cu
v c cung cp sn phm c to ra t cc thnh phn v cc phn on c lu
trong kho.
Mt kho d liu c xc nh ng hng, hot ng hiu qu c th tr
thnh mt cng c cnh tranh c gi tr cao trong kinh doanh.
1.2.2. Mc ch ca kho d liu
Mc tiu chnh ca kho d liu l t nhng mc tiu sau:
-

Phi c kh nng p ng mi thng tin yu cu ca ngi dng

H tr nhn vin ca t chc thc hin tt, hiu qu cng vic ca h

Gip cc t chc xc nh, qun l, iu hnh cc d n, nghip v mt cch hiu


qu v chnh xc.

Tc hp d liu v siu d liu t nhiu ngun khc nhau.

Mun t c cc mc tiu trn th kho d liu phi:


-

Nng cao cht lng d liu bng cch lm sch v hng ch nht nh

Tng hp v kt ni d liu

ng b ha cc ngun d liu

Phn nh v ng nht cc h c s d liu tc nghip

Qun l siu d liu

Cung cp thng tin c tch hp, tm tt hoc c lin kt, t chc theo cc ch

Dng trong cc h thng h tr ra quyt nh.

1.2.3. Li ch ca kho d liu


To ra nhng quyt nh c nh hng ln. Mt kho d liu cho php trch rt
ti nguyn nhn lc v my tnh theo yu cu cung cp cc cu truy vn v cc bo
co da vo c s d liu hot ng v sn xut. iu ny to ra s tit kim ng k.
9

C kho d liu cng trch rt ti nguyn khan him ca h thng sn xut khi thc thi
mt chng trnh qu lu hoc cc bo co v cc cu truy vn phc hp.
Cng vic kinh doanh tr nn thng minh hn. Tng thm cht lng v tnh
linh hot ca vic phn tch kinh doanh do pht sinh t cu trc d liu a tng ca
kho d liu, l ni cung cp d liu c sp xp t mc chi tit ca cng vic
kinh doanh cho n mc cao hn - mc tng qut. m bo c d liu chnh
xc v ng tin cy do m bo c l trong kho d liu ch cha duy nht d liu c
cht lng cao v n nh (trusted data).
Dch v khch hng c nng cao. Mt doanh nghip c th gi gn mi quan
h vi khch hng tt hn do c mi tng quan vi d liu ca tt c khch hng qua
mt kho d liu ring.
Ti sng to nhng tin trnh kinh doanh. S cho php phn tch khng ngng
thng tin kinh doanh thng cung cp s hiu bit mi mt ca phng thc kinh
doanh do c th lm ny sinh ra nhng kin cho s sng to ra nhng tin trnh
ny li. Ch khi xc nh chnh xc cc nhu cu t kho d liu th mi gip ta nh gi
c nhng hn ch v mc tiu kinh doanh mt cch chnh xc hn.
Ti sng to h thng thng tin. Mt kho d liu l nn tng cho cc yu cu d
liu trong mi lnh vc kinh doanh, n cung cp mt chi ph nh hng ngha l a ra
thi quen cho cho c hai s chun ha d liu v s chun ha hot ng ca h iu
hnh theo chun quc t.
1.2.4. Thnh phn ca kho d liu
Chi tit hin hnh
Trung tm ca kho d liu l chi tit hin hnh ca n. l ni m phn ln
d liu c lu tr. Chi tit hin hnh n trc tip t h iu hnh v c th c
lu tr nh l d liu th hoc nh s tp hp ca d liu th.

10

Data Marts
and Cubes
Source

Relational
Data Store
Clients

Hnh 1. Cc thnh phn ca kho d liu

Chi tit hin hnh l phn li d liu mc thp nht trong kho d liu. Mi thc
th d liu trong chi tit hin hnh l mt bc nh chp nhanh, ti mt thi im, l s
minh ha khi d liu chnh xc. Chi tit hin hnh l c trng t hai n nm nm. S
chnh xc ca chi tit hin hnh xy ra thng xuyn nh l iu kin cn thit
cung cp nhng yu cu trong kinh doanh.
H thng bn ghi
Mt h thng bn ghi l ngun d liu tt nht hoc phi nht (rightest data)
dng nui dng kho d liu. D liu phi nht l d liu hp thi nht, y
nht, chnh xc nht, v c s thch nghi v cu trc nht trong kho d liu. D liu
phi nht thng ng nht i vi ngun ghi nhn trong mi trng sn xut. Trong
nhng trng hp khc, mt h thng bn ghi c th l mt ni dng cha d liu
tng hp.
1.2.5. Cu trc ca kho d liu
Mt kho d liu c th c mt vi phn ca cu trc sau:
11

Kho d liu mc vt l
C s d liu mc vt l trong tt c d liu ca kho d liu c lu tr , theo
cng vi metada v tin trnh x l logic cho vic lc, t chc v ng gi d liu, x
l d liu chi tit.
Kho d liu mc logic
Cng cha ng metadata bao gm nhng lut kinh doanh v x l logic cho
vic lc, t chc, ng gi v x l d liu, nhng khng cha ng d liu tht s.
Thay vo n cha ng nhng thng tin cn thit truy cp d liu bt c ni
u.
Kho d liu thng minh hay d liu theo ch (Data mart)
L tp con ca mt kho d liu din rng. in hnh l n cung cp nhng
thnh phn ln (phn khu, vng, chc nng,). Ni tm li, Data mart nh l nhng
phn chuyn bit ha ca kho d liu.
1.2.6. M hnh thc th trong kho d liu
M hnh thc th mi quan h c s dng ph bin trong m hnh c s d
liu OLTP. Tuy nhin, m hnh c s d liu ER ny khng thch hp cho vic thit
k kho d liu v phi truy vn ti qu nhiu bng khc nhau. Hu ht cc kho d liu
s dng m hnh sao (star schema). M hnh ny ch gm duy nht mt bng s kin
v mt bng chiu (dimention) cho mi chiu. Trong bng s kin s c cc trng
kha ngoi lin kt vi kha chnh ca cc bng chiu. V d v m hnh sao:

12

Products

Orders
Fact Table

OrderNo
OrderDate
Custormers
CustomerNo
CustomerName
CustomerAddress

OrderNo
SalespersionID
CustomerNo
ProdNo
DateKey
CityName
Quantity
TotalPrice

City
Salespersons
SalespersonID
SalespersonName
City
Quota

ProdNo
ProdName
ProdDescr
Category
CategoryDescr
UnitPrice
QOH
Date
DateKey
Date
Month
Year
City
CityName
State
Country

Hnh 2. M hnh sao


M hnh sao khng h tr tt cho cc bng cha cc thuc tnh phn cp. M
hnh bng tuyt (SnowFlake Schema) a ra gii php cho m hnh sao khi bng c
thuc tnh phn cp.

13

Orders

Products
Category

Fact table
OrderNo
OrderDate
Customers
CustomerNo
CustomerName
CustomerAddress
City

OrderNo
SalespersonID
CustomerNo
DateKey
CityName
ProdNo
Quantity
TotalPrice

ProdNo
ProdName
ProdDescr
Category
UnitPrice
QOH
Date
DateKey
Date
Month

Salesperson
SalespersonID
SalespersonName
City
Quota

CategoryName
CategoryDescr

Month
Year
Month
Year

City
State
CityName
State

Hnh 3. M hnh bng tuyt


iu ny gip cho vc bo tr cc bng chiu tt hn. Tuy nhin cu trc mc nh
trong s sao ca cc bng chiu c th thch hp hn khi duyt cc chiu.
S chm sao (fact constellation) l mt v d cho cu trc phc tp khi c
nhiu hn 1 bng s kin. Mi s sao c th xy dng thnh s chm sao (v d
bng cch chia tch cc lc sao gc thnh cc lc sao m mi chng c m
t trn cc cp khc nhau ca cc chiu phn cp). Cc kin trc s chm sao bao
gm nhiu bng s kin v c chia s cho nhiu bng chiu.

14

Hnh 4. M hnh chm sao


1.2.7. Cc lnh vc ng dng ca kho d liu
Cc lnh vc hin ti c ng dng data warehouse bao gm:
- Thng mi in t.
- K hoch ha ngun lc doanh nghip.
- Qun l quan h khch hng.
- Chm sc sc khe.
- Vin thng.

15

Chng 2. K thut phn tch OLAP


2.1.

Gii thiu OLAP

OLAP l mt k thut s dng cc th hin d liu a chiu gi l cc khi


(cube) nhm cung cp kh nng truy xut nhanh n d liu ca kho d liu. To khi
(cube) cho d liu trong cc bng chiu (dimension table) v bng s kin (fact table)
trong kho d liu v cung cp kh nng thc hin cc truy vn tinh vi v phn tch cho
cc ng dng client theo Hari Mailvaganam [5].
Trong khi kho d liu v data mart lu tr d liu cho phn tch, th OLAP l k
thut cho php cc ng dng client truy xut hiu qu d liu ny. OLAP cung cp
nhiu li ch cho ngi phn tch, cho v d nh:
- Cung cp m hnh d liu a chiu trc quan cho php d dng la chn, nh
hng v khm ph d liu.
- Cung cp mt ngn ng truy vn phn tch, cung cp sc mnh khm ph cc
mi quan h trong d liu kinh doanh phc tp.
- D liu c tnh ton trc i vi cc truy vn thng xuyn nhm lm cho
thi gian tr li rt nhanh i vi cc truy vn c bit.
- Cung cp cc cng c mnh gip ngi dng to cc khung nhn mi ca d
liu da trn mt tp cc hm tnh ton c bit.
OLAP c t ra x l cc truy vn lin quan n lng d liu rt ln m
nu cho thc thi cc truy vn ny trong h thng OLTP s khng th cho kt qu hoc
s mt rt nhiu thi gian.

2.2.

M hnh d liu a chiu

Cc nh qun l kinh doanh c khuynh hng suy ngh theo nhiu chiu
(multidimensionally). V d nh h c khuynh hng m t nhng g m cng ty lm
nh sau:
Chng ti kinh doanh cc sn phm trong nhiu th trng khc nhau, v chng ti
nh gi hiu qu thc hin ca chng ti qua thi gian.
Nhng ngi thit k kho d liu thng lng nghe cn thn nhng t v h
thm vo nhng nhn mnh c bit ca h nh:
Chng ti kinh doanh cc sn phm trong nhiu th trng khc nhau, v chng ti
nh gi hiu qu thc hin ca chng ti qua thi gian.
16

Suy ngh mt cch trc gic, vic kinh doanh nh mt khi (cube) d liu, vi cc
nhn trn mi cnh ca khi (xem hnh bn di). Cc im bn trong khi l cc giao
im ca cc cnh. Vi m t kinh doanh trn, cc cnh ca khi l Sn phm, Th
trng, v Thi gian. Hu ht mi ngi u c th nhanh chng hiu v tng tng
rng cc im bn trong khi l cc o hiu qu kinh doanh m c kt hp gia
cc gi tr Sn phm, Th trng v Thi gian [5].
San pham
Thi gian

Th trng

Hnh 5. M phng cc chiu trong kinh doanh

Mt khi d liu (datacube) th khng nht thit phi c cu trc 3 chiu (3-D),
nhng v c bn l c th c N chiu (N-D). Nhng cnh ca khi c gi l cc
chiu (dimensions), m l cc mt hoc cc thc th ng vi nhng kha cnh m
t chc mun ghi nhn. Mi chiu c th kt hp vi mt bng chiu (dimension
table) nhm m t cho chiu . V d, mt bng chiu ca Sn phm c th cha
nhng thuc tnh nh Ma_sanpham, Mo_ta, Ten_sanpham, Loai_SP, m c th
c ch ra bi nh qun tr hoc cc nh phn tch d liu. Vi nhng chiu khng
c phn loi, nh l Thi gian, h thng kho d liu s c th t ng pht sinh
tng ng vi bng chiu (dimension table) da trn loi d liu. Cn ni thm rng,
chiu Thi gian trn thc t c ngha c bit i vi vic h tr quyt nh cho cc
khuynh hng phn tch. Thng th n c mong mun c mt vi tri thc gn lin
vi lch v nhng mt khc ca chiu thi gian.
Hn na, mt khi d liu trong kho d liu phn ln c xy dng o hiu
qu ca cng ty. Do mt m hnh d liu a chiu c th c t chc xung quanh
mt ch m c th hin bi mt bng s kin (fact table) ca nhiu o s hc
(l cc i tng ca phn tch). V d, mt bng s kin c th cha s mt hng bn,
thu nhp, tn kho, ngn sch, Mi o s hc ph thuc vo mt tp cc chiu
cung cp ng cnh cho o . V th, cc chiu kt hp vi nhau c xem nh xc
nh duy nht o, l mt gi tr trong khng gian a chiu. V d nh mt kt hp

17

ca Sn phm, Thi gian, Th trng vo 1 thi im l mt o duy nht so vi cc


kt hp khc.
Cc chiu c phn cp theo loi. V d nh chiu Thi gian c th c m t
bi cc thuc tnh nh Nm, Qu, Thng v Ngy. Mt khc, cc thuc tnh ca mt
chiu c th c t chc vo mt li m ch ra mt phn trt t ca chiu. V th,
cng vi chiu Thi gian c th c t chc thnh Nm, Qu, Thng, Tun v Ngy.
Vi s sp xp ny, chiu Thi gian khng cn phn cp v c nhng tun trong nm
c th thuc v nhiu thng khc nhau.
V vy, nu mi chiu cha nhiu mc tru tng, d liu c th c xem t
nhiu khung nhn linh ng khc nhau. Mt s thao tc in hnh ca khi d liu nh
roll-up (tng mc tru tng), drill-down (gim mc tru tng hoc tng mc
chi tit), slice and dice (chn v chiu), v pivot (nh hng li khung nhn a chiu
ca d liu), cho php tng tc truy vn v phn tch d liu rt tin li. Nhng thao
tc c bit nh X l phn tch trc tuyn (OnLine Analytical Processing
OLAP).
Nhng nh ra quyt nh thng c nhng cu hi c dng nh tnh ton v xp
hng tng s lng hng ho bn c theo mi quc gia (hoc theo mi nm). H
cng mun so snh hai o s hc nh s lng hng bn v ngn sch c tng
hp bi cng cc chiu. Nh vy, mt c tnh phn bit ca m hnh d liu a
chiu l n nhn mnh s tng hp ca cc o bi mt hoc nhiu chiu, m l

mt trong nhng thao tc chnh yu tng tc x l truy vn.


2.3.

Kin trc khi (Cube) ca OLAP

i tng chnh ca OLAP l khi (cube), mt th hin a chiu ca d liu chi


tit v tng hp. Mt khi bao gm mt ngun d liu (Data source), cc chiu
(Dimensions), cc o (Measures) v cc phn dnh ring (Partitions). Cc khi
c thit k da trn yu cu phn tch ca ngi dng. Mt kho d liu c th h tr
nhiu khi khc nhau nh khi Bn hng, khi Bng kim k
D liu ngun ca mt khi ch ra ni cha kho d liu cung cp d liu cho khi.
Cc chiu (dimension) c nh x t cc thng tin ca cc bng chiu (dimension
table) trong kho d liu vo cc mc phn cp, v d nh chiu a l th gm cc
mc nh Lc a, Quc gia, Tnh-Thnh ph. Cc chiu c th c to mt cch c
lp v c th chia s gia cc khi nhm xy dng cc khi d dng v chc chn
rng thng tin tng hp cho phn tch lun n nh. V d, nu mt chiu chia s mt
18

phn cp sn phm v c s dng trong tt c cc khi th cu to ca thng tin tng


hp v sn phm s n nh gia cc khi s dng chiu .
Mt chiu o (virtual dimension) l mt dng c bit ca chiu m nh x cc
thuc tnh t cc thnh vin (member) ca mt chiu khc sau c th c s
dng trong cc khi. V d, mt chiu o ca thuc tnh kch thc sn phm cho php
mt khi (cube) tng hp d liu nh s lng sn phm bn c theo kch thc,
hoc nh s lng o bn c theo kiu v theo kch thc. Cc chiu o (virtual
dimension) v cc thuc tnh thnh vin c nh gi l cn thit cho cc truy vn v
chng khng i hi phi c cc khi lu tr vt l.
Cc o (measure) xc nh cc gi tr s t bng s kin (fact table) m c
tng hp cho phn tch nh gi bn, chi ph hoc s lng bn.
Cc phn dnh ring (partition) l cc vt cha lu tr a chiu, gi d liu ca
khi. Mi khi cha t nht mt partition, v d liu ca khi c th kt hp t nhiu
partition. Mi partition c th ly d liu mt ngun d liu khc nhau v c th lu
trong mt v tr ring bit (separate). D liu ca mt partition c th c cp nht
c lp vi cc partition khc trong mt khi. V d, d liu ca mt khi c th c
chia theo thi gian, vi mt partition cha d liu ca nm hin hnh, mt partition
khc cha d liu ca nm trc, v mt partition th ba cha tt c d liu ca cc
nm trc na.
Cc partition ca mt khi c th c lu tr c lp trong cc cch thc khc
nhau vi cc mc tng kt khc nhau. Cc partition khng th hin i vi ngi
dng, i vi h mt khi (cube) l mt i tng n, v chng cung cp cc tu
chn a dng qun l d liu OLAP.
Mt khi o (virtual cube) l mt khung nhn lun l (logic) ca cc phn chia ca
mt hoc nhiu khi. Mt khi o c th c s dng ni (join) cc khi khc
nhau chia s mt chiu chung no , v d nh c th kt gia khi Bn hng v
khi Kho nhm cc mc ch phn tch c bit no trong khi duy tr cc khi tch
bit cho n gin. Cc chiu (dimension) v cc o (measure) c th c chn t
cc khi c kt th hin trong khi o.

2.4.

So snh OLAP v OLTP

c trng ca cc ng dng OLTP (On-Line Transaction Processing) l cc tc v


x l t ng ghi chp d liu x l tc v ca mt t chc nh ghi nhn n t hng
v cc giao dch ngn hng (chng l nhng cng vic hng ngy ca t chc thng
mi) m cn phi c hoc cp nht mt vi mu tin da trn kho chnh ca chng[5].
19

Nhng tc v c cu trc, c lp li, bao gm cc giao dch ngn, ti gin v tch


bit, yu cu d liu chi tit v mi cp nht. Cc c s d liu tc nghip c xu
hng t vi trm megabyte n hng gigabyte kch thc v ch lu tr cc d liu
hin hnh. Tnh nht qun v kh nng phc hi ca c s d liu l then cht, v ti
a thng lng giao dch l thc o chnh yu. V th c s d liu c thit k
ti thiu cc xung t trng lp.
Cn kho d liu, mc tiu l h tr quyt nh cho cc nh qun l. Tnh chi tit v
ring l ca cc mu tin th t quan trng hn tnh lch s, tng kt v hp nht ca d
liu. Do , kho d liu thng cha d liu hp nht t mt hoc nhiu c s d liu
tc nghip v c thu thp qua mt thi gian di. Kt qu l kch thc kho d liu
c khuynh hng t vi trm gigabyte n hng terabyte so vi cc c s d liu tc
nghip. Kho d liu h tr cc truy vn phc tp vi thi gian hi p nhanh, cc truy
vn phc tp c th truy xut hng triu mu tin v thc hin nhiu ln cc thao tc
qut, kt v tng hp. i vi kho d liu, s lng truy vn a vo v thi gian hi
p quan trng hn s lng giao dch a vo. M OLAP l mt trong nhng cng
c cho php thc hin hiu qu cc truy vn ny.
Cn c vo , cc c s d liu tc nghip c xy dng h tr tt cc tc v
OLTP, v th nu c gng thc thi cc truy vn OLAP phc tp i vi cc c s d
liu tc nghip s cho kt qu l hiu qu thc hin khng th chp nhn c.

2.5.

Cc thnh phn ca OLAP

Nhng thnh phn m OLAP s dng thc hin cc dch v bao gm:
- Ngun d liu: Cc c s d liu OLTP v cc ngun d liu hp l khc cha
cc d liu c th chuyn i thnh d liu OLAP trong kho lu tr.
- Kho trung gian: l ni lu tr v x l d liu c tp hp, sau c sp
xp, sng lc, chuyn i thnh d liu OLAP hu ch.
- My ch lu tr: Cc my tnh chy c s d liu lin kt cha cc kho d liu
cho kho lu tr, v cc my ch qun l d liu OLAP (warehouse server).
- ng dng thng minh: Cc b cng c v ng dng thc hin truy vn d liu
OLAP v cung cp cc bo co v thng tin cho ngi ra quyt nh ca doanh
nghip (Business Intelligence).
- Siu d liu: Cc i tng nh cc bng biu trong c s d liu OLTP, cc
khi trong kho lu tr d liu, v cc bn ghi m ng dng tham chiu ti cc on
d liu khc nhau.
20

2.6.

Chuyn i d liu t OLTP ti OLAP

chuyn i d liu OLTP sang d liu OLAP trong kho d liu c thc hin
thng qua cc qui trnh sau:
-Hp nht d liu: tt c cc d liu lin quan ti cc mc c trng (sn phm,
khch hng, hay nhn vin) phi c kh nng hp nht t nhiu h thng OLTP ti
mt h thng OLAP n. Quy trnh hp nht phi gii quyt c s khc nhau v
m ho gia cc h thng OLAP, ph hp vi cc d liu chung c s dng c
hai h thng c th bng cch so snh cc trng tng t, c th bin i d liu
lu tr t nhiu loi d liu khc nhau trong mi h thng OLTP thnh mt loi d
liu duy nht c s dng trong h thng OLAP.Cc h thng cung cp cc d
liu u vo cho mt h thng OLAP khng nht thit phi l cc h thng OLTP
truyn thng m c th c lu tr nhiu dng hp l, chng hn nh cc bn
ghi Microsoft Excel trong mt tp c chia s.
-Qut d liu: Vic hp nht d liu OLTP vo mt kho d liu (data
warehouse) to iu kin qut d liu. Mt s h thng OLTP nh vn cc mc
khc nhau, hoc qu trnh hp nht c th gy ra cc li chnh t. S khng thng
nht ny phi c chnh sa trc khi d liu c th c nhp vo kho lu tr
phc v cho h thng OLAP.
-Tp hp d liu: D liu OLTP ghi nhn tt c cc chi tit ca transaction.
OLAP ch truy vn nhng d liu tng kt cn thit, hoc cc d liu c tp hp
bng mt s quy tc nht nh. V d, mt truy vn ly tng doanh thu hng thng
cho mi sn phm trong nm trc s chy nhanh hn nu c s d liu ch c cc
dng tng kt doanh thu hng ngy (hoc tng gi) ca mi sn phm, so vi truy
vn phi qut tt c cc bn ghi chi tit trong vng 1 nm. Mc tp hp d liu
trong kho lu tr ph thuc vo s lng cc yu t thit k (ging nh lp trnh
hng i tng).
-Sp xp d liu: Khi d liu OLTP c chuyn vo kho lu tr, chng s
phi c bin i theo cch sp xp hp l hn i vi nhu cu phn tch nhm
a ra quyt nh v hn ch tiu ph thi gian. Qu trnh thit lp kho lu tr bao
gm c vic sp xp li d liu OLTP, lu trong cc bng biu lin kt, thnh d
liu OLAP c lu trong cc khi a chiu. D liu sau c ti vo kho lu
tr.
-Truy cp v phn tch d liu: Khi d liu c ti vo kho lu tr, OLAP
cung cp kh nng truy cp, xem, v phn tch d liu vi linh hot v hiu qu
21

cao. OLAP trnh by d liu thng qua m hnh d liu t nhin v trc quan, gip
cho ngi s dng xem v hiu mt cch tt nht nhng thng tin trong kho lu
tr. T cho php ngi s dng nhn bit c gi tr ca d liu.

2.7.

Cc m hnh lu tr h tr OLAP

Dch v OLAP h tr nhiu m hnh lu tr d liu khc nhau, mi m hnh c cc


u v khuyt im ring, chng c s dng tu theo mc ch khai thc.
2.7.1. M hnh Multidimentional OLAP (MOLAP)
M hnh OLAP a chiu (MOLAP) lu tr d liu c s (l d liu t cc bng
ca kho d liu hoc data mart) v thng tin tng hp (l cc o c tnh ton t
cc bng) trong cc cu trc a chiu gi l cc khi (cube). Cc cu trc ny c lu
bn ngoi c s d liu data mart hoc kho d liu.

D liu trong mi
trng OLAP

Mysql

Oracle

MOLAP
data

Other

Hnh 6. M hnh d liu MOLAP


Lu tr cc khi (cube) trong cu trc MOLAP l tt nht cho cc truy vn tng hp
d liu thng xuyn m cn thi gian hi p nhanh. V d, tng sn phm bn c
ca tt c cc vng theo qu.
u im ca m hnh MOLAP:
- Thc thi nhanh: khi trong MOLAP thu hi d liu nhanh v ti u ha
hot ng[15].
- C th thc hin cc php ton phc tp: mi tnh ton c to ra trc
khi khi to ra [15].
22

Nhc im ca m hnh MOLAP:


- Gii hn lng d liu c th x l: Bi v tt c cc tnh ton c sinh
ra khi xy dng khi, do n khng th bao gm lng d liu ln
trong khi ca chnh n. iu ny khng c ngha l d liu t khi
khng th c xy dng t mt lng d liu ln. iu ny c th,
nhng n ch tm tt thng tin cha trong chnh n [15].
-

Yu cu u t thm: Cng ngh to khi thng c c quyn v


khng tn ti trong t chc no. V vy, s dng cng ngh MOLAP
cn phi u t b sung thm vn v nhn lc [15].

2.7.2. M hnh Relational OLAP (ROLAP)


M hnh OLAP quan h (ROLAP) lu tr d liu c s v thng tin tng hp
trong cc bng quan h. Cc bng ny c lu tr trong cng c s d liu nh l cc
bng ca data mart hoc kho d liu.

Hnh 7. M hnh d liu ROLAP


Lu tr cc khi trong cu trc ROLAP l tt nht cho cc truy vn d liu
khng thng xuyn. V d nh nu 80% ngi dng truy vn ch d liu trong vng
mt nm tr li y, cc d liu c hn mt nm s c a vo mt cu trc
ROLAP gim khng gian a b chim dng, hn na cn loi tr d liu trng
lp.
u im ca m hnh ROLAP:
- C th x l lng d liu ln: Kch thc gii hn ca ROLAP ph thuc
vo kch thc ca c s d liu ngn. Ni cch khc, bn thn cng ngh
ROLAP khng c gii hn v kch thc d liu [15].
23

C th vn dng chc nng vn c ca c s d liu quan h: C s d liu


quan h thng i km vi rt nhiu chc nng. Cng ngh ROLAP c th tn
dng cc chc nng ny, tit kim chi ph [15].
Nhc im ca ROLAP:
Hiu sut x l thp: Mi bo co ROLAP thng c tp hp d liu t
nhiu bng khc nhau, iu ny s lm cho hiu qu ca ROLAP thp khi d
liu ln, phn tn [15].
Gii hn bi chc nng ca SQL: Bi v cng ngh ROLAP ch yu da vo
vic to ra cc cu lnh SQL truy vn c s d liu. M bo co da trn
truy vn SQL trong mt s trng hp khng t c hiu qu mong mun.
Cc nh pht trin khc phc iu ny bng cc to ra cc cng c h tr
ngoi gip ngi dng to ra cc chc nng ca ring h [15].

2.7.3. M hnh Hybird OLAP (HOLAP)


M hnh OLAP lai (HOLAP) l s kt hp gia MOLAP v ROLAP.

Hnh 8. M hnh d liu HOLAP


Lu tr cc khi (cube) trong cu trc HOLAP l tt nht cho cc truy vn tng hp
d liu thng xuyn da trn mt lng ln d liu c s. V d, chng ta s lu tr
d liu bn hng theo hng qu, hng nm trong cu trong MOLAP v d liu hng
thng, hng tun v hng ngy trong cu trc ROLAP[15].
Li ch ca vic lu tr trong cu trc HOLAP l:
- Ly d liu trong khi (cube) nhanh hn bng cch s dng x l truy vn tc
cao ca MOLAP.
- Tiu th t khng gian lu tr hn MOLAP.
- Trnh trng lp d liu.
24

2.7.4. So snh cc m hnh


Bng sau so snh tng hp ba m hnh lu tr h tr OLAP:
MOLAP

ROLAP

HOLAP

Lu tr d liu c s

Khi

Bng quan h Bng quan h

Lu tr thng tin tng hp

Khi

Bng quan h Khi

Hiu sut thc hin truy vn

Nhanh nht Chm nht

Nhanh

Tiu th khng gian lu tr

Nhiu

Thp

Trung bnh

Chi ph bo tr

Cao

Thp

Trung bnh

25

Chng 3. B cng c Pentaho


3.1

Tng quan

B cng c Pentaho open BI cung cp mt ci nhn ton cnh v kh nng kinh


doanh thng minh (BI) ca doanh nghip bao gm: lp biu bo, phn tch, biu ,
tch hp d liu, v l mt h BI m ngun m ph bin nht th gii. Sn phm ca
Pentaho c cc doanh nghip hng u s dng nh: MySql, Motorola, Terra
Industries, DivX[6]
B cng c pentaho c cc cng c:
- Report designer
-

Design studio
Aggregation designer
Metadata editer
Pentaho data integartion

- Schema wordbench
Cu trc ca Pentaho:

Hnh 9. Cu trc Pentaho

3.2

Cc kh nng BI ca pentaho
26

Pentaho gip ngi dng:


Bo co:
Cc t chc s dng bo co t nhiu ngun nn bo co l ct li v c khai
thc u tin trong kinh doanh thng minh. Bo co Pentaho cho php cc doanh
nghip truy cp, nh dng v phn phi thng tin d dng n nhn vin, khch hng
v cc i tc.

Linh hot trong trin khai t nhng bo co n n cc bo co dng web tch hp


trong kinh doanh thng minh ca doanh nghip.
H tr nhiu ngun d liu nh OLAP, hay ngun d liu da trn XML.
Xut d liu linh hot ra PDF, HTML, Microsoft Excel, Rich Text Format, hay text

thun ty.
Wizard h tr thit k bo co d dng v nhanh chng.
Phin bn chuyn nghip vi nhiu chc nng nh phn nhm, ng k, tch hp

th mc, kim duyt


Phn tch:
Pentaho Analysis l mt cng c phn tch c lc gip ngi dng a ra
nhng quyt nh c hiu qu nht. V d: Bo co cho bit tnh hnh bn c khuynh
hng gim hn so vi mong i th cc tri thc d dng pht hin ra nguyn nhn vn
bng cch t cc cu hi sau:
- Vn nh hng n mt dng sn phm hay mt khu vc no ?
- S khc nhau gia s phi hp ny vi nhng phi hp khc m khng c vn
l g?
- Vn lin quan vi nhng hng bn l g? Nhng chin dch tip th? Hay
ci g khc?
Pentaho Analysis gip tr li nhng cu hi kinh doanh bng cch:
- Gip ngi dng d dng khai thc thng tin kinh doanh bng cch ko, th, xem
chi tit hay lp bng kim tra cho d liu.
- Tr li nhanh cc truy vn phn tch phc tp.
- Gii quyt cc cu hi phc tp nhanh chng.

27

- H tr cc kh nng tin tin bao gm bo co tch hp, siu d liu, biu thng
qua vic tch hp vi cc sn phm khc trong b Pentaho.
Biu :
Pentaho Dashboards gip ngi qun tr hiu tng tn s vic bn trong ngay
lp tc t s thc hin c nhn, phng ban, hay doanh nghip. Bng php o trn giao
din trc quan, Pentaho Dashboards cung cp cho nh doanh nghip thng tin thc
gip h hiu bit v ci thin cng vic.
Pentaho Dashboards h tr tnh trc quan bng cch cung cp:
- Kh nng qun l cc php o ton din cho php nh ngha v theo di nhng o
c ng ch mc c nhn, phng ban hay doanh nghip.
- Hin th trc quan phong ph gip nh kinh doanh c th thy ngay nhng ci no
ang i ng hng v ci no cn ch .
- Tch hp bo co v phn tch ngi s dng c th khai thc tn gc cc bo
co v phn tch hiu nhng nhn t a n thnh cng hay tht bi.
- Cng tch hp d dng chuyn cc php o c trong kinh doanh lin quan vi s
lng ln ngi s dng, tch hp thng vo trong ng dng ca h.
- Tch hp bo ng lin tc theo di nhng ngoi l v thng bo cho ngi s
dng bit.
Khai ph d liu:
- Nhng mi quan h tm n trong d liu c th c dng ti u ha nhng qui
trnh nghip v v d on nhng kt qu tng lai.
- Cung cp mt phm vi tin tin y cc gii thut khai thc d liu.
- Hin th kt qu cho ngi dng vi nh dng d hiu.
Quy trnh:
- Qui trnh kinh doanh t ng v hp l a ra cc kt qu c bo chng, hiu qu
v c th bo co vi nhiu mc ch khc nhau.

28

- Lin kt trc tip cc php o c vi tin trnh. y mnh ci tin chu trnh kinh
doanh lin tc. T vic bo co da theo cc php o thng qua s thay i trong
kinh doanh n vic bo co nhng kt qu thay i , v lp li qu trnh ti
u ha hn na.

3.3 Nhng c tnh v li ch


Cung cp hiu bit tng tn gia cc mu v mi quan h n trong d liu ca
bn:
Mt v d in hnh ca vic khai thc d liu l mt ngi bn l pht hin ra mi
quan h gia vic bn t lt v bia vo nhng chiu ch nht Hai sn phm ny
chng c quan h g vi nhau. Nhng nu tnh c nhnhg ng chng pht hin trong
kho hng c bia th h s nht bia ln thay v t lt iu ny s khng c pht hin
trc khi khai thc d liu.
Cho php bn khai thc nhng tng quan ci thin cng vic
Tip tc v d trn, nhng ngi bn l thng hot ng trong nhng mi quan h
h c bng cch dng chin thut lin kt cc hn mc vi nhau kch thch s mua
hng. Cc doanh nghip c th thu li t theo cch lm ging nhau s dng nhng
kiu mu c khm ph mi nht v nhng tng quan nh c s thi hnh ci
thin hiu qu v hiu lc.
c kt cc b quyt cho tng lai
Khng hc t qu kh chc chn s b vp li l mt li trch ni ting t nh
trit hc George Santayana. Vic khai thc d liu c kh nng d on nhng hu
qu da vo d liu c ci thin ng k cht lng v nhng hu qu trc khi
a ra quyt nh. Ly v d n gin, l ngi quyt nh tt nn kt hp cc giai
an m khch hng thanh ton ng hn v li dng nhng thng tin hu ch ny
a ra nhng quyt nh.
Cho php a cc khuyn co vo trong ng dng
Bn c th dng kt qu khai thc d liu trnh by mt bn tng kt thu chi
n gin v ua ra nhng khuyn co vo hat ng ng dng. V d trn mn hnh
thanh ton bn c th thm cu: Da vo s liu mi c n 85% kh nng khch
hng ny tr chm, v th ha n ny c ngh tr trc 50%. Vic lp bo co
29

da trn kt qu tng th ging nh thi gian thu hi tin hng tn ng (DSO) gip
bn o c s tin trin trong kinh doanh da vo cc xut cho php hay khng
cho php bn tinh chnh m hnh v cc khuyn co c hiu qu ti u, cho php
bn tn dng trit cc phm vi thut ton.
Khng c thut ton no ti u cho tt c cc tnh hung. V vy bn nn th
cc phm vi tm ra tht ton ph hp nht cho d liu ca bn.
Nu bn c nhiu thut ton hp l bn c th dng tt c V d: Da trn s
phn tch ca 3 m hnh d an th kh nng khch hng ny tr chm l: M hnh A:
95% (96% ng), M hnh B: 89% (92% ng), M hnh C: 76% (97% ng).
C th p dng cho bt c BI hoc tin trnh kinh doanh no
Tch hp vi cc thnh phn khc ca h Pentaho BI cho php bn d dng p
dng khai thc d liu cho bt k tin trnh no trong h thng (chn hng nh quay
vng tin mt) v qui trnh kinh doanh thng minh (nh pht sinh bo co, ha n, v
nhg hnh ng tri quy lut). Vic ng dng ny rt linh hat ty theo d kin ca
tin trnh BI c thc hin.
Trch dn, to, khai thc c ci nhn su sc hn trong phn tch ca bn
iu ny xy ra khi d liu c sinh ra hoc mt phn trong tin trnh chun
b d liu. V d khi lm bo co bn hng bn c th hin vng ha m bn dng
cho khai thc d liu sau ny. Ngai ra bn cng c th thm d liu trong qu trnh
chun b khai thc d liu nh cc bin tnh ton hay n v o lng khc.
Cch khai thc d liu.
Chn mt m hnh
Cc nh phn tch c th lm vic trn phm vi m hnh trc quan bao gm cc
hnh thc tin tin ca khai thc d liu nh l xp nhm, phn on, cc quyt nh
hnh cy, kiu ngu nhin, kiu hnh mng, v phn tch nhn t thit yu.
Thm d liu
C th thm cc tnh nng khc cho d liu. V d, bn c th nh ngha cc
bin h thng c th t ng ly d liu to thm cc ct mi phn tch.
Ph hp
30

Mi m hnh lm vic c nhng tham s ring ph hp vi d liu mu.


Nhng ngi phn tch c th dng tham s ny mt cch t ng hay iu chnh bng
tay (ph thuc vo m hnh)
nh gi
Kt qu c th c nh gi theo m hnh dng d liu c so vi kt qu
thc t
Tnh hon ho
p dng m hnh hun luyn trong qui trnh. Sau khi c hun luyn chc
chn s a ra kt qu tt nht cho mc ch kinh doanh c th cn p dng.
D liu u ra
p dng m hnh hun luyn trong qui trnh. Sau khi c hun luyn chc chn s
a ra kt qu tt nht cho mc ch kinh doanh c th cn p dng.
K thut
Cng c khai thc d liu mnh
Cung cp mt cng c hc gii thut ton din t d n Weka bao gm xp nhm,
chia on, quyt nh hnh cy, kiu ngu nhin, kiu hnh mng, v phn tch nhn t
thit yu.
Pentaho tch hp vi h Pentaho BI x l chuyn i nh dng d liu t ng
thnh cc nh dng m cng c khai thc d liu cn[8].
Gii thut c th p dng trc tip vo d liu hoc gi t Java.
u ra c th xem dng th tng tc vi chng trnh hoc dng ngun d liu
to bo co, phn tch su hn hay cc x l khc na.
B lc h tr vic phn ri ho, bnh thng ha, mu s dng li, chn lc thuc
tnh, thay i v kt hp thuc tnh.
Cng c phn loi cung cp cc m hnh d on nhng s lng o v thc. S
hc bao gm nhng cy quyt nh v danh sch, nhng my vc t h tr,
perceptrons nhiu lp, hi quy logic, mng Baye v cc k thut tin tin khc[9].
31

Cng c khai thc d liu l mt h han ho trong vic pht trin my hc s


gip khch hng kt hp cht ch cc m hnh ca h.
u vo v u ra c kim sot cht ch, cho php ngi pht trin a ra
nhng gii php hon ton ty bin s dng nhng thnh phn c cung cp.
Cng c thit k trc quan
Cng c thit k khai thc d liu v qun tr trc quan c tch hp theo chun
Pentaho v c h tr trong Eclipse.
Cung cp giao din ngi dng trc quan trong vic tin x l d liu, phn loi,
hi qui, xp loi, qui lut hip hi v trc quan ha.
Bo mt v t chun
Cung cp bo mt vai tr v qui tc kinh doanh
H tr Java ng nhp mt ln v LDAP tch hp vi cc bo mt doanh nhip
ang tn ti
H tr mc ch kim th. D liu kim th c th in ngay ra bo co v c tch
hp vi cc c tnh ca tin trnh trong h Pentaho BI
Cc nh ngha Web Services, Repositories, XML
Cc thnh phn c giao din ha c s dng mt cch linh hat
Kho d liu tp trung cha cc biu bo, biu mu, truy vn v cc ni dung khc
Cc nh ngha v ni dung c lu dng XML c th to, sa cha bng nhiu
cch khc hn l ch trn giao din v d sa file XML bng tay.
Tnh mm do v tnh thc thi
c thit k trin khai trong cc doanh nghip, ng dng vi chc nng phong
ph chy trn nn J2EE bao gm JBoss , ngoi ra cn c tnh nng mm do nh l
phn nhm.

32

Chng 4. Gii thiu bi ton trin khai trn


Pentaho v kt qu t c
4.1. Gii thiu bi ton
minh ha cho vic s dng cng c pentaho trong vic xy dng bo co ti chnh
em xin trnh by v d c th sau:
Xem xt, nh gi s nh hng ca gi du, t gi USD/VND,ch s VNIndex ln
gi vng.
Mi trng thc hin:
-

H iu hnh windows 7

H qun tr c s d liu Mysql

B cng c pentaho

4.2. Thu thp,x l d liu


D liu v t gi USD/VND c ly ti:
http://www.oanda.com/currency/historical-rates
File ti v l file excel c dng:

Hnh 10. D liu t gi


33

D liu v gi vng c ly ti a ch:


http://www.perthmint.com.au/investment_invest_in_gold_precious_metal_prices.aspx
File ti v c dng:

Hnh 11. D liu gi vng


n gi c tnh theo n v USD/ounce
D liu gi du c ly ti a ch:
http://sdw.ecb.europa.eu/browse.do?node=2120782
File ti v c dng:

34

Hnh 12. D liu gi du


n v tnh l USD / Thng
D liu ch s VnIndex c ti ti a ch
http://www.cophieu68.com/datametastock
File ti v c dng:

Hnh 13. D liu ch s VnIndex


35

Theo d liu thu thp c th ta thy d liu khng ng nht. V d nh trong


d liu t gi th thng tin v t gi c y cc ngy, nhng trong d liu v gi
vng th khng c d liu v gi vng trong ngy th 7 v ch nht hng tun, v d
liu v gi du ch c theo thng.Vy gii php trong bi ny em chn ng b d
liu l d liu thiu s c thm vo bng cch ly d liu ca ngy trc , d
liu hng ngy ca gi du s bng d liu gi du ca thng .Nh vy, d liu
c ng b.
Bc tip theo ta tng hp d liu thnh 1 file excel vi y thng tin nh hnh
sau:

Hnh 14. D liu tng hp


Trng timekey c thm theo cch nm+thng+ngy vit lin.
Nh vy ta lm sch c d liu, loi b d liu d tha.

4.3. To data warehouse


p dng cng c data intergation trong b cng c ca pentaho l Spoon, ta to
kho d liu nh sau:
Kho d liu c 3 bng: 2 bng chiu v 1 bng chnh, 2 bng chiu l : bng
dim_time : a ra cc d liu v ngy, thng, qu, nm. Bng dim_factor: a ra cc
nhn t dng x l. Mt bng chnh l bng fact_price cha thng tin gi ca
tng nhn t ti tng thi im.
Cu trc bng v s quan h c m t trong hnh sau:
36

Hnh 15. M hnh kho d liu


M cng c Spoon, ta chn File -> New -> Transformation.
D liu u vo l file excel lu di dng .csv cha ton b d liu c chun
ha do trong Step ta chn phn input l
ny vo trong vng thao tc ta c:

CSV file input ko v th biu tng

Hnh 16. Spoon workspace


Click p vo i tng ny ta thay i cc thuc tnh ca n nh step name, file
name (ng dn n file d liu .csv), delimiter (k t ngn cch gia cc trng
trong file .csv), sau ta n Get Fields v sa i tn cc trng cho ph hp:

37

Hnh 17. Spoon nhp d liu

lm bc tip theo, ta phi to mt c s d liu trng trong Mysql. Ta dng


Mysql Query Browser to c s d liu mi thng qua truy vn CREATE
DATABASE data_price. Vi data_price l tn data warehouse cn to.
Tr li Spoon trong phn step ta chn trong tab Data Warehouse ko th
Combination lookup / update vo khung lm vic. Sau ko di chut tri+ shift t
bc input CSV sang bc Combination lookup/update.

Hnh 18. Combination Lookup/Update


Click p vo bc Combination lookup/update thay i cc thuc tnh

38

Hnh 19. Thay i thuc tnh


Phn connection ta chn new nu cha c kt ni no:

Hnh 20. Kt ni c s d liu


Ta chn Mysql trong phn Connection Type, in thng tin c s d liu,
connection name, chn test, nu kt ni thnh cng ta chn OK.
Quay tr li ca s Combination lookup/update ta in cc thng s , y trong
bc ny ta s to bng dim_time

39

Hnh 21. To bng Dim_time


n nt Get Fields load cc trng trong file excel , ta loi b nhng trng
khng xut hin trong bng dim_time, t trng kha cho bng dim_time, tick vo
Remove lookup fields? cc trng ny khng xut hin trong cc bng sau.
n nt SQL xem cc cu lnh sql to bng sau n nt Execute to bng
Dim_time(time_id,timekey,month,quarter,year).
Tng t ta cng ko thm 1 step
vi bc to bng dim_time trn:

Combination lookup/update na v ni tip

Hnh 22. To bng dim_factor

40

Trong bng ny ch c 2 trng l factor_key t ng sinh ra v l kha chnh v


trng factor cha tn ca cc nhn t nh hng.
Bc tip theo ta to bng fact_price, y l bng output v n bao hm 2 bng
trn. Do trong phn step ta ko th

Table output.

Hnh 23. To Table Output

Click p vo Table output ta thay i cc thng s cho ph hp:

Hnh 24. To bng fact_price


n SQL xem cu lnh sql v n nt Execute to bng.
Ta lu transformation ny vo v n nt
v chn Launch nhp d liu vo trong
c s d liu c to.

41

Hnh 25. Nhp d liu


Nh vy ta to thnh cng data warehouse all_price bng cng c Spoon.

4.4. X l d liu bng k thut OLAP


4.4.1. To cube
to cube ta dng cng c Schema Workbench trong b cng c Pentaho.
Trc tin ta phi to kt ni ti c s d liu Mysql bng cch trong menu Tools ta
chn Connection hin ra ca s, ta in cc thng s kt ni ti Mysql:

Hnh 26. Kt ni c s d liu


Ta to 1 schema mi v 1 cube vi vi cc o sum v avg i vi gi nh
trong hnh:

42

Hnh 27. Kin trc Cube


Sau khi to c cube, ta publish cube ny ln h thng pentaho vi thng
tin y v server v ti khon user trong pentaho.

Hnh 28. Repository Login


Ta lu li file cube v publish schema v cube ln h thng pentaho.
4.4.2. Analysis View
Pentaho cung cp tin ch p dng k thut OLAP l Analysis View. Ngoi ra
chng ta c th p dng cng c c pentaho pht trin ring s dng OLAP l
Mondrian.
Trong bi ny em xin trnh by cch p dng tin ch Analysis View ng dng
k thut OLAP.
43

Trc tin ta phi kt ni Pentaho n c s d liu m chng ta cn phn tch trn


h c s d liu Mysql. kt ni n c s d liu ny ta vo folder ci t Pentaho ,
vo folder administration-console v chy file start-pac.bat khi ng
Administration Console. Sau ta vo trnh duyt v chy link :
http://localhost:8099 s hin ra khung ng nhp, ti khon admin mc nh l user:
admin / password: password.
to kt ni ti mysql v data warehouse c to ta vo tab Database
Connection.
Trong bi ny em s dng c s d liu all_price v h c s d liu Mysql nn
ta s nhp nh hnh sau:

Hnh 29. Kt ni c s d liu


Sau khi nhp y , ta n test kim tra kt ni, kt ni thnh cng ta chn
OK lu kt ni ny. Nh vy ta kt ni thnh cng pentaho ti mysql.
Bc tip theo ta vo a ch http://localhost:8080 vo Pentaho User Console.
Hin ra khung ng nhp, ta in user v password vo, hoc c th s dng 1 vi
account mu.
Sau khi login vo s hin ra mn hnh nh sau:

44

Hnh 30. Khung lm vic Pentaho


S dng Analysic View chn schema v cube c to ra bc trn.

Hnh 31. Chn schema v cube


Sau khi n OK s hin ra ca s nh sau:

Hnh 32. D liu schema v cube


Trn thanh Tools bar ta chn
la chn o, columns, rows v filter cho
vic la chn hin th ni dung cc phn tch.

45

Hnh 33. Ni dung phn tch


phn tch t gi USD/VN trong vng 10 nm t nm 2000 ti 2010, trong
phn Measures ta chn avg price:

Hnh 34. Chn Measures


Phn factor ta chn exchange:

Hnh 35. Chn factor


Trong phn thi gian ta chn cc nm t 2000 n 2010, y ta so snh gi tr
trung bnh ca t gi ca tng nm.

46

Hnh 36. Chn nm phn tch


Tuy nhin ta c th la chn thi gian chi tit hn theo thng thng, tng qu, v
tng ngy bng cch chn nt :

Hnh 37. Chn chi tit ngy thng


V hin th biu t gi trong vng 10 nm qua, ta chn
biu :

47

la chn kiu

Hnh 38. Chn loi biu


Sau khi chn xong nh dng cho biu , ta chn nt :
t gi usd / vnd trong vng 10 nm qua:

hin th biu

Hnh 39. Biu t gi USD/VND


Da vo biu ta c th thy t gi USD/VN thay i nhiu nht trong
nhng nm 2008 tr li y v ang c xu hng tng.
Tng t ta c biu gi vng:

48

Hnh 40. Biu gi vng


Gi vng trong 10 nm gn y tng mnh, c bit l t nm 2005 ti nay, gi
vng bin i v tng lin tc. Da vo biu ta c th thy gi vng ang c xu th
tng.
Biu gi du:

Hnh 41. Biu gi du


Gi du c nhiu bin ng trong 10 nm tr li y. Gi du tng gim tht
thng, rt kh d on. Gi du cao nht vo khong gia nm 2007. V hin nay
ang c xu hng tng tr li.
Biu ch s VnIndex:

49

Hnh 42. Biu ch s VnIndex


Ch s VnIndex ca nc ta c bin ng rt ln. T nm 2000 ti nm 2005
ch s VnIndex tng rt chm, nhng sau nm 2005 ti nm 2007 ch s VNIndex lin
tc tng cao. V ri xung thp nht vo cui nm 2008 u 2009, hin nay ang c
du hiu phc hi v cn bng.
Biu gi vng v gi du:

Hnh 43. Biu gi vng v gi du


Biu gi vng v t gi USD /VN

50

Hnh 44. Biu t gi v gi vng


Da vo biu ta c th nhin thy s lin quan gia gi vng v t gi
USD/VND, hu nh chng u cng tng v cng gim.
Biu ch s VNIndex v gi vng:

Hnh 45. Biu gi vng v VNIndex


Da vo biu ta nhn thy rng gi vng v ch s VNIndex t c mi lin h
vi nhau. Do kh c th kt lun xu hng ca gi vng da vo xu hng ca ch
s VNIndex.

51

Kt lun
Qua nhng phn tch v ng dng trong bi bo co ny a ra cho thy vic p
dng kho d liu v cc k thut OLAP trong tng lai s l tt yu v l xu th cc
doanh nghip ng dng.
Kha lun t c nhng kt qu:
- Tm hiu v phn tch k thut kho d liu v ng dng trong lnh vc ti
chnh.
-

Tm hiu v phn tch k thut OLAP, cc m hnh lu tr h tr OLAP, ch ra

cc u v nhc im ca cc m hnh lu tr .
-

Phn bit c s khc nhau gia OLTP v OLAP

Gii thiu b cng c Business Intelligent l Pentaho v p dng.

Phn tch d liu bin ng ca gi la, gi vng v ch s VNIndex.

52

Ti liu tham kho


Ting Vit
[1] Kho d liu . http://vi.wikipedia.org/wiki/Kho_d%E1%BB%AF_li%E1%BB%87u
[2] Ths. Nguyn Th Quyn. Gii thiu v kin trc khi ca OLAP. Tp ch Cng
ngh thng tin & Truyn thng.
http://www.tapchibcvt.gov.vn/News/PrintView.aspx?ID=15695
Ting Anh
[3]. Djoni Darmawikarta. Dimensional Data Warehousing with MySql. Brainy
Software Corp, 2007.
[4]. Don Jones. Why is OLAP Faster than OLTP.
http://nexus.realtimepublishers.com/tips/Data_Warehousing/Why_Is_OLAP_Faster_T
han_OLTP.php
[5]. Hari Mailvaganam. Introduction to OLAP.
http://www.dwreview.com/OLAP/Introduction_OLAP.html
[6]. Kefa Rabah. Pentaho Business Intelligene BI Suite Training Manual. Global Open
Versity, 2007. Tr. 1-23.
[7]. Online Analytical Processing. Wikipedia.org.
[8]. Pentaho Corporation. Pentaho Training Course 2010 Edition. Pentaho
Corporation, 2007. Tr 1-13.
[9]. Pentaho Corporation. Pentaho Analysis Viewer User Guide. Pentaho Corporation,
2007. Tr 1-23.
[10]. Roland Bouman- Jos Van Dongen. Business Intelligence and Data Warehousing
with Pentaho and Mysql- Pentaho Solutions. Wiley Publishing,Inc, 2009. Tr 3-309.
[11]. S.Nagabhushana. Data warehousing Olap and Data mining. New Age
International Publishers, 2006. Tr. 24-246.

53

[12]. Seth Grimes. Mysql V5- Ready for Prime Time Business Intelligence. Alta
Plana Corporation, 2006. Tr 2-23.
[13]. Surajit Chaudhuri- Umeshwar Dayal. An Overview of Data warehouse and
OLAP Technology. Tr 2-10.
[14] Thomas C.Hammergren- Alan R. Simon. Data warehousing for dummies. Wiley
Publishing,Inc. Tr 9-95.
[15] MOLAP, ROLAP, And HOLAP
http://www.1keydata.com/datawarehousing/molap-rolap.html

54

You might also like