You are on page 1of 152

1

LI CAM ON


Du tin, chng em xin goi loi cam on dn Thy, C khoa Cng ngh Thng tin
truong Dai hoc Khoa hoc Tu nhin d tn tnh day d, du dt chng em sut bn nm
dai hoc.
Chng em cam on C Pham Thi Bach Hu, nguoi tn tnh huong dn, gip d,
dng vin chng em hon thnh lun vn ny.
Cui cng, chng con cam on Ba, Me v nhung nguoi thn d khch l, h tro,
dng vin chng con trong thoi gian hoc tp, nghin cuu d c duoc thnh qua nhu
ngy nay.









Thng 7 nm 2005
Sinh vin
Pham Thi M Phuong Tu Thi Ngoc Thanh







2
NHAN XT CUA GIO VIN HUNG DAN


....


Ngy thngnm 2005
K tn


D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 3 - 0112398 Tu Thi Ngoc Thanh
NHAN XT CUA GIO VIN PHAN BIJN



Ngy thngnm 2005
K tn


D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 4 - 0112398 Tu Thi Ngoc Thanh

MJC LJC

MO DAU.................................................................................................................................10
Chuong 1 : TONG QUAN.....................................................................................................11
1.1. Dt vn d ................................................................................................................ 11
1.2. Bi ton giai quyt ................................................................................................... 13
1.3. Huong tip cn......................................................................................................... 14
Chuong 2 : CO SO L THUYET ........................................................................................17
2.1. Chin luoc tm kim thng tin cua cc b tm kim (Search Engine) ..................... 17
2.1.1. Mt s search engine thng dung: ................................................................... 17
2.1.2. Chin luoc tm kim........................................................................................ 32
Nguyn l hoat dng........................................................................................................ 34
2.2. Semantic Web.......................................................................................................... 34
2.2.1. Khi nim......................................................................................................... 34
2.2.2. Kin trc .......................................................................................................... 36
2.2.3. Cc thch thuc dt ra cho Semantic web ......................................................... 37
2.2.4. So snh web v web ngu nghia........................................................................ 41
2.2.5. Cc khi nim lin quan................................................................................... 42
2.2.6. Ontology .......................................................................................................... 44
2.2.7. Rdf ................................................................................................................... 46
2.3. eDoc......................................................................................................................... 55
2.3.1. Tm hiu eLearning.......................................................................................... 55
2.3.2. Tm hiu eLib................................................................................................... 61
2.3.3. Tm hiu eDoc ................................................................................................. 68
2.4. Mt s vn d trong xu l ngn ngu tu nhin: ......................................................... 71
2.4.1. Vn d trong vic xu l vn ban:...................................................................... 72
2.4.2. Vn d xu l ngu nghia: ................................................................................... 72
2.4.3. Phn loai vn ban (Text Classification)........................................................... 82
Chuong 3 : M HNH V GIAI THUAT ..........................................................................84
3.1. Cng ngh tm kim ngu nghia trn th gioi hin nay: ........................................... 84
3.2. Cc buoc xy dung mt ung dung semantic search engine:.................................... 91
3.3.1. Xy dung kin trc Web ngu nghia:................................................................ 92
3.3.2. Lp chi muc ngu nghia tim tng: ................................................................... 93
3.3. M hnh d nghi cho ung dung tm kim ngu nghia trn linh vuc eDoc................. 96
3.4. Cc giai thut su dung ........................................................................................... 100
3.4.1. Giai thut xu l ti liu: ................................................................................. 100
3.4.2. Giai thut rt trch siu du liu: ..................................................................... 102
3.4.3. Giai thut phn loai linh vuc cho ti liu:...................................................... 104
3.4.4. Giai thut xu l cu truy vn: ......................................................................... 104
Chuong 4 : CHUONG TRNH UNG DJNG....................................................................105
4.1. Gioi thiu chuong trnh ung dung: ........................................................................ 105
4.2. Kin trc cua ung dung:......................................................................................... 105
4.3. M ta pham vi ung dung........................................................................................ 107
4.3.1. M ta bi ton: ............................................................................................... 107

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 5 - 0112398 Tu Thi Ngoc Thanh
4.3.2. Xc dinh yu cu: .......................................................................................... 107
4.4. Xy dung ung dung: .............................................................................................. 108
4.4.1. Thit k du liu: ............................................................................................. 108
4.4.2. Thit k xu l:................................................................................................ 110
4.5. Kt qua chuong trnh ............................................................................................. 112
4.6. Thuc nghim chuong trnh .................................................................................... 114
Chuong 5 : KET LUAN......................................................................................................118
5.1. Dnh gi kt qua nghin cuu ................................................................................. 118
5.1.1. Uu dim......................................................................................................... 118
5.1.2. Khuyt dim:.................................................................................................. 119
5.2. Huong pht trin.................................................................................................... 119
TI LIJU THAM KHAO...................................................................................................120
I. Lun vn, lun n:...................................................................................................... 120
II. Sch, eBooks:............................................................................................................. 120
III. Website: ................................................................................................................. 122
PHJ LJC..............................................................................................................................124
1. C php RDF: ............................................................................................................ 124
2. RDF Gateway: ........................................................................................................... 129
2.1. Kin trc cua RDF Gateway:............................................................................. 130
2.2. Tnh nng (Features).......................................................................................... 132
3. H thng nhn ngu nghia:.......................................................................................... 138
3.1. Nhn ngu nghia co ban cho danh tu: ................................................................. 139
3.2. Nhn ngu nghia co ban cho dng tu: ................................................................. 141
3.3. Nhn ngu nghia co ban cho tnh tu:................................................................... 142
3.4. H thng nhn ngu nghia LDOCE .................................................................... 142
4. H co so tri thuc ngu nghia tu vung WordNet .......................................................... 144
4.1. H thng nhn ngu nghia cua danh tu: .............................................................. 144
4.2. H thng nhn ngu nghia cua dng tu: .............................................................. 149














D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 6 - 0112398 Tu Thi Ngoc Thanh









DANH MJC CC BANG


Bang 1 : Bang hung dn nhanh v cch su dng mt s search engine ph bin ......... 28
Bang 2: So luc v cc d}c trung cua mt s search engine thng dng trn Internet .. 32
Bang 3 : Cc lp trong RDF ............................................................................................ 54
Bang 4:Cc thuc tnh cua RDF........................................................................................... 55
Bang 5: Danh sch cc nghia v rng buc cua cc t thc trong cu............................. 77
Bang 6 M ta co so d liu cho ng dng.......................................................................... 110
Bang 7 Cc module cua chuong trnh................................................................................ 110
Bang 8 Module eDocSearch ................................................................................................ 111
Bang 9 Module eDocSearch ................................................................................................ 111
Bang 10 Cc cu truy vn thu nghim............................................................................... 115
Bang 11 Thng k linh vc khoa hc my tnh................................................................. 116
Bang 12 Thng k linh vc ngh thut. ............................................................................. 116
Bang 13: Nhn ng nghia co ban cho danh t.................................................................. 140
Bang 14: Nhn ng nghia co ban cho dng t.................................................................. 142
Bang 15 : Nhn ng nghia co ban cho tnh t................................................................... 142
Bang 16: H thng nhn ng nghia LDOCE .................................................................... 144
Bang 17:S phn lp danh t trong WordNet.................................................................. 148









D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 7 - 0112398 Tu Thi Ngoc Thanh








DANH MJC CC HNH

Hnh 1: Giao din cua Google............................................................................................... 18
Hnh 2: Giao din cua Yahoo................................................................................................ 19
Hnh 3: Giao din cua Ask Jeeves ........................................................................................ 20
Hnh 4: Giao din cua AllTheWeb....................................................................................... 21
Hnh 5: Giao din cua Teoma ............................................................................................... 22
Hnh 6: Giao din HotBot ..................................................................................................... 23
Hnh 7: Giao din cua Altavista............................................................................................ 24
Hnh 8: Giao din cua Lycos................................................................................................. 25
Hnh 9: Kin trc tng cua Semantic web. .......................................................................... 36
Hnh 10: Mt Ontology don gian......................................................................................... 46
Hnh 11: M hnh d liu RDF............................................................................................. 51
Hnh 12 : Tiu chun dnh gi tnh bao mt cua eDoc ...................................................... 71
Hnh 13 Cc quan h c php v rng buc ng nghia ..................................................... 76
Hnh 14 Cy quyt d|nh trong vic chn nghia ph hp. .................................................. 78
Hnh 15: Dng co so tm kim Web ................................................................................... 91
Hnh 16: M hnh d ngh| cho ng dng tm kim ng nghia trn linh vc eDoc .......... 97
Hnh 17: Qui trnh xu l cua tng search engine................................................................ 99
Hnh 18: Giai thut xu l ti liu: ...................................................................................... 100
Hnh 19: Giai thut rt trch siu d liu.......................................................................... 103
Hnh 20: So d d liu quan h cua ng dng.................................................................. 108
Hnh 21: Giao din chnh cua ng dng............................................................................ 112
Hnh 22: Giao din kt qua tm kim cua ng dng......................................................... 113
Hnh 23: Giao din quan l ti nguyn............................................................................... 113
Hnh 24: Kin trc cua RDF Gateway............................................................................... 130
Hnh 25: Giao din cua RQF Query Analyzer. ................................................................. 136






D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 8 - 0112398 Tu Thi Ngoc Thanh









DANH MJC CC T VIET TAT
eDoc Electronic document
eLib Electronic library
eLearning Electronic learning
www World Wide Web
URI Uniform Resource Identifier
URL Uniform Resource Locator
HTTP Hypertext Transfer Protocol
RDF Resources Descriprion Framework
OIL Ontology Inference Language
OWL Ontology Web Language
XML eXtensible Markup Language









D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 9 - 0112398 Tu Thi Ngoc Thanh







DANH MJC CC THUAT NG
Class Lop
Property Thuc tnh
Metadata Siu du liu
Subject Chu d, chu ngu
Title Tiu d
Namespace Khng gian tn
Predicate Vi ngu
Triple B ba (subject, predicate, object)












D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 10 - 0112398 Tu Thi Ngoc Thanh
MO DAU
Hin nay, hu ht cc h thng tm kim trn Internet du di theo huong truyn
thng d l tm kim theo tu kho ( key word ). Theo cch tm kim ny, khi ta g vo
tu cn tm, cc h thng tm kim s hin thi cc ti liu m trong n c chua tu kho
cn tm. Do d, kt qua tra ra l mt danh sch rt nhiu cc ti liu, m c th cc ti
liu ny khng lin quan g dn ni dung ta cn tm. V di khi cc h thng ny
khng dua ra ht cc ti liu cn thit, tuc l thua ti liu khng cn thit nhung lai
thiu hn nhung ti liu quan trong khc.
Vn d dt ra l ta phai xy dung mt h thng tm kim nhu th no d khc
phuc hin trang nu trn ?
D giai quyt vn d ny, ta cn xy dung h thng tm kim sao cho dp ung
dy du thng tin m nguoi dng mong mun, nghia l phai xy dung h thng tm
kim theo ngu nghia dua trn thng tin nguoi dng dua vo.
Tu nhn thuc trn chng em quyt dinh chon d ti: Tm kim ng nghia ng
dng trn linh vc eDoc (nhung ti liu din tu ting Anh) voi muc dch tm hiu v
xy dung mt cng cu tm kim theo ngu nghia d c th tm kim thng tin chnh xc
v dy du, d c th han ch duoc phn no vn d tm kim theo tu kho cua cc
search engine hin tai.
Cc di tuong nghin cuu lin quan dn d ti: eDoc, Semantic Web, RDF,
OWL, Metadata,.
Trong pham vi d ti, v thoi gian thuc hin ngn, nn chng em chi thu nghim
chuong trnh tm kim trong mt s linh vuc: Khoa hoc my tnh (Computer Science),
Ngh thut (Art). Hai linh vuc ny c ve nhu khng lin h voi nhau nhung thuc t
vn c nhung truong hop cn phai phn bit, v du nhu ti liu v ngh thut lp
trnh (Art of programming) th phai phn ti liu v linh vuc khoa hoc my tnh
chu khng phai ngh thut . Tm lai, ung dung m chng em xy dung chi tm kim
thng tin trong cc linh vuc nu trn. Tuy nhin, ung dung c th d dng mo rng ra
nhiu linh vuc cn lai.


D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 11 - 0112398 Tu Thi Ngoc Thanh
Chuong 1 : TONG QUAN
1.1. D}t vn d
Nhu cu tm kim, nm bt thng tin l mt nhu cu khng th thiu trong doi
sng cua mi nguoi. Khi vic su dung World Wide Web d tro nn ph bin rng
khp, th cng vic cua cc search engine cung tro thnh mt phn sng cn v c loi
ch cho Web. Cc cng cu tm kim tro thnh nhung cng cu cng cng cho moi
nguoi dng cua Internet; Google v Yahoo, cung tro thnh nhung ci tn quen thuc.
Cc cng cu tm kim hin nay dua trn mt trong hai dang cua cng ngh tm
kim Web: tm kim do con nguoi tu chi duong dn v tm kim tu dng.
Cng cu tm kim do con nguoi chi duong dn su dung mt co so du liu cua
cc tu kho, cc khi nim, v cc tham chiu. Nhung cng cu tm kim theo tu kho
tra v mt dy cc trang, nhung phuong php don gian ny thuong dn dn hng loat
cc kt qua khng lin quan v khng xc thuc. Hoat dng cua mt cng cu tm kim
dua trn ni dung l: s dm s luong cc tu truy vn ( cc tu kho) so voi cc tu hin
din trong mi trang duoc chua trong chi muc cua n. Sau d, cng cu tm kim ny s
sp xp cc trang. Tip cn phuc tap hon bng cch dua cc vi tr cua tu kho vo mt
muc d quan trong cu th. V du, cc tu kho xut hin trong the title cua trang web th
quan trong hon trong phn body. Cc kiu khc cua cng cu tm kim do nguoi dng
chi duong dn, nhu Yahoo, su dung cc luoc d chu d d gip chi huong tm kim v
tra v cc kt qua c lin quan hon. Nhung luoc d chu d ny do con nguoi tao ra.
Boi l do ny, chng ta phai tn chi ph tao ra v duy tr trong cc tu mang nghia
thoi gian (thay di theo thoi gian), v ri th khng duoc cp nht thuong xuyn nhu
cc h thng tu dng.
Cch tip cn tm theo tu kho vn cn mt s han ch, diu ny d lm giam
di tnh dng dn cua cc search engine. V du nhu cc tu dng m khc nghia (chng
han: bank (ngn hng), bank (bo sng), ) hoc cc tu c cc bin th khc nhau do
c cc tin t v hu t nhu student v students; small, smaller, smallest; . Ngoi ra,
cc search engine khng tra v cc ti liu c cc tu dng nghia voi cc tu trong cu

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 12 - 0112398 Tu Thi Ngoc Thanh
truy vn m nguoi dng nhp vo. Key word khng du d biu din chnh xc nhu
cu cua nguoi dng cung nhu ni dung cc trang web, han ch ny lm cho cc search
engine tra v nhung ti liu khng lin quan dn vn d m nguoi dng quan tm. Boi
v tp hp cc t kha l dang biu din so luoc nht cua ni dung, v do d, cch
biu din ny l mt dang gc nhn lun l (logical view) cua ni dung mang mc d
thng tin thp nht, d chnh l l do co ban khin cho cc Search Engine hin nay
c ty l s trang web hu ch trn tng s trang web tra v thp.
Google voi 400 triu ti liu thu v mi ngy v trn 8 ti trang web duoc lp chi
muc, v l cng cu tm kim thng dung nht duoc su dung ngy nay, nhung thm ch
voi Google vn cn c nhiu vn d. V du, bng cch no ban tm kim chi voi mt
luong t du liu m ban cn trong mt bin kt qua khng lin quan duoc dua ra?
Khi cng ngh tr tu nhn tao (Artificial Intelligence_AI) pht trin manh, th
vn d dt ra l lm th no d dua ra nhung phuong php tm kim tt hon m c th
thuc su tin cy vo nhung kt qua tm kim d. D l xu huong cua nhung cng cu tm
kim dua vo ngu nghia v cc agent tm kim theo ngu nghia. Mt cng cu tm kim
ngu nghia tm kim cc ti liu c nghia tuong tu nhau chu khng chi nhung tu ngu
tuong tu nhau. D Web tro thnh mt mang ngu nghia, phai cung cp nhiu siu du
liu v ni dung cua n, thng qua vic su dung cc the RDF (Resource Description
Framework) v OWL (Ontology Web Language), cc the ny s gip thuc hin dua
Web vo trong mang ngu nghia. Trong mang ngu nghia, nghia cua ni dung duoc
th hin tt hon, v nhung lin kt logic duoc thuc hin giua nhung thng tin lin quan
nhau.
Cng cu tm kim ngu nghia, chng ta d cp o dy, c hai uu dim lon so voi
cc cng cu tm kim truyn thng:
1. N chp nhn cc truy vn duoc pht biu o ngn ngu tu nhin.
2. Kt qua l tm kim mt mu thng tin; khng phai l mt danh sch cc ti
liu c th (hoc khng) chua thng tin yu cu.
Tht vy cng cu tm kim ngu nghia bt du voi luong thng tin qu tai. N
tip nhn mt s cc tc vu khng duoc ai ua thch trong vic tm kim thng tin hin

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 13 - 0112398 Tu Thi Ngoc Thanh
nay: mo ra mi ti liu cua danh sch kt qua v qut n mt cch thu cng d ly
thng tin. Theo cch d, cc cng cu tm kim ngu nghia c kha nng cch mang ho,
huong dn vic tm kim thng tin din tu mt cch tu dng: n thay di m hnh tm
kim tu vic thu hoi ti liu dn vic tra loi cu hoi.
1.2. Bi ton giai quyt
Theo thng k trong nm 2001: Cc nhn vin tn trung bnh 8 gio mt tun,
hay 16% gio cng hng tun cua ho, d tm kim v su dung ni dung thng tin bn
ngoi. Chi ph luong chi ring cho cng ty cua M l 107 ti dla mt nm. Vic tm
kim ngu nghia l mt co hi dy nghia cho cc cng ty gip cho nhn vin cua ho
c kha nng hon v hiu qua hon trong vic dt thng tin bn ngoi vo cng vic cua
ho. Khng cn ni nhiu thm nua. Su qu tai thng tin l mt vn d lon trong x
hi thng tin.
Nhung khm ph tuong tu cung duoc tm thy trong nhiu nghin cuu, lm ni
bt vn d: phai dua ra giai php trong vic cai tin xu l tm kim thng tin. Ngoai tru
nhung ch loi to lon m cc cng cu tm kim mang lai cho chng ta nhung nm gn
dy bng vic lm cho c th truy cp dn hng triu cc ti liu, bt chp vi tr vt l
v ngn ngu, th chng vn c mt s han ch co ban. V du, chng khng hiu cc
tu con nguoi g vo v do d dat toi mt s luong khng l cua cc kt qua sai. Hon
nua, chng hoat dng hiu qua khi hoi v nhung su kin, chng han nhu Kerry v
vua cua Ty Ban Nha. Tuy nhin, chng thuc hin nhiu kt qua khng tt nu cu
truy vn ni v su lin h giua cc khi nim chng han nhu Nhung quc gia no d
tham gia trong chin tranh Iraq? v tng thng nuoc Php theo chnh Dang no?
C ba vn d cn duoc cai tin d cai thin cc kt qua cua cng cu tm kim l:
(i) Cng cu tm kim cn cho php nhung truy vn phuc tap hon (v
du trong ngn ngu tu nhin),
(ii) Cng cu tm kim cn hiu nhung g con nguoi hoi, v
(iii) Cng cu tm kim phai cung cp cu tra loi cho truy vn (c th
sao luu lai nhung lin kt dn cc ti liu m cho ra cu tra loi).

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 14 - 0112398 Tu Thi Ngoc Thanh

1.3. Hung tip cn
C hai tip cn d cai thin cc kt qua tm kim thng qua phuong php ngu
nghia:
1. Kin trc cua Semantic Web.
2. Lp chi muc cho ngu nghia tim tng (Latent Semantic Indexing).
Tuy nhin, hu ht cc cng cu tm kim dua trn ngu nghia phai chiu nhung
vn d thuc thi boi qui m cua mang ngu nghia rt lon. Nhm muc dch lm cho tm
kim ngu nghia tro nn hiu qua trong vic tm kim cc kt qua mong mun, mang
ny phai chua mt luong lon cc thng tin lin quan. Cng lc d, mt mang rng lon
tao ra nhung kh khn trong vic xu l nhiu duong dn c th c cho mt giai php
lin quan.
Chng ta su dung kha canh sc bn cua cng ngh Web ngu nghia kt hop
cht ch su phi hop cua cc cng ngh tin tin lm cho m hnh c th chuyn
nhanh trong vic tm kim thng tin.
Cng ngh xu l ngn ng t nhin cho php nguoi dng hoi nhung
cu hoi m ho mun, hon l phai nu ln nhung tu kho c lin quan
trong cu hoi cua ho.
Cc Ontology d|nh nghia linh vc quan tm. Chng duoc xem nhu l
b no cua cng cu tm kim, boi v n c gng hiu nhung cu truy
vn cua nguoi dng trong cc tu cua ontology ny. Theo cch ny ch
rng cng cu tm kim ngu nghia cua chng ta khng phai l c muc
dch thng thuong nhu Google, m n c dinh p dung di voi mt
linh vuc hay khu vuc cu th (v du v linh vuc php l, vn ho, th thao
v.v).
Phn tch tri thc. Cng ngh ny chuyn du liu khng c cu trc
sang thng tin c cu trc. N rt trch thng tin tu cc vn ban tu do,

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 15 - 0112398 Tu Thi Ngoc Thanh
cc vn ban bn cu trc v cu trc d pht sinh ra ontology voi tri thuc
tht su.
Truy cp tri thc thng minh. Cc cu tra loi cho cc truy vn dat
duoc do vic truy vn ontology duoc dua ra tu dng, v duoc biu din
trong nhung dang khc nhau:
o D liu cua thuc th chnh duoc hoi dn (v du trong linh vuc
x hi, du liu cua mt ngh si).
o Djnh hung ng ngha. Nhung tu cua cc cu tra loi duoc tu
dng siu lin kt dn cc khi nim ontology con, cho php dinh
huong bng nghia.
o Cc the thng minh v lin kt thng minh. Cc cu tra loi lun
duoc sao luu boi cc ngun v cc ti liu chng dua vo. Khi
nhung ti liu d duoc tra cuu, th phn mm gn the v lin kt
s tu dng nhn ra cc tu chua nghia linh vuc v lin kt chng
dn ontology, hay thm vo cc the thng minh voi nhung hoat
dng duoc dinh nghia trong ontology.
o S tuong tung thng minh. Thng thuong, cc cu tra loi
pht sinh ra nhiu cc khi nim lin quan v cc mi quan h.
Phm mm tuong tuong thng minh cho php mt khi nim di
xuyn qua tri thuc ny.
C mt vn d m cng cu tm kim ngu nghia duoc dinh nghia o dy vn chua
th hon tt so voi nhung cng cu tm kim voi muc dch thng thuong (khng c ngu
nghia) nhu Google d l: pham vi. Trong Google ban c th tm kim voi bt ky tu
kho no trong bt ky linh vuc no. Nu cc tu kho xut hin trong mt s ti liu
trn Web, Google s tm thy n. Mt cng cu tm kim ngu nghia cn mt s tri thuc
nng cao: n cn bit nghia, duoc biu din trong mt ontology. Thuc t l cc
ontology trong trang thi thi hnh hin tai vn cn lm bng thu cng, han ch
chng trong nhung muc dch thng thuong. Do d, cc cng cu tm kim ngu nghia l
nhung cng cu quan trong cho nhung linh vuc cu th. Trong truong hop ny, muc dch

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 16 - 0112398 Tu Thi Ngoc Thanh
cua cc cng cu tm kim ngu nghia l b sung cho cc cng cu tm kim thng
thuong, hon l canh tranh nhu nhung di thu .

























D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 17 - 0112398 Tu Thi Ngoc Thanh
Chuong 2 : CO SO L THUYET
2.1. Chin luc tm kim thng tin cua cc b tm kim (Search Engine)
2.1.1. Mt s search engine thng dng:
Sau dy l danh sch mt s search engine. Tai sao chng duoc xem l nhung
search engine lon? D l boi v chng duoc bit dn nhiu v su dung tt. Di voi
cc chuyn gia web, cc cng cu tm kim lon l danh sch nhung noi quan trong nht
boi chng pht sinh ra mt luong rt lon cc trang web tim tng. Di voi nhung
nguoi tm kim, cc cng cu tm kim ph bin thuong tra ra cc kt qua dng tin cy
hon. Nhung search engine ny rt c th duoc duy tr tt v nng cp khi cn thit, d
giu th cn bng voi tc d pht trin cua web.
Nhung search engine sau l tt ca nhung lua chon tt nht d bt du khi tm kim
thng tin:

















D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 18 - 0112398 Tu Thi Ngoc Thanh
2.1.1.1. Google: http://www.google.com/


Hnh 1: Giao din cua Google

Nguyn thuy, Google l mt d n cua truong dai hoc Stanford duoc thuc hin
boi hai sinh vin Larry Page v Sergey Brin goi l BackRub. Dn nm 1998, th di
tn thnh Google, v d n ny d tro thnh cng ty ring Google dt tai khun vin
truong dai hoc. N vn cn duoc luu giu cho dn ngy nay.
Google l cng cu tm kim ni ting, tt nht trong cc lua chon d tm kim
thng tin trn web. Dich vu dua vo crawler, spider cung cp trang web voi thng tin
dua ra ton din cng voi muc d lin quan tt. Dy l cng cu tt nht hin nay trong
vic tm kim bt cu thu g ban mun.
Tuy nhin, Google cung cp chon lua d tm kim chu yu v cc trang web.
Su dung hp tm kim trn trang chu Google, ban c th d dng dinh vi cc anh qua

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 19 - 0112398 Tu Thi Ngoc Thanh
web, nhung d nghi duoc dt trong cc nhm thao lun Usenet, dinh vi thng tin tin
tuc hay thuc hin tm kim san phm.

2.1.1.2. Yahoo: http://www.yahoo.com/


Hnh 2: Giao din cua Yahoo
Dua ra nm 1994, Yahoo l thu muc cu nht cua web, mt noi m cc nh
bin tp t chuc cc trang web trong cc danh muc. Tuy nhin, vo thng 10 nm
2002, Yahoo chuyn sang lp danh sch dua vo crawler cho nhung kt qua chnh cua
n. Cng cu ny su dung cng ngh tu Google cho dn thng 2 nm 2004. Hin nay,
Yahoo su dung cng ngh tm kim ring cua mnh.
Yahoo Directory vn tn tai. Ban s chi ra cc lin kt danh muc pha duoi
mt s cc trang web lit k trong kt qua tra v cua mt tm kim tu kho. Khi duoc

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 20 - 0112398 Tu Thi Ngoc Thanh
d xut, nhung trang web ny dn ban dn mt danh sch cc trang web d duoc xem
xt v ph chun boi mt nh bin tp.
Cng ngh AltaVista v AllTheWeb duoc phi hop voi k thut Inktomi, mt
cng cu tm kim dua trn crawler, d tao nn mt Yahoo crawler hin nay.

2.1.1.3. Ask Jeeves: http://www.askjeeves.com/


Hnh 3: Giao din cua Ask Jeeves

Ask Jeeves bt du ni ting tu nm 1998 v 1999, duoc bit nhu l mt cng
cu tm kim ngn ngu tu nhin cho php ta tm kim bng cch hoi nhung cu hoi
v tra v kt qua voi nhung g c ve l tra loi dng v moi thu.
Thuc su, cng ngh khng phai l nhung g lm cho Ask Jeeves thuc thi tt.
Bn canh cc bi canh, cng cu ny tai mt thoi dim c khoang 100 trnh soan thao

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 21 - 0112398 Tu Thi Ngoc Thanh
gim st cc log tm kim. Sau d chng vo trong web v dinh vi nhung site m
chng cho l tt nht tuong xung voi cc truy vn ph bin nht.

2.1.1.4. AllTheWeb: http://www.alltheweb.com/

Hnh 4: Giao din cua AllTheWeb

Duoc Yahoo cung cp ngun, c th thy AllTheWeb l mt tm kim thun
tu (pure search) nhe nhng hon, tuy bin hon v d chiu hon l khi thuc hin o
Yahoo. Tiu dim l trong tm kim web, ngoai tru tin tuc, tm kim hnh anh, video,
MP3 v FPT cung duoc dua ra.





D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 22 - 0112398 Tu Thi Ngoc Thanh
2.1.1.5. Teoma: http://www.teoma.com/

Hnh 5: Giao din cua Teoma

Teoma l mt cng cu tm kim dua trn crawler duoc so huu boi Ask Jeeves.
N c s luong trang web duoc chi muc nho hon Google v Yahoo. Nm 2000,
Teoma ra doi cng voi thnh cng cua mnh: dua ra duoc nhung thu lin quan. Tnh
nng Refine cua cng cu ny d xut ra nhung chu d d khao st sau khi ban thuc
hin mt tm kim.
Teoma duoc Ask Jeeves mua vo thng 9 nm 2001 v cung cung cp mt s
kt qua cho web site ny.





D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 23 - 0112398 Tu Thi Ngoc Thanh
2.1.1.6. HotBot: http://www.hotbot.com/


Hnh 6: Giao din HotBot

HotBot h tro truy cp d dng dn 3 trang web search engine dua vo crawler
lon: Yahoo, Google, v Teoma. Khng nhu mt meta search engine, n khng th pha
trn cc kt qua tu tt ca cc crawler ny voi nhau. Do d, n l mt cch nhanh, d
dng d ly cc kin tm kim web khc nhau trong mt noi.








D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 24 - 0112398 Tu Thi Ngoc Thanh
2.1.1.7. AltaVista: http://www.altavista.com/


Hnh 7: Giao din cua Altavista

AltaVista duoc dua ra vo thng 9 nm 1995 v duoc xem nhu l Google
trong mt vi nm, n cung cp nhung kt qua lin quan v d c mt nhm nguoi
dng yu thch cng cu tm kim ny. Nhung tu sau nm 1998, nguoi ta khng cn ua
chung AltaVista nua, boi v su moi me cua cc danh sch AltaVista v tin tuc duoc
dua ra cua crawler trong trang web ny khng duoc cp nht thuong xuyn.
Ngy nay, AltaVista mt ln nua tp trung vo tm kim. Cc kt qua dn tu
Yahoo, v cho php dn cc trang web d tm hnh anh, MP3/Audio, Video, cc danh
sch danh muc con nguoi v cc kt qua tin tuc. Nu mun mt cam gic nhe nhng
hon Yahoo nhung vn c cc kt qua cua Yahoo, AltaVista l mt chon lua tt.



D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 25 - 0112398 Tu Thi Ngoc Thanh
2.1.1.8. Lycos: http://www.lycos.com/


Hnh 8: Giao din cua Lycos

Lycos l mt trong nhung cng cu tm kim cu nht trn web, duoc dua ra nm
1994. Duoc m ta nhu l nhung cng truy cp web ( web portal ) hay nhung trung tm
truy cp, l noi m nguoi dng di vo d ly thng tin cho moi linh vuc, k ca tn gu,
goi thu din tu,







D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 26 - 0112398 Tu Thi Ngoc Thanh


Search
Engine
Google AlltheWeb AltaVista Teoma
Database google.com alltheweb.com altavista.com teoma.com
Kch thuoc(#
trang )
Khoang 8 ti (1
ti khng dnh
chi muc trn
ton vn ban)
Khoang 3 ti,
chi muc trn
ton vn ban.
Khoang 1 ti Khoang 1 ti
Da phuong
tin
(multimedia)

H tro H tro H tro Khng h tro
Ton tu
Mc dinh AND AND AND AND
Loai tru - - - -
Cum tu Dng du Dng du Dng du Dng du
Rt gon Khng h tro
Dng k tu *
d thay th
cho cc k tu
trong du
Khng h tro Dng k tu * Khng h tro
Boolean OR (chi dng
cho danh tu
ring )
AND, OR,
ANDNOT,
RANK, ()
AND, OR,
ANDNOT,
NEAR, ()
OR (chi dng cho tn
ring)

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 27 - 0112398 Tu Thi Ngoc Thanh
Stop words Thng thuong
bo qua cc tu
thng dung
+ nu mun
tm v phai dt
trong cp du

Dng du
trong search
co ban
Bo qua trong
search nng
cao
Thng thuong bo qua
cc tu thng dung
+ nu mun tm
Danh tu
ring
Khng h tro Khng h tro H tro Khng h tro
Gioi han
field cn tm
intitle:
inurl:
allintitle:
Allinurl:
filetype:
Link:site:
Trong search
nng cao :
cache:info:
Normal.title:
url.all:
Link.all:
Link.extension
:
Title:
domain:
Link:
image:
Text:
url:
host:
Anchor:
applet:
intitle:
inurl:
site:
geoloc:
lang:
last:
afterfate:
Cc dc tnh
dc bit
~ tm tu dng
nghia
Gioi han boi
ngn ngu
Nhiu kiu file
: pdf, doc,
Caches : trang
web khi dnh
chi muc
Duyt qua cc
URL
Trong tm
nng cao :
gioi han boi
ngy, domain,
dia chi iP

Gioi han boi
ngy, vi tr,
ngn ngu
Trong tm
nng cao : su
dung sortby d
loc v sp xp
kt qua.
Dng refine d ti uu
kt qua.
Resource d c duoc
cc trang v lin kt
tp trung trn chu d
cn tm.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 28 - 0112398 Tu Thi Ngoc Thanh
Uu dim
Uu dim
chnh
Rt tt voi
nhung trang
c d ph bin
cao.
Cc trang tin
tuc gn dy
Tt nhu
Google.
Khng c
stopword.
Dng nhiu
ton tu
Boolean trong
tm kim.
Trong tm
nng cao h
tro hin thi kt
qua theo d
ph bin cua
tu.
Tnh d ph bin tt,
dua vo s luong
trang web cng chu
d voi cc trang dang
xt. Thuong dat kt
qua dng khch l.
Search
Engine
Google AlltheWeb AltaVista Teoma
Bang 1 : Bang hung dn nhanh v cch su dng mt s search engine ph bin



Search
engine
Co so d liu Ton tu La chn tm
kim
Linh tinh
Google
http://www.g
oogle.com
H tro tm
kim nng
cao
H thng thu
muc chu d
(Subject
Ton vn ban
cua cc trang
web, .pdf,
.doc, .xls, .ps,
.wpd
(4.3B, + 1B
mt phn cua
chi muc
URLs)
AND (mc
dinh)
OR (danh tu
ring)
+ cho cc stop
word thng
dung, cho cc
URL hoc cc
trang cu th (v
Dng * d rt
gon.
Dng tm cum
tu.
Fields : intitle:,
inurl:, link:, site:
Tm trn h
thng danh muc
cc chu d trong
Kim li chnh
ta.
Luu tru cc trang
d lp chi muc.
Tt cho tm cc
trang hay bi li
404.
Phin dich dn 5
ngn ngu.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 29 - 0112398 Tu Thi Ngoc Thanh
Directory)
H thng thu
muc mo
(Open
Directory)
Tin tuc : cp
nht thuong
xuyn (4500
ngun ).
Cc dang file
anh
Nhm :
Usenet tu
1981 dn nay

du +edu)
- loai tru
thu muc web.
Tm cc trang
web tuong tu.
~ tm tu dng
nghia.


AlltheWeb
http://allthew
eb.com
H tro tm
kim nng
cao

Ton b vn
ban cc trang
web, .pdf,
Flash,
(3.1B ton b
chi muc
URLs)
Tin tuc : cp
nht thuong
xuyn (3000
ngun)
Tranh anh
Video
Audio
FPT
AND (mc
dinh)
OR, phai dt
cc tu trong
du .
ANDNOT,
RANK
- d loai bo

Khng rt gon.
Dng du cho
cum tu.
Field intitle:inurl:
link:site:
Trong tm nng
cao :
gioi han theo
ngy, ngn ngu,
domain, file
format, dia chi
iP.

Kim li chnh
ta.
Tm nng cao :
tranh anh, video.
H tro su dung
k thut
clusters d ti
uu cu truy vn.
AltaVista
http://altavist
a.com
Ton b vn
ban cc trang
web (khoang
AND (mc
dinh)
Trong tm nng
Du * d rt gon.
Du cho cum
tu.
Kim li chnh
ta.
Phin dich : 8

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 30 - 0112398 Tu Thi Ngoc Thanh
H tro tm
kim nng
cao
H thng thu
muc chu d
(Subject
Directory )
H thng thu
muc mo
(Open
Directory)
1B) v file
.pdf.
Tin tuc (3000
ngun), anh,
MP3/Audio,
Video.
cao hoc danh
tu ring trong
tm co ban :
AND, OR,
ANDNOT,
NEAR, du ()
lng nhau.
- cho loai tru.

Tm nng cao :
gioi han ngy,
ngn ngu.
ngn ngu cua
Chu u & cc
ngn ngu cua
Chu .
AltaVistaPrima :
ti uu cu hoi.
Teoma
http://teoma.c
om
H tro tm
kim nng
cao

Ton b vn
ban trang web
(khoang 1B)
AND (mc
dinh)
OR (danh tu
ring)
+ hoc cho
stopword
- d loai bo
Khng rt gon.
Dng du cho
cum tu.
Field intitle:inurl:
site:geoloc:lang:l
ast:
afterdate:befored
ate:
betweendate:
Trong tm nng
cao :
gioi han theo
ngy, ngn ngu,
domain, file
format, dia chi
iP.

Kim li chnh
ta.
Gom nhm kt
qua Refine d ti
uu cu hoi.
Resource d c
cc trang hoc
lin kt tp trung
vo chu d.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 31 - 0112398 Tu Thi Ngoc Thanh
AskJeeves
www.ask.co
m

Nhn kt qua
tu CSDL cua
Teoma.
Tm san phm
:
PriceGrabber.
com,
Tm tranh anh
:
Picsearch.co
m
Tm tin tuc :
Moreover.co
m.
Ging Teoma.
Di voi nhung
cu hoi don
gian, xut hin
cua s di
thoai.
Ging Teoma.
Click vo
Remove Frame
d thy URLs
cua cc trang.
Kim li chnh
ta.

AskJeeves for
Kids
www.ajkids.c
om

Tra loi tt cc
cu hoi don
gian.
Games cho
tre em,
Tin tuc theo
tung nhm
tui.
Hoi bng ngn
ngu tu nhin.
Khng su dung
cc ton tu
Boolean.
Click vo No
frames d thy
URL cua trang
kt qua.
Dn dn cc
trang phuc vu
hoc tp : tu din,
vt l, khoa hoc,
ban d, lich
su,
Yahoo
http://dir.yaho
o.com

Xem xt cc
trang web
(khoang 13K)
AND (mc
dinh)
OR

Cum tu :
Rt gon : *
Fields t: title,
u:URL

Nhiu dich vu
trong Yahoo:
Tin tuc : tung
gio.
Th thao :ti s,..
Ban d, thoi tit,

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 32 - 0112398 Tu Thi Ngoc Thanh
mua sm.
Bang 2: So luc v cc d}c trung cua mt s search engine thng dng trn
Internet

2.1.2. Chin luc tm kim
Tu search engine thuong duoc su dung rng ri d m ta cc cng cu tm
kim dua trn crawler v cc thu muc do con nguoi cung cp. Dy l hai loai cua cc
search engine tp hop cc danh sch cua chng trong nhung cch khc nhau hon
ton.
Search engine dua vo crawler gm 3 phn:
B thu thp thng tin Robot
Robot l mt chuong trnh tu dng duyt qua cc cu trc siu lin kt d thu
thp ti liu v mt cch d quy n nhn v tt ca cc ti liu c lin kt voi ti liu
ny.
Robot duoc bit dn duoi nhiu tn goi khc nhau : spider, web wanderer hoc
web worm, crawler Nhung tn goi ny di khi gy nhm ln, nhu tu spider ,
wanderer lm nguoi ta nghi rng robot tu n di chuyn v tu worm lm nguoi ta
lin tuong dn virus. V ban cht robot chi l mt chuong trnh duyt v thu thp
thng tin tu cc site theo dng giao thuc web. Nhung trnh duyt thng thuong khng
duoc xem l robot do thiu tnh chu dng, chng chi duyt web khi c su tc dng cua
con nguoi.
B lp chi mc Index
H thng lp chi muc hay cn goi l h thng phn tch v xu l du liu, thuc
hin vic phn tch, trch chon nhung thng tin cn thit (thuong l cc tu don , tu
ghp , cum tu quan trong) tu nhung du liu m robot thu thp duoc v t chuc thnh

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 33 - 0112398 Tu Thi Ngoc Thanh
co so du liu ring d c th tm kim trn d mt cch nhanh chng, hiu qua. H
thng chi muc l danh sch cc tu kho, chi r cc tu kho no xut hin o trang no,
dia chi no.
B tm kim thng tin Search Engine
Search engine l cum tu dng d chi ton b h thng bao gm b thu thp
thng tin, b lp chi muc v b tm kim thng tin. Cc b ny hoat dng lin tuc tu
lc khoi dng h thng, chng phu thuc ln nhau v mt du liu nhung dc lp voi
nhau v mt hoat dng.
Search engine tuong tc voi user thng qua giao din web, c nhim vu tip
nhn v tra v nhung ti liu thoa yu cu cua user.
Ni nm na, tm kim tu l tm kim cc trang m nhung tu trong cu truy vn
(query) xut hin nhiu nht, ngoai tru stopword (cc tu qu thng dung nhu mao tu a,
an, the,). Mt tu trong cu truy vn cng xut hin nhiu trong mt trang th trang
d cng duoc chon d tra v cho nguoi dng. V mt trang chua tt ca cc tu trong cu
truy vn th tt hon l mt trang khng chua mt hoc mt s tu. Ngy nay, hu ht
cc search engine du h tro chuc nng tm co ban v nng cao, tm tu don, tu ghp,
cum tu, danh tu ring, hay gioi han pham vi tm kim nhu trn d muc, tiu d, doan
vn ban gioi thiu v trang web,..
Ngoi chin luoc tm chnh xc theo tu kho, cc search engine cn c gng
hiu nghia thuc su cua cu hoi thng qua nhung cu chu do nguoi dng cung cp.
Diu ny duoc th hin qua chuc nng sua li chnh ta, tm ca nhung hnh thuc bin
di khc nhau cua mt tu. V du : search engine s tm nhung tu nhu speaker,
speaking, spoke khi nguoi dng nhp vo tu speak.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 34 - 0112398 Tu Thi Ngoc Thanh
Nguyn l hot dng
Search engine diu khin robot di thu thp thng tin trn mang thng qua cc
siu lin kt ( hyperlink ). Khi robot pht hin ra mt site moi, n goi ti liu (web
page) v cho server chnh d tao co so du liu chi muc phuc vu cho nhu cu tm kim
thng tin.
Boi v thng tin trn mang lun thay di nn robot phai lin tuc cp nht cc
site cu. Mt d cp nht phu thuc vo tung h thng search engine. Khi search engine
nhn cu truy vn tu user, n s tin hnh phn tch, tm trong co so du liu chi muc
v tra v nhung ti liu thoa yu cu.
2.2. Semantic Web
2.2.1. Khi nim
Web ngu nghia l mt dang mo rng cua web hin nay, m cho php ta truy
tm, chia se, phi hop, su dung lai v rt trch thng tin mt cch chnh xc, d dng.(
Tim Berners Lee, XML 2000 ).
Web ngu nghia l mt mang luoi thng tin duoc lin kt theo cch m my tnh
c th d dng xu l duoc trn quy m ton cu. Chng ta c th xem web ngu nghia
nhu l mt co so du liu ton cu duoc lin kt voi nhau.
Web ngu nghia duoc pht trin boi Tim Berners Lee, nh pht minh cua
WWW, URIs, HTTP, v HTML. Hin nay c mt nhm nghin cuu tai tp don
WWW dang cai tin, mo rng v tiu chun ho h thng ngu nghia.
Du liu trong tp tin HTML thuong huu ch trong mt s truong hop. Phn lon
du liu trn web l dang HTML nn kh su dung trn quy m lon, boi v n khng c
mt h thng ton cu d xut ban du liu.
Do d, Web ngu nghia duoc xem nhu l mt giai php k thut.
Web ngu nghia duoc xy dung chu yu trn c php su dung URIs d biu din
du liu, thuong thy l cu trc dua trn b ba (subject, predicate, object), v du: nhiu
b ba cua du liu URI c th duoc ct giu trong co so du liu, hoc thay th ln nhau

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 35 - 0112398 Tu Thi Ngoc Thanh
trn word wide web bng cch su dung mt tp cc c php dc bit duoc php trin
chuyn bit phuc vu cho nhim vu d. C php ny duoc goi l c php RDF.
Web ngu nghia yu cu du liu khng nhung my c th doc duoc m cn
mong mun my c th hiu duoc. Trch dn cu ni cua Tim Berners Lee:
The semantic web goal is to be a unifying system which will (like the web for
human communication) be as un-restraining as possible so that the complexity of
reality can be described.
Tam dich l: Muc dch cua web ngu nghia l d mt h thng hop nht (ging
nhu web dnh cho su giao tip cua nguoi) cng khng bi can tro cng tt d m d
phuc tap cua thuc t c th duoc m ta.
Voi web ngu nghia, n s d dng nhn bit ton b pham vi cua cc cng cu
v ung dung kh giai quyt trong khun kh cua web hin tai.
Hai cng ngh quan trong cho vic pht trin semantic web l: eXtensible
Markup Language (XML) v Resource Description Frameword (RDF). XML cho
php moi nguoi c th tao ra cc tag (the ) cua ring mnh. Cn RDF th trnh by ngu
nghia, RDF su dung tp cc triple d m ta cc khi nim co so.
URI ( Uniform Resource Identifier):
Mt URI don gian dng d nhn bit mt trang web: ging nhu cc chui bt
du voi http hay ftp m ban thuong thy trn word wide web. Bt ky ai cung c
th tao ra mt URI v quyn so huu chng duoc uy quyn mt cch r rng, chnh v
vy chng tao nn co so quan nim d xy dung web ton cu. Thuc ra, word wide
web c th xem nhu l: bt ky thu g m c URI duoc coi nhu l on the web.
Cc URIs l cc chui k tu c th nhn bit cc ti nguyn trn web. Thng
qua vic su dung URIs, chng ta c th su dung cng cch dt tn don gian d tham
chiu dn cc ti nguyn duoi cc nghi thuc (protocol) khc nhau nhu l: HTTP, FTP,
GOPHER, EMAIL, .
URLs ( Uniform Resource Locator): l mt dang duoc su dung rng ri cua
URIs, duoc su dung rt ph bin trn web, l cc dia chi cua cc ti nguyn. Mc d
thuong duoc bit dn nhu l cc URLs, nhung URIs cung c th duoc tham chiu dn

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 36 - 0112398 Tu Thi Ngoc Thanh
cc khi nim trong semantic web. V du, gia su ban c mt quyn sch c tn l
Machine Learning, th URI cua n s nhu sau:
http://www.cs.bris.ac.uk/home/pw2538/book/title#machinelearning
Luu l moi thu trn web du c mt URI duy nht.
2.2.2. Kin trc
Web ngu nghia duoc xy dung theo m hnh kin trc phn tng gm c 7
tng, cc tng nhu sau:

Hnh 9: Kin trc tng cua Semantic web.

Tng Unicode + URI:
Nhm bao dam vic su dung tp k tu quc t v cung cp phuong tin d dinh
danh cc di tuong trong Web ngu nghia.
Tng XML + NS + Luoc d XML:
Cng voi cc dinh nghia v namespace v schema bao dam rng ta c th tch
hop cc dinh nghia web ngu nghia voi cc chun dua trn XML khc.
Tng RDF + Luoc d RDF:
Dng siu du liu m ta ti liu trn Web d my c th hiu duoc chng.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 37 - 0112398 Tu Thi Ngoc Thanh
Tng Ontology:
Luoc d RDF cung cp cc cng cu d dinh nghia nhung tu vung, cu trc v
cc rng buc trong vic m ta cho siu du liu v cc ti nguyn Web. Nhung luoc
d RDF chua tht su dy du cho vic m hnh ho v h tro suy lun trn Semantic
Web. Ngn ngu Ontology OIL duoc d ra l mt dang mo rng cua luoc d RDF. N
cho php th hin ngu nghia hnh thuc, gip h tro suy din tu dng.
Tng Logic:
Tng logic duoc xem nhu l mt co so lut trn Semantic Web. Ban cht cua co
so lut ny c dang nhu mt h chuyn gia. Tng ny s h tro cc dich vu nhu : phn
loai vn ban, rt trch du liu.
Tng Proof:
Trong khi tng logic gip h tro suy lun dua vo co so lut th tng Proof duoc
dng d chung minh cc suy din cua h thng bng cch lin kt cc du kin.
Tng Trust:
Trong Web ngu nghia cc thng tin duoc su dung chung nhu mt co so du liu
ton cu, nn cn phai c mt ci g d d bao mt. D l nguyn nhn cua su ra doi
cua chu k din tu, n gip cho thng tin trn Web dng tin cy hon. Trust engine l
mt h thng dang duoc xy dung dua trn nn tang cua chu k din tu. Cc k thut
d xy dung chng cn dang trong giai doan nghin cuu v thu nghim.

2.2.3. Cc thch thc d}t ra cho Semantic web
2.2.3.1. Thch thc 1: Tnh sn c cua ni dung (The availability
of content)
Ni dung cua web ngu nghia l ni dung web duoc ch thch theo cc ontology
dc bit, cc ontology ny dinh nghia ngu nghia cua cc tu hoc cc khi nim xut
hin trong cng mt ni dung. Mt su mo rng don gian di voi HTML l duoc dng
d ch thch cc trang web voi thng tin v ontology. Vic tao ni dung semantic web
l mt thch thuc lon, boi v co so ha tng cua semantic web vn cn dang duoc xy

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 38 - 0112398 Tu Thi Ngoc Thanh
dung (chua hon chinh RDF, OIL, DAML+OIL,), hin tai c rt t ni dung web
ngu nghia c sn.
2.2.3.2. Thch thc 2: Cc ontology sn c, pht trin v tin ho
Cc ontology l cha kha di voi semantic web boi v chng l nhung b
chuyn cho ngu nghia duoc chua trong semantic web, c nghia l chng cung cp mt
tp tu vung v ngu nghia ch thch. C 3 vn d chnh cn duoc giai quyt di voi
thch thuc ny, hai vn d du c lin quan dn cc vn d v vic pht trin cc
ontology truyn thng m cho dn tn by gio cc vn d ny vn chua duoc giai
quyt, v vn d thu ba cn lai c lin quan nhiu dn khung canh moi cua semantic
web:
Vn d thu nht l vic xy dung cc ontology hat nhn (kernel) d duoc su
dung boi tt ca cc domain. Nhung khoi du tn tai di voi vic xy dung mt s
kernel ontology ny l chng phai duoc ung dung trong nhung domain khc nhau.
Vn d thu hai l cung cp su h tro mang tnh cht giai php v cng ngh di
voi hu ht cc hoat dng cua tin trnh pht trin ontology, bao gm:
a. Su thu thp tri thuc, m hnh khi nim v m ho ontology trong cc
ngn ngu semantic web (RDFS, OIL, DAML+OIL), v cc ngn ngu
moi cc ngn ngu moi ny c th s duoc dua ra trong nhung nm sp
toi [Maedche, Staab 2001] .
b. Su sp xp v nh xa ontology, su tch hop ontology, cc cng cu
chuyn di ontology, v cc cng cu xy dung ontology, nu cc
ontology tn tai sp duoc su dung lai [Fensel et al, 2001], [Noy, Musen
2000].
c. Cc cng cu kim tra tnh bn vung cho cc ontology duoc su dung lai
[Gomez-Perez 1996].

Vn d thu ba l su tin ho cua cc ontology v mi quan h cua chng di voi
cc du liu d duoc ch thch. Cc cng cu quan l cu hnh l cn thit cho su diu
khin cc phin ban cua mi ontology cung nhu su phu thuc ln nhau giua chng v

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 39 - 0112398 Tu Thi Ngoc Thanh
cc ch thch. Tt ca cc vn d ny c th l khng quan trong lm, nhung cn thit
phai giai quyt truoc khi mt semantic web thuc su ra doi.
2.2.3.3. Thch thc 3: Scalability of semantic web content
Mt khi chng ta d c ni dung cua semantic web, chng ta s phai quan tm
dn vic phai quan l n nhu th no, c nghia l cch t chuc n nhu th no, noi luu
tru n v cch d tm duoc ni dung dng dn. C 2 vn d chnh trong thch thuc
ny:
a. Vn d thu nht c lin quan dn vic luu tru v t chuc cua cc trang
web ngu nghia (semantic web pages). Semantic web co so bao gm
cc trang duoc ch thch dua trn ontology, cu trc lin kt cua cc
trang ny phan nh cu trc cua WWW, c nghia l cc trang lin kt
voi nhung trang khc thng qua cc hyperlink. Theo cch lin kt ny
(hyperlink) th khng khai thc duoc dy du ngu nghia cua cc trang
web ngu nghia. Chin luoc semantic indexes duoc d xut d gom
nhm ni dung cua semantic web dua trn cc chu d cu th. Semantic
indexes s duoc pht sinh tu dng bng cch su dung thng tin cua
ontology v cc ti liu d duoc ch thch.
b. Vn d thu hai c lin quan dn vic d dng tm kim thng tin trn
semantic web, ni cch khc l c lin quan dn vic phi hop giua cc
semantic indexes.
2.2.3.4. Thch thc 4: Da ngn ng
Vic hoc dua trn su phn tn cua ngn ngu thng qua ni dung cua WWW chi
ra rng thm ch nu ting Anh l ngn ngu uu th hon di voi cc ti liu, mt s ti
nguyn duoc vit bng ngn ngu khc cung rt quan trong: Ting Anh 68,4%; Ting
Nht 5,9%; Ting Duc 5,8%; Ting Trung Quc 3,9%; Ting Php 3,0%; Ting Ty
Ban Nha 2,4%; Ting Nga 1,9%; Ting Italia 1,6%; Ting B Do Nha 1,4%; Ting
Hn 1,3%; Cc ngn ngu khc 4,6% [www.vilaweb.com]. Tnh da dang cua ngn ngu
cn quan trong hon nhiu di voi cc ti nguyn WWW. Da ngn ngu dng vai tr

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 40 - 0112398 Tu Thi Ngoc Thanh
ngy cng lon di voi cc cp d sau: o cp d ontology, o cp d ch thch, v o cp
d giao din nguoi dng.
O cp d ontology, nhung nguoi thit k ontology c th mun su dung ngn
ngu dia phuong cua mnh cho vic pht trin ontology m trong d cc ch thch s
duoc gn vo. Boi v khng phai tt ca nguoi su dung du l nhung nguoi xy dung
ontology, nn cp d ny c d uu tin thp nht. Su tn tai cua da ngn ngu v cc ti
nguyn ngn ngu hoc, nhu l WordNet [wordnet], EuroWordnet [eurowordnet],c
th duoc xem xt ti mi d h tro vn d da ngn ngu o cp d ny.
O cp d ch thch (annotation), ch thch cua ni dung c th duoc thuc hin
trong nhiu ngn ngu khc nhau. Boi v nhiu nguoi dng (dc bit l cc nh cung
cp ni dung) s thch ch thch ni dung hon l pht trin cc ontology, su h tro ph
hop l cn thit phai d cho cc nh cung cp ( ni dung ) ch thch ni dung bng
ngn ngu dia phuong cua ho. D c th pht sinh ni dung web ngu nghia bng tt ca
kha nng, chng ta khng th yu cu ch thch ni dung tu ting Php sang ting Duc
duoc v nguoc lai.
Cui cng o cp d giao din nguoi dng, hng ti nguoi mun truy xut vo ni
dung thch hop bng ngn ngu dia phuong cua ho bt chp ngn ngu ngun ngn
ngu m trong d cc ch thch duoc trnh by. Mc d hin tai, da s ni dung du
duoc vit bng ting Anh, chng ta hy vong rng s c nhiu ni dung hon duoc vit
bng nhiu ngn ngu khc. Bt ky huong tip cn no cua semantic web cung nn bao
gm cc tin ch truy xut thng tin trong nhiu ngn ngu. Cc cng ngh quc t ho
v dia phuong ho nn duoc xem xt cn thn di voi vic truy xut thng tin c nhn
dua trn ngn ngu dia phuong cua nguoi dng.
2.2.3.5. Thch thc 5: Visualization s mung tung
Voi su gia tng thng tin vuot bc, su muong tuong (hnh dung) cua truc gic
v thng tin s tro nn rt quan trong, boi v nguoi dng s yu cu su d dng d
nhn bit su ph hop cua ni dung cho muc dch cua ho ngy cng gia tng. Thm vo
d vic su dung semantic indexes v cc routers cho vic luu tru, t chuc v tm kim

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 41 - 0112398 Tu Thi Ngoc Thanh
thng tin, v sau ny s yu cu mt buoc quan trong trong su muong tuong. Cc
cng ngh nn cho php di voi cc cng ngh 3 chiu v su muong tuong moi d
muong tuong ra ni dung cua semantic web trong bt ky mt ngn ngu web hin tai
no (RDFS, OIL, DAML + OIL). Thng qua cng ngh hin thi d hoa thoi gian thuc
3D thoa dng v vic khai thc cc mi quan h ngu nghia, mt giao din ba chiu
moi c th duoc pht sinh mt cch tu dng. Theo cch ny, nhiu thng tin hon c
th duoc trnh by trong mt khng gian nho hon, v nguoi dng c th tuong tc voi
cc site mt cch thuc t v tin loi [Van Harmelen et al 2001].
2.2.3.6. Thch thc 6: S chun ho cc ngn ng semantic web
Semantic web l mt linh vuc dang ni bt v WWW Consortium s dua ra cc
gioi thiu v cc ngn ngu v cng ngh s duoc su dung. D vuon ln dn muc ngh
thut trong semantic web, v cc cng cu phn lon phu thuc vo ngn ngu semantic
web m chng duoc h tro, th nhu cu chun ho ngn ngu semantic web l mt di
hoi cn thit.
2.2.4. So snh web v web ng nghia
Dim ging nhau giua Web v Web ngu nghia: ca 2 du dng nhung lin kt
(link) URI, nhung Web ngu nghia su dung cc link ny rt nhiu, vic su dung link
lm gia tng tnh chnh xc cua thng tin.
Su khc nhau co ban giua Web v Web ngu nghia:
Web ng nghia Web
Web ngu nghia l mt khng gian
thng tin trong d thng tin duoc biu
din thng qua mt ngn ngu m my
v nguoi du c th hiu duoc.
Web l mt khng gian thng tin chua
dung thng tin chi huong vo vic biu
din trong mt ngn ngu tu nhin m
chi c nguoi moi hiu duoc.
Web ngu nghia l mt du liu lin kt
voi nhau mt cch ngu nghia v hnh
thuc.
Web l mt tp hop thng tin lin kt
voi nhau mt cch khng hnh thuc.


D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 42 - 0112398 Tu Thi Ngoc Thanh
2.2.5. Cc khi nim lin quan
2.2.5.1. Metadata
Metadata l thng tin c cu trc m ta, giai thch, dinh vi hoc mt khc
lm cho d dng truy vn, su dung, quan l mt ti nguyn thng tin. Metadata thuong
duoc goi l du liu v du liu (tu din du liu), hoc l thng tin v thng tin.
Metadata l thng tin v thng tin, metadata duoc su dung rng ri trong
th gioi thuc cho muc dch tm kim. V du, ban mun muon mt vi quyn sch o
mt thu vin no d thng qua my tnh. Thuong th thu vin s cung cp mt h
thng tra cuu, h thng ny cho php ban lit k sch theo tn tc gia (author), theo
tua sch (title), theo chu d (subject), v.v. Danh sch lit k ny chua nhiu thng
tin quan trong nhu: tn tc gia, tua sch, ISBN, v thng tin quan trong nht l noi ct
giu sch. Ban cn vi thng tin (trong truong hop ny l noi ct giu sch) m ban
mun bit v ban su dung metadata (trong truong hop ny l: tn tc gia, tua sch, v
chu d) d ly duoc sch.
C 3 kiu metadata:
a. Descriptive metadata: m ta mt ti nguyn cho nhung muc dch nhu l
khm ph hoc l nhn din. N c th bao gm cc phn tu nhu l:
titles, astract, author, v keywords.
b. Structural metadata: v du: cho bit cc di tuong phuc hop lin kt voi
nhau nhu th no, cc trang (pages) duoc sp xp thnh cc chuong nhu
th no.
c. Administrative metadata: cung cp thng tin gip cho vic quan l mt
ti nguyn, nhu l n duoc tao ra khi no v nhu th no, kiu file, v
cc thng tin k thut khc, v nhung ai c th truy cp dn n.
2.2.5.2. Namespace
Chng ta c th mo rng tp tu vung cua chng ta thng qua cc
namespace l cc nhm cua tn cc phn tu v tn cc thuc tnh. Gia su, nu ban
mun gp (include) mt k hiu (symbol) duoc m ho trong mt ngn ngu dnh du

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 43 - 0112398 Tu Thi Ngoc Thanh
no d trong mt ti liu XML, th ban c th khai bo mt namespace ( khng gian
tn) m symbol d thuc v. Thm vo d, chng ta c th trnh duoc tnh hung hai
di tuong XML trong cc khng gian tn khc nhau voi cng mt tn m c nghia
khc nhau thng qua cc dc trung cua cc namespace. Giai php l gn mt tin t
nhn bit namespace m mi phn tu hoc cc thuc tnh thuc v. C php cua
namespace nhu sau:
ns-prefix:local-name
Trong d ns-prefix l tn cua namespace, v local-name l tn cua phn
tu hoc thuc tnh.
V du v namespace:
Ti liu XML duoi dy l mt thu vin sch. Chng ta bt du bng phn
tu gc c tn the l <libarary>, bn trong the gc chua cc phn tu sch <book> v tua
sch <title> nhu sau:

<library>
<book>
<title>
Earthquakes for lunch
</title>
</book>
</library>


Khng gian tn cc b (local namespace):
Chng ta c th dt thuc tnh xmlns o phn tu gc hay o bt ky the no khc.
Khi thuc tnh ny khng nm trong the gc th ta goi d l khng gian tn cuc b.
V du: Xem doan xml duoi dy:
<minhkhai: library
xmlns: minhkhai= http://www.minhkhai.com.vn/spec>

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 44 - 0112398 Tu Thi Ngoc Thanh
<minhkhai:book>
<minhkhai:title>
Earthquakes for lunch.
</minhkhai:title>
</minhkhai:book>
<amazon:book
xmlns:amazon=http://www.amazon.com.lib>
<amazon:title>
Earthquakes for lunch.
</amazon:title>
</amazon:book>
Trong v du ny th namespace: xmlns:amazon=http://www.amazon.com.lib
duoc goi l khng gian tn cuc b.
2.2.6. Ontology
Thut ngu ontology duoc vay muon tu trit hoc. nghia du tin cua n l
the branch of metaphysics that deals with the nature of being [The American
Heritage Dictionary of the English Language: Fourth Edition (2000)].
Ontology l mt cng ngh quan trong mang tnh cht xuong sng, v n cung
cp mt dc tnh quan trong: ontology giao tip duoc giua ngu nghia hnh thuc m
my tnh c th hiu duoc voi ngu nghia cua th gioi thuc m con nguoi c th hiu
duoc.
Nhung Ontology duoc pht trin trong tr tu nhn tao d tri thuc d dng chia
se v su dung lai. K tu du thp nin 90 cua th ky XX, Ontology d tro thnh mt d
ti nghin cuu ph bin di voi cc t chuc nghin cuu tr tu nhn tao, bao gm
nhung k su v tri thuc (Knowledge), xu l ngn ngu tu nhin v trnh by tri thuc.
Ontology khng chi lm cho tri thuc c th su dung lai d dng hon, n cn l
nn tang cua vic tao ra cc chun boi v n lm r cc khi nim bn canh mt thut
ngu hoc mt m hnh. Yu cu trn thuc t khng phai chi dnh cho mt khi nim

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 45 - 0112398 Tu Thi Ngoc Thanh
duy nht, m l di voi mt su tuong tc mo h giua cc khi nim phuc tap v chi tit
( c th duoc trnh by trong nhiu ngn ngu khc nhau).
Gn dy, khi nim Ontology d tro nn ph bin hon nhiu trong cc linh vuc
nhu su tch hop thng minh, nhung h thng thng tin hop tc, phuc hi thng tin,
giao dich thuong mai din tu, v quan l tri thuc. Muc dch cua Ontology l huong
dn tri thuc min, nn su pht trin cua n thuong l mt qu trnh xu l ko theo
nhiu yu t khc.
Tu lc ra doi dn nay, Ontology d c rt nhiu dinh nghia. Tuy nhin, dc
dim ct li cua Ontology vn l: Mt ontology l mt su chi dinh tuong minh, hnh
thuc v chia se v mt khi nim dng chung. Trong d:
Mt khi nim tham chiu dn mt m hnh truu tuong cua mt
vi hin tuong no d trong th gioi thuc m xc dinh nhung khi
nim c lin quan v hin tuong d.
Tuong minh l nhung khi nim v nhung rng buc trn n duoc
su dung mt cch r rng.
Hnh thuc tham chiu dn cng vic m ontology phai thuc hin
d my tnh c th hiu duoc.
Chia se phan nh rng mt ontology giu tri thuc dng nht, nghia
l n khng bi han ch boi mt c nhn hay mt nhm ring le
no.
Hin nay c nhiu ontology lon nhu: CYC, WordNet, .
V du v ontology:



D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 46 - 0112398 Tu Thi Ngoc Thanh

Hnh 10: Mt Ontology don gian
2.2.7. Rdf
2.2.7.1 Khi nim :
RDF l tu vit tt cua Resource Description Framework. RDF duoc d cu boi
W3C cho mt m hnh v ngn ngu siu du liu (metadata) chun. RDF l mt b
khung cho vic m ta cc ti nguyn trn web.
RDF cung cp m hnh du liu v c php d cc phn dc lp nhau c th
chuyn di cho nhau v su dung duoc RDF.
2.2.7.2 Cu trc :
RDF l khung suon (framework) cho vic xu l metadata, v n m ta cc mi
quan h giua cc ti nguyn thng qua cc thuc tnh v cc gi tri. RDF duoc xy
dung dua trn cc lut nhu sau:

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 47 - 0112398 Tu Thi Ngoc Thanh
Resource: Moi thu duoc m ta bng biu thuc RDF duoc goi l mt
resource ( ti nguyn). Mi ti nguyn c mt URI v n c th l ton b trang web
hoc l mt phn cua trang web.
Property: Property l mt kha canh, dc trung, thuc tnh hoc quan h
ring bit duoc dng d m ta mt ti nguyn trch trong W3C, Resource
Description Framework (RDF) Model and Syntax Specification. Ch l mt
property cung c th l mt resource boi v n c nhung tnh cht ring cua n.
Statements: Mt statements duoc dng d kt hop mt resource, mt
property v mt value cua n. Ba phn ring bit ny duoc bit nhu l subject,
predicate, v object. V du, The Author of
http://www.cs.bris.ac.uk/home/pw2538/index.html is Peng Wang l mt statement.
Ch rng value cua cu ny c th l mt chui k tu m cung c th l mt
resource.
V d v RDF:
Mt statement ( pht biu ) c th duoc xem nhu l mt d thi trong RDF.
Pht biu nhu sau:
The Author of http://www.cs.bris.ac.uk/home/pw2538/index.html is
Peng Wang
Cu trn duoc phn tch thnh 3 phn:
Subject ( Resource ) http://www.cs.bris.ac.uk/home/pw2538/index.html
Predicate (Property) Author
Object (Literal) Peng Wang

Duoc biu din duoi dang d thi nhu sau:



D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 48 - 0112398 Tu Thi Ngoc Thanh

Chiu cua mui tn lun huong tu subject dn object cua pht biu ( statement).
V d thi c th doc theo cch sau: <subject> HAS <predicate> <object>, v du:
http://www.cs.bris.ac.uk/home/pw2538/index.html has author Peng Wang.
Nu chng ta gn mt URI cho thuc tnh author, th s c :
http://www.cs.bris.ac.uk/home/pw2538/terms/author
D trnh by ngn gon, chng ta dua ra mt s tin t ( prefix) d trnh phai
vit lai ton b dia chi URI tham chiu dn. C mt s tin t gn lin voi cc URI
duoc su dung rng ri sau:
Tin t rdf: l khng gian tn cho URI:
http://www.w3.org/1999/02/22-rdf-syntax-ns#
Tin t rdfs: l khng gian tn cho URI:
http://www.w3.org/2000/01/rdf-schema#
Tin t daml: l khng gian tn cho URI:
http://www.daml.org/2001/03/daml+oil#
Tin t xsd: l khng gian tn cho URI:
http://www.w3.org/2001/XMLSchema#
Trong v du ny, chng ta dng khng gian tn l pwterms d dai din cho dia
chi URI m ta tham chiu dn: http://www.cs.bris.ac.uk/home/pw2538/terms
Khi d c php RDF cho cu pht biu: The Author of
http://www.cs.bris.ac.uk/home/pw2538/index.html is Peng Wang l:
1
2
3
4
5
6
7
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:pwterms=" http://www.cs.bris.ac.uk/home/pw2538/terms">
<rdf:Description
rdf:about="http://www.cs.bris.ac.uk/home/pw2538/index.html">
<pwterms:author>Peng Wang</pwterms:author>
</rdf:Description>
</rdf:RDF>

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 49 - 0112398 Tu Thi Ngoc Thanh

Mt cu pht biu khc: Mt nguoi c m s sinh vin l pw2538 c tn l
Peng Wang v c dia chi email l pw2538@bristol.ac.uk . Nguoi ny l tc gia cua ti
nguyn http://www.cs.bris.ac.uk/home/pw2538/index.html
C d thi nhu sau:


C c php RDF:

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 50 - 0112398 Tu Thi Ngoc Thanh

M hnh d liu RDF (RDF Data Model):
RDF cung cp mt m hnh cho vic m ta cc ti nguyn. Ti nguyn c cc
tnh cht (property) thuc tnh hoc l dc trung. RDF dinh nghia ti nguyn nhu l
mt di tuong bt ky c th nhn bit duy nht bng mt URI. Cc property duoc kt
hop voi cc ti nguyn duoc nhn bit boi cc property types, v cc property
types ny c cc values tuong ung. Property types m ta mi quan h cua cc values
duoc kt hop voi cc ti nguyn. Trong RDF, cc values c th duoc xem nhu l
nguyn tu trong tu nhin ( chui text, s, v.v) hoc l cc loai ti nguyn khc.
Ban cht ct li cua RDF l mt m hnh dc lp c php cho vic trnh by cc
ti nguyn v su m ta tuong ung cua chng.


D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 51 - 0112398 Tu Thi Ngoc Thanh

Hnh 11: M hnh d liu RDF

M hnh du liu RDF l mt d thi c gn nhn dinh huong, trong d cc nt l cc ti
nguyn (nhung thuc th voi URI) hoc nhung k tu, v cc canh l nhung thuc tnh. Nhu d
gioi thiu, mt pht biu RDF l mt b ba (Chu ngu, Vi ngu, B ngu). Trong d, ti nguyn
l Chu ngu cua mt pht biu c thuc tnh m gi tri cua n l B ngu cua mt pht biu.
Mt B ngu c th l ti nguyn hoc c th l mt gi tri k tu. Mt pht biu c th duoc
dai din nhu mt d thi, bng cch v mt cung tu mt nt (Chu ngu) dn nt khc (B ngu).
RDF l mt cch thnh lp cho vic xu l siu du liu, n cung cp
interoperability (thao tc gia cc phan) giua cc ung dung m chuyn di thng tin
my c th hiu duoc trn web. RDF nhn manh cc tin ch d c th xu l tu dng
cc ti nguyn web.
2.2.7.3 RDF Schema mt ngn ng m ta t vng
Ngn ngu duoc dinh nghia trong dc ta ny (specification) gm mt tp hop
cc ti nguyn m c th duoc su dung d m ta cc thuc tnh cua cc ti nguyn
RDF khc ( bao gm ca cc thuc tnh) dinh nghia tp tu vung RDF cua ung dung
xc dinh. Tp tu vung ny chu yu duoc dinh nghia trong mt khng gian tn duoc
goi l rdfs, v duoc nhn bit boi tham chiu URI: http://www.w3.org/2000/01/rdf-

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 52 - 0112398 Tu Thi Ngoc Thanh
schema#. Dc ta ny cung su dung tin t rdf d tham chiu dn khng
gian tn RDF chnh: http://www.w3.org/1999/02/22-rdf-syntax-ns#.
H thng class v property trong RDF Schema cung tuong tu nhu cc h thng
kiu cua cc ngn ngu huong di tuong nhu Java. Tuy nhin, RDF khc voi cc h
thng khc o ch thay v dinh nghia mt class trong quan h cua cc thuc tnh m th
hin cua n c th c, RDF Schema s dinh nghia cc thuc tnh trong quan h cua cc
lop cua ti nguyn m chng ung dung. Dy l nhim vu cua rdfs:domain v
rdfs:range duoc m ta trong dc ta ny. V du, chng ta c th dinh nghia thuc tnh
eg:author, c min l eg:Document v gioi han l eg:Person, nhung tri lai mt h
thng huong di tuong kinh din c th dinh nghia mt cch dc trung mt class
eg:Book voi mt thuc tnh duoc goi l eg:author cua kiu eg:Person.

T vng Domain and Range
Dc ta ny gioi thiu tp tu vung RDF cho vic m ta cch su dung dy du ngu
nghia cua cc property v cc class trong du liu RDF. V du, mt luoc d RDF c th
m ta gioi han trn cc kiu cua cc value thch hop voi mt s thuc tnh.
RDF Schema cung cp co ch (k thut) cho vic m ta thng tin ny, nhung khng
th ni trong truong hop no th ung dung nn su dung n v su dung nhu th no.
Cc ung dung khc nhau s su dung thng tin ny theo nhiu cch khc nhau. V du,
cc cng cu kim tra du liu c th su dung thng tin ny d tm ra cc li trong
dataset, mt trnh soan thao giao tip giua nguoi v my c th d nghi nhung gi tri
thch hop, v mt ung dung suy lun c th su dung n suy lun ri dua ra thng tin
moi tu du liu ban du.
Luoc d RDF (RDF Schema) c th m ta cc mi quan h giua cc tu vung tu
nhiu luoc d duoc pht trin dc lp nhau. Boi v tham chiu URI duoc su dung d
nhn bit cc class v property trn web, nn n c th tao ra cc thuc tnh (property)
moi c domain v range m gi tri cua n duoc dinh nghia trong mt namespace khc.
Dc ta ny khng c gng d lit k tt ca cc hnh thuc c th c cua vic m
ta tu vung m n duoc su dung d trnh by ngu nghia cua cc class v property cua

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 53 - 0112398 Tu Thi Ngoc Thanh
RDF. Thay vo d, chin luoc m ta tu vung RDF thua nhn rng c nhiu k thut
m thng qua d ngu nghia cua cc class v property duoc cho bit, v d xut ban
mt s quy uoc cho vic su dung RDF/XML d m ta cc dc trung cua cc class v
property cua RDF.
Luoc d tt hon hoc l cc ngn ngu ontology nhu l DAML+OIL, W3C,
cc ngn ngu suy lun dua trn lut, v cc chu nghia hnh thuc khc, mi loai s gp
phn cho kha nng cua chng ta nm bt duoc su tng hop dy du ngu nghia v du
liu trn web. Cc nh thit k tu vung RDF c th tao v pht trin cc ung dung web
ngu nghia bng cch su dung tin ch The basic RDF Schema 1.0, trong khi trnh by
cc ngn ngu m ta tu vung tt hon cch ny cung su dung huong tip cn ny.

So luc v RDF Schema

Bang ny trnh by mt cch tng qut v tp tu vung co so cua RDF

Tn lp Ghi ch
rdfs:Resource The class resource, everything.
rdfs:Literal
This represents the set of atomic values, eg.
textual strings.
rdfs:XMLLiteral The class of XML literals.
rdfs:Class The concept of Class
rdf:Property The concept of a property.
rdfs:Datatype The class of datatypes.
rdf:Statement The class of RDF statements.
rdf:Bag An unordered collection.
rdf:Seq An ordered collection.
rdf:Alt A collection of alternatives.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 54 - 0112398 Tu Thi Ngoc Thanh
rdfs:Container This represents the set Containers.
rdfs:ContainerMembershipProperty
The container membership properties, rdf:1,
rdf:2, ..., all of which are sub-properties of
'member'.
rdf:List The class of RDF Lists
Bang 3 : Cc lop trong RDF


Property name comment domain range
rdf:type Indicates membership of a class rdfs:Resource rdfs:Class
rdfs:subClassOf Indicates membership of a class rdfs:Class rdfs:Class
rdfs:subPropertyOf
Indicates specialization of
properties
rdf:Property
rdf:Propert
y
rdfs:domain A domain class for a property type rdf:Property rdfs:Class
rdfs:range A range class for a property type rdf:Property rdfs:Class
rdfs:label
Provides a human-readable
version of a resource name.
rdfs:Resource rdfs:Literal
rdfs:comment Use this for descriptions rdfs:Resource rdfs:Literal
rdfs:member a member of a container rdfs:Container
not
specified
rdf:first
The first item in an RDF list. Also
often called the head.
rdf:List
not
specified
rdf:rest
The rest of an RDF list after the
first item. Also often called the
tail.
rdf:List rdf:List

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 55 - 0112398 Tu Thi Ngoc Thanh
rdfs:seeAlso
A resource that provides
information about the subject
resource
rdfs:Resource
rdfs:Resour
ce
rdfs:isDefinedBy
Indicates the namespace of a
resource
rdfs:Resource
rdfs:Resour
ce
rdf:value
Identifies the principal value
(usually a string) of a property
when the property value is a
structured resource
rdfs:Resource
not
specified
rdf:subject The subject of an RDF statement. rdf:Statement
rdfs:Resour
ce
rdf:predicate the predicate of an RDF statement. rdf:Statement
rdf:Propert
y
rdf:object The object of an RDF statement. rdf:Statement
not
specified
Bang 4:Cc thuc tnh cua RDF
(M ta cc tu vung cua RDF duoc trnh by trong phn Phu luc [1].)


2.3. eDoc
2.3.1. Tm hiu eLearning
2.3.1.1. Khi nim
eLearning hay cn goi l Online Learning, chun cho tt ca cc hnh thuc cua
vic hoc.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 56 - 0112398 Tu Thi Ngoc Thanh
Online learning lin quan dn vic su dung cc cng ngh mang ( nhu l:
Internet hay l mang thuong mai bussiness network) cho vic phn pht, h tro,
dnh gi vic day hoc chnh qui v khng chnh qui.
Hoc xay ra o du v nhu th no? O: cc ti nguyn v cc ti liu truc tuyn,
cc thu vin din tu, cc ti liu; v cc kho hoc, cc bui thao lun, chats, email, hi
nghi, v cc ung dung chia se tri thuc. Mt ch quan trong l online learning khng
nht thit phai din ra truc tuyn (online). Su dung cng ngh cho vic hoc thuong l
mt yu t phu di voi lop hoc v cc co hi hoc truc tip ( face to face ).
Mt s nguyn nhn d su dung online learning:
a. Vic truy cp duoc cai thin v tnh linh dng: Moi nguoi c th dng
nhp vo bt ky mt my tnh no, o tai nh hoc o noi lm vic, vo bt
ky lc no k ca ngy ln dm, d ly bi hoc hoc tham khao dn cc
ti liu hoc.
b. Phn phi nhanh hon v tit kim chi ph: Di voi cc t chuc cn truyn
dat thng tin quan trong m thng tin ny nhanh chng tro nn li thoi (
v du, phin ban moi nht cua mt san phm), th hnh thuc online hu
nhu l re hon v nhanh hon nhiu so voi vic nguoi truyn dat phai bay
qua nhiu quc gia d gp g nhung hoc vin o lop hoc voi hng ting
dng h.
c. Cai tin vic diu hnh v chun ho: Trong mi truong thuong mai
quc t ngy nay, nhiu t chuc mo rng trn pham vi ton cu. Su khc
nhau v kin thuc v k nng cua cc c nhn day c th s lm cho cht
luong hoc cua cc hoc vin o nhung noi khc nhau s khc nhau: v du
nhung nguoi hoc o New Delphi s c cht luong hun luyn khc voi
nhung nguoi o New York. Online learning cung cp thng tin nht qun,
ph bin di voi cc di tuong o khp noi.
Lm ni bt thng tin truyn dat v su cng tc: Thng qua nhung phn mm
no d s cho php nhung nguoi hoc duoc giao tip voi nhau, cng tc voi nhau qua
cc du n, v chia se ti liu m khng cn phai gp mt truc tip.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 57 - 0112398 Tu Thi Ngoc Thanh
2.3.1.2. Cc chun cua eLearning
Ngnh cng nghip eLearning tip tuc duoc mo rng mi ngy, v cc chun
cn thit d tao ni dung bi hoc ngy cng tro nn phuc tap.
Truoc khi mt qui uoc cua eLearning tro thnh standards (chun), n duoc
goi l specification ( dc ta ). Specification duoc duyt boi mt t chuc t chuc
ny duoc moi nguoi cng nhn, nhu l IEEE chng han.
Mt s chun cua eLearning:
a. Tp phn tu siu d liu Dublin Core
Tp phn tu siu du liu Dublin Core ( The Dublin Core metada element
set) l chun cho su m ta ti nguyn thng tin xuyn domain (bng qua nhiu
domain). O dy, ti nguyn thng tin duoc dinh nghia l bt ky thu g m c th
nhn bit duoc. Di voi cc ung dung Dublin Core, mt ti nguyn s l mt ti
liu din tu (electronic document).
Siu du liu Dublin Core duoc dng cho vic tm kim v chi muc cho
cc siu du liu dua trn Web. Tp siu du liu ny cung cp tu vung ngu nghia
nhu: Description, Creator v Date cho vic m ta nhung dc trung thng
tin quan trong cua cc ti nguyn Internet.
Tp siu du liu Dublin Core cung cp 15 tu vung:
Title: Tn duoc gn cho ti nguyn.
Creator: Thuc th c trch nhim tao ra ti nguyn. V du nhu:
c nhn, t chuc hay mt dich vu no d.
Subject: Chu d ni dung cua ti nguyn.
Description: M ta ni dung cua ti nguyn.
Publisher: Thuc th c nhim vu tao ra ti nguyn.
Contributor: Thuc th c dng gp vo ni dung cua ti nguyn.
Date: Ngy ti nguyn duoc tao.
Type: Th loai ni dung cua ti nguyn.
Format: Dang luu tru vt l cua ti nguyn.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 58 - 0112398 Tu Thi Ngoc Thanh
Identifier: Mt tham chiu cu th dn ti nguyn trong mt ngu
canh cho php.
Source: Tham chiu dn mt ti nguyn m ti nguyn duoc
dn xut.
Language: Ngn ngu su dung boi ni dung cua ti nguyn.
Relation: Tham chiu dn mt ti nguyn lin quan
Coverage: Mo rng ni dung cua ti nguyn
Right: Thng tin v quyn so huu ti nguyn.

b. LOM (Learning Object Metadata)
LOM l mt chun v eLearning hin tai duoc pht trin boi t chuc
IEEE. T chuc chun ho cng ngh hoc (Learning Technology Standards
Committee) cua IEEE d pht trin chun LOM nhm gip cho vic su dung v
su dung lai cua cc ti nguyn hoc duoc h tro cng ngh nhu l vic hun
luyn dua trn my tnh, v vic hoc tu xa.
Trong mt h thng eLearning, di tuong hoc l nhung g c th duoc su
dung, k thua hay tham khao trong vic h tro cng ngh hoc. Hin tai mt s
di tuong dang duoc tip tuc pht trin nhm dp ung nhu cu hoc thay di
nhanh chng. Vic thiu thng tin hay siu du liu v di tuong hoc tao ra
nhiu can tro, han ch cho kha nng quan l, khm ph v su dung di tuong
hoc.
LOM giai quyt vn d trn bng cch dinh nghia mt cu trc cho vic
m ta mt di tuong hoc. LOM chi ra c php v ngu nghia cua cc siu du
liu di tuong hoc, dinh nghia cc thuc tnh nhm m ta dy du v thoa dng
cc di tuong hoc.
Muc dch cua LOM:
Cho php nguoi hoc hay nguoi huong dn tm kim, dnh gi di
tuong hoc.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 59 - 0112398 Tu Thi Ngoc Thanh
Cho php chia se v trao di cc di tuong hoc qua bt ky cng ngh
c h tro h thng hoc.
Cho php pht trin cc di tuong hoc theo cc don vi c kha nng
kt hop hay phn r theo mt phuong php ph hop.
Cho php cc agent my tnh linh dng l tu dng trong vic t chuc
cc bi hoc cung cp dn nguoi hoc.
N hon ton dua trn chun v quan tm dn cc di tuong hoc
trong mi truong mo v phn tn.
Cho php cc cng ngh moi kt hop voi cc di tuong hoc.
Cung cp cho cc nh nghin cuu chun h tro v suu tp du liu lin
quan dn hiu qua cua cc di tuong hoc.
LOM dinh nghia mt tp ti thiu cc thuc tnh (attributes) d quan l,
dinh vi, v dnh gi cc di tuong hoc. Cc thuc tnh duoc gom nhm thnh 8
pham tr:
General: chua dung thng tin v ton b di tuong.
Lifecycle: chua dung siu du liu v su tin ho cua cc di
tuong.
Technical: voi su m ta cua cc dc trung v yu cu k thut.
Educational: chua dung cc thuc tnh v gio duc hoc su pham.
Rights: m ta quyn so huu v cc diu kin su dung
Relation: nhn bit cc di tuong c lin quan voi nhau.
Annotation: chua dung cc ch thch v ngy, tc gia cua cc ch
thch ny.
Classification: nhn bit cc b nhn din h thng phn loai
khc cho di tuong.
Bn trong mi pham tr l mt tp cc phn tu du liu c thu tu, m gi
tri cua chng l cc metadata. V du: Cc phn tu siu du liu lin quan dn
vic hoc duoc tm thy trong pham tr Education l Typical Age Range,
Difficulty, Typical Learning Time, v Interactivity Level.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 60 - 0112398 Tu Thi Ngoc Thanh
c. vCard
vCard l chun duoc gioi thiu v pht trin boi IMC (Internet Mail
Consortium). Cc thng tin c nhn thng thuong rt phuc tap v c nhiu loai
khc nhau. Hin tai c mt s chun d xut cc cu trc cho vic trao di
thng tin c nhn PDI (Personal Data Interchange). Muc dch cua chun ny l
nhm giai quyt nhu cu suu tp v trao di thng tin c nhn qua nhiu knh
thng tin khc nhau nhu din thoai, thu din tu hay di thoai truc tip.
Chun vCard ph hop cho vic trao di du liu c nhn giua cc ung
dung v h thng. Dinh dang cua vCard hon ton dc lp voi phuong php
dng d truyn tai n. Vic truyn tai ny c th l trao di mt h thng tp
tin, mang chuyn mach cng cng, mang dy dn hay mang khng dy. vCard
nhm dn vic trao di thng tin c nhn. Trong mi truong thuong mai ngy
nay, thng tin ny thuong duoc trao di trn cc the thuong mai v vCard dinh
nghia nhung thng tin ny dua trn cc di tuong the thuong mai din tu.

d. SCORM (Shareable Content Object Reference Model)
SCORM dinh nghia m hnh kt hop giua ni dung v mi truong thuc
thi cho cc di tuong hoc. Dy l mt m hnh tham chiu dn mt tp cc k
thut lin quan vic thit k nhm dp ung yu cu ni dung hoc dua trn Web,
nhung yu cu ny bao gm kha nng ti su dung, truy xut, kha nng tuong
tc cua cc di tuong hoc.

e. IMS ( Instructional Management Systems)
IMS dang duoc pht trin v xc tin tro thnh chun mo cho cc hoat
dng eLearning nhu su dung, sp xp cc ni dung gio duc v mo rng cc
khi nim tng qut nhu: thit k nguoi hoc, theo di v bo co qu trnh
nguoi hoc nhm thuc hin vic trao di thng tin giua cc h thng hoc khc
nhau.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 61 - 0112398 Tu Thi Ngoc Thanh
Muc dch cua IMS:
Dinh nghia cc chun k thut nhm nng cao kha nng tuong tc
giua ung dung v dich vu trong mi truong hoc phn tn hin nay.
H tro vic st nhp dc ta cua IMS vo trong cc san phm v dich
vu trn ton th gioi. Su chp nhn dc ta rng ri s cho php phn
phi mi truong v ni dung hoc tu nhiu tc gia lai voi nhau.
2.3.2. Tm hiu eLib
Elib (electronic library hay c goi l digital library) l mt thu vin n. Tu
electronic library ngu l mt suu tp cua cc ti nguyn thng tin din tu duoc ni
mang cng k thut lin kt v co so ha tng quan tri. Ban c th truy cp n tu bt cu
my PC hay laptop c ni mang no tu bt cu noi no trn th gioi o bt cu thoi dim
no.
Elib luu tru v chi muc hng van sch, bo, tap ch v du cc chu d trn th
gioi, chng han nhu vt l, thin vn, sinh ho, cng ngh sinh hoc, ho hoc v cng
trnh xy dung ho cht, cc thit bi xy dung, cng trnh xy dung mi truong, khoa
hoc thuc phm, v an ton suc khoe v v sinh .v.v cung nhu cc ti liu v thng
tin tiu su, l lich c nhn, ngh nghip, cc t chuc, hi lin hip, v du lich v.v.
Thu vin din tu ny duoc su dung ph bin nht trong cc truong dai hoc v nhung
trung tm nghin cuu khoa hoc. Tt nhin, di tuong su dung n chnh l nhung sinh
vin, nghin cuu sinh v cc nh khoa hoc.
Nhung chuong trnh Electronic library duoc xy dung dua trn nhung chun
thng nht do cc hi dng, t chuc lon trn th gioi lp ra. Mt s t chuc dinh chun
lon trn gioi nhu W3C (World Wide Web Consortium), ISO (International
Organization for Standardization), NISO (National Information Standards
Organization ), . C nhiu chun cho nhiu kha canh khc nhau cua vic luu tru v
truy cp thng tin din tu, bao gm cc chun v thu hi thng tin (Information
Retrieval Standard), thao tc giua cc phn (Interoperability), dinh dang ti nguyn,

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 62 - 0112398 Tu Thi Ngoc Thanh
nhn dang ti nguyn, m ta ti nguyn, Sau dy l mt s chun su dung trong
eLib lin quan dn vn d truy cp thng tin din tu:
Chun v thu hi thng tin:
Kiu chun ny cho php thng tin giua cc h thng khc nhau, lm cho thun
tin trong vic khm ph v truy cp thng tin din tu. V du nhu chun thu hi thng
tin ISO 23950 (tuong duong voi ANSI Z39.50) dinh nghia mt huong chun cho hai
my tnh lin lac v chia se thng tin voi nhau. N d duoc thit k d h tro khm
ph ti nguyn v thu hi ti nguyn cua nhung ti liu full-text, du liu muc luc,
cc hnh anh v multimedia. Chun ny dua trn kin trc client-server v dc lp voi
cc h thng cu th, hon ton diu hnh trn Internet.
Z39.50:
Z39.50 l mt trong mt nhm cc chun duoc san xut d lm cho d dng kt
ni cc h thng my tnh. Chun ny chi ra cc dinh dang v thu tuc chi phi vic
trao di cc thng dip giua client v server, cho php nguoi dng c th tm kim cc
co so du liu tu xa, nhn din cc dng du liu c dinh r cc chun, v thu hi mt
vi hay tt ca cc dng duoc nhn din v c lin quan, cu th voi vic tm kim v
thu hi thng tin trong co so du liu. Mt trong nhung thun loi lon trong vic su dung
Z39.50 l n cho php truy cp nhu nhau dn mt s luong lon ngun thng tin thay
di khc nhau.
Z39.50 thua nhn rng vic thu hi thng tin gm hai thnh phn chnh chon
thng tin dua trn nhung tiu chun v thu hi thng tin d, v n cung cp mt ngn
ngu chung cho ca hai hnh dng d. Z39.50 chun ho cch xu su m trong d client
v server thng tin voi nhau v hoat dng ngay khi c nhung khc bit giua cc h
thng my tnh, cc cng cu tm kim v cc co so du liu.

EDI (Electronic Data Interchange)
EDI duoc bit dn nhu mt chun cng ngh thng tin quc gia. O EDI, du liu
m theo truyn thng duoc chuyn vo trong cc ti liu giy th duoc truyn hay
duoc thng tin mt cch din tu ty vo cc lut v cc dinh dang duoc thit lp. Du

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 63 - 0112398 Tu Thi Ngoc Thanh
liu lin doi voi mi kiu cua ti liu chuc nng, v du nhu bang mua bn hay ho don,
duoc vn chuyn ln nhau nhu l mt thng dip din tu. Du liu d dinh dang c th
duoc vn chuyn tu nguoi tao ra dn nguoi nhn thng qua thng tin lin lac bng cp
hay vn chuyn vt l vo trong thit bi luu tru din tu.
EDI dua dn mt chui cc thng dip giua hai noi, v du nguoi mua v nguoi
bn, mi nguoi c th xem nhu l nguoi tao ra hay nguoi nhn. Cc thng dip tu
nguoi mua dn nguoi bn s bao gm, v du nhu du liu cn thit cho yu cu di voi
su trch dn (request for quotation_ RFQ), cc bin lai mua bn, cc thng bo vic
vn chuyn tu thuyn, v cc ho don. Vic thuc thi cua EDI yu cu vin su dung
cua mt ho cc chun lin kt voi nhau. Ho chun ny phai bao gm cc chun cho
cc kiu thng dip (cung duoc goi l cc nhm giao dich _ transaction set), v
cho vic vn chuyn thu, cc yu t du liu, v cc chui cua cc yu t du liu duoc
sp xp goi l cc segment du liu. Mt chun thng dip hay chun transaction set
dinh nghia chui cc segment du liu m tao thnh thng dip v transaction set d.
Thu muc segment du liu lit k tt ca cc segment du liu, v dinh nghia dinh danh
v chui cua cc yu t du liu tao nn n. Tu din yu t du liu cung cp cc chun
cua tt ca cc yu t du liu. Vic vn chuyn thu cung cp thng tin diu khin v cc
thng dip thm vo cho cc h thng vn chuyn v tip nhn. Vic chun ho cua
cc dinh dang thng dip, v cua cc segment du liu v yu t du liu trong cc thng
dip d, lm cho c th thu thp, tho roi v xu l cc thng dip bng my tnh voi
cc kt qua c th c th don truoc.

ILL (Internet Loan Library)
Nghi thuc ILL (ISO 10160/1) duoc pht trin d giu nhiu giao dich duoc lin
kt bao gm cc hoat dng yu cu ti liu gm nhiu nguoi tham gia. V khi nim
th n tuong duong voi EDI v bao gm vic cung cp cho dinh nghia cc data
element duoc yu cu, dinh nghia mt nhm cc thng dip v cc mi quan h cua
n, v mt c php cho vic lp cu trc thng dip.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 64 - 0112398 Tu Thi Ngoc Thanh
Nghi thuc ILL c ve nhu c nhiu d cung cp cc dich vu yu cu, dc bit khi
chng tro nn phn tn nhiu hon. Su truyn thng tu h thng ny sang h thng khc
cua cc thng dip c cu trc cho php mt pham vi rng lon cc thi hnh duoc tu
dng, v cc thu tuc bng tay hay phi hop cho vic theo vt, goi v, duoc tu dng.
Cng dung cua n trong cc dich vu tuong tc di voi yu cu cc ti liu cn nghin
cuu xa hon nua.

Chun m ho ti nguyn:
Nhung chun ny dinh nghia cc kiu hin thi khc nhau cua thng tin din tu.
Bao gm cc chun:
o Dinh dang m ta trang (v du postscript, PDF)
o Dinh dang d hoa (v du TIFF, GIF, JPEG)
o Thng tin cu trc (SGML, HTML, XML)
o Dinh dang hnh anh dng v audio.
o Nn (v du: gzip, jar, tar, zip).

Chun nhn dng ti nguyn:
Gm mt s chun sau:
DOI (Digital Object Identifier)
Digital Object Identifier l mt h thng duoc pht trin boi Bowker v CNRI
(Corporation for National Research Initiative) o US, theo mt yu cu v cc d xut
cho cng ngh nhn dang ni dung k thut s duoc dua ra boi Association of
American Publishers. H thng DOI c ba thnh phn: phn dinh danh, thu muc v co
so du liu. H thng ny cho php cc b dinh dang qui dinh nhung muc khc nhau,
v cho cc h thng khc (v du SICI, ISSN) duoc thm vo.
H thng DOI c th duoc dinh nghia nhu l mt b nhn dang duy nht c
th giai quyt duoc v nhiu mang cua du liu trang thi kiu kt hop trong mt co so
quan l thng tin. Din ta nhung phn cua dinh nghia nhu sau:

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 65 - 0112398 Tu Thi Ngoc Thanh
a. Mt b nhn dang duy nht: nhim vu cua DOI l duy nht di voi
mt mang cua dc tnh tri thuc. Dinh nghia cua mang ny duoc chi r
boi mt s mang chnh cua thng tin v n (siu du liu) m thuc vo
th loai cu th: d thuc th l mt bi bo hay mt video clip, v du nhu
vy. Dinh danh ny l mt chui khng r rng; n khng chua bt cu tri
thuc c php v thuc th ny.
b. c th giai quyt duoc; voi du liu trang thi kt hop: di su vo
thng qua h thng Internet tu b nhn dang d dn mt hay nhiu mang
cua du liu kt hop. Nhung mang ny biu diu trang thi hin tai (gi
tri) cua mt s kiu du liu (v du nhu mt URL). Nhung mang ny cua
du liu c th hin thi, hay dn dn, cc dich vu su dung DOI nhu l mt
dim thuc th.
c. mt co so quan l thng tin: mt khi mt mang du liu thu duoc do su
phn tch, th siu du liu v thuc th duoc dinh danh c th thi hnh voi
siu du liu tu nhung ngun khc (v du v ngu canh) d xy dung cc
dich vu v cc giao dich tu dng. Kha nng thi hnh ny duoc hon tt
thng qua vic quan l siu du liu trong mt huong duoc diu khin,
ph hop voi mt kin trc thi hnh m lm cho DOI c th dua ra nhung
ung dung o mt b nhn dang lin tuc don gian.

SICI
Chun SICI l chun ANSI/NISO Z39.56-1996 dinh nghia nhung lut l v m
dng nhn dang duy nht chui cc item (v du nhu cc s bo) v mi thnh phn (v
du nhu bi bo) chua trong mt chui. SICI l tu vit tt cua Serial Item and
Contribution Identifier v duoc su dung trong chun ny d chi m cua chnh n.
Chun ny duoc dinh nghia cho vic su dung voi chui cc xut ban trong tt
ca cc dinh dang. Di voi muc dch cua chun ny, mt chui duoc dinh nghia nhu l
mt xut ban pht hnh trong nhung phn lin tuc o nhung khoang trng du dn hay

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 66 - 0112398 Tu Thi Ngoc Thanh
khng du dn, mang bc s v/hoc thu tu thoi gian (numerical and/or chronological
designation), v c xu huong duoc tip tuc v han.
SICI c xu huong duoc tao ra v su dung boi cc thnh vin cua cng dng thu
muc tham gia vo nhung chuc nng kt hop voi vic quan l cua cc chui v cc phn
m chng chua dung, cc chuc nng nhu sp thu tu, b sung vo thu vin, yu cu,
thu tin nhun bt, quan l quyn, thu hi truc tuyn, lin kt co so du liu, v phn
pht ti liu.
B nhn dang duoc xy dung theo chun ny duoc su dung trong nhung ung
dung: Electronic Data Interchange (EDI), m vach Serial Industry Systems Advisory
Committee (SISAC), truy vn Z39.50, Uniform Resource Names (URNs), thu din tu,
v ban ghi cua con nguoi trong in n. Chun ny khng dinh nghia bt cu h thng
vn chuyn no r rt hay nghia cua vic thuc thi.
SICI su dung chui s chun quc t (International Standard Serial Number _
ISSN) d dinh nhn din chui tiu d. Do d, d su dung chun ny trong vic xy
dung mt item hay gp phn nhn din vt cht duoc pht sinh trong chui ny, th
chui ny phai duoc gn vo trong mt ISSN.
Chun SICI l mt su kt hop cua cc segment duoc dinh nghia, tt cc chng
du duoc yu cu. Nhung segment ny l:
a. Item Segment, cc data element cn m ta chui item (ISSN, bang nin
dai, bang lit k)
b. Contribution Segment, cc data element cn nhn din cc phn trong
mt item (vi tr, m tiu d, v nhung sp xp thu tu theo s trong mt
truong hop cu th cua SICI).
c. Control Segment, cc data element cn ghi lai nhung element quan tri d
m dinh nghia su dnh gi, phin ban, v dinh dang cua biu din m.
Dy l segment quan trong nht cua SICI. Su phin dich v xu l duoc
dinh nghia boi segment diu khin ny.
V du:

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 67 - 0112398 Tu Thi Ngoc Thanh


Chun m ta ti nguyn:
Chun ny c th lm cho d dng khm ph ti nguyn hiu qua. Bao gm:
AACR2_ mt tp cc m duoc su dung cho vic m ta cc ti liu thu
vin
Dublin Core_ mt chun siu du liu m ta duoc pht trin cho vic
m ta ti nguyn trn Internet. (Duoc m ta bn trn).
MARC (Machine-Readable Cataloguing)_ mt chun siu du liu m
ta pht trin cho muc dch muc luc.
Chun MARC dang duoc gim st boi hi dng thng tin thu muc c th doc
bng my (Machine-Readable Bibliographic Information Committee) kt hop voi vn
phng pht trin mang v cc chun MARC o thu vin cua co quan lp php Hoa K.
Cc dinh dang MARC l cc chun cho vic biu din v truyn thng cua
thng tin thu muc v quan h trong vic thi hnh c th doc bng my Dng MARC
chua mt chi dn dn du liu cua n, hay mt t cc bin chi duong(signposts),
truoc mi mang thu muc cua thng tin. C ba loai ni dung MARC chi r: cc the, cc
b m lnh vuc con, v cc chi thi.
Thun loi trong vic su dung siu du liu MARC l chng khng phai pht
trin phuong php chi r linh vuc cua vic t chuc thng tin thu muc, thng tin ny
luu cng vic v cho php du liu danh muc c th cng tc v trao di voi cc thu
vin khc. Su dung chun MARC ngn chn vic lp lai cng vic v cho php cc
thu vin chia se tt hon cc ti nguyn thu muc. MARC l mt chun cng nghip

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 68 - 0112398 Tu Thi Ngoc Thanh
din rng m muc dch chnh cua n l dua vic truyn dat cua thng tin trong mt
huong chun, bng cch d lm cho d dng truy cp thuong xuyn dn cc dng du
liu.

EDA (Encoded Archival Description)_ duoc su dung boi cc chuyn
vin luu tru vn thu cho vic m ho nhung gip d tm kim.
EAD l mt chun duoc su dung d m ho nhung gip d trong vic tm kim
su dung SGML v/hay XML. Muc dch cua vic su dung EAD l thuc hin luu tru ti
nguyn tu nhiu co so c kha nng truy cp nhiu hon dn nguoi dng. EAD cung
khuyn khch cng dng luu tru vn thu tn thnh cc chun cu trc du liu v lm
vic voi nhau trong su hnh thnh cua cc hi dng v cc co so du liu thng nht.
Hin tai, thu vin cua vn phng chun MARC v pht trin mang cua co quan lp
php Hoa Ky hoat dng nhu l co quan bao dung cho EAD v cung cp ti liu chnh
thuc cho trang web cua n. Cng dng chuyn vin luu tru vn thu cua M hoat dng
nhu nguoi chu cua EAD, v bn trn SAA EAD c trch nhim tip tuc gim st v
pht trin.
Gip d tm kim l g? Nhung gip d tm kim l nhung huong dn chi tit,
n m ta v sng tc nhung suu tp cua cc ti liu giy c nhn chua xut ban, cc h
so t chuc, v hnh anh. Chng gip nguoi nghin cuu nhn dang v dinh vi cc hp
hay cc thu muc quan tm duoc yu cu cho cng vic nghin cuu. Chng cung cung
cp thng tin co ban v t chuc, nguoi, hay gia dnh d tao ra cc ti liu hay hnh anh,
mt tng quan cua nhung suu tp v vic sp xp cua chng, v mt danh sch luu tru
chi tit. Gip d tm kim l nhung cng cu cua vic m ta luu tru.
2.3.3. Tm hiu eDoc
2.3.3.1. Khi nim
Edoc l tu vit tt cua electronic document hay cn goi l digital
document. Dy l mt khi nim mang tnh tng qut, chi tt ca nhung ti liu trn
web, chng han nhu cc trang tin tuc, tap ch din tu, cc ti liu chuyn ngnh hay

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 69 - 0112398 Tu Thi Ngoc Thanh
cc sch din tu. Edoc duoc xem l ngun ti nguyn chnh cho cc d n eLib,
eLearning. Nhung d n ny tp hop, t chuc lai mt cch logic cc eDoc xoay quanh
mt chu d cu th no d nhm muc dch gip cho nguoi dng c th d dng tm
thy cc ti liu din tu trong hng van ti liu, phuc vu cho nhu cu nghin cuu cua
nguoi dng.
2.3.3.2. Phm vi su dng cua eDoc
eDoc duoc su dung/ p dung trong tt ca cc hoat dng, noi no c
phn mm v cc thit bi cng ngh duoc ung dung d tao, luu tru, chuyn di v
nhn thng tin th o d cn c eDoc.
2.3.3.3. Cc yu cu di vi eDocs
- eDoc duoc tao, su dung, chuyn di v luu tru voi su h tro cua cc
thit bi cng ngh v su h tro cua cc phn mm.
- eDoc phai duoc biu din trong hnh thuc dy du nghia nht
- eDoc phai c cu trc ph hop, ph dung duoc nhiu nguoi su dung,
v c cc thuc tnh cho php xc nhn tnh xc thuc cua n.
2.3.3.4. Cu trc cua eDoc
- Electronic document bao gm 2 phn khng th tch roi duoc :
general part v especial part.
- General part bao gm thng tin th hin ni dung cua ti liu. Nu
mt ti liu duoc goi dn mt nguoi xc dinh, thng tin v nguoi ny
duoc th hin trong phn general part.
- Especial part gm mt hoc nhiu chu k din tu.
2.3.3.5. Bao mt trong eDoc
Khi mt t chuc mun thuc hin cc giao dich thuong mai truc tuyn,
vic bao dam an ton v b mt cua thng tin duoc su dung trong sut cc giao dich,
cung nhu vic cung cp xc tht v ton ven thng tin l rt cn thit. Boi v nhiu

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 70 - 0112398 Tu Thi Ngoc Thanh
giao dich tu dng dua trn ti liu din tu, loai ti liu ny chua dung thng tin rt
nhay cam, cc t chuc phai bao dam hon ton nhung ti liu ny. Nhiu giai php bao
mt thng tin c gng bao v cc ti liu din tu chi bao dam duoc o muc luu tru cuc
b hoc trong sut qu trnh chuyn di. Tuy nhin cc giai php bao mt ny khng
cung cp ch d bao v cho ton b chu trnh sng cua mt ti liu din tu. Khi mt ti
liu duoc chuyn dn cho nguoi nhn th ch d bao v cho n cung mt di, v ti liu
ny c th duoc chuyn dn hoc duoc xem mt cch c hay v tnh boi nguoi nhn
m khng th no chung thuc duoc nguoi ny c duoc quyn chuyn tip hoc xem
hay khng?
Mt giai php hiu qua hon nhiu l bao v ti liu bng cch gn cc
thng s bao mt m duoc goi km voi n. Su tiu chun cn phai c d cung cp ch
d bao v hiu qua hon cho mt ti liu din tu trong sut chu trnh sng cua n:
1. Confidentiality
2. Authorization
3. Accountability
4. Integrity
5. Authenticity
6. Non-repudiation



D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 71 - 0112398 Tu Thi Ngoc Thanh

Hnh 12 : Tiu chun dnh gi tnh bao mt cua eDoc

2.3.3.6. Dnh gi
Cho dn hin tai eDoc vn chua tht su c mt chun no . Ti liu eDoc trn
Internet v cng phong ph, da dang, chua dung mt luong thng tin khng l trn
web. Tuy nhin, cung v n qu phong ph, da dang nn tht su kh khn cho vic d
xut ra mt chun d tt ca cc ti liu eDoc tun theo.
Trong khi d, eLearning, eLib voi s luong ti liu khim tn hon nhung thuc
su d tun theo cc chun ring cua mnh v duoc moi nguoi chp nhn. Voi nhung
chun ring cua mnh, ti liu eLearning, eLib d dng tin dn voi web ngu nghia.
2.4. Mt s vn d trong xu l ngn ng t nhin:
Xu l ngn ngu tu nhin (Natural Language Processing) l bi ton l th nht
v cung kh khn nht cua ngnh my tnh tu hon 50 nm qua. Uoc mo dng my tnh
d xu l ngn ngu, mun my tnh hiu duoc ngn ngu tu nhin nhu con nguoi, d gp
phai tro ngai lon nht tu pha ngn ngu, d l tnh nhp nhng (ambiquity) vn c cua
ngn ngu tu nhin. Tuy nhin, tu hon nua th ki qua, cc nh ngn ngu hoc v cc nh

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 72 - 0112398 Tu Thi Ngoc Thanh
tin hoc d cng nhau tung buoc khc phuc duoc dng k cc tro ngai ny v d dat
nhiu kt qua tuong di kha quan.
2.4.1. Vn trong vic x l vn bn:
Vn ban du vo o dang text, chng han nhu cc trang HTML, chua duoc xu l.
Cn phai c thm tng tin xu l d xu l so b vn ban du vo, ri phn tch n thnh
cc don vi r rng ( nhu doan, cu, tu, ) d cho h thng d xu l. Bi ton tin xu l
vn ban bao gm cc cng vic sau:
Xu l so b vn ban du vo (lm sach vn ban) bng cch xo bo nhung k
tu, nhung m diu khin, nhung phn khng cn thit trong bi ton.
Trong mi vn ban, khi tin xu l s nhn ra cc tiu d, cc ch thch, cc
thng tin thm vo (tc gia, ngy)(nu c), v ni dung chnh cua vn
ban.
Trong mi doan vn, khi tin xu l s phn r n thnh cc cu. Dy l giai
doan kh nht. Cao hon nua, khi ny c th phn tch cu thnh nhung
mnh d (phase) d giam bot gnh nng cho h dng thoi tng cht luong
cung nhu tc d xu l cua h.
2.4.2. Vn d xu l ng nghia:
Trong xu l ngn ngu tu nhin, bi ton gn nhn ngu nghia (sense tagger), hay
cn goi l khu nhp nhng ngu nghia cua tu ( Word Sense Disambiguation, vit tt
l WSD) l bi ton kh khn nht v cung l bi ton trong tm m dn nay th gioi
vn chua th giai quyt n thoa duoc. D giai quyt bi ton ny, dn nay trn th gioi
d c rt nhiu m hnh voi nhiu huong tip cn khc nhau, chu yu gm cc huong:
Dua trn tr tu nhn tao (AI based): dy l cch tip cn som nht
(1960) voi nhung l thuyt rt hay v mang ngu nghia, khung ngu nghia v cc
nim nguyn thuy ( nhu: THING, DO, CAUSE,) v cc quan h nhu IS A,
PART OF, . Tuy nhin, do hu ht cc tri thuc v ngu nghia trong cch tip
cn ny du duoc xy dung bng tay ( khng th xy dung duoc nhiu tri thuc v
th gioi thuc ), v vy cc m hnh ny du dung lai o muc d biu din trn mt

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 73 - 0112398 Tu Thi Ngoc Thanh
vi cu ( demonstration on toy program). Vn d kh khn cua cch tip cn
ny l tnh trang thiu tri thuc.
Dua trn co so tri thuc (Knowledge Based):
Vo du thp nin 80, nguoi ta d chuyn sang huong khai thc tri thuc tu
dng tu cc tu din din tu (MRD: Machine Readable Dictionaries) nhu cc tu
din dng nghia (thesaurus), LDOCE, LLOCE, d c th phn no khc phuc
han ch cua huong tip cn dua trn tr tu nhn tao (tnh trang thiu tri thuc).
Kt qua cua huong tip cn ny l su ra doi cua: mang WordNet mt co so tri
thuc khng l v ngu nghia cua tu vung theo huong lit k nt nghia; h
CORELEX theo huong h thng nt nghia; v FrameNet v vai tr (case roles)
cua dng tu. Tuy nhin, cc co so tri thuc ni trn cung chi l nhung ngun thng
tin d h thng chon nghia tham khao, cn chon thng tin no trong s nhung
thng tin c lin quan d th ta phai tu xc dinh trong tung truong hop cu th.

Dua trn ngu liu (Corpus Based):
Huong tip cn ny s rt ra cc qui lut xu l ngu nghia ( bng thng k,
bng my hoc,) tu nhung kho ngu liu lon d c sn v p dung cc lut ny
cho cc truong hop moi. Thuc ra cch tip cn ny d duoc nu ra rt som
(1940), nhung do ngun ngu liu han ch, thit bi xu l chua hin dai, nn khng
c diu kin d pht trin. Mi dn thp nin 1990, khi m cng ngh pht trin
manh, d c th vuot qua duoc nhung kh khn cua mnh, cch tip cn ny duoc
hi sinh v pht trin ngy cng manh m cho dn ngy hm nay.
Hin nay, cch tip cn dua trn ngu liu kt hop voi tri thuc c sn l huong
tip cn dang duoc nhiu nh ngn ngu hoc my tnh quan tm.
2.4.2.1. Khi nim v nhn ng nghia t:
Tu khao st nghia tu vung cua mi tu, ta thy mi tu c th mang nhiu nghia
khc nhau, nhung trong mt ngu canh cu th, th n chi mang mt nghia nht dinh
trong s nhung nghia d. D d phn bit cc nghia tu vung khc nhau, cc nh ngu

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 74 - 0112398 Tu Thi Ngoc Thanh
nghia hoc, tu vung hoc v tm l hoc ngn ngu d phn chia ton b cc nghia tu
vung c th c thnh h thng cc nim ( cy nim) v mi nim nhu vy duoc
coi nhu l mt nhn ng nghia cua tu.
2.4.2.2. Mt s h thng nhn ng nghia:
Cho dn nay, vic xy dung mt h thng nhn ngu nghia thng nht vn chua
hon tt v vn dang tn tai nhiu h thng nhn khc nhau (mc d h thng nhn o
muc tu php d duoc thng nht v xc dinh r rng tu lu). Vn d kh khn l c
nhung tu ta khng bit nn phn vo nim no (ly nghia no) v cch phn loai
cn tuy thuc vo muc dch v linh vuc su dung.
Ngoi ra, nu h thng nhn ngu nghia duoc phn qu min th s nhn s rt
lon (hng chuc/ trm ngn nhn) v khng th gn nhn tu dng duoc ( v khi d, ta
cn ngu liu hun luyn lon hng ti tu). Cn nu h thng nhn phn qu th (qu t
nhn), th n s khng dp ung duoc mt s nhu cu phn bit nghia trong thuc t
(chng han nhu cu khu mo h nhung truong hop cng nhn ngu nghia nhung c
nghia tu vung khc nhau).
Mt s h thng nhn ngu nghia thng dung hin nay bao gm LLOCE (Longman
Lexicon Of Contemporary English), LDOCE (Longman Dictionary Of Contemporary
English), CORELEX, WordNet. D ti chon v su dung kho ngu liu WordNet l chu yu
trong giai doan xu l ngn ngu tu nhin.
H thng nhn ng ngha WordNet
WordNet l mt h co so tri thuc khng l v ngu nghia cua tu vung ting Anh
voi hon 100.000 nim khc nhau, duoc xy dung boi cc nh ngn ngu hoc my
tnh, ngn ngu hoc tm l v ngn ngu hoc tri nhn o Dai hoc Princeton (M) tu
du thp nin 1980. WordNet l mt h truc tuyn (on line) cho php moi nguoi o
khp moi noi duoc tu do ( min ph) khai thc hay su dung cho cc muc dch nghin
cuu, hoc tp.
WordNet l mt kho tng tri thuc ngu nghia tu vung khng l duoc nhiu nh
ngn ngu hoc v ngn ngu hoc my tnh khai thc, ung dung thnh cng trong nhiu
bi ton xu l ngu nghia. Hin nay, WordNet dang duoc cc nh khoa hoc v ngn

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 75 - 0112398 Tu Thi Ngoc Thanh
ngu, tm l, my tnh trn ton th gioi tip tuc khai thc, dng gp d cai tin ngy
cng hon thin hon. WordNet c nhiu uu dim khng th chi ci, d l: tnh khoa
hoc, tnh h thng, tnh mo (open), tnh d su dung, tnh ph thng, tnh pht trin,
Chnh v vy, dn nay, d c mt s cng trnh ban dia ho (localization) WordNet
theo ngn ngu cua mt s nuoc, nhu: Php, Nht, Ty Ban Nha, Hn, Nht,.v gn
dy l Vit Nam.
WordNet khng chi don thun l nhm cc tu dng nghia hay cc tu c quan h
ngu nghia voi nhau thnh tung lop nhu mt s tu din LDOCE, LLOCE, m
WordNet cn l mt h thng cc nim c quan h nhiu mt voi nhau, tao thnh
mt mang luoi phuc tap. Muc tiu co ban cua WordNet l chua cc thng tin v ng
nghia cua tu. Chnh v vy, ngay tu du, ta phai xc dinh cch hiu v don vi tu trong
WordNet l nhu th no, sau d ta tm hiu v tp dng nghia (synset) thnh phn co
ban cua WordNet d p dung vo vic ban dia ho WordNet thnh ngn ngu cua
chng ta.
2.4.2.3. Cc ngun tri thc d xu l ng nghia:
D xu l ngu nghia, nguoi ta phai kt hop nhiu ngun tri thuc: tu cc tri thuc
v ngn ngu (nhu: hnh thi, ngu php, ngu nghia) cho dn cc tri thuc ngoi ngn
ngu ( tri thuc v th gioi thuc). Cc ngun tri thuc d thuong bao gm:
2.4.2.3.1. Tri thc v t loi
Trong truong hop cc tu dng tu ( homograph) v c nghia khc nhau voi cc
tu loai khc nhau v ung voi mt tu loai chi c mt nghia duy nht, th nho thng tin
tu loai, chng ta s xc dinh duoc chnh xc nghia cua chng. V du, tu can c nghia
l c th (tro dng tu), ci hop (danh tu), dng hop (dng tu). V vy, voi cc
truong hop ny, nu bit duoc chnh xc tu loai, chng ta hon ton khu duoc nhp
nhng nghia cua chng. V du: I
PRO
can
AUX
can
V
a
DET
can
NN
(Ti c th dng hop
mt ci hop).
Theo thng k trong tu din LLOCE , c toi 88% muc tu thuc dang ni trn,
ngoi ra c 7% truong hop m muc tu (tp cc tu dng tu) c nhiu tu loai, mi tu loai

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 76 - 0112398 Tu Thi Ngoc Thanh
c th c nhiu nghia khc nhau, nhung trong d c t nht mt tu loai c duy nht
mt nghia. Di voi truong hop ny, ta c th khu nhp nhng nghia nu tu loai cua n
( trong ngu canh) chnh l tu loai m chi c mt nghia.
2.4.2.3.2. Tri thc v quan h c php v rng buc ng ngha:
Truong hop mt tu trong mt tu loai c nhiu hon mt nghia, th thng tin tu
loai khng du d khu nhp nhng nghia. V du: tu bank (c 2 tu loai l dng tu v
danh tu), voi tu loai danh tu c cc nghia: ngn hng, bo sng, dy,. Trong
truong hop ny, ta cn su dung thm tri thuc v th gioi thuc thng qua cc rng buc
ngu nghia ( selectional restriction) giua cc thnh phn c php (S V O M ) trong
cu. V du, trong cu I enter an old bank, sau khi qua phn gn nhn ngu php, ta
duoc:
[I
PRO
]
NP
[enter
V
[an
DET
old
ADJ
bank
N
]
NP
]
VP
v cy c php nhu hnh duoi dy:


Hnh 13 Cc quan h c php v rng buc ng nghia


D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 77 - 0112398 Tu Thi Ngoc Thanh
Trn cy c php ny, ta xc dinh duoc cc quan h c php nhu: S V (chu
ngu dng tu), V O (dng tu di tu), A N ( tnh tu danh tu), D N ( dinh tu
danh tu). Mi tu thuc (content words) trong cu trn, cho d d xc dinh duoc tu loai
chnh xc, nhung du vn gy nhp nhng v ngu nghia. V du, dng tu enter ( di
vo / nhp), danh tu bank (ngn hng/ bo sng/ dy), tnh tu old (gi/ cu ). V vy,
chng ta phai su dung dn nhung rng buc ngu nghia nhu sau:
Tu Rng buc /
nhn ngu nghia
Rng buc
I (ti) Type: Person
(Nguoi)

Enter1
(di vo)
S:Human
(nguoi)
O:Closed SPA
(khng kn)
Enter2
(nhp)
S:Human
(nguoi)
O: Data (du liu)
Bank1
(ngn hng)
Type: Hou (nh cua, khng gian kn)
Bank2
(bo sng)
Type: Nat
(cng trnh thin nhin, khng gian ho)
Old1
(gi)
N: Ani
(c su sng)

Old2
(cu)


Bang 5: Danh sch cc nghia v rng buc cua cc tu thuc trong cu.


D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 78 - 0112398 Tu Thi Ngoc Thanh

Hnh 14 Cy quyt d|nh trong vic chn nghia ph hp.

Qua vic duyt cy tu trn xung voi gc l dng tu (Enter), cui cng ta chon
duoc cc nghia ph hop: enter1 (di vo), bank1 ( ngn hng), v old2 (cu). Trong vic
xt diu kin rng buc v ngu nghia, chng ta phai xt dn tnh cp bc (hierachical)
trong h thng nhn ngu nghia (ontology) m trong d khi nim con s k thua cc
nt nghia cua khi nim cha v c thm nt nghia moi ring cua chng. Thng tin v
dc dim ngu nghia (type) cua tung muc tu thuc cung nhu cc rng buc d duoc xc
dinh trong tu din LDOCE v FrameNet.
2.4.2.3.3. Tri thc v ngn t ( Collocation)
Rng buc v ngu nghia giua cc thnh phn c php khng phai lc no cung
giai quyt duoc moi nhp nhng, v c nhung quan h tim n v logic, v ngu nghia
hoc thm ch do thi quen m vic nhn bit phai di hoi nhung tri thuc th gioi thuc
m dn nay nguoi ta cung chua th tch hop ht vo tu din hay cc co so tri thuc khc
trong my tnh.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 79 - 0112398 Tu Thi Ngoc Thanh
V du, danh tu bank trong cu I go to the bank c nghia g? Ta s chon
nghia no trong s cc nghia: ngn hng/ bo (sng) / dy; danh tu way l duong
(di) / cch (thuc)?; danh tu letter l buc thu / chu ci?;. Nu ta chi xt cc rng
buc v ngu nghia ( khng phai lc no cc rng buc ny cung c mt dy du ) th ta
kh m c th xc dinh duoc chnh xc nghia cua cc tu nhp nhng d.
V vy, d khu nhp nhng trong nhung truong hop ny, nguoi ta thuong xt
dn hnh thi v ngu nghia cua cc tu ln cn hay cn goi l ngn tu (collocation).
Chng han khi thy bank river bo sng, bank account/money ngn
hng; way to duong (di), way of cch thuc; write letter to
buc thu, letter A chu ci, letters, digits, symbols chu ci,
write papers, letters, messages, buc thu;.
Pham vi ln cn cua tu cn khu ngu nghia c th l bn tri 1, 2 hay n tu v bn
phai 1, 2 hay n tu. Vic chon lua ln cn ny phu thuc vo tung truong hop v c
nhn cu th.
2.4.2.3.4. Tri thc v chu d (subject)
Trong mt s truong hop nhp nhng, chng ta c th xc dinh duoc nghia
dng cua tu nu ta bit duoc chu d cua vn ban. Chng han tu bank, nu dang ni
v vn d ti chnh th n thuong c nghia l ngn hng; tu driver trnh diu
khin ( nu chu d l linh vuc tin hoc); sentence cu (nu chu d l ngn
ngu / vn pham) hoc ban n ( nu dang ni v php lut); element nguyn
t ( trong ho) / phn tu (trong ton / tin hoc);.
D xc d|nh duc chu d cua vn ban dang cn dich, ta cn xem xt s xut
hin cua mt s t chuyn mn trong linh vc d. Chng han, nu trong vn ban ta
thy xut hin cc tu nhu: ellipsis (tinh luoc), bilingual (song ngu), anaphora
(th dai tu), pharse (ngu), th ta c th don nhn vn ban ny dang ni v chu d
ngn ngu hoc; tuong tu cho cc tu computer, memory, peripherals, CPU,
dang ni v tin hoc, .

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 80 - 0112398 Tu Thi Ngoc Thanh
Chnh v vy, trong tu din LDOCE/ LLOCE du c m s chu d cho cc tu
chuyn mn ny. Chng ta c th xc dinh duoc chu d mt cch tu dng bng cch
xem xt cc tu chuyn mn ln cn tu dang cn khu nhp nhng.
2.4.2.3.5. Tri thc v tn sut ngha cua t
Mt tu khng phai lc no cung thuc v mt chu d nht dinh ( trong tu din
LDOCE, hon 56% tu thuc dang ny), v vy tnh thng dung cua mt nghia no d
cn duoc dua trn d do v tn sut (frequency) xut hin cua tu d di voi nghia cu
th d. V du, danh tu penc nghia thng dung nht l bt/ vit (bn canh cc
nghia t thng dung hon, nhu: chung, lng chim); ball thuong c nghia l qua
banh/ hn bi hon l bui khiu vu,
D do tn sut xut hin cua mi nghia cua mi tu duoc thng k trn nhung
ngu liu rt lon thuc nhiu loai vn ban khc nhau. Chnh v vy, trong WordNet v
trong LDOCE, cc nghia duoc sp xp theo thu tu giam dn (nghia thng dung nht s
duoc lit k du tin).
2.4.2.3.6. Tri thc trong djnh ngha cua ngha t (definition):
Trong cc tu din LDOCE/ WordNet, mi nghia s duoc dinh nghia v c v du
km theo. V du, tu bank trong LDOCE s c cc nghia km dinh nghia cua n nhu:
- land along the side of a river, lake, etc. (dt doc bn sng / h )
- a place where money is kept and paid . (noi giu tin v tra
tin )
- a row, a line of (mt hng, mt dy )
Dua trn thng tin trong cc dinh nghia ny, v so snh voi thng tin cua ngu
canh, ta c th xc dinh duoc nghia ph hop cua tu trong ngu canh d. D thuc hin
diu ny, Wilks et.al. d tnh ton phn giao (overlap) cua tt ca cc t hop nghia cua
cc tu thuc trong cu ting Anh dng d dinh nghia mi nghia cua tu.


D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 81 - 0112398 Tu Thi Ngoc Thanh
2.4.2.4. Gn nhn ng nghia
Khu nhp nhng nghia cua tu l bi ton dc trung trong gn nhn ngu nghia.
Tuc l nghia cua tu da nghia s duoc xc dinh ngay nu bit nhn ngu nghia cua n, v
du: danh tu bank s c nghia l ngn hng nu duoc gn nhn l HOU, v c
nghia bo (sng) nu gn nhn NAT, .
Trong cc m hnh gn nhn ngu nghia theo cch tip cn dua trn cc ngun
tri thuc ni trn, nguoi ta thuong su dung b nhn c d min (granularity) khc nhau.
B nhn cng min ( chi tit hng trm ngn nhn nhu WordNet) th d chnh xc cua
vic gn nhn s thp hon nhung kha nng khu nhp nhng nghia cua n s cao hon (
v khng c truong hop no cng nhn m khc nghia). Nguoc lai, nu chon b nhn
cng th ( chi c 36 nhn nhu LLOCE), th d chnh xc trong gn nhn s cao hon v
tt nhin kha nng khu nhp nhng nghia s thp hon ( s c nhiu truong hop cng
nhn nhung khc nghia).
Ngoi ra, vic gn nhn ngu nghia cn duoc phn bit theo quy m gn nhn:
hoc l gn cho mt s t cc tu din hnh ( nhu Hwee Ng v Hian Lee cho mt tu
interest, David Yarowsky cho 12 tu,) hoc l gn cho hu ht cc tu thuc (nhu Mark
Stevenson v Yorick Wilks, Mona Diab v Philip Resnik).
Vic chon ngun tri thuc no cho mi tnh hung duoc h thng quyt dinh
bng phuong php hoc gim st trn ngu liu d duoc gn nhn ngu nghia chnh xc (
dy chnh l ngu liu hun luyn hay cn goi l ngu liu vng). Giai thut hoc c th
l mang Neural, cy quyt dinh, MBL, TBL, m trong d cc giai thut hoc dua trn
k hiu (symbolic) to ra chnh xc hon.
2.4.2.5. Cc mc d nhp nhng trong xu l ng nghia:
2.4.2.5.1. Nhp nhng mc t vng:
Nhu cu v du I enter the bank o trn, sau khi phn tch c php, my tnh d
xc dinh duoc mi quan h giua dng tu enter (di vo) v di tu cua n l bank (l
ngn hng hay bo sng?) th phai cn phn tch ngu nghia cua dng tu enter v danh
tu bank. Trong truong hop ny my s vn dung cc nim cua ngn ngu hoc tri

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 82 - 0112398 Tu Thi Ngoc Thanh
nhn d bit rng enter l hnh dng di vo khng gian kn (close space) v danh
tu bank voi nghia l bo sng c thuc tnh l khng gian ho th s khng thoa
thuc tnh ny, chi c bank voi nghia ngn hng l s thoa diu kin khng gian
kn ny, nn cui cng my tnh s chon nghia ngn hng.
2.4.2.5.2. Nhp nhng mc cu trc:
V du xt ngu Old man and woman, ta c 2 phn tch: [Old man] and
[woman] v Old [man and woman] v my tnh s chon cch phn tch thu nh (do
tnh cn bng vn c trong cu trc song song cua lin tu and). Tuy nhin, nu xt
Old man and child, ta cung s c 2 phn tch: [Old man] and [child] v Old [man
and child] v my tnh s chon cch phn tch thu nht, v my thy cu trc thu nh
l v l (do c su di lp giua thuc tnh tre trong child v gi trong man).
2.4.2.5.3. Nhp nhng mc lin cu:
V du xt cu The monkey ate the banana because it was hungry (con khi n
chui v n di). Trong mt s truong hop, my tnh hin nay c th xc dinh duoc dai
tu it (n) thay th cho tu no: monkey (khi) hay banana (chui). D giai quyt
duoc nhp nhng ny, my tnh phai xem lai mnh d truoc v vn dung tri thuc v th
gioi thuc c trong WordNet d bit rng chi c khi moi c kha nng di nn s chon
it thay th cho monkey. Cn trong cu: The monkey ate the banana because it was
ripe (con khi n chuoi v n chn), th my tnh s bit rng chi c chuoi moi c kha
nng chn), nn s chon it thay th cho banana.
2.4.3. Phn loi vn ban (Text Classification)
Trong thoi dai ngy ny, thoi dai cua thng tin, luong vn ban ngy cng lon v
ta cn phn loai cc vn ban thnh cc nhm chu d khc nhau, nhu: theo chuyn
ngnh (Ton, L, Ho, Vn, Su, ), theo linh vuc (Khoa hoc, Vn ho, X hi, Chnh
tri, ), . Do khi luong qu lon, ta khng th phn loai thu cng bng tay duoc. V
vy, mt chuong trnh my tnh phn loai tu dng duoc yu cu. D xy dung chuong
trnh ny, nguoi ta d dng nhiu cch tip cn khc nhau, nhu: dua trn tu kho, dua

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 83 - 0112398 Tu Thi Ngoc Thanh
trn truong ngu nghia cua cc tu c tn s xut hin cao, m hnh Maximum Entropy,
dua trn l thuyt tp th,
Di voi ting Anh, cc kt qua trong linh vuc ny rt kha quan. Cn di voi
ting Vit, gn dy d c mt s cng trnh nghin cuu v vn d ny v d c mt s
kt qua ban du nhung cn han ch do phn phn tch hnh thi (tch tu) v tu din
nim (phn loai ngu nghia) cho ting Vit chua hon thin. Bn canh vic phn loai
vn ban, nguoi ta cung quan tm dn cc ung dung gom cum vn ban nhm nhm cc
vn ban c ni dung tuong tu nhau (theo cc thng s cua vn ban) lai voi nhau.





















D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 84 - 0112398 Tu Thi Ngoc Thanh
Chuong 3 : M HNH V GIAI THUAT
3.1. Cng ngh tm kim ng nghia trn th gii hin nay:
Hu ht cc hiu qua gn dy cua cc cng cu tm kim dua vo ngu nghia l
phu thuc cao vo cng ngh xu l ngn ngu tu nhin d phn tch v hiu cu truy
vn. Mt trong nhung cng cu tm kim du tin v thng dung nht ny l Ask Jeeves
(http://www.askjeeves.com/). N lin kt nhung dim manh cua phn mm phn tch
ngn ngu tu nhin, xu l khai khong du liu, v tao co so tri thuc voi nhung phn tch
theo kinh nghim. Nguoi dng c th g cc truy vn bng ngn ngu tu nhin v nhn
duoc nhung tra loi thoa dng.
Mt v du dua trn ngu nghia khc l Albert ( http://www.albert.com/). Uu
dim lon nht cua n l cung cp nhiu ngn ngu thm vo cho ting Anh, v du nhu
ting Php, Ty Ban Nha, Duc. Loai ny cua search engine cn mt s dng nguoi d
xy dung nn mt mang ngu nghia rt lon nhm muc dch huong toi vic thuc thi hop
l.
Mt kiu nng cao khc cua cng cu tm kim Internet l Cycorp
(http://www.cyc.com/). Cyc lin kt co so tri thuc lon nht trn th gioi voi Internet.
Cyc (en-cyc-lopedia) l mt co so tri thuc bao la v da ngu canh. Voi Cyc Knowledge
Server, n cho php cc site Internet thm vo tri thuc ngu nghia thng dung v phn
bit nhung nghia khc nhau cua cc khi nim nhp nhng.

3.1.1. Cc hiu qua tm kim ng nghia hin nay
Khi cng ngh Web tr tu nhn tao tro nn nng cao hon, su dung cc the RDF
v OWL s dua ra nhung co hi ngu nghia cho tm kim. Tuy nhin, kch thuoc cua
mang dang duoc tm kim s phai thit lp mt khoang trng cho giai php phuc tap
v do d anh huong manh dn kha nng xut hin cua cc kt qua thnh cng.
Nhiu cng ty lon dang tht su huong dn vn d cua tm kim ngu nghia. Su
pht trin cua Microsoft v Web c l phu thuc vo kha nng cua n d hon thin
cng cu tm kim m dn du l Google. Kt qua l Microsoft d dua ra mt chuong

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 85 - 0112398 Tu Thi Ngoc Thanh
trnh tm kim moi goi l MSNBot, n luot qua Web d xy dung mt chi muc cua cc
lin kt HTML v cc ti liu. MSNBot duoc du dinh nhu l mt cng ngh m kt
hop cc ung dung cho h diu hnh Windows. Sau d Microsoft s kt ni cng cu tm
kim cua n voi cng MSN trong phin ban Windows k tip cua n nhm lm cho d
dng tm kim e-mail, spreadsheets v cc ti liu trn cc PC (Personal Computer),
cc mang hop nht, cung nhu Web.

3.1.2. Cng ngh tm kim
Tm kim ngu nghia giai quyt voi cc khi nim v cc mi quan h logic.
Nu xem xt cc vn d thuc t cua tm kim ngu nghia, chng ta s thy rng cy tm
kim dung truoc tnh trang thiu logic dua dn vn d chua hon tt (Incompleteness
Problem) hay vn d ngc ngu (Halting Problem).
Du tin hy xem xt vn d chua hon tt. Kt lun c th duoc xem nhu l
mt su suy din cua mt dy logic gn lai voi nhau. O mi dim, c th c nhiu
huong khc nhau d toi mt suy din moi. V vy, nhm dat hiu qua, c mt nhm
cc kha nng phn nhnh d bng cch no d huong dn mt giai php dng. V
nhm cc phn nhnh d c th trai ra trong cc huong moi la.
V du, ban c th mun c gng dinh nghia ai l nguoi m Kevin Bacon bit
dua trn thng tin v mi quan h gia dnh cua anh ta, nhung phim cua anh ta, hay
nhung tip xc cng vic cua anh ta. Do d, c nhiu hon mt huong d dua dn mt
s cc kt qua. Cc kt qua ny nm trong mt nhm phn nhnh cc kha nng c th
c. Do vy, kt lun trong h thng cua chng ta l mt loai cua vn d tm kim,
duoc biu thi nhu l mt cy tm kim.
C th bt du o dinh cua cy, o gc, hay tu cc nhnh. Dinh cua cy c th l
cu truy vn duoc hoi. Mi buoc ln xung cc nt con trong cy ny c th duoc xem
nhu mt suy din logic tim tng di chuyn huong dn vic c gng xc nhn cu truy
vn nguyn thuy m su dung buoc suy din logic ny. Huong r quat cua cc kha nng
c th duoc xem nhu cy phn nhnh ny, tro nn rm rap hon v su hon. Mi tip
cn ny kt thc bng vic tro thnh mt trong cc buoc con, dn mt nt con.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 86 - 0112398 Tu Thi Ngoc Thanh
Tuong tuong rng mi nt trong cy ny biu thi mt vi huong d xc nhn.
Mi lin kt tu mt nt cha cao hon dn mt nt con biu thi mt cu lnh logic. By
gio vn d ny l chng ta c mt cy lon cua cc kha nng.
Trong mt h thng logic phuc tap, c mt s luong lon cc chung co tim
tng. Mt s chng di v khng r rng nu chi c mt chung co. Duoc chung minh
vo nhung nm 1930, mt s h thng logic du phuc tap vn d l khng dy du
(khng th quyt dinh). Ni cch khc, c cc cu lnh m khng th duoc chung
minh mt cch logic. Lun cu cua n cho diu d lin quan dn mt vn d khc, vn
d ngc ngu (Halting Problem).
Vn d halting suy ra rng cc thut giai hin nay s khng bao gio kt thc
trong mt cu tra loi. Khi ni v Web, chng ta ni v hng triu cc su kin v hng
chuc ngn lut m c th ni kt dan lai voi nhau trong nhung huong phuc tap, v th
khng gian cua cc chung cu tim tng l v tn v cy ny theo logic s tro nn v
tn. Theo d, chng ta s di vo cc vn d khng hon tt vn c; v du nhu chng ta
khng th thy mi chung cu c th c v thu tt ca cc cu tra loi.
Chng ta s di vo tnh trang khng hon tt boi v cy tm kim qu lon. V th
huong tip cn cua chng ti l chi phai tm kim trn cc phn cua cy. C mt chin
luoc ni ting cho vic bng cch no d chi ra cc vn d tm kim nhu vy. Mt
chin luoc l tm kim cy theo chiu su (depth-first).
Tm kim chiu su s bt du o dinh cy v di xung su dn muc c th mt
s duong dn no d, mo rng cc nt khi chng ta di, cho dn khi tm thy mt kt
thc cht (dead end). Mt kt thc c th l mt dch (thnh cng) hay mt nt m
chng ta khng th tao ra cc con moi. V vy h thng khng th chung minh bt cu
thu g ngoi dim ny.
Hy xem qua tm kim theo chiu su v xoay theo truc cua cy. Chng ta bt
du o nt dinh v di su nht c th:
1) Bt du o nt cao nht.
2) Di xung su nht c th theo mt huong.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 87 - 0112398 Tu Thi Ngoc Thanh
3) Khi chng ta di vo mt kt thc, sao luu nt cui cng m tu d
chng ta roi khoi. Nu c mt duong dn m chng ta chua di, th
hy ln theo n. Cu theo chon lua ny cho dn khi chng ta thy mt
kt thc hay mt dch dn.
4) Duong dn ny dn dn mt kt thc khc, v th di tro lai mt nt v
c gng o nhnh khc.
5) Duong dn dua dn mt dim dch. Ni cch khc, nt cui cng ny
l mt kt qua kha quan cho truy vn. V th chng ta c mt cu tra
loi. Hy tm kim nhung dp n khc bng cch di ln mt vi node
v sau d di xung mt duong dn m chng ta chua di thu.
6) Tip tuc cho dn khi thy nhiu hon nhung dim kt thc v su dung
ht nhung kha nng tm kim.
Uu dim cua tm kim theo chiu su l: dy l mt cch hiu qua theo thut
ton d tm kim cc cy trong mt dinh dang. N gioi han s luong khng gian m ta
c d duy tr vic nho nhung thu m ta chua nhn thy. Tt ca nhung thu m chng ta
phai nho l luu lai duong dn.
Khuyt dim cua tm kim ny l mt khi chng ta bt du di xung mt
huong, chng ta s di dn tt cc cc con duong cho dn cui cng.
Mt chin luoc khc cho tm kim l tm kim theo chiu ngang truoc. O dy
chng ta tm kim tu lop ny sang lop khc. Du tin chng ta c gng thuc hin tt ca
cc kim chung o buoc 0 v sau d chng ta c gng thuc hin tt ca cc kim chung
o buoc 1, v.v Uu dim cua tm kim theo chiu ngang l chng ta duoc bao dam
nhn cc kim chung don gian nht truoc khi chng ta dn nhung ci phuc tap hon.
Diu ny duoc dua ra do nhung loi ch cua Ockhams Razor. Nu c mt kim chung
o buoc thu n, chng ta s tm thy n truoc khi chng ta xem xt dn buoc thu n+1.
Khuyt dim cua tm kim theo chiu ngang l chng ta c nhung cy rt su, chng
ra cung c nhung cy rt rm rap m chng ta c hng ngn hay hng chuc ngn cc
nt con. Khuyt dim khc cua tm kim ny l s luong khng gian chng ta phai su

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 88 - 0112398 Tu Thi Ngoc Thanh
dung d luu tt ca cc kt qua muc thu 3 truoc khi chng ta khao st n. Voi tm kim
theo chiu rng, chng ta cng di vo cy cng su th khng gian yu cu cng lon.
V th chng ta nhn ra rng hai trong cc thut giai c din cho tm kim, theo
chiu doc v chiu ngang, s dn dn nhung vn d v cc h thng lon.
C hai lop co ban cua cc giai thut tm kim duoc su dung d c gng giai
quyt cc gioi han v vn d khng hon tt v tnh trang ngc ngu l: khng c du
thng tin v c du thng tin. Cc tm kim khng dy du thng tin, hay khng nhn
thy, th khng c thng tin v s luong cc buoc hay chi ph duong dn tu trang thi
hin tai dn dch. Nhung tm kim kiu ny bao gm: tm theo chiu su (depth-first),
theo chiu rng (breadth-first), chi ph khng di (uniform-cost), gioi han chiu su
(depth-limiting) v tm kim su thm lp di lp lai (iterative deepening). Cc tm
kim dy du thng tin, hay heuristic, c dy du thng tin v dch dn; thng tin ny
thuong l chi ph duong dn uoc luong cho n hay l uoc don s luong cc buoc
xut pht tu n. Thng tin ny duoc bit nhu l heuristic search agent. N cho php
cc tm kim c dy du thng tin thuc hin tt hon nhung tm kim khng du thng tin
v lm cho chng hnh xu trong mt dng ve hon ton l tr. Nhung tm kim ny
bao gm: cc tm kim best-first, hill-climbing, beam, A*, v IDA* (iterative
deepening A*).

3.1.3. Cc Web search agent
Trong khi cc cng cu tm kim l manh v quan trong cho tuong lai cua Web,
th c mt hnh thuc hoat dng khc cua tm kim cung dng vai tr quyt dinh: cc
tram tm kim Web (Web search agent). Mt Web search agent s khng thuc hin
nhu mt cng cu tm kim thuong mai. Cc cng cu tm kim ny su dung co so du
liu tra cuu tu mt co so tri thuc (Knowledge Base).
Trong truong hop cua Web search agent, tu cc trang Web duoc tm kim v
my tnh cung cp mt giao din cho nguoi dng. Cc kt qua tri gic cua agent l cc
ti liu duoc kt ni thng qua Internet su dung HTTP. Cc hoat dng cua agent duoc
dinh nghia nu tm thy dch dn cua vic tm mt trang Web chua mt dim dch

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 89 - 0112398 Tu Thi Ngoc Thanh
duoc chi r (v du nhu tu kho hay cum tu) v nu khng , th tm mt vi tr khc d
ving thm. N hoat dng trong mi truong su dung cc phuong php du ra d cp
nht nguoi dng o trang thi cua tm kim hay cc kt qua kt thc.
Ci g lm cho tr tu cua agent c kha nng ra quyt dinh c l tr khi dua ra
mt chon lua. Ni cch khc, dua ra mt dch dn, chng s ra quyt dinh di theo
nhung hnh dng m dn dn dch trong mt cch dng lc.
Mt agent thuong c th pht sinh ra tt ca cc kt qua c th c cua mt su
kin, nhung sau d n s cn tm kim thng qua nhung kt qua d d tm kim mt
dch dn mong mun v thuc thi duong dn (chui cc buoc) bt du o trang thi ban
du hay trang thi hin tai, d dn trang thi cua dch dn mong mun. Trong truong
hop cua Web search agent thng minh, n s cn su dung mt tm kim d dinh huong
thng qua Web d toi dch cua n.
Vic xy dung mt Web search agent thng minh cn nhung k thut cho tm
kim nhiu v kt hop tu kho, ngn chn handling v kha nng tu nay mm khi n
su dung ht hon ton mt khng gian tm kim. Dua ra mt dim dch, Web search
agent xu l d tm kim thng qua mt s duong dn cn thit. Agent ny s dua vo
tu kho. Phuong php duoc ung h ny l d bt du tu mt vi tr hat ging (do
nguoi dng cung cp) v tm tt ca nhung vi tr khc duoc lin kt trong mt dang cy
dn gc (vi tr hat ging) chua dim dch.
Search agent cn bit dim dch (v du tu kho hay cum tu), noi m bt du, lp
lai bao nhiu ln dim dch d nhn thy s xem bao lu (rng buc thoi gian), v
phuong php g nn duoc dinh nghia tiu chun cho vic chon duong dn (cc phuong
php tm kim). Nhung vn d ny duoc dua ra trong phn mm.
Vic thuc thi cn mt s tri thuc cua lp trnh, lm vic voi sockets, HTTP,
HTML, sp xp, v tm kim.
C nhiu ngn ngu trong nhung thi hnh trn Web, nhung giao din lp trnh
ung dung (APIs) nng cao, v kha nng phn tch vn ban tt hon m c th su dung
d vit mt Web agent.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 90 - 0112398 Tu Thi Ngoc Thanh
Su dung thut giai sp xp nng cao v hiu qua s gip cai thin thuc thi cua
Web search agent.
Thit k Web search agent gm bn giai doan: khoi tao, nhn thuc, hnh dng
v hiu qua. Trong giai don khoi to, Web search agent nn tao lp tt ca cc bin,
cu trc v mang. Cung nn ly thng tin co so cn cho vic chi dao sn tm dim
dch, dch dn, mt vi tr bt du v phuong php tm kim. Giai don nhn thc,
duoc tp trung su dung tri thuc duoc cung cp d tip xc voi mt trang v thu hi
thng tin tu vi tr d. N nn duoc nhn din nu hin din dim dch v nn nhn ra
cc duong dn dn nhung vi tr URL khc. Giai don hnh dng ly tt ca nhung
thng tin m h thng bit v dinh nghia nu dch dn duoc tm thy (dim dch duoc
tm thy v vic sn tm kt thc).
Nu vic sn tm vn cn hoat dng n phai ra quyt dinh di dn noi no tip
theo. Dy l su thng minh cua agent, v phuong php cua tm kim cho bit Web
agent s thng minh bao nhiu. Nu mt lin kt khng tm thy, vic sn tm kt
thc, v n cung cp du ra cho user.
Web search agent di chuyn tu giai doan khoi tao dn mt vng lp bao gm
cc giai doan nhn thuc, hoat dng v hiu qua cho dn khi dat duoc dch dn hay
khng.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 91 - 0112398 Tu Thi Ngoc Thanh

Hnh 15: Dng co so tm kim Web

3.2. Cc buc xy dng mt ng dng semantic search engine:
Mt v du cua cng ngh tm kim ngu nghia l TAP. TAP l mt d n phn
tn gm nhung nh nghin cuu tu Standford, IBM, v W3C. TAP tao dn by cho
cng ngh tu dng v bn tu dng rt ra nhung co so tri thuc tu phn thn khng c

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 92 - 0112398 Tu Thi Ngoc Thanh
cu trc hay bn cu trc cua vn ban. H thng ny c th su dung thng tin vua hoc
d hoc thm thng tin moi, v c th su dung d thu hi thng tin.
Trong TAP, cc ti liu sn c duoc phn tch su dung cng ngh ngu nghia v
chuyn sang thnh cc ti liu Web ngu nghia su dung cng ngh tu dng hay thu
cng voi cc gi tri thuc c cu trc ngy cng su hon. Cng ngh thu hi thng tin
truyn thng duoc nng cao voi tri thuc c cu trc su d cung cp cc kt qua chnh
xc hon. Ca hai php phn tch tu dng v duoc huong dn su dung cc h thng v
cc agent lp lun thng minh.
Cc giai php xy dung nn mt cng ngh trung tm duoc goi l cc Semantic
Web Template. Thuc hin biu din tri thuc, su sng tao, su tiu thu v duy tr cua tri
thuc tro nn trong sut di voi nguoi dng. M hnh du liu RDF l co so cua cng
ngh biu din tri thuc Web ngu nghia v TAP su dung RDF Schema v OWL.
Kh khn cua vic tu tao ra tri thuc yu cu mt my tri thuc c th dng d
dich cc ti liu sang nhung ngn ngu tuong trung v logic duoc yu cu. Cc
ontology su dung vn tu vung chnh cua tri thuc duoc yu cu d dinh nghia cc khi
nim v mi quan h m cc truong hop cua khi nim d nm giu.
3.3.1. Xy dng kin trc Web ng nghia:
Kin trc Web ngu nghia duoc pht trin dua trn tuong cua vic ch thch
cc trang Web bng cc the RDF v OWL d biu din chi tit cc ontology ngu
nghia. Tuy nhin, gioi han cua cc h thng ny l chng chi xu l cc trang Web d
duoc ch thch bng nhung the ngu nghia cu th.
Ontology m ta cc khi nim v mi quan h voi mt tp tu vung tiu biu.
Muc dch cua vic xy dung ontology l chia se v su dung lai tri thuc. Tu khi Web
ngu nghia l mt mang phn tn, c nhung ontology khc nhau m ta nhung diu
tuong duong mt cch ngu nghia. Kt qua l, cn thit d lp so d cc yu t cua
nhung ontology ny nu chng ta mun xu l thng tin trn qui m cua Web. Mt tip
cn cho tm kim ngu nghia c th dua trn vic phn loai vn ban cho nhung nh xa
ontology so snh mi yu t cua mt ontology ny voi mi yu t cua ontology khc,

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 93 - 0112398 Tu Thi Ngoc Thanh
v sau d dinh nghia quan h tuong duong trn mi mt cp co so. Nhung item duoc
lin kt c gi tri tuong duong cua n lon hon mt ngung no d.
3.3.2. Lp chi mc ng nghia tim tng:
By gio chng ta d cp dn vic thuc thi Latent Semantic Indexing (LSI lp
chi muc ngu nghia tim tng) c th cai tin nhung kha nng tm kim ngy nay m
khng c nhung gioi han nghim trong cua mang Web ngu nghia rng lon.
Vic dua vo tiu chun cua d chnh xc, phm cht v su thu hi di hoi
nhiu hon suc manh co bp. Gn cc cng cu m ta v phn loai cho vn ban cung
cp mt thun loi quan trong, bng cch tra v cc ti liu khng cn chua lin kt
theo tung chu mt cho truy vn tm kim cua chng ta. Cc b du liu duoc m ta dy
du c th cung cp mt buc tranh v pham vi v su phn tn cua b suu tp ti liu ni
chung. Diu ny c th duoc thuc hin boi vic nghin cuu cu trc cua cc danh muc
v cc danh muc con (duoc goi l su phn loai_ taxonomy).
Mt tro ngai nghim trong cho su tip cn dn vic phn loai du liu ny l vn
d vn c trong bt cu kiu cua taxonomy trn th gioi di khi chng lai su phn
loai. V du, c chua l tri cy hay rau qua?
V diu g xay ra khi chng ta kt ni hai tp ti liu duoc chi muc trong nhung
huong khc nhau? Cc giai php duoc goi l cc ontology taxonomy (phn loai
ontology).
Cc tm kim tu kho thng thuong tip cn mt tp ti liu m mt ti liu
chua hay khng chua mt tu dua ra.
Chi muc ngu nghia tim tng (LSI) thm mt buoc quan trong cho vic xu l
chi muc ti liu. Thm vo vic ghi nhung tu kho m mt ti liu chua, phuong php
ny khao st ton b tp du liu, d thy nhung ti liu khc chua mt s tu tuong
duong voi cc tu d. LSI duoc pht trin du tin o Bellcore trong cui nhung nm 80.
LSI xem cc ti liu c nhiu tu thng dung l c nghia, v xem nhung ti liu t tu
thng dung l c t ngu nghia. Mc d thut giai LSI khng hiu t g v nghia cua cc
tu, n nhn ra cc khun mu.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 94 - 0112398 Tu Thi Ngoc Thanh
Khi ban tm kim mt co so du liu chi muc LSI, cng cu tm kim ny xem
xt nhung gi tri tuong tu m n tnh ton cho mi tu cua ni dung, v tra v cc ti
liu m n nghi l thch hop nht voi cu truy vn. Boi v hai ti liu c th rt gn
nghia voi nhau thm ch nu chng khng cng chung mt tu kho dc bit, LSI
khng yu cu mt su phn tch ly tuong xung d tra v cc kt qua huu dung. O
nhung vi tr m mt tm kim theo tu kho don gian s khng thuc hin duoc nu
khng c phn tch ly tuong xung, th LSI s thuong tra v nhung ti liu lin quan
m khng chua tt ca nhung tu kho d.

3.3.2.1. Tm kim ly ni dung
Vic lp chi muc ngu nghia tim tng xem xt cc mu tu trong mt tp ti liu.
Ngn ngu tu nhin c nhiu nhung tu khng cn thit, v khng phai mi tu xut hin
trong ti liu du chua ngu nghia. Cc tu duoc su dung thuong xuyn trong tin Anh
thuong khng chua ni dung, v du nhu cc tu chuc nng, lin tu, gioi tu, v cc dng
tu thuong. Buoc du tin trong vic thuc thi LSI l chon loc nhung tu xa la tu mt ti
liu. D thu duoc ni dung ngu nghia tu mt ti liu:
1. Tao mt danh sch hon chinh tt ca cc tu xut hin trong b suu tp.
2. Luoc bo cc mao tu, cc gioi tu, v cc lin tu
3. Luoc bo cc dng tu thng dung (know, see, do, be)
4. Luoc bo cc dai tu
5. Luoc bo cc tnh tu thng dung (big, late, high)
6. Luoc bo cc tu frilly (therefore, thus, however, albeit,)
7. Luoc bo mt s tu xut hin trong moi ti liu.
8. Luoc bo cc tu xut hin chi trong mt ti liu.

3.3.2.2. Stemming (lemmatize)
Cng cu tm kim ngu nghia l mt giai php hiu qua dng ch . N c th
pht hin duoc 2 ti liu tuong tu nhau thm ch nu chng khng c bt ky mt tu

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 95 - 0112398 Tu Thi Ngoc Thanh
no chung v cng cu tm kim ngu nghia ny c th loai bo nhung ti liu chi dng
chung nhung tu quan tm mt cch ph bin.
Mt s cng vic khoi du cn thit d thu thp ti liu sn sng cho vic lp
chi muc th rt dc trung ngn ngu, chng han nhu stemming (lemmatize). Di voi
cc ti liu ting Anh, chng ta su dung thut ton duoc goi l The Porter Stemmer
d khu cc phn dui thng thuong cua tu, d tra v dang gc cua n. (V du: writing
write, writes write, ).
Vic du tin l p dung di voi cc ti liu ring bit, v chng ta gn cho n
mt trong s cuc b. Cc tu xut hin nhiu ln trong mt ti liu th c trong s lon
hon nhung tu chi xut hin 1 ln.
Chng ta dua ra mt giai thut tao ra trang web cua cc ti liu v cc tu lin
kt tt ca cc ti liu voi cc tu. Cho mt m hnh cc tu v cc ti liu, mt nguoi c
th thit lp cc gi tri dua trn su khc bit cua ti liu so voi cc ti liu khc. Gi
tri cua mt ti liu bt ky so voi cc ti liu khc c th duoc thit k nhu l mt hm
cua s luong cc kt ni m phai duoc thng qua d thit lp mt kt ni giua cc ti
liu. Nu 2 ti liu duoc lin kt voi nhau boi nhiu duong di (duong kt ni) th hai
ti liu ny c th c cng mt muc d tuong quan.
Trong s cua tu l s chun ho cua 2 t c nghia thng thung:
- Cc tu xut hin nhiu ln trong mt ti liu th c nhiu ngu nghia hon tu
chi xut hin mt ln.
- Nhung tu duoc su dung thuong xuyn th c th dng quan tm hon nhung
tu bnh thuong.

M ta giai thut:
Voi mi ti liu:
1. Stem (luoc bo tin t v hu t) tt c cc tu v bo di nhung tu c
nghia thuong xuyn xut hin.
2. Di voi mi tu:
a. Dnh du lai mi ti liu m c mi quan h truc tip dn tu ny.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 96 - 0112398 Tu Thi Ngoc Thanh
b. Tnh dim cho mi ti liu dua trn hm tnh khoang cch tu ti
liu xut pht dn cc mi quan h cua tu.
3. Voi mi ti liu c mi quan h moi chua duoc dnh du th tin hnh
luu vt.
Lp lai cc thao tc nhu trn mt cch d qui.
Giai thut tnh trong s chi tit duoc su dung nhu sau:
1. Di voi mi ln tng khoang cch, chia dim s cho 2.
2. Dim s cho mi ti liu bng voi gi tri gioi han chia cho cn bc hai
tnh ph bin cua tu.
Ton b thut giai ny dua ra mt ci nhn ngu nghia thp dua vo duong di tu
mt ti liu dn so d tu.
Chun duoc trnh by o dy l truong hop don gian nht v n c th duoc cai
tin theo nhiu cch khc nhau. C nhiu giai thut tnh dim khc c th duoc su
dung. Thm vo d, mt tu din dng nghia c th duoc p dung d gip khc phuc
cc vn d ngu nghia.
Mt thu thch dng quan tm l lm cho giai thut lm vic d m khi cc ti
liu moi duoc thm vo chng s lp tuc tu tnh dim. Mt thch thuc khc l tm ra
mt cch m c th dua giai thut dn nhiu my.

3.3. M hnh d ngh| cho ng dng tm kim ng nghia trn linh vc
eDoc
Tu nhung co so l thuyt d nghin cuu trn, chng em tng hop lai v d nghi
m hnh cho ung dung tm kim ngu nghia trong linh vuc eDoc.


D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 97 - 0112398 Tu Thi Ngoc Thanh

Hnh 16: M hnh d ngh| cho ng dng tm kim ng nghia trn linh vc eDoc

Web Browser:
Dng vai tr giao din giao tip voi nguoi dng. N thuc hin vai tr tip nhn
cu truy vn cua nguoi dng v hin thi kt qua cu truy vn.

Search engine:
Dy l phn chnh cua chuong trnh. Search engine thuc hin tt ca cc thao tc
xu l cn c cua h thng:
Dng vai tr nhu web robot, thu thp ti liu din tu trn mang.
Search engine
Web Browser

Corpora
Ontology
Metadata
eDoc


D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 98 - 0112398 Tu Thi Ngoc Thanh
Thuc hin nhu b loc, search engine tin hnh thu thp, xu l, rt trch siu
du liu cho cc ti liu bng cch phn tch tu, luoc bo nhung tu khng cn
thit chi giu lai danh sch cc danh tu v dng tu, sau d tin hnh thng k
tn s xut hin cua cc linh vuc trong ti liu v cui cng luu tru siu du liu
cho ni dung cua ti liu d, su dung chun siu du liu Dublin Core.
T chuc v luu tru cc Ontology cho mi quan h ngu nghia giua cc di
tuong trong thuc t. Hnh thuc t chuc, luu tru dang tp tin RDF.
T chuc v luu tru cc kho ngu liu (corpora). Dy cung duoc xem l mt
Ontology, biu din mi quan h thnh phn_b phn cua di tuong, dng thoi
kho ngu liu cung cho php xc dinh cc tu dng nghia voi nhau dua vo khi
nim synset. (Chi tit v cc kho ngu liu duoc m ta bn duoi). Su dung hnh
thuc luu tru bang trong SQL Server v du liu ny c nhu cu truy vn cao.
Thit k siu du liu d m ta mi quan h giua cc ti nguyn (cc ti liu
eDoc) voi cc di tuong trong Ontology. Cung su dung hnh thuc luu tru dang
co so du liu quan h.
Thuc hin phn tch cu truy vn cua nguoi dng, ly nhung tu quan trong,
tu d phn tch ngu nghia cua cu truy vn dua vo Word Net v cc Ontology
dng thoi truy vn cc siu du liu d tra v cho Web Browser cc ti liu dng
voi ngu nghia cu truy vn cua nguoi dng.

eDoc
Chi tt ca cc ti liu din tu trn mang, cu th l cc file dang HTML, PDF,
CHM, ASP, PHP







D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 99 - 0112398 Tu Thi Ngoc Thanh
Qui trnh xu l cua tng search engine:


Hnh 17: Qui trnh xu l cua tng search engine
Internet
Cu truy vn Ti liu tra v
Ti liu
eDoc
Thng tin ti
liu
Metadata
Nhn cu
truy vn
Xu l truy
vn
Hin thi
kt qua
Thu thp
ti liu
Xu l ti
liu
Luu vo co
so du liu
Ontology
Ontology

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 100 - 0112398 Tu Thi Ngoc Thanh
3.4. Cc giai thut su dng
3.4.1. Giai thut xu l ti liu:
Ti liu sau khi duoc thu thp v s duoc xu l thng qua b loc. So d giai
thut:

Hnh 18: Giai thut xu l ti liu:

eDoc
Cc key word
v thng tin
ti liu
Danh t v dng
t
text
Danh t v dng
t nguyn mu
luc bo nhng t
khng quan trng
chuyn sang
text
thng k tn s xut
hin cua t v linh
vc cua ti liu
lemmatize
Kho
ng liu
Kho
ng liu

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 101 - 0112398 Tu Thi Ngoc Thanh
Giai thut cho buc lemmatize:
Kho ngu liu su dung cho vic stemming l WORDNET v s luong tu
trong kho ngu liu l kh lon (voi trn 100 000 danh tu v 11 000 dng tu), cc
tu su dung o dang nguyn mu. Ngoi ra trong tu din cua WORDNET c file
noun.exc v verb.exc, dy l hai file d chuyn cc danh tu dang s nhiu
bt qui tc sang s t v chuyn cc dng tu qu khu v tip din dang bt qui
tc v nguyn mu.
Cc buoc stemming don gian:
B1: Kim tra tung tu, nu tu ny c trong noun.exc hay verb.exc th ly
dang nguyn mu cua n.

B2: Nu khng c th:
Nu tu ny kt thc bng s th: tin hnh bo s theo lut.
Nu tu kt thc bng ss, chs, shs, xs, is, zs th
dy khng phai l s nhiu.
Nu tu kt thc l s th dy l dang so huu cch nn bo hai
k tu ny.
Bo k tu s o cui tu.
Kim tra trong kho ngu liu danh tu v dng tu, nu c tu ny
th dy l tu nguyn mu.
Nu khng c (nghia l tu ny chua o dang nguyn mu) th:
o Nu tu kt thc bng se, che, she, xe, ze th bo
k tu e sau cng.
o nu tu kt thc bng ie th bo ie thm y.

Nu tu ny kt thuc bng ed th:
Bo ed.
Kim tra trong kho ngu liu dng tu, nu c th dy l dang
nguyn mu.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 102 - 0112398 Tu Thi Ngoc Thanh
Nu khng c th:
o Nu tu c hai k tu cui ging nhau th bo mt k tu cui.
o Nu tu kt thc bng i th thay bng y.
o Cn cc truong hop cn lai th thm vo cui k tu e.

Nu tu ny kt thuc bng ing th:
Bo ing.
Kim tra trong kho ngu liu dng tu, nu c th dy l dang
nguyn mu.
Nu khng c th:
o Nu tu c hai k tu cui ging nhau th bo mt k tu cui.
o Nu tu kt thc bng y th thay y bng ie.
o Cn cc truong hop cn lai th thm vo cui k tu e.
3.4.2. Giai thut rt trch siu d liu:
Sau khi d xu l ti liu d ly cc thng tin v ti liu, chuong trnh xy dung
metadata d m ta ti liu d. Metadata su dung chun Dublin Core d m ta v dua
v luu tru dang RDF.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 103 - 0112398 Tu Thi Ngoc Thanh

Hnh 19: Giai thut rt trch siu d liu
Su dung cc tag chnh:
- title: m ta tn ti liu
- identifier: m ta URI cua ti liu
- language: ngn ngu ti liu
- description: m ta thng tin ti liu
- subject: cc tu kho cho ti liu (mt s trang HTML c the
meta ny, kt hop voi mt s tu thng k duoc trong ni dung
ti liu).
Cc key word
v thng tin
ti liu
tiu
d
Tc
gia
Key
word
d|a chi
ti liu
DC:
Title
DC:
Creator
DC:
Subject
DC:
Description
DC:
Language
File RDF
Ngn
ng

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 104 - 0112398 Tu Thi Ngoc Thanh
Ni dung cua cc tag ny chu yu duoc ly trong phn HEAD cua file
HTML. Tru tag identifier v subject duoc thm vo tu thng tin nhn din ti
nguyn cua robot v thng tin thng k key word.

3.4.3. Giai thut phn loi linh vc cho ti liu:
Mt ti liu, sau khi duoc rt trch thng tin o phn header, s duoc xu l ni
dung d phn loai linh vuc cho n. Cc linh vuc duoc dua ra d phn loai chnh l
nhung lop con (subclass) trong ontology. V hnh thuc phn loai l su dung mt tp
cc tu ung voi mi lop con bao gm cc tu dng nghia v cc tu chi tit hon cua lop
con d, goi l cc tu chuyn ngnh. Vic xy dung tu din cc tu ny dua vo kho ngu
liu WordNet v Tropes (cng cu phn loai vn ban).
V du, trong linh vuc khoa hoc my tnh th c nhung lop con nhu my tnh
(computer), lp trnh (programming). V lop con my tnh (computer) lai chua
cc tu ring cua n nhu: computing machine, hardware, CPU.
Cc buoc phn loai linh vuc:
B1: Dua vo danh sch cc tu chuyn ngnh, tm trong ti liu v dm s ln
xut hin cua n, con s ny duoc xem nhu l trong s cua tu trong ti liu.
B2: Cng cc trong s cua tu trong tung lop con d tnh trong s cho mi lop
con.
B3: Lop con no c trong s cao nht th duoc xem l lop ti uu v ti liu s
duoc xp vo lop con d.
V mi quan h giua ti liu voi cc lop con s duoc luu tru theo dang chi muc
Doc_Onto.
3.4.4. Giai thut xu l cu truy vn:
Cc buoc phn tch linh vuc cua cu truy vn cung duoc thuc hin tuong tu nhu
giai thut phn loai linh vuc cho ti liu. Tu vic phn tch d, nhung ti liu thuc
linh vuc ti uu cua cu truy vn s duoc dua ra v xem nhu d l kt qua tra v cho
nguoi dng.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 105 - 0112398 Tu Thi Ngoc Thanh

Chuong 4 : CHUONG TRNH UNG DJNG
4.1. Gii thiu chuong trnh ng dng:
Trong chuong ny, chng em xy dung mt cng cu tm kim d minh hoa cho
vic tm kim Web trn Internet c kt hop voi ngu nghia. M hnh xy dung duoc
hin thuc dua trn co so p dung v pht trin cc m hnh Web ngu nghia m chng
em d trnh by trong cc chuong truoc.
Chuong trnh ung dung s thuc hin vic tm kim ngu nghia thng qua cc
cng ngh Web ngu nghia hin c v cc giai php m chng em d d xut:
- Chuong trnh c su dung cng cu RDF Gateway.
- Thi hnh trn I.E5.
- Chuong trnh c su dung cng cu RDF editor.
4.2. Kin trc cua ng dng:
D thit k cng cu tm kim ngu nghia ung dung trn eDoc, chng em d xut
mt kin trc m hnh h tro vic tm kim trn Internet v Intranet gm cc cng
doan sau:
Cng don 1: Thit k ontology.
Cc Ontology thuong luu duoi dang tp tin c dui: .rdf, .rdfs, .owl, .daml,
.xml, .
Ontology m ta mi quan h giua cc di tuong trong thuc t. Ontology do cc
chuyn gia v cc linh vuc d duoc tao sn, d sn trn Internet. Dc tnh cua cc
Ontology ny l cho php moi nguoi c th chia se, tao, doc v ghi trn n. Do d,
chng ta c th pht trin Ontology theo mun.
Cc Ontology cung duoc tao tu nhung tp tin cu trc dang: HTML, RDF,
Image, Excel, WinWord, SQL Server, Oracle, . Cc Ontology ny s duoc tao ra
thng qua mt cng cu soan thao, sau d chng s duoc luu duoi dang tp tin c dui:
.rdf, .rdfs, .owl, .daml, .

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 106 - 0112398 Tu Thi Ngoc Thanh
Cc cng cu c th dng d soan thao Ontology l:
- Su dung HTML Parser.
- Protg
- RDF Editor
- .
Cng don 2: Xy dng ng dng.
Cc buoc chnh trong qu trnh xy dung ung dung:
o Buoc 1: Dng cc phn mm nhu Crawlers, Spiders, dng vai
tr l cc robot thu thp thng tin trn internet, cung nhu l d thu thp
cc Ontology tu trn internet.
o Buoc 2: Dng tin ch RDF Query Analyzer trong phn mm
RDF Gateway d dua cc file Ontology( thu duoc o Buoc 1 ) vo co so
du liu cua RDF Gateway.
o Buoc 3: Xy dung ung dung:
Tin hnh phn loai Ontology (d thu duoc) theo nhung
linh vuc cn tm.
Ti liu sau khi d thu thp (o Buoc 1), tin hnh rt trch
siu du liu voi cc thnh phn quan tm: title, author,
keyword, subject, description, . Ri phn loai ti liu
theo linh vuc.
Siu du liu rt trch duoc s duoc dua xung co so du
liu SQL Server. Dng thoi cung xy dung mi quan h
giua cc di tuong trong Ontology voi siu du liu rt
trch.
Voi truy vn nguoi dng nhp vo, vo co so du liu tin
hnh truy vn v tra ra kt qua cho nguoi dng.


D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 107 - 0112398 Tu Thi Ngoc Thanh
4.3. M ta phm vi ng dng
4.3.1. M ta bi ton:

Trong ung dung ny, chng em tch hop cc Ontology (ly tu internet) vo mt
thu muc o my cuc b d tin cho vic minh hoa ung dung. Tuy nhin, ta c cung c
th ly cc ontology ny truc tip tu internet. Cc ontology duoc luu vo localhost:
http://localhost/eDocSearch/Library/RDF/
O dy chi su dung nhung ontology cho tung linh vuc nht dinh, nu mt linh
vuc c nhiu ontology hoc mt ontology ung dung cho nhiu linh vuc th ta phai tin
hnh phn loai ontology theo linh vuc ( dy l huong mo rng cua lun vn).
Ung dung duoc xy dung nhm minh hoa cho vic tm kim ngu nghia trn
linh vuc edoc, pham vi ung dung gioi han trong linh vuc nhu sau:
Khoa hoc my tnh (computer scient).
Ngh thut (art) .
4.3.2. Xc d|nh yu cu:
Yu cu luu tr:
Luu thng tin ngu nghia cn tm ( cc di tuong) tu cc ontology vo
trong CSDL, thng tin m ta cc thut ngu tuong duong h tro cho vic
tm kim.
Yu cu tra cu:
Tm kim cc ti liu lin quan dn thut ngu m nguoi dng g vo.
Tnh hiu qua:
Kt qua tm kim phai ph hop, chnh xc, nhanh chng theo cng ngh
Semantic Web.
Tnh tin ho:
Cc ti liu h tro nhiu ti liu hon, nhiu linh vuc hon, .
Tnh tuong thch:
Nguoi dng chi cn mt trnh duyt web v kt ni duoc dn server.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 108 - 0112398 Tu Thi Ngoc Thanh
Tnh tin dng:
Giao din thn thin, d su dung, nguoi dng chi cn g vo mt thut
ngu cn tm kim ri nhn vo nt Search.
Tnh bao mt:
Nguoi dng chi xem duoc kt qua tra cuu duoi dang tinh (htm/html).
Tnh d bao tr:
D dng pht trin hay thm cc ontology thun loi.
4.4. Xy dng ng dng:
4.4.1. Thit k d liu:
Du liu duoc luu tru trong SQL Server 2000. Bao gm cc bang:


Hnh 20: So d d liu quan h cua ng dng






D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 109 - 0112398 Tu Thi Ngoc Thanh
Tn bang

Cc trung M ta
DOCUMENTS DocID varchar(12)
Title text
Descript text
URI varchar(200)
Author varchar(200)
Datacreate varchar(12)
Keywords text
Version varchar(50)
ScenID char(3)
Bang luu tru thng tin
cua cc ti liu cng voi
linh vuc m ti liu d
thuc v.
ONTOLOGIES OntoID varchar(12)
Word varchar(50)
ScenID char(3)
Bang luu tru thng tin
cc dn tuong cua
ontology.
DOC_ONTO DocID varchar(12)
OntoID varchar(12)
Mi quan h giua ti
liu v cc di tuong
cua ontology
WORDS WordID varchar(10)
Word varchar(50)
ScenID char(3)
C th xem dy l danh
sch cc tu c th c
trong mt linh vuc.
WORD_ONTO WordID varchar(10)
OntoID varchar(12)
Cc tu tham chiu dn
mt di tuong cua
Ontology
STATISTIC OntoID varchar(12)
NumWords int
ScenID char(3)
Dy l bang tam dng d
luu tru s tu tm thy
trong ti liu ung voi
mt di tuong trong
Ontology. Bang ny su
dung d phn loai ti

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 110 - 0112398 Tu Thi Ngoc Thanh
liu theo mt linh vuc.
WORD_TEMP Word varchar(50)
Numwords int
Dy cung l mt bang
tam nhm luu cc tu c
trong ti liu ung d sau
ny ly cc key word
cho ti liu.
Bang 6 M ta co so d liu cho ng dng
D}c bit bang Ontology duoc xy dung tu nhung ti liu RDF. Su dung RDF
gateway d truy vn v cache du liu vo bang ny gip tm kim nhanh chng
v d dng hon.
4.4.2. Thit k xu l:

Chuong trnh su dung ngn ngu lp trnh C# kt hop voi ASP.NET.
Su dung SQL Server 2000 d luu tru du liu.

Chuong trnh c 2 module:
STT Module nghia
1 eDocSearch Thuc hin giao tip voi nguoi dng, tip
nhn cu truy vn, xu l cu truy vn, v
hin thi kt qua cho nguoi dng.
2 eDocSearchAdministrator Quan l co so du liu cc tu, cc
ontology, cc ti liu.
Thu thp ti liu tu Internet, v xu l ti
liu.

Bang 7 Cc module cua chuong trnh




D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 111 - 0112398 Tu Thi Ngoc Thanh

Cc lop di tuong cho tung module:
Module eDocSearch:
STT Lp di tung nghia
1 UserQuery.cs C trch nhim xu l cu truy vn cua nguoi
dng, v tra ra kt qua cho cu truy vn.
Bang 8 Module eDocSearch

Module eDocSearchAdministrator:
STT Lp di tung nghia
1 Database.cs Thuc hin kt ni co so du liu SQL server v
RDF gateway.
2 Spider.cs Thu thp ti liu tu Internet
3 DocumentProcess.cs Quan l co so du liu ti liu ( rt trch
metadata cho ti liu, phn loai linh vuc cho ti
liu).
4 TextProcess.cs C trch nhim xu l vn ban (luoc bo cc tu
khng quan trong, thuc hin lemmatize)
5 Word_database.cs Quan l co so du liu cc tu chuyn ngnh cho
tung linh vuc.
5 ManageOntology.cs Quan l co so du liu Ontology
6 DatabaseProcess.cs Xu l Ontology, chuyn tu dang luu tru RDF
sang co so du liu quan h SQL server.
Bang 9 Module eDocSearch


D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 112 - 0112398 Tu Thi Ngoc Thanh
4.5. Kt qua chuong trnh
Ti liu cho vic tm kim thu nghim duoc download v v luu trong my chu
o thu muc http://localhost/eDocSearch/DataTest/. S luong ti liu khoang 500 ti liu
cho ca hai linh vuc.
Mi truong ung dung: My Celeron, 256 MB RAM, 1.2 GB, hdh Windows XP.
Thoi gian xu l vn ban ~ 2s/ti liu
Thoi gian xu l truy vn nhanh.
Phn loai vn ban theo linh vuc: 91%

Chuong trnh cho php nguoi dng truy vn nhung vn d quan tm bng ngn
ngu tu nhin.
Giao din chnh cua chuong trnh:

Hnh 21: Giao din chnh cua ng dng

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 113 - 0112398 Tu Thi Ngoc Thanh

Hnh 22: Giao din kt qua tm kim cua ng dng

Giao din quan l ti nguyn:















Hnh 23: Giao din quan l ti nguyn

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 114 - 0112398 Tu Thi Ngoc Thanh
4.6. Thc nghim chuong trnh
Danh sch cc cu truy vn thu nghim chuong trnh:
STT T truy vn S ti liu
tra v
S ti liu khng
dng ni dung
1 Programming 14 3
2 Oop 10 1
3 Asp 10 1
4 Assembly 9 2
5 Java 12 3
6 Visual basic 3 0
7 C# 10 1
8 Data 7 3
9 Database 76 33
10 Metadata 32 14
11 Register 0 0
12 Security 5 1
13 Computer science 63 25
14 Computing 47 17
15 Algorithm 45 9
16 Machine
translation
52 17
17 Computer vision 62 27
18 Internet 46 6
19 www 43 18
20 Site 43 18
21 Server 57 22
22 Computer 29 24
23 Hardware 11 7

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 115 - 0112398 Tu Thi Ngoc Thanh
24 Information
processing
9 7
25 Natural language
processing
10 8
26 Sofrware 12 6
27 Freeware 7 2
28 Shareware 7 2
29 Virus 6 0
30 Norton antivirus 5 0
31 Graphic 5 3
32 Picture 9 7
33 Artwork 15 7
34 Art school 100 90
35 Artist 12 3
36 Gallery 19 17
37 Museum 19 8
38 Clip art 100 90
39 Painting 36 27
40 Landscape 11 6
41 Portrait 10 7
Bang 10 Cc cu truy vn thu nghim


Kt qua thng k truy vn theo tng linh vc:
Cng thuc thng k:
D chnh xc cua linh vuc = trung bnh cng(phn trm chnh xc cua tung tu
trong linh vuc d).



D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 116 - 0112398 Tu Thi Ngoc Thanh

Computer & information science:

STT Tn linh vc D chnh xc
1 Programming 87%
2 Data 57%
3 Security 93%
4 Computer science 65%
5 Internet 67%
6 Computer 26%
7 Information science 21%
8 Software 64%
9 Virus 100%
Bang 11 Thng k linh vc khoa hc my tnh


Art:
STT Tn linh vc D chnh xc
1 Art and artwork 10%
2 Artist 75%
3 Gallery 11%
4 Museum 58%
5 Art school 10%
6 Painting 25%
8 Music 70%
9 Music style 65%
Bang 12 Thng k linh vc ngh thut.




D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 117 - 0112398 Tu Thi Ngoc Thanh
Nhn xt:
- Ung dung chi xy dung trn hai linh vuc l ngh thut v khoa hoc my
tnh nn moi ti liu dua vo du duoc phn vo mt trong hai linh vuc
ny do d lm giam di d chnh xc.
- S ti liu tra v cho mi tu trong cng mt lop con trong ontology l
khng bng nhau do phuong php xu l cu truy vn l: ly nhung ti
liu trong cng lop con cua ontology v dng thoi ly nhung ti liu c
tu kho c trong voi tu kho cua cu truy vn.
- D chnh xc trong vic phn loai ti liu theo tung lop con chua cao do
cc lop con trong ontology thit k chua dy du, chua bao hm ht cc
khi nim trong mt linh vuc v s tu trong mt linh vuc chua nhiu v
dy du.
- Mt khc, d chnh xc trong vic phn loai cua ti liu cn bi anh
huong do s luong tu cua ni dung trong ti liu t (ti liu chi chua da
s l cc hyperlink v cc hnh anh).
- Linh vuc ngh thut c d chnh xc thp do cc tu trong mi lop con
cua ontology khng duoc phn bit r rng, mt tu c th nm o nhiu
lop v s luong tu t.
Tm lai, chuong trnh ung dung dat hiu qua tt trong vic phn loai ti liu
theo linh vuc lon, cn di voi tung lop con trong mi linh vuc th hiu qua chua cao.
Nguoi quan tri c th nng cao hiu qua cua chuong trnh bng cch xy dung tt ca
cc linh vuc trong thuc t, b sung cc tu trong tung lop con cua mi linh vuc theo xu
huong cng nhiu tu dc trung cho lop cng tt (muc c lp giua cc lop cng cao).

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 118 - 0112398 Tu Thi Ngoc Thanh
Chuong 5 : KET LUAN
5.1. Dnh gi kt qua nghin cu
5.1.1. Uu dim
V co ban lun vn d thuc hin tt cc ni dung d ra v dat duoc mt s kt
qua nht dinh :
o Lun vn d trnh by co so l thuyt v nguyn l vn hnh cung
nhu uu v khuyt dim cua mt h thng search engine.
o Lun vn trnh by r m hnh Web ngu nghia cng voi cc di
tuong cua n nhu RDF, OWL,
o Trnh by cc vn d v ngu nghia cung nhu cc huong giai quyt
trong vic xu l ngn ngu tu nhin nhm gip my tnh hiu duoc
cu hoi cua nguoi dng.
o Tu nhung co so nghin cuu l thuyt, lun vn d d ra m hnh cho
vic xy dung cng cu tm kim ngu nghia, v thuc hin ci dt mt
cng cu tm kim cc ti liu din tu ph hop voi ngu nghia cua cu
truy vn cua nguoi dng.
o Lun vn c th xc dinh tuong di chnh xc linh vuc m ti liu
thuc v. V phn no xc dinh duoc linh vuc cua cu truy vn cua
nguoi dng.
nghia thuc tin:
Tm hiu m hnh, nm vung cng ngh tm kim ngu nghia d
p dung chi ting Vit.

nghia khoa hoc:
Dy l cng cu phuc vu cho nhu cu phn loai vn ban, phn loai
ti liu hoc tp.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 119 - 0112398 Tu Thi Ngoc Thanh
5.1.2. Khuyt dim:

Tuy nhin, do vn d v ngu nghia l mt vn d phuc tap v rng lon nn lun
vn chi d ra mt s huong nghin cuu hin nay o mt s linh vuc huu han, khng th
bao hm ht duoc cc khi nim cung nhu ngn ngu cua con nguoi.
Nhung vn d duoc d xut trong lun vn nhm muc dch dua ra mt huong
giai quyt mang tnh cht tham khao nn c th s c nhiu dim chua ti uu, cn
duoc hon thin hon.
Trong chuong trnh ung dung, lun vn su dung co so du liu cc tu dc trung
cho tu linh vuc, co so du liu ny duoc xy dung chu yu dua vo WordNet, song vn
cn han ch v s luong cc tu ring cho tung chuyn ngnh. Nu cu truy vn cua
nguoi dng hoi v nhung tu khng nm trong co so du liu th c th s khng tm
thy kt qua. V vic phn loai cc tu linh vuc mang tnh chu quan nn c th chua ti
uu.
Vic phn loai ti liu theo linh vuc tuong di tt do c s luong tu kh nhiu
nhung vic phn loai cu truy vn cua nguoi dng, su dung mt luong tu rt t nn c
mt s cu truy vn khng c kt qua tra v.
Ngoi ra, lun vn chi su dung co so du liu cc ti liu luu sn v trn my chu nn
s luong cc ti liu chua lon.

5.2. Hung pht trin
Chuong trnh ung dung cua lun vn duoc xy dung dua trn nhung vn d co
ban, song n c th pht trin d ngy cng hon thin v ti uu hon. Nhung huong
pht trin cua lun vn:
- Mo rng tm kim trong tt ca cc linh vuc.
- Tm kim trn nhiu ontology, phn loai ontology.
- Thuc su tm kim online.
- Ung dung cho Ting Vit.


D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 120 - 0112398 Tu Thi Ngoc Thanh

TI LIJU THAM KHAO
I. Lun vn, lun n:
[I.1] Dng Thi Quynh Chi. Lun vn thac si tin hoc. Nghin cu v m hnh,
khm ph v khai thc cc mi quan h trn web ng nghia, xy dng ng
dng. Nguoi huong dn khoa hoc: Nguyn Tin Dung.
[I.2] L Thu Ngoc, D M Nhung. Lun vn cu nhn tin hoc. Tm hiu v
Search Engine v xy dng ng dng minh ho cho Search Engine ting
Vit. GVHD: Nguyn Thi Dim Tin.

II. Sch, eBooks:

[II.1] Ying Ding, Dieter Fensel, Michel Klein, and Borys Omelayenko. The
Semantic Web: Yet another Hip?. Data and knowedgle engineering, 2002.
[II.2] Eero Hyvonen. Semantic web Kick off in Finland vision, Technologies,
Research, and Applications; May 19, 2002 .
[II.3] Dinh Din, Gio trnh Xu L Ngn Ngu Tu Nhin, thng 12/2004.
[II.4] Dr. V. Richard Benjamins, Jess Contreras; Six challenges for the
semantic web; April 2002.
[II.5] Nicola Guarino; Some Ontological Principles for Designing Upper
Level Lexical Resources; 28 30 May 1998.
[II.6] Urvi Shah, Tim Finin, Anupam Joshi, R. Scott Cost, James Mayfield;
Information Retrieval on the Semantic Web
*
.
[II.7] Luke K. McDowell; Meaning for the Masses: Theory and Applications
for Semantic Web and Semantic Email Systems; 2004.
[II.8] Gareth Osler;The Semantic Web Through Semantic Data A Four Tier
Architecture Model ; 4 Mar 2005.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 121 - 0112398 Tu Thi Ngoc Thanh
[II.9] Julius Stuller; Network of Excellence Semantic Web; 7 June 2002.
[II.10] Peter Dolog and Wolfgang Nejdl; Challenges and Benefits of the
Semantic Web for User Modelling.
[II.11] Pang Wang; A Search Engine Based on the Semantic Web; May, 2003.
[II.12] Karen Sparck Jones; Whats new about the Semantic Web? Some
questions; December 2004, 18 23.
[II.13] Mark Klein, Abraham Bernstein; Searching for Services on the
Semantic Web Using Process Ontology; July 30 August 1, 2001.
[II.14] Michael Sintek, Stefan Decker; TRIPLE A Query Language for the
Semantic Web; November 2 2001.
[II.15] Stefan Decker, Vipul Kashyap; The Semantic Web: Semantics for Data
on the Web; September 10 2003.
[II.16] Catherine C. Marshall; Taking a Stand on the Semantic Web; 2003.
[II.17] Eric Miller, Ralph Swick; Semantic Web Activity: Adcanced
Development; 07/09/2003.
[II.18] Tim Berners Lee; Semantic Web Road map; 10/14/1998.
[II.19] Raul Corazzon; Ontology. A resource guide for philosophers;
06/01/2005.
[II.20] John F.Sowa; Guided Tour of Ontology; June 03 2005.
[II.21] John F. Sowa; Building, Sharing, and Merging Ontologies; June 03
2005.
[II.22] ISO; Information and documentation The Dublin Core metadata
element set; 02/26/2003.
[II.23] IEEE; Draft Standard for Learning Object Metadata; 15 July 2002.
[II.24] Shigeo SUGIMOTO, Jun ADACHI, Stuart WEIBEL; 68
th
IFLA
Council and General Conference; August 24 2002.
[II.25] Stiching SURF; DARE use of Dublin Core, version 2.0; December
2004.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 122 - 0112398 Tu Thi Ngoc Thanh
[II.26] CEN/ISSS MII DC (WI3) Report; Guidance for the Deployment of
Dublin Core Metadata in Corporate Environments; 8/20/2004 DRAFT.
[II.27] Kazuhiko Asou, Takako Nakahara, Takao Namiki; A report on Dublin
Core based research information service on mathematics; 10/26/2001.
[II.28] Western States Digital Standards Group, Metadata Working Group;
Western States Dublin Core Metadata Best Practices, Version 2.0; 01/12/2005.
[II.29] Jay Cross, CEO, Internet Time Group; eLearning; mid 1999.
[II.30] ADOBE; A primer on electronic document security; 11/2004.
[II.31] Gerhard U. Bartsch; Introduction to Electronic Document Management
Whitepaper ; March 16 2003.
[II.32] Andreas Hotho; Using Ontologies to Improve the Text Custering and
Classification Task; January 14 2005.
[II.33] Norman Paskin; DOI: implementing a standard digital identifier as the
key to effective digital rights management; March 9 2000.
III. Website:
[III.1] W3C SemanticWeb Activity http://www.w3.org/2001/sw
[III.2] Semantic web server http://www.semanticwebserver.com
[III.3] RDF http://www.w3.org/RDF
[III.4] Tim Berners Lee Notation3
http://www.w3.org/DesignIssues/Notation3.html
[III.5] http://www.cimtech.co.uk
[III.6] http://www.adobe.com/security
[III.7] RDQL: RDF Data Query Language
http://www.htl.hp.com/semweb/rdql.html
[III.8] RDF/XML Syntax Specification http://www.w3.org/TR/rdf-syntax-
grammar/
[III.9] DAML http://www.daml.org
[III.10] RDF Data http://www.rdfdata.org

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 123 - 0112398 Tu Thi Ngoc Thanh
[III.11] National Information Standards Organization http://www.niso.org
[III.12] Intellidimension: Delivering a Platform for the Semantic Web
http://www.intellidimension.com/
[III.13] eLib http://purl.org/metadata/dublin_core.





























D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 124 - 0112398 Tu Thi Ngoc Thanh


PHJ LJC
1. C php RDF:

rdfs:Resource
Tt ca moi thu duoc m ta boi RDF duoc goi l resources v l thnh vin cua class
rdfs:Resource
rdfs:Literal
Lop rdfs:Listeral dai din cho mt lop cc gi tri k tu nhu l strings v intergers. V
du: thuc tnh gi tri: chui text
rdfs:XMLLiteral
Lop rdfs:XMLLiteral dai din cho lop gi tri chui cua XML.
rdfs:Class
Lop ny tuong ung voi khi nim chung type hoc l catalog cua ti nguyn.
RDF class membership (quan h thnh vin lop RDF) duoc su dung d dai din cho
types v catalog cua ti nguyn. Hai lop c th c cng thnh vin.
rdf:Property
rdf:Property dai din cho nhung ti nguyn c thuc tnh RDF.
rdfs:Datatype
rdfs:Datatype dai din cho nhung ti nguyn c cc kiu du liu RDF.
rdf:type
Thuc tnh rdf:type cho bit mt ti nguyn l thnh vin cua class no.
Khi mt ti nguyn c mt thuc tnh rdf:type m gi tri cua thuc tnh ny l mt s
class xc dinh, th chng ta ni rng ti nguyn l mt instance of cua class xc dinh
ny.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 125 - 0112398 Tu Thi Ngoc Thanh
Gi tri cua thuc tnh rdf:type s lun l mt ti nguyn ti nguyn ny l mt th
hin (instance) cua rdfs:Class. Ti nguyn ny duoc bit nhu l rdfs:Class ban thn n
l mt ti nguyn cua mt rdf:type rdfs:Class. (Ban thn n cung l mt kiu type
cua mt lop).
rdfs:subClassOf
Thuc tnh rdfs:subClassOf dai din cho mi quan h chun ho giua cc class cua
mt ti nguyn. Thuc tnh rdfs:subClassOf l mt transitive.
rdfs:subPropertyOf
Thuc tnh rdfs:subPropertyOf l mt th hin (instance) cua rdf:Property, duoc su
dung d xc dinh mt thuc tnh l mt chun cua mt ci khc.
H thng cp bc thuc tnh con c th duoc su dung d trnh by h thng cp bc
cua cc rng buc v range v domain.
Ch : Thut ngu super property di khi duoc su dung d cho bit mi quan h
giua mt s thuc tnh voi nhiu thuc tnh ph bin khc, v du l mi quan h
rdfs:subPropertyOf.
rdfs:range
Mt th hin cua rdf:Property duoc su dung d cho bit cc class no m gi tri cua
mt thuc tnh s l thnh vin cua n.
Gi tri cua mt thuc tnh rdfs:range lun lun l mt Class. Thuc tnh rdfs:range ban
thn n c th duoc su dung d biu din diu ny: The rdfs:range of rdfs:range is the
class rdfs:Class. Diu ny cho thy rng bt ky mt ti nguyn no l gi tri cua thuc
tnh range s l mt class.
Thuc tnh rdfs:range chi duoc p dung di voi cc thuc tnh. Diu ny cung duoc
miu ta trong RDF thng qua vic su dung thuc tnh rdfs:domain. The rdfs:Domain
of rdfs:range is the class rdf:Property. Diu ny cho thy rng thuc tnh range p
dung di voi cc ti nguyn m ban thn n cung l cc thuc tnh (property).
rdfs:domain
Mt th hin cua rdf:Property duoc su dung d cho bit class no s c thnh vin l
bt ky mt ti nguyn no sao cho thuc tnh cua n duoc chi dinh.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 126 - 0112398 Tu Thi Ngoc Thanh
The rdfs:domain of rdfs:domain is the class rdf:Property. Diu ny cho thy rng
thuc tnh domain duoc su dung trn cc ti nguyn l cc thuc tnh.
The rdfs:range of rdfs:domain is the class rdfs:Class. Diu ny cho thy rng bt ky
mt ti nguyn no m l gi tri cua mt thuc tnh domain s l mt class.
rdfs:label
Thuc tnh rdfs:label duoc su dung d cung cp phin ban tn cua ti nguyn m con
nguoi c th doc duoc.
rdfs:comment
Thuc tnh rdfs:comment duoc su dung d cung cp su m ta ti nguyn m con nguoi
c th doc duoc.
Mt dng ch thch bng text (textual comment) gip lm r ngu nghia cua cc class
v cc property cua RDF.

Cc lp v cc thuc tnh RDF Utility v Container
RDF dinh nghia thm mt s class v property, bao gm xy dung cch biu din cc
container v cc pht biu RDF, v cch mo rng m ta tu vung RDF trn world wide
web.

Cc lp v cc thuc tnh RDF Container
rdfs:Container
Lop rdfs:Container l mt super class cua cc lop Container cua RDF, v du:
rdf:Bag, rdf:Seq, rdf:Alt.

rdf:Bag
Lop rdf:Bag dai din cho cu trc container Bag cua RDF, v l mt lop con cua lop
rdfs:Container.




D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 127 - 0112398 Tu Thi Ngoc Thanh
rdf:Seq
Lop rdf:Seq dai din cho cu trc container Sequence cua RDF, v l mt lop con
cua lop rdfs:Container.

rdf:Alt
Lop rdf:Alt dai din cho cu trc container Alt cua RDF, v l lop con cua lop
rdfs:Container.

rdfs:ContainerMembershipProperty
Lop rdfs:ContainerMembershipProperty voi tu cch l thnh vin cua thuc tnh
rdfs:member v cc thuc tnh _1, _2, _3, c th duoc su dung d cho bit quan h
thnh vin cua cc container Baq, Seq, v Alt. rdfs:ContainerMembershipProperty l
mt lop con (subclass) cua rdf:Property. Mi thuc tnh trong quan h thnh vin cua
container l mt rdfs:subPropertyOf cua thuc tnh rdfs:member.

rdfs:member
Thuc tnh rdfs:member l mt siu thuc tnh (super property) cua cc thuc tnh
trong quan h thnh vin cua container.

rdf:List
Lop rdf:List dai din cho lop cc danh sch lit k (Lists) cua RDF. N duoc su dung
voi cc construct nhu first, rest, v nil, v n duoc h tro trong c php
RDF/XML.

rdf:first
Thuc tnh rdf:first dai din cho mi quan h giua rdf:List v phn tu (item) du tin
cua n.

rdf:rest

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 128 - 0112398 Tu Thi Ngoc Thanh
Thuc tnh rdf:rest dai din cho mi quan h giua phn tu (item) rdf:List voi cc phn
tu cn lai trong danh sch (list), hoc voi phn tu cui cua n (v du, rdf:nil).

rdf:nil
Ti nguyn rdf:nil dai din cho mt rdf:List rng (empty).

Cc lp v cc thuc tnh RDF Utility
rdfs:seeAlso
Thuc tnh rdfs:seeAlso duoc su dung d cho bit mt ti nguyn c th cung cp
thng tin RDF thm vo v ti nguyn chu d (subject resource).

rdfs:isDefinedBy
Thuc tnh rdfs:isDefinedBy l mt thuc tnh con cua rdfs:seeAlso, v cho bit ti
nguyn no dang dinh nghia ti nguyn chu d.

rdf:value
Thuc tnh rdf:value nhn bit gi tri chu yu (thuong l chui) cua mt thuc tnh khi
gi tri thuc tnh l mt ti nguyn c cu trc (structured resource).

rdf:Statement
Lop rdf:Statement dai din cho cc pht biu v cc thuc tnh cua cc ti nguyn.
rdf:Statement l domain (linh vuc) cua cc thuc tnh: rdf:predicate, rdf:subject v
rdf:object.
Cc th hin (instance) rdf:Statement dc lp khc c th c cng gi tri cho cc thuc
tnh predicate, subject v object cua chng.

rdf:subject
Chu d cua mt pht biu (statement) RDF.
Thuc tnh rdf:subject cho bit mt ti nguyn l chu d cua mt s pht biu RDF.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 129 - 0112398 Tu Thi Ngoc Thanh
The rdfs:domain of rdf:subject is rdf:Statement and the rdfs:range is rdfs:Resource.
Thuc tnh ny c th duoc su dung d xc dinh ti nguyn no duoc m ta boi mt
pht biu RDF.

rdf:predicate
Vi ngu (predicate) cua mt pht biu RDF.
The rdfs:domain of rdf:predicate is rdf:Statement and the rdfs:range is rdfs:Resource.
Thuc tnh ny duoc su dung d xc dinh vi ngu no duoc su dung trong mt pht
biu RDF.

rdf:object
Tc tu (tn ngu) cua mt pht biu RDF.
The rdfs:domain of rdf:object is rdf:Statement. Thuc tnh range khng duoc dinh
nghia cho thuc tnh ny boi v cc gi tri cua rdf:object c th bao gm ca Literals
v Resources. Thuc tnh ny c th duoc su dung d xc dinh tc tu cua mt pht
biu RDF.


2. RDF Gateway:

Cng ty Intellidimension, nm tai Windsor, Vermont (USA) d tao ra mt nn
RDF thuong mai duoc goi l RDF Gateway. Dim manh cua cng cu ny l tnh d su
dung v mang chuyn. RDF Gateway chi gioi han trn nn Microsoft Windows, hin
nay vn chua c mt k hoach no cho su ra doi cua mt phin ban cho Linux hay mt
h diu hnh khc.
San pham RDF Gateway ra doi cng lc voi su ra doi cua cng ty
Intellidimension vo thng 6 nm 2000. Phin ban kim nghim beta cua n duoc ra
mt vo nm 2001. Nhung nh lp trnh d d xut v thao lun cc tnh nng cua h
thng trong din dn thao lun chung cua W3C. Cui cng th phin ban thuong mai
1.0 ra doi vo ngy 3 thng 3 nm 2003.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 130 - 0112398 Tu Thi Ngoc Thanh
Boi v dy l mt phn mm thuong mai, nn n cung cn c ban quyn. Tuy
nhin vn l min ph di voi cc muc dch hoc tp pht trin.
2.1. Kin trc cua RDF Gateway:

RDF Gateway l mt server nhe v nhanh, n c th lin kt cc tnh nng cua
mt h quan tri co so du liu v web server. N duoc thit k nhu l mt khung nn
cho vic tp hop, truy vn, chuyn di v phn phi du liu RDF.

Hnh 24: Kin trc cua RDF Gateway

o B xu l ban m RDFQL (RDFQL Script Processor)
RDFQL Script Processor l mt my ao uu tin (preemptive virtual
machine) c th bin dich, luu tru, v thuc thi cc doan script RDFQL. RDFQL l mt
ngn ngu scripting pha server dua trn ECMA Script (Java Script). RDFQL tch hop
cc mo rng truy vn tua SQL d d dng truy cp dn b my co so du liu suy din
cua RDF Gateway. RDFQL script processor cho php cc trang (pages) c su kt

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 131 - 0112398 Tu Thi Ngoc Thanh
hop cua script v ni dung tinh tuong tu nhu Microsoft Active Server Pages (ASP).
Server duoc duoc kt ni dn RDFQL thng qua mt thu vin cua cc di tuong bn
trong (Server, Session, Request, Response, ).
o Database Engine
RDF Gateway c mt b my co so du liu suy din duoc thit k tu
nn khng h tro hoc c h tro RDF. N thuc hin dnh gi truy vn theo chin luoc
bottom up, duoc t chuc lin doan theo tt ca cc ti nguyn du liu xc dinh. Kha
nng suy lun logic cua b my cung cp su h tro cho c php cc lut khai bo cua
RDFQL. B my co so du liu khng truy cp dn mt h thng quan l du liu bn
ngoi.
o Data Service Interface: (Giao din d|ch v d liu)
Giao din dich vu du liu cho php cc ti nguyn du liu tu bn ngoi
duoc tch hop voi RDF Gateway. Mt nh cung cp dich vu du liu l mt m dun
thuc thi giao din ny v biu din cc ni dung cua mt kiu xc dinh cua ti nguyn
du liu nhu l du liu RDF. RDFQL cho php t chuc lin doan cc cu vn tin duoc
thi hnh thng qua nhiu dich vu du liu. Giao din mo ny lm cho n c th su dung
bt ky mt nh cung cp dich vu du liu sn c hin tai no hoc pht trin mt nh
cung cp theo mnh cho mt ngun du liu.
o Authentication/Security:
RDF Gateway c mt m hnh bao mt dua trn quyn v su cho php
diu khin truy xut dn server v cc ti nguyn co so du liu. RDF Gateway h tro
cho nhung user cua n v cc role cung nhu user v group cua NT. Mt NT user lun
duo chung thuc bng cch su dung mt su uy nhim cua NT cho ti khoan. Su h tro
cua RDF Gateway cho di voi cc user v group cua NT lm cho c th quan tri bao
mt tu bn ngoi.
o Network IO
Giao din mang h tro ca HTTP v TCP/IP dua trn giao thuc. Tng
nhp xut mang (network IO layer) h tro luoc d chung thuc mang bao mt nhu l

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 132 - 0112398 Tu Thi Ngoc Thanh
NT Challenge/ Response (NTLM). Mt client kt ni dn server thng qua mt
interface (giao din).
o Package Management
RDF Gateway cho php thuc thi cc ung dung d duoc pht trin v
trin khai nhu l cc package. Mt package bao gm cc trang server RDF, cc trang
HTML, cc hnh anh hoc bt ky mt kiu file no khc.
o Component Management
RDFQL h tro COM trong script pha server cua n. Diu ny cho php
tnh nng cua RDF Gateway c th duoc mo rng hoc di voi cc ung dung duoc tch
hop voi RDF Gateway.
o Session Management
B quan l phin lm vic cho php luu lai trang thi cua nguoi dng
trn server.

2.2. Tnh nng (Features)

o Biu din cc b ba RDF vo trong cc bang d liu:
H bin ho RDBMS ( RDBMS paradigm) cua vic luu tru du liu trong
cc bang duoc lp vo d luu tru cc b ba RDF (triples). M hnh du liu cua cc
bang l mt b ba bao gm: predicate, subject, v object. Cc ct cua bang khng c
tn nhung lun chua 3 thnh phn cua b ba ny theo thu tu. Luu l predicate l
thnh phn du tin. C mt ct tuy chon thu tu cho luu tru siu du liu v triple (b
ba), siu du liu ny duoc goi l context cua b ba. Truong context c th luu tru
mt dinh danh ti nguyn m dinh danh ny c th duoc su dung d giai quyt cc vn
d bao mt hoc nhn din ti nguyn cua b ba hoc bt ky mt tnh nng quen thuc
no.
o Other data sources: (Cc ngun d liu khc)
Cc ngun du liu bn ngoi v cc co so du liu dang hoat dng duoc
truy xut tu server duoc bao quanh cc di tuong cua ngun du liu. Mt di tuong

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 133 - 0112398 Tu Thi Ngoc Thanh
ngun du liu (datasource object) c cu trc ging nhu mt table, chua dung cc b
ba trong cc dng. C su h tro cho cc bang trong b nho v n c th tao cc trnh
bao boc cho du liu bn ngoi.
o Databases
Vic luu tru cc bang duoc phn thnh cc phn trong co so du liu.
Mt server c th chua nhiu co so du liu khc nhau, mt bang c th duoc tao trong
mt co so du liu. Format cua co so du liu l mt dinh dang file so huu, mi co so
du liu duoc luu trong mt file.
o RDFQL script language:
Ngn ngu scripting dua trn ECMA script, thuong duoc bit dn nhu l
Javascript. Cc khi nim sau duoc cung cp:
Functions (cc hm)
Variables v Arrays ( cc bin v cc mang)
Cu lnh loops v If
Exception handling (bt li)
Import cc file script khc.
Comments (cc ch thch)
Cc cu lnh (pht biu) trong RDF Gateway.
Cc cu lnh cho RDF Gateway bao gm mi kha canh cua server v
gip nguoi lp trnh truy cp dn tt ca cc tnh nng cua n. Mt v du l cng cu cu
hnh server, cng cu ny l mt trang web duoc vit bng RDFQL duoc thng dich
boi mt web server duoc tch hop, v cho php truy xut dn tt ca cc di tuong cua
server nhu l: cc table, cc database, user v package.
D tm ra duoc cc dataset cua b ba RDF, mt di tuong RDF node
duoc cung cp, n thu thp tt ca cc predicate v subject cua mt di tuong d cho v
lm cho n c th thay di gi tri cua cc subject.
D chay cc cu truy vn trn server, mt tp cc cu lnh co so du liu
cn phai sn sng. Cc cu lnh co so du liu dng gi trong RDFQL script, cu lnh

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 134 - 0112398 Tu Thi Ngoc Thanh
ny thuong duoc bit tu cc cu lnh SQL trong cc file source C duoc tch hop boi
mt trnh bin dich truoc.
Truy cp dn cc di tuong ActiveX v COM duoc h tro thng qua
phuong thuc khoi gn (construct) cua ngn ngu ActiveXObject.
Nu doan script RDFQL duoc dnh gi trong ngu canh cua web server, th cc di
tuong chua du liu session, request v response duoc cung cp.
o Adding and retrieving data (thm v truy vn d liu)
Cc lnh thao tc du liu th tuong tu voi c php lnh trong SQL. Tnh
nng duoc mo rng di voi cc nhu cu xc dinh cua RDF. C cc cu lnh nhu:
INSERT, SELECT v DELETE. Cc cu lnh ny su dung cc bin (variable) d rng
buc du liu, tuong tu nhu ngn ngu RQL duoc su dung boi RDFSuite.
INSERT {
[http://www.artchive.com/]
[http://www.icom.com/schema.rdf#technique]
[http://www.artchive.com/rembrandt/abraham.jpg]
'Oil on canvas'
} INTO museum;

V du ny chi ra cch no d insert mt b ba (triple) vo table
museum. B ba duoc vit giua 2 du ngoc nhon ({ v }) v chua 4 gi tri:
Context
Predicate
Subject
Object hoc Literal
Thng tin ngu nghia cua b ba ny c nghia l: buc anh abraham.jpg
thuc v linh vuc Oil on Canvas v thng tin ny duoc ly tu www.artchive.com.

SELECT ?a, ?b, ?c USING museum
WHERE {?a ?b ?c} AND ?c LIKE Oil;

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 135 - 0112398 Tu Thi Ngoc Thanh
D truy vn cc triple tu mt table, th cu lnh SELECT duoc su dung.
V du ny truy xut tt ca cc triple m c chua tu oil trong gi tri di tuong literal.
Ch l triple o giua 2 du ngoc nhon chi chua 3 gi tri, context duoc bo di.
Du liu c th duoc ly tu cc ngun du liu bn ngoi hoc chuyn di (tranfer) tu
mt bang ny dn mt bang khc.
var doc = new DataSource(
"inet?url=file://c:/Museum.xml&parsetype=rdf");
SELECT ?a, ?b, ?c USING #doc WHERE {?a ?b ?c};
INSERT {?p ?s ?o} INTO museum USING #doc
WHERE {?p ?s ?o};
Trong v du ny, mt du liu RDF duoc ly tu mt file text v duoc
insert vo bang museum. Luu l trong RDFQL Javascript, code duoc trn voi mt
doan code ging nhu SQL bin javascript doc duoc su dung trong lnh co so du
liu nhu l #doc.
o Built in Webserver (Webserver gn lin)
RDF Gateway c mt Webserver gn lin.Giao din cu hnh v quanl
duoc xut ban duoi dang web. Cc nh pht trin ung dung c th tao cc trang web
voi web server ny, bng cch su dung ngn ngu RDFQL script. Tnh nng ny c th
duoc su dung trong vic debug v pht trin, nhung cung c th su dung d xy dung
ton b cc ung dung web bng cch su dung RDF Gateway. Di voi vn d su dung
cc di tuong ActiveX thng qua RDFQL, web server duoc xem l rt manh.
o RDF Query Analyzer
Cc cu lnh v cc cu truy vn RDFQL c th duoc tao ra bng cch
su dung ung dung ao ny (RQF Query Analyzer).


D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 136 - 0112398 Tu Thi Ngoc Thanh

Hnh 25: Giao din cua RQF Query Analyzer.

Query analyzer th tuong tu nhu cc san phm luong ga truy vn cua
cc SQL Server ph bin. Cc script phuc tap c th duoc tao ra o dy v duoc su
dung trong cc trang web hoc cc ung dung khc. Cc cu truy vn c th duoc dnh
gi lai mt ln nua o RDF Gateway cuc b hoc o xa, trnh soan thao vn van (text
editor) c c php o dang duoc highlight v c th luu v mo cc cu vn tin.
o Inference Engine (my suy din)
B my co so du liu RDF Gateway gm mt my suy din. Cc cu
lnh b ba RDF moi, c th duoc pht sinh mt cch tu dng dua trn cc lut suy
din v cc b ba d c sn. Cc hm (function) c th duoc dinh nghia, cc hm ny
rt trch du liu tu co so du liu dua trn cc lut. Cc lut ny duoc dinh nghia trong
ngn ngu RDFQL script v c th duoc su dung trong cc dng lnh thao tc co so du
liu.
RULEBASE schema
{
INFER {[rdf:type] ?s ?class} FROM
{[rdf:type] ?s ?subclass} AND

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 137 - 0112398 Tu Thi Ngoc Thanh
{[rdfs:subClassOf] ?subclass ?class};
};

SELECT ?p ?s ?o USING #ds RULEBASE schema WHERE
{[rdf:type] ?s ?o} AND {?p ?s ?o};
V du ny dinh nghia mt lut (rule) cho luoc d RDF (RDF Schema) v
cc subclass (lop con): Nu mt subject l mt kiu (type) cua X v X duoc dinh
nghia l subject cua Y, th subject cung l mt kiu cua Y. Ri sau d lut ny duoc su
dung cho cu lnh SELECT d truy vn tt ca cc class v cc class xut pht cua tt
ca cc subject.
Vn RDF Schema khng duoc h tro boi RDF Gateway, n phai duoc
m ta theo cc lut suy din.
o Client Libraries
RDF Gateway c cc the ci client (client drivers) cho Microsoft ADO
v Sun Microsystems JDBC. Diu ny cho php RDF Gateway h tro rng khp nhiu
client nhu l cc web browser, cc ung dung Windows, cc ung dung Java, XML hoc
RDF dua trn clients.
o Security
Khi truy cp vo RDF Gateway thng qua http, ADO hoc cc protocols
khc, nguoi dng phai duoc nhn bit bng cch su dung username v password. Mt
ti khoan nguoi dng (user account) l anonymous duoc cung cp cho vic truy xut
chung (moi nguoi du c th truy cp vo voi user account ny).
H thng bao mt su dung ca hai loai: co so du liu bao mt cua windows d xc nhn
cc nguoi dng windows v mt co so du liu nguoi dng bn trong. Cung nhu
Internet Explore, NT Authentication (su chung thuc o muc NT) c th duoc su dung
voi http.
Mi thnh phn (item) duoc quan l boi RDF Gateway c th bi gioi
han di voi cc user duoc dinh nghia, cc thnh phn ny bao gm: cc package, cc

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 138 - 0112398 Tu Thi Ngoc Thanh
table, data source v cc component. O cp d table, d dng d sua di vic doc, vit,
v xo cc quyn cho cc user ring bit.
Mt khi nim bao mt dua trn dng di voi cc lnh RDF trong cc
table duoc dua trn ct context, truong thm vo thu tu ny duoc add vo subject,
predicate, v object. Mt user c th duoc cho php d doc, vit, v xo cc quyn
(rights) di voi mt context ring bit.
Khng h c su h tro di voi mt nhm nguoi su dugn (user group
trong RDF Gateway khng c khi nim ny).
o Configuration and Management (cu hnh v quan l)
Cc su sp dt (setting) cu hnh chi tit duoc truy cp thng qua giao
din web, giao din ny duoc dn vo nho web server gn lin. Nguoi dng phai dng
nhp vo bng cch su dung mt account c vai tr l administrator cua windows.
Ung dung web ny duoc dt tn l RDF Gateway Management Utility v cung cp
truy cp dn cc databases, tables, users, contexts, ActiveX Components, Data
Services, Roles, Packages, MimeTypes v Timers. Di voi hu ht cc thnh phn
ny, th cc chon lua bao mt v su cho php c th duoc dt.
Tin ch quan l duoc thuc thi nhu l RDF Gateway web package.

Trn dy l nhung gioi thiu bao qut v RDF Gateway. Ngoi ra ta cung c th xem
thm v c php cua RDF Gateway kh chi tit trong phn help cua tin ch RDF
Query Analyzer.


3. H thng nhn ng nghia:
H thng nhn ngu nghia duoc gioi thiu o dy gm 3 tiu h thng nho ung
voi 3 tu loai: danh tu, dng tu v tnh tu. Trong mi tiu h thng, lai duoc phn thnh
2 cp: cp co ban chua mt s t cc nhn chung nht, su dung nhiu nht v l nhung
nhn vit tt (goi nho) d nho; v cp chuyn su l nhung nhn theo h thng

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 139 - 0112398 Tu Thi Ngoc Thanh
LLOCE. Ngoi ra, trong phn ny, cung cn lit k mt s h thng nhn ngu nghia
khc nhu WordNet, CoreLex.
3.1. Nhn ng nghia co ban cho danh t:

STT Nhn M ta nghia
1 ABS Abstraction Nhung g truu tuong
2 ACT Act Hnh dng
3 AGT Agent Tc nhn
4 ANM Animal Sinh vt
5 ART Artifact Nhn tao
6 ATR Attribute Thuc tnh
7 BDY Body Co th con nguoi
8 CEL Cell T bo
9 CHM Chemical Ho cht
10 COM Communication Truyn tin
11 CON Consequence Hu qua
12 ENT Entity Thuc th
13 EVT Event Bin c
14 FEL Feel Su cam nhn
15 FEM Female Ging ci/ nu
16 FOD Food Thuc n
17 FRM Shape, form Hnh dang
18 GAS Gas Th kh
19 GRB Group biology Nhm sinh hoc
20 GRP Group Nhm ni chung
21 GRS Group social Nhm x hi
22 HOU House Cng trnh xy dung
23 HUM Human Con nguoi

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 140 - 0112398 Tu Thi Ngoc Thanh
24 LFR Life form Su sng
25 LIN Line Duong, nt, du vt
26 LIQ Liquid Th long
27 LME Linear measure Do luong
28 LOC Location Vi tr
29 LOG Location geography Vng dia l
30 MAL Male Ging duc/ nam
31 MEA Measure Dai luong
32 MIC Microoragnism Vi sinh vt
33 MOT Motion Su chuyn dng
34 NAT Natural object Vt th thin nhin
35 PHM Phenomenon Hin tuong
36 PHO Physical object Vt th vt l
37 PLT Plant Thuc vt
38 POS Possession Su so huu
39 PRO Process Qu trnh
40 PRT Part, piece B phn
41 PSY Psychological Thuc tnh tm l
42 QUD Definite quantity Dai luong huu han
43 QUI Indefinite quantity Dai luong v han
44 REL Relation Quan h
45 SOL Solid Th rn
46 SPC Space Khng gian
47 STA State Trang thi
48 SUB Substance Cht liu
49 TME Time Thoi gian
50 UNT Unit Thuc don vi
Bang 13: Nhn ngu nghia co ban cho danh tu

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 141 - 0112398 Tu Thi Ngoc Thanh
3.2. Nhn ng nghia co ban cho dng t:

STT Nhn M ta nghia
1 VBDY Body Cc dng tu cua co th: n, mc,

2 VCHG Change Cc dng tu thuc v su thay di:
tng, di,
3 VCOG Human Cc dng tu tri nhn: suy nghi, xt
don,
4 VCOM Communication Cc dng tu truyn thng: k, hoi,
ra lnh,
5 VCMP Competition Cc dng tu v canh tranh: chin
du, thi du,
6 VCSM Consumption Cc dng tu v tiu thu: n, ung,

7 VCON Contact Cc dng tu v tip xc: dnh, do,

8 VCRE Creation Cc dng tu v su tao lp: son,
khu, thi hnh,
9 VEMO Emotion Cc dng tu v cam gic: yu,
ght,
10 VMOT Motion Cc dng tu v chuyn dng: di,
bay, boi,
11 VPER Perception Cc dng tu v gic quan: nghe,
thy, cam thy,
12 VPOS Possession Cc dng tu v so huu: mua, bn,
so huu,
13 VSOC Social Cc dng tu v hoat dng x hi:
bu cu, tai v,

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 142 - 0112398 Tu Thi Ngoc Thanh
14 VSTA Stative Cc dng tu v trang thi, quan h
khng gian.
15 VWEA Weather Cc dng tu v thoi tit: mua,
tuyt, sm,
Bang 14: Nhn ngu nghia co ban cho dng tu

3.3. Nhn ng nghia co ban cho tnh t:

STT Nhn M ta nghia
1 ACOL Color Cc tnh tu v mu sc: do, xanh,
2 ASIZ Size Cc tnh tu v kch thuoc: trn, det,
3 ATME Time Cc tnh tu thuc v thoi gian: lu, mau,

4 ASPC Space Cc tnh tu thuc v khng gian: lon,
nho, di,
5 ASTR Strength Cc tnh tu v suc manh: manh, yu,
6 ADEG Degree Cc tnh tu v muc d: nhiu, t,
7 AFEA Feature Cc tnh tu v dc dim, ni dung: kh,
hay,
8 AREF Reference Cc tnh tu b nghia so chi: former
(president)
9 AREL Relation Cc tnh tu quan h: Vietnamese (war)
Bang 15 : Nhn ngu nghia co ban cho tnh tu
3.4. H thng nhn ng nghia LDOCE

STT M ngu nghia co ban M ngu nghia pht sinh
1 A Con vt
(animal)
E Cht rn/ long (S +
L)
2 B Con vt ci K Nguoi/con vt duc

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 143 - 0112398 Tu Thi Ngoc Thanh
(female animal) (D +M)
3 C Vt cu th
(concrete)
O Nguoi/ con vt (A
+ H)
4 D Con vt duc
(male animal)
R Nguoi/con vt ci
(B + F)
5 F Nguoi nu
(female human)
U Tp hop nguoi/con
vt (Col. + O)
6 G Kh (gas) V Thuc vt/ con vt
(P + A)
7 H Nguoi (human) W Vt truu tuong/cu
th (T + I)
8 I Vt cu th
khng c su sng
X Vt truu tuong/
nguoi (T + H)
9 J Vt rn di
chuyn duoc
Y Vt truu tuong/ c
su sng (T + Q)
10 L Cht long
(liquid)
1 Nguoi /cht rn (
H + S)
11 M Nguoi nam (
male human)
2 Truu tuong/ cht
rn ( T + S)
12 N Vt rn khng
di chuyn duoc
6 Cht long/ truu
tuong (L + T)
13 P Thuc vt (
plant)
7 Cht kh/ cht long
(G + L)
14 Q C su sng
(animate)

15 S Cht rn (solid)
16 T Truu tuong
(abstract)


D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 144 - 0112398 Tu Thi Ngoc Thanh
17 Z Khng dnh du
(unmarked)

18 4 Vt th truu
tuong (abs physic)

19 5 Cht huu co (
organic material)


Bang 16: H thng nhn ng nghia LDOCE
4. H co so tri thc ng nghia t vng WordNet
4.1. H thng nhn ng nghia cua danh t:

Truoc ht, ta s tm hiu nhung han ch trong cch luu tru thng tin v ngu
nghia cua danh tu o tu din thng thuong, tu d, chng ta moi thy nhung uu th cua
WordNet trong cch luu tru, truy xut, cp nht cc thng tin d.
4.1.1. T chc cua danh t trong t din thng thung:
Khi ta tra mt danh tu no d trong cc tu din thng thuong, ta s nhn duoc
nhung loi giai thch c ve kh dy du. V du, tra tu tree (cy), ta s nhn duoc dinh
nghia tree is a plant that is large, woody, perennial and has a distinct trunk ( cy l
mt thuc vat m c thn, sng lu nm, c g, kch thuoc lon). Di voi nhung nguoi
c kin thuc ph thng, c th chp nhn dinh nghia ny. Nhung nu chng ta mun
bit su hon nhu cy c r, c t bo xen lu l, l t chuc c su sng, th ta
cn phai tra ngu nghia cua tu plant, tuy nhin khi tra tu plant, ta s nhn duoc hai
loi giai thch hon ton khc nhau: mt dnh cho nghia nh my v mt dnh cho
nghia thuc vt. Cu hoi dt ra l, khi mun truy xut tu dng, th my tnh s chon
nghia no? Dy l han ch cua cc tu din thng thuong.
Cc tu din thng thuong chu yu thiu thng tin mang tnh cu trc
(structure), v dinh nghia cua n chi mang thng tin c tnh du kin (fact), v do cch
t chuc theo vn abc, nn khng th chua o mi tu moi thng tin c lin quan trong

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 145 - 0112398 Tu Thi Ngoc Thanh
dinh nghia cua n duoc, v lm nhu vy s trng lp thng tin, kch thuoc cua tu din
s v cng lon v khng kinh t.
Cui cng, mt khuyt dim lon nht m hu ht cc tu din thng thuong du gp
phai, d l vic dinh nghia vng trn. Nghia l: dng tu W
a
d dinh nghia tu W
b
, ri
lai c ch lai dng tu W
b
d dinh nghia lai tu W
a
.
4.1.2. T chc d liu danh t trong WordNet
Thy duoc cc khuyt dim cua tu din thng thuong, WordNet luu tru danh
tu thnh mt h thng phn cp hnh cy dua theo quan h ha danh (hyponymy) v
thuong danh (hypernymy). Xut pht tu gc l mt nim cha rt tng qut, dua theo
quan h thuong danh (hypernymy), ta gia phn (nhnh) thnh cc nim con cu th
hon, ri cung tu chnh cc nim con ny, lai tip tuc phn nho nua thnh cc nim
chi tit hon, v cu nhu th dn khi khng cn cn thit phn chia nua (trung bnh c
chuc cp) v nt tn cng d (nt l) chnh l cc danh tu.
V du, cy si (oak) l mt loi cy (tree), cy l mt loi thuc vt
(plant), thuc vt l mt loi huu co (organism). Trong WordNet s din ta nhu
sau: oak @ tree @ plant @ organism, voi k hiu @ d tro dn nt
cha, th hin quan h ha danh (hyponymy), hay cn goi l quan h ISA. Di lp voi
quan h ha danh l quan h thuong danh (hypernymy) v trong WordNet, quan h ny
duoc k hiu l ~ d tro dn nt con, v du: organism ~ plant ~ tree ~
oak ( v WordNet duoc luu tru duoi dang din tu, nn WordNet chi cn luu quan h
hyponymy mt cch tuong minh, cn quan h hypernymy s duoc tu dng suy ra tu
quan h hyponymy).
Voi cch t chuc phn cp nhu trn, WordNet khng cn luu moi tnh cht cua
mi nim (nt), m chi cn luu dc dim ring cua nim d m thi, cn cc tnh
cht khc duoc tu dng suy din ra tu dc tnh chung duoc k thua tu nim cha cng
voi cc dc tnh khc cua cc nim con. Diu ny gip cho WordNet khc phuc duoc
cc khuyt dim cua tu din thng thuong (khng luu trng lp thng tin m vn chua
dy du thng tin, tit kim khng gian luu tru).

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 146 - 0112398 Tu Thi Ngoc Thanh
Ngoi ra, voi cc t chuc phn cp c k thua nhu trn, WordNet khc phuc
duoc hin tuong dinh nghia vng quanh, khng bao gio c hin tuong tu W
a
dinh
nghia tu W
b
, ri chnh W
b
lai dinh nghia W
a
. V theo t chuc hnh cy, mi loai quan
h chi c mt chiu nht dinh, v du quan h thuong danh, chi c chiu tu trn xung
duoi, di tu tng th dn chi tit ( chuyn bit ho), cn quan h ha danh th nguoc lai:
di tu duoi ln trn, di tu chi tit dn tng th (tng qut ho).
Tuy nhin, khng phai moi thng tin v th gioi thuc du duoc luu trong cc
nim cua WordNet, nn trn thuc t, ta cung khng th c duoc dy du hon ton cc
tri thuc v th gioi thuc cua cy nhu tri thuc cua nguoi duoc. V du: WordNet
khng luu nhung thng tin, nhu: cy cho bng mt, cy kh c th lm cui dun, .
Hin nay, WordNet chua lin kt bc si voi bnh vin, chua th lin kt vot,
banh, luoi, voi sn choi tennis.
4.1.3. Cc nim nguyn thuy (primitive semantic)
Trong WordNet, ta c gia pha cua tu oak nhu sau: {oak} @ {tree}
@ {plant, flora} @ {organism, living thing} @ {thing, entity}. Nhu vy,
nim {thing, entity} l mt nim gc, nim cao nht, tng qut nht, chnh v vy
n chng mang mt nghia g ( v n l ci g d rt chung chung) v moi nim
trong WordNet du dn toi nim gc d (du l con chu cua n). Tuy nhin, nu ta
t chuc cy nim danh tu voi mt gc nim duy nht trn cy th s khin cho cy
c kch thuoc rt lon, vic t chuc cc nhn cho cc nim phai chi tit hon d trnh
trng nhau. V du: giua plant cua nim thuc vt v plant cua nim nh
my, WordNet phai dng 2 nhn (dang tu) khc nhau d phn bit, hon nua, su gom
v chung mt gc lon nhu vy th cung chng c k thua duoc thng tin g ( v cc
nim gc l rt chung chung, t thng tin).
Chnh v vy m WordNet d phn thnh 25 gc chnh nhu bang duoi dy m
ta. Cc gc ny duoc goi l cc nim nguyn thuy. Mi cy nhu vy duoc luu thnh
mt tp tin ring r. Chnh v vy, m khi gp nhn plant (thuc vt) nhu trn, th
my tnh khng nhm ln voi plant c nghia nh my, v cy nim m chua
tree l cy m c nim nguyn thuy l {plant} (thuc vt) duoc luu ring bit voi

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 147 - 0112398 Tu Thi Ngoc Thanh
cy nim m c chua plant voi nghia l nh my ( nim ny duoc luu trong cy
khc, cy m c nim nguyn thuy l {artifact}).
Quan st 25 nim nguyn thuy d, ta thy c mt s nim c nhung nt
nghia chung nhau (v du: {animal}, {person}, {plant} du l nhung vt c su sng),
chnh v vy m trong WordNet, nhung nim c chung nt nghia nhu vy s duoc
nhm voi nhau d tao thnh con cua mt nim cao hon. Sau khi nhm rt gon lai,
trong WordNet chi cn 11 nim nguyn thuy (nhung nim duoc in nghing trong
bang duoi dy).
Animal (sc vt)
Person (nguoi)
Organism
(vt c su
sng) Plant (thuc vt)
Artifact (d nhn tao)
Natural object (vt th tu nhin) Body (co th)
Entity
(thuc th
tip xc
duoc)
Object (vt
th khng
c su sng) Substance (cht) Food (thuc n)
Attribute (thuc tnh)
Quantity (s luong )
Relation (quan h)
Abstraction
( truu
tuong)
Time (thoi gian)
Cognition (tri nhn)
Feeling (cam gic)
Psychology
feature (v
tm l) Motivation (tnh cam)
Natural phenomenon (hin tuong
tu nhin)
Process (qu
trnh)
Activity (hoat dng)
Event (bin c)
Group (nhm nguoi)
Location (vi tr )

Possession (so huu)

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 148 - 0112398 Tu Thi Ngoc Thanh
Shape (hnh dang)
State (trang thi)
Bang 17:Su phn lop danh tu trong WordNet
Cc nim trong bang trn dy duoc goi l nhung nim nguyn thuy
(primitive semantic component). Tu nhung nim nguyn thuy ny, WordNet d xy
dung nn h thng cy phn lop cho danh tu theo quan h ha danh (hyponymy) v
thuong danh (hypermyny).
Voi cch sp xp nhu trn, trong thuc t su dung WordNet, tc gia thy d su
cua cy WordNet rt can (c 10 12 cp) v gn mt nua trong s cc nim phai di
qua d, mang nghia k thut nhiu hon.
4.1.4. D}c dim ring cua mi nim trong h phn cp:
Theo cch t chuc cua WordNet, cc nim con cng k thua mt nim cha,
cn phai c mt s dc tnh ring nhm phn bit voi nim cha v cc nim anh em
voi n. Cc dc tnh phn bit ny gm 3 loai, v du voi nim {robin} (chim c do),
n c 3 loai dc tnh sau:
Thuc tnh (attributes), (ni voi tnh tu) [ mu = do, kch thuoc
= nho]
B phn (parts) (ni voi dnh tu) [mo, lng,cnh]
Chuc nng (functions) (ni voi dng tu) = [ht, bay]
Tuong tu, nim {canary} (chim vng anh) cung l con cua nim {bird}
(chim), c thuc tnh [ mu = vng, kch thuoc = nho], c b phn [ mo, lng, cnh],
c kha nng [ht, bay, de trung]. Vy ta thy giua {robin} v {canary} (du cng l
loi chim), c dim khc bit v mu sc. Nhu vy, thng tin cua mt nim chnh l
thng tin k thua tu nim cha cn thm cc dc tnh ring cua n. Vy ta c th ni
synset {A} l con cua synset {B} nu tt ca cc dc tnh cua synset {B} du c trong
synset {A}. V vy mt tu thuc synset con, c th lm tin tr tu (antecendent) thay
cho mt tu thuc synset cha, hay c th thay cho mt di tu cua mt dng tu voi diu
kin di tu d thuc synset cha. V du:

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 149 - 0112398 Tu Thi Ngoc Thanh
- Trong cu Ti dua anh y mt cuon tiu thuyt hay, nhung cun
sch d lm anh ta bun. Ta c cuon tiu thuyt l nim con cua
nim cuon sch, nn c th lm tin tr tu cho tu cuon sch.
- Trong cu Ti ung nuoc, c th thay th di tu nuoc cua dng
tu ung bng bt ky di tu no m thuc nim con cua n, nhu:
nuoc ngot, nuoc tr, nuoc suoi,
4.2. H thng nhn ng nghia cua dng t:

Dng tu l tu loai quan trong nht v l tu bt buc phai c di voi moi cu
ting Anh. Dua trn dc dim cua dng tu, ta c th xc dinh cu trc cua cu (A.S.
Hornby). Dua trn dng tu, ta c th xc dinh cc vai trong cu (Fillmore). S luong
dng tu trong ting Anh chi bng 1/3 s luong danh tu, cn muc d mo h nghia cua
dng tu th lai cao hon (trung bnh mt dng tu c 2.11 nghia, cn danh tu c 1.74
nghia). Nghia cua dng tu rt uyn chuyn, linh dng theo cc danh tu c lin quan
dn n. WordNet chia cc dng tu thnh 15 nhm (o trn) d chi cc bin co (event),
hnh dong (action) hay trang thi (state) khc nhau dua theo su phn chia v mt ngu
nghia, nhu: nhm dong tu chi chuc nng v vic chm sc co th, su nhan thuc, quan
h x hoi, .
Vic xy dung tp dng nghia (synset) cho dng tu cung gp nhiu kh khn
hon so voi danh tu v kh xc dinh tu dng nghia. Ta thy trong ting Anh c mt s
dng tu dng nghia, nhu: begin commence (bt du), end terminate (kt thc), buy
purchase (mua), hide conceal (giu), nhung thuc cht vic dng ln ln cc
dng tu dng nghia ny khng phai lc no cung dng. V du: nguoi ta thuong ni
Where have you hidden Dads slippers? (Anh giu dp cua Dad o du?) chu khng
ni l Where have you concealed Dads slippers?.
Vic biu din ngu nghia v t chuc dng tu l diu kh khn nht so voi cc tu
loai khc. C rt nhiu cch tip cn khc nhau d biu din ngu nghia cua dng tu,
chu yu l phn r ngu nghia dng tu thnh dang ny hay dang khc. Sau dy l mt
s cch phn giai ngu nghia dng tu.

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 150 - 0112398 Tu Thi Ngoc Thanh
4.2.1. S phn giai ng nghia cua dng t:
Hu ht cc cch tip cn di voi ngu nghia dng tu l c gng phn giai ngu
nghia dng tu thnh mt s huu han cc thnh phn nim ngu nghia ph qut
(universal semantic conceptial components), hay cn goi l nim nguyn thuy,
nguyn t, so khoi, vi tu nguyn tu, danh tu dnh du (noun marker), v du: dng tu
kill (git) = {CAUSE TO BECOME NOT ALIVE} (gy ra su dn dn khng sng).
Cch tip cn ny d nhn duoc nhiu kin khc nhau, c nguoi dng tnh (Katz,
Lakoff, Jackendoff, Schank, Miller) nhung cung c nguoi phan di cho l khng thch
hop (Chomsky v mt s nguoi khc).
Su phn tch ngu nghia quan h cua dng tu khc voi su phn giai ngu nghia
cua dng tu. Su phn giai ngu nghia chu yu dua trn cc nim co so (don vi ngu
nghia nho nht), cn su phn tch ngu nghia quan h lai dua vo cc nim cn ban d
hnh thnh trong du c cua con nguoi. V du: nhu quan h CAUSE (nguyn nhn)
lin kt cc cp dng tu teach (day) learn (hoc), show (chi) see (thy), dua trn
quan h ny cung gip ta phn bit mt cch c h thng du l tha dng tu (transitive
verb) v du l tu dng tu (intransitive verb).
4.2.2. Quan h ko theo cua dng t:
Trong WordNet, mi tu loai duoc t chuc dua theo mt quan h chnh no d,
v du: danh tu th dua theo quan h ha danh (hyponymy), tnh tu th dua theo quan h
phan nghia (antonymy), cn dng tu th dua vo quan h ko theo (entialment).
Giua quan h ko theo c phn no d ging quan h b phn (meronymy),
nhung khng thch hop cho nghia V1 l b phn cua V2 ging nhu bn danh tu. V
du: ta thu xt c phai thinking (su suy nghi) l mt b phn cua planning (vic
hoach dinh) hay khng? Nhung nhiu nguoi cho rng dng tu khng th phn chia b
phn ging nhu danh tu v: cc danh tu v cc b phn cua danh tu du c so chi vt
(referent) cu th, phn bit trong khi d bn dng tu th khng duoc r rng nhu vy.
Ngoi ra, quan h giua 2 dng tu cn phu thuc vo thoi gian thuc hin, xay ra hnh
dng, bin c (bn danh tu: quan h b phn khng phu thuc vo thoi gian). Mt

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 151 - 0112398 Tu Thi Ngoc Thanh
hnh dng hay bin c duoc goi l mt b phn cua mt hnh dng hay bin c khc
chi khi n l mt phn, mt giai doan trong qu trnh thuc hin cua hnh dng kia.
Tm lai, qua quan st cc truong trn, ta rt ra nhn dinh sau: nu V1 ko theo
V2 v nu thoi gian din ra V1 nm trong hay bao hm thoi gian din ra V2 th gia
V1 v V2 c quan h bo phan ton th (part whole).
4.2.3. Quan h cch thc d}c bit cua dng t:
Trong WordNet, quan h ha danh (hyponymy) dng vai tr chnh trong vic t
chuc danh tu, v du: canary (chim vng anh) l mt loai (ha danh cua) bird (chim),
nhung di voi dng tu, ta thy khng thch hop nu ni limp (di khp khing) l mt
loai cua walk (di b). Diu ny l do: su khc bit ngu nghia giua 2 dng tu th khc
voi nhung dc trung phn bit giua 2 danh tu trong quan h ha danh.
Trong vic xem xt quan h ha danh cua dng tu, nguoi ta nhn thy n
khng don gian nhu danh tu, m n lin quan dn su cn nhc ti mi v ngu nghia trn
cc truong nghia (semantic field) khc nhau. V du: khi phn tch cc dng tu chuyn
dng: slide (truot) v pull (ko), nguoi ta nhn thy rng chng l mt su kt hop
khc nhau giua nt nghia MOVE (chuyn dng) voi nt nghia MANNER (cch thuc) .
Chnh v vy, m trong WordNet, d su dung mt quan h moi, duoc goi l quan h
cch thuc (troponymy) d din ta V1 l V2 voi cch thuc dc bit, v du: limp (di
khp khing) c quan h cch thuc voi dc bit voi walk (di b) v di khp khing
l mt cch thuc di b dc bit. Cch thuc dc bit phai duoc hiu rng khng chi l
cch thuc d hnh dng, m cn c th l dinh, dng co, mi truong, d hnh
dng, d xay ra bin c, d hnh thnh trang thi.
Trong moi quan h cch thuc dc bit, giua dng tu V1 cua mt dng tu V2
tng qut hon, bao gio cung c quan h V1 cung ko theo V2. V du nhu: khi din ra
hnh dng di khp khing th hin nhin lc d cung phai din dang din ra hnh
dng di b.
V vy, ta c th ni: quan h cch thuc dc bit (troponymy) l mot truong
hop dc bit cua quan h ko theo (entailment). Mt quan h ko theo m trong d
thoi gian din ra 2 hnh dng cua 2 dng tu l trng nhau. Cn giua hai dng tu buy/

D ti: Tm kim ngu nghia ung dung trn linh vuc eDoc
0112274 Pham Thi M Phuong - 152 - 0112398 Tu Thi Ngoc Thanh
pay hay snore/ sleep th chi l quan h ko theo m thi chu khng c quan h cch
thuc dc bit (v thoi gian din ra cua 2 hnh dng khng trng nhau).

You might also like