You are on page 1of 9

Cng c h tr to ng ngha trang Web s dng k thut tch thng tin t vn bn

A Toolkit for Creating Semantic Web Contents Using Information Extraction Techniques
T Minh Phng, Trnh Hu Kin
Abstract. The Semantic Web is an extension of the current Web in which information is given formal and explicit meaning. The Semantic Web enables computer programs to understand information contents and thus facilitates more efficient discovery, automation, integration and sharing of data. To create Semantic Web contents one needs appropriate tools. In this paper, we describe such a toolkit we have constructed. The most important feature of the toolkit is that it makes use of information extraction techniques for automatically annotating Webpage contents. Experiments with a real life application show promising results and demonstrate the usefulness of the toolkit.

cha ca WWW, Web c ng ngha l s m rng ca WWW hin ti bng cch thm vo cc m t ngha (hay ng ngha) ca thng tin di dng m chng trnh my tnh c th hiu v do vy cho php x l thng tin hiu qu hn [1]. Nh vy, Web c ng ngha s bao gm cc thng tin (trang Web) c biu din theo cch truyn thng cng vi ng ngha ca cc thng tin ny c biu din mt cch tng minh. Vic thm phn ng ngha cung cp thm tri thc cho cc chng trnh (cc agent), gip nng cao cht lng phn loi, tm kim, trao i thng tin. Mun xy dng Web c ng ngha cn c cng c h tr. Trong bi bo ny, chng ti m t b cng c m chng ti xy dng phc v mc ch ny cng vi cc gii php k thut c la chn v s dng. Phn quan trng ca b cng c l phn tch thng tin t ng cho php rt ngn thi gian to phn ng ngha cho trang Web. minh ho cho vic s dng v th nghim b cng c, bi bo cng trnh by mt ng dng tm kim thng tin vi nhng trang Web c ng ngha do b cng c to ra. II. THNH PHN CA WEB C NG NGHA tin cho vic m t chc nng ca b cng c, phn ny s trnh by s lc v cc thnh phn ca Web c ng ngha. Cc thnh phn ca Web c ng ngha c chia thnh ba nhm chnh nh sau: Ontology v cc ngn ng dng biu din ng ngha thng tin. Cc cng c to nn phn ng ngha cng nh cu trc h tng ca Web c ng ngha.

I. T VN Vi nhiu t trang Web phn b trn hu ht cc quc gia, World Wide Web (WWW) l mi trng tt cho vic biu din v truy cp thng tin dng s. Tuy nhin, lng thng tin khng l cng to ra nhng kh khn ln trong vic tm kim, chia s thng tin trn WWW. Hin nay, thng tin trn WWW c biu din ch yu di dng ngn ng t nhin (cc trang Web trn ngn ng HTML). Cch biu din ph hp vi con ngi nhng li gy ra nhiu kh khn cho cc chng trnh lm nhim v h tr tm kim, chia s v trao i tin. Chng trnh my tnh khng hiu c thng tin v d liu biu din di dng thch hp vi con ngi. gii quyt vn ny, nhiu t chc nghin cu v kinh doanh phi hp nghin cu v pht trin Web c ng ngha (Semantic Web). Theo nh ngha ca Tim Berners_Lee gim c t chc World Wide Web Consortium (http://www.w3c.org), ng thi l

Cc ng dng s dng Web c ng ngha. Chc nng tng nhm c trnh by di y. 1. Ngn ng cho Web c ng ngha C ch cho php chia s v trao i ng ngha ca thng tin c bit n v s dng lu nht l ontology. Ontology l bn m t mt cch tng minh cc khi nim trong mt min ng dng no cng vi quan h gia nhng khi nim ny. Ontology cung cp t vng chung cho vic trao i thng tin gia cc ng dng v dch v Web. Bn thn phn ng ngha ca Web c ng ngha bao gm ontology v gi tr c th ca khi nim nh ngha trong ontology. biu din ontology v d liu cn c ngn ng thch hp. Trong qu trnh hnh thnh Web c ng ngha, nhiu ngn ng nh vy c xut v pht trin, trong c bit n nhiu nht l RDF v RDFS [2], DAML+OIL [8,9]. RDF v RDF Schema. RDF (Resource Description Framework) l c ch cho php m t d liu v d liu (meta data). RDF coi cc i tng trn Web (trang Web, on vn, ngi, cc i tng khc.v.v.) l cc ti nguyn. Mi ti nguyn c m t bi b ba i tng - thuc tnh gi tr. V d, mnh Phng l tc gi bi bo ti trang Web no s c m t bi b ba: http:// www, tc gi, Phng. RDF Schema (RDFS) l mt bin th n gin s dng c ch RDF. RDFS cho php m t cc thuc tnh c th cho ng dng, ng thi nh ngha lp cc i tng c cng thuc tnh . Vic nh ngha lp i tng vi thuc tnh v quan h rt cn thit cho vic xy dng ontology. DAML + OIL. RDF v RDF Schema ch cho php biu din ng ngha mc n gin. biu din ng ngha bao gm nhiu i tng c quan h lgic phc tp vi nhau cn cc phng tin biu din mnh hn. DAML (Darpa Agent Markup Language) v OIL (Ontology Interface Layer) l cc phng tin nh vy. DAML+OIL l mt m rng ca RDFS. Trong DAML+OIL, ng ngha c m t thng qua lgic m t (descriptive logic) cho php s dng lgic

bool khi m t quan h gic cc i tng v c nhiu kiu quan h c s hn so vi RDFS. 2. Cng c cho Web c ng ngha to v s dng Web c ng ngha cn c s h tr ca cc loi cng c sau. Cng c to v lin kt ontology. Cc cng c ny cho php to ra khi nim, thuc tnh ca khi nim, quan h v phn cp gia cc khi nim. Cng c loi ny thng c giao din ho v tun theo chun ca ng dng Web. V d in hnh cho cng c loi ny l Protg [11]. Cng c ch gii (annotation tools). Cng c ch gii cho php to phn ng ngha, tc l gi tr c th ca khi nim, thuc tnh v quan h t d liu thng thng ph hp vi mt ontology no . Gi tr to ra c th c biu din bi cc ngn ng c nhc ti phn trn. Hin nay a s cng c ch cho php ch gii bng tay, do vy qu trnh ch gii thng i hi nhiu thi gian [6]. Cc kho cha. Sau khi to ra, cc ontology v phn ng ngha phi c lu vo kho cha. Nhng kho ny thc cht l c s d liu cho php lu cc m t trn ngn ng RDFS hay DAML+OIL v cho php bin i cu truy vn trn nhng ngn ng ny thnh cu truy vn SQL. Mt trong nhng kho cha in hnh l Sesame [7]. Dch v suy din. Dch v suy din cho php tm ra gi tr c th ca cc khi nim hoc thuc tnh tng ng vi ontology c trong kho cha. Mt v d h thng suy din kiu ny l Ontobroker [5]. 3. Cc ng dng Web c ng ngha cho php tng cng chc nng, mc thng minh v tnh t ng ho ca nhiu ng dng hin c. Nhng lnh vc ng dng c bit ha hn cho Web c ng ngha l cc dch v Web, qun l tri thc v thng mi in t [3]. Dch v Web l cc chng trnh v thit b c th truy cp thng qua h tng WWW. Web c ng ngha cung cp thng tin v tri thc cn thit cho vic tm kim, tng tc, chia s v kt hp cc dch v Web.

Qun l tri thc lin quan n vic thu thp, lu tr, tm kim, truy cp v cung cp thng tin, tri thc trong cc t chc vi mc ch tn dng ti sn tr tu ca chnh t chc . Cng vic ny i hi mt s chc nng hon chnh hn cc h thng qun l vn bn hoc d liu thng thng nh tm kim thng minh, t ng tch thng tin t vn bn, lin kt c s d liu, t ng tng hp vn bn. Nhng chc nng ny c th thc hin c trn h tng m Web c ng ngha cung cp. S pht trin mnh ca thng mi in t hin nay dn n s lng khng l cc giao dch trn mng. t ng ho nhng giao dch ny, phn mm h tr cn c kh nng: chuyn i gia nhng dng vn bn tn ti trong giao dch in t, h tr ontology m t hng ho v dch v cho php cc agent tm kim, phn loi v thng lng v hng ho. III. KHI QUT V B CNG C Mc tiu ca b cng c l h tr ton b qu trnh
Son tho Ontology B ti RDF Schema To ng ngha vi RDF

cng c c sn song chng ti cho rng, vic kt hp chng trong mt h thng thng nht (vi mt s chnh sa nht nh) l cn thit h tr qu trnh to lp v truy vn Web c ng ngha mt cch hon chnh v ng b. Ngoi nhng cng c c sn, h thng cn c mt s thnh phn do chng ti t xy dng. Quan trng nht trong s l m un ch gii trang Web t ng s dng k thut tch thng tin t vn bn. Chi tit v vic tch thng tin v vn bn s c trnh by chi tit trong phn sau ca bi bo. minh ho cho hot ng ca b cng c to ng ngha, mt s mun tm kim thng tin da trn phn ng ngha va to ra cng c b sung vo h thng. Cc thnh phn ca ton h thng c th hin trn hnh 1. Trong hnh 1, hnh ch nht l cc khi chc nng, hnh elip biu din thng tin hoc d liu sinh ra t nhng khi chc nng . Cc hnh ch nht c ng bao m v nn xm l nhng thnh phn do chng ti t xy dng, hnh vi ng bao nht l nhng thnh phn c Cng c to sn c tch hp vo trang Web h thng. Cc thnh phn c sn bao gm b son tho ontology Trang Web Protg [11], kho cha m t RDF Sesame [7], b ti RDF v RDFS, nh ch mc mt phn my tm kim HTML s dng t kho truyn thng.
Ch mc HTML

Ontology

B sinh ch gii di dng RDF

Information Extractor

Phn to ng ngha c thc hin bi cc Kho cha Sesame m un nm trong hnh nh ch mc ch nht khng lin nt truyn thng gc trn bn tri. y RDF & HTML Giao din cng l phn chnh ca Search engine ngi dng h thng. Phn ng Tm kim ngha sinh ra s c s Hnh 1. B cng c to Web c ng ngha v ng dng i km dng cho ng dng tm to lp, lu tr v truy vn phn ng ngha ca trang kim thng tin thng minh vi my tm kim v giao Web. Qu trnh ny i hi s h tr ca nhiu cng din th hin pha di hnh v. m bo tnh c ring bit. Mc d nhiu cng c nh vy l nhng
Ch gii B ti RDF

tng thch ca phn tm kim cho Web truyn thng (khng c ng ngha), h thng cn bao gm m un nh ch mc HTML theo t kho ( pha bn phi trn hnh 1). H thng hot ng nh sau. Trc tin, ngi s dng to ra ontology cho mt min ng dng c th nh cng c son tho ontology. Sau ontology c chuyn thnh m t trn RDFS v c cha trong kho cha Sesame. Sau khi to c ontology, bc tip theo l ch gii cc trang Web, tc l thm phn ng ngha cho trang Web bng cch in gi tr cho cc khi nim v thuc tnh trong ontology bng thng tin ly t trang Web. Thng thng, vic ch gii c thc hin bng tay. Vi s lng trang Web ln, cng on ny i hi nhiu thi gian v d sinh li nh thiu ch gii, ch gii khng chnh xc. B cng c ca chng ti cho php gii quyt vn nh m un tch thng tin t trang Web v to ch gii t ng. ch gii cho mt trang Web, trang Web c a v m un tch thng tin t ng. Da trn cu trc ontology, m un ny tch t trang Web nhng thng tin v gi tr c th ca khi nim v thuc tnh cha trong ontology. Thng tin c tch ra bc trn c a sang b sinh ch gii. M un ny c nhim v to cc b ba RDF m t nhng thng tin c tch ra v chuyn m t va c to ra sang kho cha Sesame. Song song vi qu trnh trn, trang Web cng c nh ch mc theo t kho nh cch truyn thng. Cui cng, phn ng ngha c s dng trong my tm kim. My tm kim s dng ngn ng RQL truy vn kho cha, ng thi kt hp vi c ch suy din da trn ng ngha a ra kt qu tm kim thng minh. Cu truy vn c th c cho di dng ngn ng t nhin. Trong trng hp , phn ng ngha ca cu truy vn c tch ra cng bng k thut tng t nh phn tch thng tin phc v ch gii.

IV. TCH THNG TIN T VN BN V CH GII T NG Nhim v ca khi tch thng tin t vn bn l pht hin nhng thng tin, d liu tng ng vi cc khi nim trong ontology, tch nhng thng tin ny v chuyn cho khi sinh ch gii. V d, xt on vn bn sau ly t trang Web ng thng tin tuyn dng lao ng ( tin cho vic trnh by thut ton, v d c s dng l ting Vit. Trn thc t, phng php tch vn bn c xut cho ting Anh. Vic nghin cu ng dng cho ting Vit khng nm trong phm vi bi bo ny v c th c trnh by trong khun kh nhng nghin cu khc). Cn tuyn lp trnh vin cho d n thng mi in t. ng vin cn c t nht bn nm kinh nghim, c kh nng lm vic vi h iu hnh Windows v Unix. ng vin phi s dng thnh tho cc ngn ng lp trnh Java, Javascript, c bit phi c kinh nghim lp trnh Java khng di ba nm. u tin nhng ng vin c k nng lm vic vi c s d liu Oracle. Gi s ontology c cc khi nim, thuc tnh v quan h nh m t trn hnh 2. Qu trnh tch thng tin phi cho kt qu sau:
ngh: lp trnh vin lp trnh vin: kinh nghim :bn nm k nng: h iu hnh: Windows, Unix ngn ng: Javascript Java: kinh nghim :ba nm.

C nhiu k thut tch thng tin c cp n trong cc nghin cu [4,10,12]. Do vn bn cn ch gii l vn bn c cu trc yu (vit di dng ngn ng t nhin), ng thi thng tin tch ra phi c cu trc nh ontology quy nh nn chng ti la chn k thut tch thng tin m t trong [4] - k thut cho php tho mn tt nht hai yu cu ny. Chng ti cng thc hin mt s sa i qu trnh tch thng tin ph hp hn vi yu cu b cng c [13].

Ngh rdfs:subClassOf Lp trnh vin K nng rdfs:domain Kinh nghim

HH rdfs:domain rdfs:domain Ngn ng Kinh nghim

kho c trong vn bn. Kt qu nhn dng hng v t kho c cha trong bng nh m t bc 2. Bc 2: To bng Tn|Gi tr|V tr. Nhng hng v t kho c nhn dng bc trn c cha trong mt bng. Mi dng ca bng ny cha tn ca khi nim hoc thuc tnh ng vi hng hay t kho tm c, gi tr tm c, v tr bt u v kt thc trong vn bn. T kho c phn bit vi hng bng cch cho tin t KEYWORD trc. V d, t on vn bn trong v d trn, ta xy dng c bng sau (ch th hin mt phn ca bng)
lp trnh vin:kinh nghim|bn nm|80|86 ngn ng:kinh nghim|bn nm|80|86 KEYWORD lp trnh vin:kinh nghim|kinh nghim|88|98 KEYWORD k nng:h iu hnh|h iu hnh|126|137 k nng:h iu hnh|Windows|139|145 k nng:h iu hnh|Unix|151|154 KEYWORD k nng:ngn ng|ngn ng lp trnh|196|213 k nng:ngn ng|Java|212|215 k nng:ngn ng|Javascript|218|227 k nng:ngn ng|Java|270|273

Hnh 2: Mt v d ontology (khng y )

Qu trnh tch thng tin bao gm nhng bc sau: Bc1: Nhn bit hng v t kho. Hng l gi tr c th ca khi nim hay thuc tnh cha trong ontology. T kho l t hoc cm t cho php xc nh hng thuc v khi nim hay thuc tnh no. Chng hn, trong v d trn Java l mt hng, cn ngn ng lp trnh l t kho cho bit hng thuc v thuc tnh ngn ng ca khi nim k nng. Hng v t kho c xc nh bng cch s dng cc quy tc. Quy tc y l cc mu c biu din di dng regular expression (nh trong Perl) nhng c m rng thm bi mt s t vng. V d, mu nhn dng thi gian kinh nghim c cho nh sau:
Lp trnh vin: Kinh nghim case insensitive constant {extract S, [a-zAZ\s]*\s+nm }; lexicon {S case insensitive, filename number.dat }; keyword {\bkinh nghim\b } end;

Mu trn cho bit thuc tnh kinh nghim ca lp trnh vin c nhn dng bi biu thc bt u bi mt S, kt thc bi nm; S l t vng cha trong file c tn number.dat (t vng ny lit k cc xu mt, hai, ba.v.v.); t kho i km l \bkinh nghim\b. Cc mu nhn dng hng v t kho c cha trong ontology cng vi m t khi nim v thuc tnh. Nh vy, chng ti m rng ontology bnh thng cha thm nhng thng tin ny. Khi bt u qu trnh tch thng tin, tt c cc mu c ln lt s dng tm kim cc hng v t

Bc 3: To thng tin ng vi ontology t bng trn. bc ny, thng tin t bng Tn|Gi tr|V tr c s dng sinh ra gi tr cho khi nim v thuc tnh trong bng. Thc cht ca bc ny l gii quyt mu thun hoc khng r rng v thng tin trong bng bng cch s dng mt s quy tc heuristic. V d, trong bng trn, ta thy bn nm c nhn dng bc 1 va thuc loi kinh nghim lp trnh ni chung, va thuc loi kinh nghim lp trnh ngn ng do ph hp vi mu ca c hai thuc tnh ny. Hay Java cng c nhn dng hai ln, trong khi ch c th cho mt gi tr vo kho cha. y, ta s dng mt s heuristic sau: Nu mt khi nim hoc thuc tnh ch c php c mt gi tr nhng bng li cha nhiu gi tr th ch gi li gi tr gn t kho tng ng nht. V d, trong bng trn c hai hng cho thuc tnh lp trnh vin:kinh nghim l ba nm v bn nm. Heuristic ny cho php loi gi tr ba nm do nm xa t kho lp trnh vin : kinh nghim.

Nu c nhiu hng trng nhau th ch gi li hng tng ng vi t kho gn nht. Chng hn, trong bng trn c hai hng bn nm th ch gi li hng ng vi lp trnh vin:kinh nghim v nm gn t kho ny hn. Nu c nhiu gi tr hng / t kho lng nhau th ch gi li hng / t kho di hn. Chng hn, t kho kinh nghim lp trnh lng t kho kinh nghim nhng li di hn, do ch gi li kinh nghim lp trnh cho v tr . Nu mt khi nim ch c th c mt gi tr th chn hng u tin xut hin trong bng. Cc quan h mt-nhiu thng c th hin bi cc hng c v tr lng nhau trong vn bn. Trong cc quy tc trn, khong cch dng so sng c tnh theo v tr xut hin hng v t kho trong vn bn. Sau khi p dng nhng heuristic trn, cc hng cn li c chuyn sang b sinh ch gii bin i v dng RDF. V. TRIN KHAI V TH NGHIM 1. Trin khai h thng H thng c trin khai nh mt ng dng Web, mi giao din u s dng Web form v c hin th bng trnh duyt. La chn ny cho php xy dng v lu tr phn ng ngha tp trung trn my ch hoc ngay trn my cc b. Chng ti s dng nhng ngn ng v cng c sau trin khai h thng. Ngn ng lp trnh l ngn ng Java. Java c la chn do c nhiu u im: thch hp vi lp trnh ng dng Web, c th l h tr Servlet/JSP; l ngn ng hon ton hng i tng; khng ph thuc phn cng v h iu hnh, c th kt ni vi c s d liu thng qua JDBC. Ngoi ra, th vin chun ca Java (t phin bn 1.4) h tr regular expression cn thit cho phn tch thng tin. H thng bao gm hai c s d liu, mt dng cho kho cha Sesame v mt cha cc thng tin qun l ca h thng. C hai u c xy dng s dng h qun tr CSDL MySQL. y l h qun tr c s d

liu min ph vi nhiu u im nh nhanh, khng i hi nhiu ti nguyn. Phn mm my ch Web l Tomcat 4.1 (http://jakarta.apache.org/tomcat). y l phn mm min ph h tr Servlet / JSP. Phn nh ch mc trang Web v tm kim theo t kho c xy dng da trn my tm kim Jakarta Lucene (http://jakarta.apache.org/lucene). y l my tm kim m ngun m c vit trn Java v h tr nhiu tnh nng tm kim m rng vi t kho. 2. ng dng minh ho v th nghim Vi mc ch minh ho v th nghim, h thng c s dng ch gii cc trang Web cha thng tin c nhn v k nng ca lp trnh vin, sau phn tm kim cho php tm kim thng tin v nhng ngi ny da trn ng ngha hoc t kho. Trc ht, mt ontology v ngh lp trnh v nhng k nng, kinh nghim lin quan c to ra. Ontology ny ch cn to mt ln cho tt c cc trang Web. Sau khi c ontology, ngi dng s dng giao din ca h thng nhp trang Web cn ch gii. y c th to mi trang Web v ch gii lun hoc to ch gii mt trang c sn bng cch ti trang ln. Giao din nhp trang Web cn ch gii c cho trn hnh 3.

Hnh 3. Nhp trang Web cn ch gii

Sau khi xc nh trang Web cn to ng ngha v bm nt Create, b sinh tch thng tin s sinh ra ch gii. Ngi dng c th xem nhng ch gii c to ra v c th chnh sa theo mong mun. Hnh 4

minh ho phn ch gii v k nng ca lp trnh vin c to ra cho mt trang Web v d

lng xy dng ontology nh hng nhiu nht ti cht lng ch gii. Sau khi hiu chnh ontology, vi 30 trang Web c nhn, gi tr ca recall v precision tng ng l 88% v 95%. Cc ch s recall v precision nh vy l tng i cao v ph hp vi c im ca phng php tch thng tin la chn. Kt qu ch gii t ng c th hiu chnh bng tay sau cho kt qu tt nht. Do ni dung ch yu ca bi bo l trnh by v b cng c nn nhng th nghim ni trn ch mang tnh cht minh ho cho hot ng ca h thng. S lng mu th c s dng, do vy, khng ln v khng a dng. Tuy nhin, kt qu th nghim y hn v hot ng ca thut ton tch thng tinh t vn bn c trnh by trong [4] v cc bi bo lin quan.

Hnh 4: Ch gii v k nng c tch t trang Web

VI. KT LUN Bi bo trnh by vic thit k v xy dng b cng c h tr to Web c ng ngha cng vi ng dng minh ho. Kt qu xy dng cng c cho thy, vic s dng k thut tch thng tin t vn bn cho php gim ng k thi gian ch gii thng tin trn trang Web, phn vic chim nhiu thi gian nht khi to Web c ng ngha. Phn ch gii thng tin t ng s dng thut ton tch thng tin c chnh xc kh cao. Kinh nghim xy dng b cng c cng cho thy, vic hiu chnh v tch hp mt s cng c c sn cho php gim thi gian ng thi tng thm tnh nng b cng c v to thun li cho ngi s dng so vi dng tng cng c ring l. Tuy nhin, b cng c cn thiu mt s chc nng t ng khc nh t ng sinh ontology t vn bn. Phn ch gii t ng trang Web ng vai tr quan trng trong vic sinh ra ng ngha cho nhng trang Web c sn. Cht lng ch gii ph thuc nhiu vo qu trnh tch thng tin. Theo chng ti c bit, hin cha c nghin cu no cp ti vic tch thng tin t vn bn ting Vit. Phng php tch thng tin trnh by trn cng ch c xut cho ting Anh v cha c nghin cu p dng cho vn bn ting Vit. Tuy nhin, do c im ca phn tch thng tin s dng khong cch gia cc t, trong khi

Sau khi ch gii cc trang Web, ngi dng c th tm kim thng tin theo t kho v/hoc theo ng ngha nh v d trn hnh 5.

Hnh 5: Kt qu tm kim kt hp t kho v ng ngha

th nghim h thng, chng ti s dng 30 trang thng tin c nhn ca lp trnh vin ang lm vic ti trung tm xut khu phn mm FPT Fsoft v mt s trang ly t Internet. Nhng trang ny c ch gii t ng bi b cng c, sau ch gii bng tay v so snh kt qu. Kt qu c nh gi theo hai ch s recall (t l thng tin tch c/thng tin c trong vn bn) v precision (t l thng tin tch ng/thng tin tch c). Th nghim cho thy, cht

ng php ting Anh v ting Vit u quy nh cht ch th t cc t trong cu, thut ton tch thng tin c trnh by trn c th s dng cho vn bn ting Vit vi mt s sa i khng ln. Gi thit ny cn c nghin cu thm v c th l mc tiu ca nhng nghin cu tip theo. Li cm n: Nghin cu c thc hin vi s h tr kinh ph ca Hi ng Khoa hc t nhin. TI LIU THAM KHO
[1] T. Berners-Lee, J. Hendler, O. Lassila, The Semantic Web, Scientific American, May 2001. [2] D. Brickley, R.V. Guha, Resource Description Framework (RDF) Schema Specification, World Wide Web Consortium, Proposed recommendation 2001. [3] Y. Ding, D. FenselL, M. Klein, B. Omelayenko, The semantic Web: yet another hip? Data & Knowledge Engineering 41, Elsevier 2002, pp 205227. [4] D.W. Embley, D.M. Campbell, R.D. Smith, S.W. Liddle, Ontology-Based Extraction and Structuring of Information from Data-Rich Unstructured Documents, Proc. of 1998 ACM Inter. Conf. on Inform. and Knowledge Man., CIKM 1998, USA, pp 52-59. [5] D. Fensel, S. Decker, M. Erdmann, H.-P. Schnurr, R. Studer, A. Witt, Lessons learned from applying AI to the Web, Journal of Cooperative Information Systems 9 (4) (2000). [6] S. Handschuh, S. Staab, CREAM Creating metadata for the semantic Web, Computer networks, vol. 42, Elsevier 2003, pp 557-571. [7] http://sesame.aidministrator.nl/. [8] http://www.ontoknowledge.org/oil. [9] http://www.daml.org. [10] N. Kushmeric, Wrapper induction: efficiency and expressiveness, Artificial intelligence, vol.118,2000. [11] N. F. Noy, M. Sintek, S. Decker, M. Crubzy, R. W. Fergerson, M. A. Musen, Creating semantic Web contents with Protg -2000, IEEE Intelligent systems, 3-4/2001, pp 60-71. [12] S. Soderland, Learning information extraction rules for semi-structured and free text. Machine learning, 34. Kluwer Academic Publishers.(1999)

[13] Tu Minh Phuong, Information Extraction and Evaluation of Candidates with Fuzzy Set techniques, Proc. of Inter. Conf. on Fuzzy syst. and Knowl. discovery, FSKD 2002, Singapore, 2002, pp 481-485.

PH LC V d mt phn ontology c to ra cho v d trong phn 5.2. Ontology c cho di dng giao din Protg. A. Cc thc th trong Ontology Skills

B. nh ngha cc class trong ontology Skills

Ngy nhn bi 26/4/2004

S LC TC GI T MINH PHNG Sinh nm 1971 ti H Ni. Tt nghip i hc ti H Bch khoa Taskent nm 1993, bo v tin s ti Vin hn lm khoa hc Uzbekistant, Taskent, nm 1995 Hin l ging vin khoa Cng ngh thng tin 1, Hc vin cng ngh Bu chnh Vin thng. Hng nghin cu: tr tu nhn to, h tc t, logic m, bioinformatics. Email: phuongtm@fpt.com.vn TRNH HU KIN Sinh nm: 1982 Tt nghip i hc ti Hc vin Bu chnh vin thng nm 2003. Hin ang cng tc ti Cng ty phn mm FPT (Fsoft). Hng nghin cu: Pht trin cc ng dng s dng tr tu nhn to. Email: trinhhuukien@yahoo.com

You might also like