You are on page 1of 140

B GIO DC V o to tr-ng i hc nng nghip h ni

Bi ging Tin sinh hC

ThS. Phan Trng Nht B mn Cng ngh sinh hc

CH

NG I: GI I THI U V INTERNET V S RA I C A TIN SINH H C

1.1. Gi i thi u v Internet Khi ni m Internet l m ng my tnh ton c u ch c, trung tm, vi n nghin c u, tr lin k t cc t ng h c....

my tnh ho t chia s m t ph l TCP/IP.

ng hi u qu

th chng ph i cng c g i chung

ng ti n truy n thng

Cc my tnh giao ti p v i nhau b ng cch no? TCP/IP (Transmission Control Protocol/ Internet Protocol) g i l giao th c truy n d li u/ giao th c Internet. Giao th c ny cho php cc my tnh trn m ng trao i d li u v i nhau m t cch th ng nh t, t ng t nh m t ngn ng qu c t c m i ng i cng s d ng c th hi u nhau. M i my tnh trn internet l a ch IP c t m t tn duy nh t

V d : IP: 203.162.8.82 hay IP: http:// www.hau1.edu.vn

1.1.1. L ch s tr c a c

ra

i c a Internet c ra i d i s ti n nghin c u pht

N m 1969: M ng ARPANET quan qu n l cc d

tri n ARPA (American Research Projects Agency) thu c B Qu c phng M (US Department of Defence). Kh i i m l 4 nt m ng M :
  

t t i 4 tr

ng

ih cc a

i h c California Los Angeles (UCLA) i h c California Santa Barbara (UCSB) i h c Utah

 H c vi n nghin c u Standford (SRI)

l m ng lin khu v c (WAN: Wide Area Network) u tin c xy d ng, nh d u s ra i c a internet ngy nay. Trung tm nghin c u Xeroc Corporation Palo Alto pht tri n chu n k t n i Ethernet. Nh ng n m 1980, giao th c TCP/IP trn Ethernet tr thnh giao th c thng d ng trn m ng c c b . N m 1983, B Qu c phng M MILNET: dnh cho cc ho t tch ARPANET lm ng qun s . ng phi qun s ,

hai m ng con:
 

ARPANET m i: dnh cho cc ho t tr ng i h c, vi n nghin c u.

N m 1986, T ch c qu khoa h c qu c gia NSF (National Science Foudation) thnh l p m ng NSFNET. Nhi u doanh nghi p chuy n t ARPANET sang NSFNET. N m 1990, ARPANET ng ng ho t ng sau g n 20 n m. N m 1995, NSFNET thu l i thnh m t m ng nghin c u cn Internet th v n ti p t c pht tri n. N m 1991, WWW (World Wide Web) ra mng cho vi c chuy n t i thng tin a ph i t n n ng ti n

(multimedia) thng qua cc siu lin k t (hyperlink) r t ti n d ng cho vi c khai thc internet. T (World Wide Web Consorticum) ra chu n chung cho Web. ch c W3C i: nghin c u cc

Cu i n m 1992, xu t hi n nh cung c p thng tin th ng m i u tin l Delphi. Thng 6/1993: c kho ng 130 website. N m 1994: c kho ng 3.000 website. Hi n nay: Vi tr m tri u website. Khng c b t k c nhn hay t ch c no c ton quy n ki m sot internet m m i nh qu n tr ch qu n l ph n m ng c a t ch c mnh. internet ho t ng theo m t chi u h ng th ng nh t th hi p h i internet v W3C c nhi m v pht tri n cc giao th c truy n thng tin chung trn internet v theo di cc chu n v web.

   

S l

ng my ch : N m 1981: kho ng 200 my N m 1985: kho ng 2000 my Nay: > 9.000.000 my thnh m ng l n nh t th gi i: m ng c a

Internet tr

cc m ng v xu t hi n trong m i l nh v c: Chnh tr , qun s , th h i... ng m i, nghin c u, gio d c, v n ho, x

1.1.2. S hnh thnh Internet Vi t Nam N m 1993 m ng VARENET (Vietnam Academic Research Education Network) c thnh l p, t o ti n cho vi c hnh thnh m ng l i internet Vi t Nam. VARENET ra i t Ch ng trnh h p tc nghin c u khoa h c, tri n khai cng ngh m ng t i Vi n Cng ngh Thng tin thu c Vi n Khoa h c v Cng ngh Vi t Nam v i s h p tc khoa h c c a i h c Qu c gia Australia (ANU). My ch c a m ng VARENET t t i ANU. N m 1993: VARENET ch c m t ch c n ng duy nh t l ph c v th i n t (E-mail) cho cc v n phng i di n n c ngoi, cc Cty lin doanh hay 100% v n n c ngoi do tnh ch t m i v chi ph ti chnh cao t i Vi t Nam.

Ngy 19

11 - 1997, khi Chnh ph Vi t Nam quy t

nh

chnh th c k t n i internet th tn mi n (.vn) Australia bn giao cho T ng c c B u

c pha

i n Vi t Nam. S

hnh thnh c a hng lo t cc nh cung c p d ch v internet sau lm m nh t vai tr c a VARENET. Sau VARENET, m ng di n r ng th (Vietnam Network) ra Thng tin Th t , i ng m i thu c B Th hai l VINANET ng m i. VINANET c v qu c truy php.... T c

Vi t Nam c a Trung tm ng trong n

cung c p thng tin gi c th tr c p th i k ny l 2,4kbps qua

a ch doanh nghi p, v n b n t

ng dy i n tho i.

N m 1997, hng lo t cc nh cung c p d ch v internet (ISP) v cc nh cung c p thng tin ln internet (ICP) nh : VNN, FPT, Saigonnet, Netnam v CINET. VNN (Vietnam Network) l m ng my tnh c a Cng ty i n ton v truy n s li u VDC (Vietnam Datacommunication Company) thu c T ng cng ty B u chnh vi n thng Vi t Nam, hnh thnh n m 1997. FPT (Company for Financing and Promoting thu t Qu ng

Technology) l Cng ty Ti chnh v K co, thnh l p n m 1997.

Saigonnet thu c Cng ty C

ph n B u chnh vi n

thng Si gn SPT (Saigon Post and Telecommunication Service Corporation), thnh l p n m 1997. Netnam thu c Vi n Cng ngh n m 1998. CINET (Culture and Information Net) thu c B ho v Thng tin, thnh l p n m 1997. Trong s cc ISP k trn, VNN d n u th u danh sch v i v a l IAP (cung c p c ng truy c p internet) V n thng tin, thnh l p

v a l cung c p d ch v internet ISP (Internet Service Provider) v ICP (cung c p n i dung trn internet).

1.2. K t c u m ng Internet 1.2.1. Cc ki u m ng: M ng c c b LAN (Local Area Network) M ng vng trung tm MAN (Metropolitan Area Network) M ng di n r ng WAN (Wide Area Network)
 M ng c c b

LAN: l m ng nh tr

nh t, trong vng vi n k t n i tr c u

km, ngo i tr

ng h p my tnh

ti p v i internet, t t c cc my tnh c n i m ng n i vo m ng LAN. M ng LAN nh, tr ng h c, th vi n, b nh vi n

c dng cho m t to

 M t c i m c a m ng LAN l khi m t my tnh truy n d li u th t t c cc my tnh trong m ng u c th nh n d li u , c tnh ny g i l broadcasting.  My tnh trong m ng LAN s d ng k thu t g i l Carrier Sense Multiple Access/ Collision Detect (CSMA/CD) ngh a l khng g i khi my tnh khc ang g i v ki m tra nh ng g g i i c xung t v i cc my tnh khc.  Cng ngh LAN m i nh t hi n nay l m ng khng dy s d ng tia h ng ngo i hay sng radio thay cho cp truy n tn hi u m ng. T c truy n t 1 n 11 Mbps, n thch h p cho nh ng ng i di chuy n th ng xuyn hay nh ng n i m khng th t dy cp.

 M ng vng trung tm MAN

Khc v i m ng LAN l dng chung m t thi t b truy n trong m ng nn cho php nhi u my tnh k t n i vo cng m t s i dy, m ng MAN s d ng cc k t n i i m

n i m (point to point) v i ch m t my tnh t i cu i m i lin k t. Cc my tnh t i cu i m i lin k t c a MAN c ng c th k t n i v i cc m ng LAN, MAN v WAN.

 M ng di n r ng WAN

Ph m vi c a m ng c th l m t qu c gia hay th m ch c l c a. C ng gi ng nh a s MAN, m ng WAN dng

cc k t n i v t l i m Cng ngh WAN th xy d ng

n i m nh ng dng cp xo n. cc h th ng

ng c ngu n g c t

ph c v cho cc cng ty i n tho i.

1.2.2. K t n i Internet  K t n i v t l: k t n i cc thi t b ph n c ng nh modem, dy cp th c hi n vi c n i t m t my tnh n m ng internet thng qua cc nh cung c p m ng.
 Sau khi k t n i v t l, vi c k t n i internet c th hi n theo hai cch:


th c

K t n i tr c ti p: c n c modem t c cao n i v i c ng V35 c a thi t b nh tuy n (Router) k t n i tr c ti p vo internet thng qua knh thu bao ring. Cc lo i d ch v k t n i do cc nh cung c p d ch v internet bao g m: ng truy n th ng tr c (Leased Line) i x ng ADSL ng dy thu bao s b t (Asymetrical Digital Subcribe Line)

 K t n i gin ti p: ch c n m t modem v m t

ng dy

i n tho i

quay s vo m ng. D ch v ny c:

Quay s k t n i qua m ng i n tho i Dial-Up M ng s tch h p a d ch v (Intergrated Service Digital Network)


 So v i k t n i internet gin ti p, k t n i internet tr c ti p

c nhi u u i m nh : b ng thng r ng, t c nh, ho t lo i gin ti p. ng lin t c (online 24/24).

cao v n ng nhin chi

ph cho vi c k t n i tr c c ng t n km h n nhi u so v i

Sau khi l a ch n cch k t n i internet th chng ta quan tm n d ch v no cho php chng ta l t trn internet. C 2 nhm chnh l:  Cc d ch v tr c tuy n (online service) nh America Online (AOL) v CompuServe th ng cung c p m t l ng l n cc d ch v Intergrative Digital bao g m Information retrieval, th i n t (e-mail), b ng tin (bulletin board) v chat room nh ng i s d ng tr c tuy n ng th i c th quan tm n m t vi l nh v c cng m t lc.  Cc nh cung c p d ch v internet ISP (Internet Service Provider) bao g m vi c c p ti kho n truy c p internet cho ng i s d ng ng th i cung c p cc d ch v internet.

1.3. M t s nt khi qut v WWW (World Wide Web) v trnh duy t Web 1.3.1. WWW v nguyn l ho t WWW l g? : WW l d ch v cung c p thng tin trn h th ng m ng d i ng

Internet/Intranet. Cc thng tin ny d ng t p tin siu v n b n (hypertext) v trnh duy t web (Web Browser).

c l u tr

c truy xu t b i

 Siu v n b n l cc t li u ch a v n b n, hnh nh t nh, hnh nh ng, m thanh, video c lin k t v i nhau qua cc siu lin k t (hyperlink). Thng qua cc siu lin k t, ng i dng c th nhanh chng tham kh o cc t li u lin quan m t cch d dng. 

truy xu t cc thng tin trn Web Server, cc khch hng s d ng web (Web Client) ph i s d ng ch ng trnh c ch c n ng duy t cc thng tin d ng siu v n b n g i l trnh duy t web. C nhi u lo i trnh duy t web nh : Internet Explorer, Netscape Navigator, Opera, Neoplanet Trong s , 2 trnh duy t c s d ng ph bi n l Internet Explorer v Netscape Navigator.

1.3.2. Vi nt c b n c a trnh duy t Web Ti m n ng c a Internet ch th c s pht huy c khi c cc trnh duy t web ra i. Chng cho php truy c p t i ngu n thng tin cc v tr khc nhau. Cc Browser l cc tr m cng tc c kh n ng x l ho c yu c u l y thng tin ho c cc ch ng trnh ng d ng t my ch c a m ng. Trong trang ch l i m trung gian gi a browser v server. Hi n nay c m t s Web browser ang dng ph bi n l Lynx cho h i u hnh Unix ho c VMS; Mosaic cho cc my Apple Mac, X-Windows; Internet Explorer v Netscape Navigator cho cc my Windows.

u i m:


S h c

d ng d dng, khng c n ph i hi u bi t nhi u v tin my tnh. i s d ng khng c n bi t v tr chnh xc c a trang trn m ng internet, m ch c n ch n c cn i

Ng

v n b n, hnh nh

n b ng cch nh p chu t vo cc lin k t dung mu n tm.

Cc nguyn t c duy t Web:


 

Xc Xc

nh r thng tin mu n tm trn web. nh c nh ng Web site no thch h p cho vi c

truy tm thng tin.




C th m nhi u c a s cho m i website trong vi c tm ki m b ng cch ch n File/New Windows ho c t h p phm Ctrl + N.

Mu n m m t lin k t trong m t c a s m i, nh p ph i chu t v ch n Open link in new windows.

Ch c n ng c a trnh duy t:
    

Cho php xem trang web. L u l i nh ng S d ng cc ch Cc trnh duy t l ng t qua, c th i. khc bao g m: Ng n ch n nh ng ng v c th c. a ch URL c a trang web. ng trnh E-mail trn web (Webmail) u s d ng a c ng t m ghi cc is d ng v a S d ng d ch v FPT b ng trnh duy t web (Web FPT)

a ch trang web (g i l cache) m ng

i u ch nh tu theo nhu c u c a m i

Cc ch c n ng h tr thay

trang web mang n i dung x u, ph n i phng ch , kch th

1.4. Cc d ch v , ti nguyn v 1.4.1. Cc d ch v : E-mail (Electronic mail): Th

c trn Internet

i nt

WWW: M ng thng tin ton c u th hi n thng tin d ng siu v n b n. FTP (File Transfer Protocol): Giao th c truy n t p tin trn m ng.


Chat: H i tho i tr c ti p trn Internet

VoIP (Voice over Internet Protocol): K thu t chuy n t i gi ng ni qua giao th c Internet hay cn g i l i n tho i Internet.

Video Conference: H i ngh truy n hnh.

WAI (Wireless Application Protocol): Giao th c s cng ngh khng dy.

d ng

1.4.2. Lu t l , c a Internet Khng

c v cc quy

nh ho t

ng

c truy c p (Access) b t h p php vo nh ng

h th ng i h i ph i c Username v Password. Khng ph ho i v gy r i lo n h th ng l u thng trn Internet (gieo r c, pht tn virus). Khng lng ph ngu n ti nguyn (khng download nh ng t p tin qu l n m ch ng trong gi cao i m. Khng xo t p tin c a ng ring t c a ng i khc. i khc. Khng xm ph m, pht tn nh ng thng tin c tnh ch t lm g, c bi t

1.5. S

ra

i v vai tr c a tin sinh h c

1.5.1. Bu i bnh minh c a trnh t Trnh t Protein Trnh t axit nucleic 1.5.2. S ra Do s i c a tin sinh h c d n t i nhu c u qu n c xu t hi n c a cc thng tin v c u trc, ch c n ng on c u trc v ch c n ng c a sinh

v trnh t c a protein, DNA t l, so snh v d v t v s

pht tri n c a cc ngnh khoa h c khc c ra i.

bi t l cng ngh thng tin, my tnh. Do nhu c u m Tin sinh h c

1.5.3. Khi ni m Tin sinh h c




L m t mn khoa h c phn tch cc c Cc ngnh h c c a Tin sinh h c bao g m:


     

li u sinh

h c nh s h tr c a my tnh v cc cng c th ng k.


Tin sinh h c genome Tin sinh h c protein Tin sinh h c ti n ho Tin sinh h c nng nghi p Tin sinh h c y h c Pht tri n cc cng c v c s n n

1.5.4. Vai tr v xu h

ng pht tri n c a Tin sinh h c

a/ Vai tr c a Tin sinh h c:




T p h p, l u tr , s p x p, truy xu t v chia s c d li u.

H tr

cho vi c tm ki m, phn tch, x

l v d

on

cc k t qu nghin c u.


tr

trong cc nghin c u v

c u trc khng gian

phn t .


H tr v t

trong nghin c u a d ng v ti n ho c a sinh

b/ Xu h

ng pht tri n c a Tin sinh h c ang c t p trung

Nh ng l nh v c c a Tin sinh h c nghin c u:




Qu n l c s d li u Phn tch, bin d ch d li u Pht tri n cc thu t ton Cc c u trc c s d li u Thi t k cc giao di n v hi n th

c/

a ch c th tm hi u thm nh ng ti li u v Tin sinh h c:


      

http://www.iscb.org http://www.ncbi.nlm.nih.gov http://www.bioinformatics.org Cc t p ch v Bioinformatics Cc cng c tm ki m (Google, Yahoo ) Cc h i ngh , h i th o Cc th vi n

CH NG 2 TM KI M THNG TIN TRN INTERNET 2.1. Khi ni m v thng tin 1. Thng tin l g? Thng tin l cc d li u v tri th c c s d ng trong th c ti n gi i quy t m t s v n ho c m t nhi m v no . 2. Cc thu c tnh c a thng tin  Gi tr c a thng tin ph thu c vo: Ch t l ng c a thng tin Trnh c a ng i s d ng  Thng tin c ch t l ng khi n c nh ng tnh ch t sau: Chnh xc v ng tin c y K p th i c tr ng v nh h ng cho ng i dng thi c th

Khi ni m chung v tm tin: ch cng vi c tm ti li u

Tm tin l m t thu t ng chung

hay ngu n c a ti li u, c ng nh thng tin v d li u v s ki n m ti li u cung c p. 2.2. Cc cng c tm ki m thng tin


 

Bi u th c tm tin: Bi u th c tm tin l m t t p h p cc t kha k t v i nhau b ng cc ton t logic. Cc lo i ton t logic th Php n i c lin

 

ng s d ng khi tm tin:

ng th i hai t kha hay v (and, +): V d :

Cy la + hoa mu ho c Cy la & Hoa mu .

Php l a ch n hay ho c: V d : Cy la or Hoa mu . Php lo i tr : V d : Cy la Php ph Computer. Hoa mu .

 

nh (Not, !): V d : Internet &!

S d ng cc d u ngo c: V d : (PCR or RAPD) and not (AFLP or SSR). Cc ton t g n ng: NEAR (g n nh ), ADJ (g n nh ) , SAME ( (theo sau b i). i lo i gi ng nh ), FBY

  

Nh ng l u v vi c ch n l a t

kha c ngoi khng t u

T t nh t ch nn dng danh t lm t kha. Trong khi tm ki m cc ti li u ti ng n nn s d ng cc m o t , gi i t . S p x p cc t tin. Nn s Trnh s r t nhi u kha quan tr ng tr kho (th

c theo th

d ng t nh t l hai t d ng cc t th

ng l 3) v k t

h p cc t kha thnh nh ng c m t .


ng s

d ng (t n su t l p l i kha

h u h t cc ti li u). cho ti li u ho c tn trang Web.

tm thng tin c th t t nh t hay ch n nh ng t m c th s l tiu

2.3. Cch tm ki m thng tin


 

Nguyn t c chung M nhi u c a s trnh duy t (Web browser) trong khi khai thc thng tin t ng t c tm tin (Ctrl + N). Khng nn m tr c ti p m t hyperlink ngay trn trang web chnh m m ring thng tin trn m t trang Web m i (Open in new windows). Cc cch tm tin

a/ Tm tin theo th m c ch : Th m c ch l m t t p h p cc ti li u c lin quan n thng tin m chng ta c n tm ki m. Cc u i m v h n ch c a th m c ch qu tm ki m. : Ch a cc thng tin c th , chnh xc v t xu t hi n trong cc k t

Khi no nn s d ng cc th m c ch

: nv n

Khi mu n xem thng tin no s n c trn trang Web trong m t l nh v c, ph m vi c th lin quan mnh quan tm trong th i gian ng n.


Cc th m c ch

tiu bi u:

Yahoo! (http://www.yahoo.com) Excite (http://www.excite.com/) LookSmart (http://www.looksmart.com) Magellan (http://magellan.excite.com/) Open Directory Project (http://www.dmoz.org) Snap (http://www.snap.com/)

b/ Tm tin theo t


kha: nh t kha

tm thng tin theo t kho ngoi vi c xc

v bi u th c tm tin chng ta c n ph i l a ch n m t cng c




tm ki m tin hay cn g i l cc search engine.

Vi c tm ki m nh cc search engine c r t nhi u l i th v: Th nh t thng tin tm c s c th v chi ti t h n. l c thng tin nh : tm nh d ng file...

Th hai c r t nhi u tiu chu n

thng tin theo th i gian, ngn ng ,

c/ Tm tin theo cc tr


ng : Title: t kha , k t qu s cho t kha ch n.

Tm ki m theo tiu ra t t c

cc trang Web c tn nh

Nhanh h n r t nhi u so v i tm ki m t kha trong ton b ti li u.




Tm ki m theo tn mi n: Tn mi n g m 3 ch

vi t t t

c a m t l nh v c m trang web ch a thng tin lin quan. Vi d : www.hau1.edu.vn khi domain l: edu
 

Tm ki m cc hnh nh: image: bones.gif Ngoi ra cn c nhi u tr language. ng tm ki m khc nh : object, text, sound, pictures, date, anchor, applet v

2.4. Cch ch n v nh gi tin c y c a thng tin 2.4.1. Cch ch n thng tin 2.4.2. nh gi tin c y c a thng tin

CH

NG 3: C S D LI U SINH H C V CC NGN HNG C S D LI U s d li u sinh h c ng ny ch axit nucleic protein, genom,

3.1. Khi qut chung v c




C y u

li u sinh h c (CSDL) trong ch n cc thng tin v trnh t

c p

(ADN, ARN), trnh t

axit amin c a cc phn t

thng tin v c u trc v gi i ph u c a m t s m hnh c u trc khng gian c a cc




i phn t . b im th

Cc thng tin ny

c s p x p v l u tr

th ng cc my ch r t m nh c a 3 ngn hng gen l n nh t th gi i l NCBI, EMBL v DDBJ.

3.2. Phn tch d li u ADN v Protein




Cc thng tin v d

li u ADN, protein: ch y u l trnh c coi nh m t th vi n u c c

t nucleotit v trnh t axit amin




Ngn hng gen c ng c th sch, trong

m i cu n sch chnh l m t trnh t

nucleotit ADN ho c axit amin c a protein v chng c nh s .




B ng cch ny hay cch khc chng ta c th tm trnh t c n quan tm. Tuy nhin, v n ph i l tm b ng cch no m chng ta ph i hi u cu n sch vi t v ci g v s d ng n nh th no?

y khng

a/ D li u ADN v Protein l gi?




li u v trnh t

nucleotit trong ADN v trnh t

axit m c

amin trong protein l nh ng thng tin sinh h c phn t . tr t t i v i ADN l s l

ng, thnh ph n v

s p x p c a cc nucleotit, ribonucleotit trn m t

phn t ADN, mRNA.




Cc thng tin v ADN th

ng

c ch r chng m ha it ng sinh v t no?

cho s n ph m gen g? c m t phn b nv n

u. Ngoi ra cc thng tin ny cn lin quan nghin c u g v c a tc gi no?

i v i Protein, l s l

ng v tr t t

s p x p c a cc axit amin trong m t phn t protein. Cc thng tin ny c ng bao g m cng b c tnh v vai tr c a protein v tr th m ch cn a ra nh ng gi c m t trong t bo, m, c quan, tc gi thi t v c u trc c a phn t .

b/ Genomic v Proteomic Genomic: L t t c nh ng d li u v thng tin di truy n nh. T c l h ng ti p c n t

c a m t loi sinh v t nh t ADN.

Proteomic: L t t c cc s n ph m gen (protein) c a m t t bo, m ho c c quan c a m t sinh v t no trong nh. Xt ph m vi h p, c

m t giai o n sinh l nh t

th ni l t p h p s n ph m d ch m c a t t c cc mRNA c m t trong t bo sinh v t t i th i i m nghin c u. T c l h ng ti p c n t protein.

Lu n thuy t trung tm

Phin m

DNA

RNA

D ch m Protein

c/ S d ng cc d li u ADN v Protein
 

lm g?

i v i trnh t nucleotit: So snh m t o n ADN b t k v i cc d li u trong ngn hng gen c th chng ta xc nh c o n ADN c a sinh v t no (Bi th c hnh tm ki m trnh t t ng ng). Bi t c trnh t s p ADN c th suy ra trnh m ch polypeptide n u hnh d ch m 1 phn t x p cc nucleotit c a m t o n t cc axit amin t ng ng trn o n ADN m ha (Bi th c ADN sang trnh t axit amin).

Xc nh t bi n, s sai khc v trnh t nucleotit trong cng m t s n ph m gen (isozyme, allozyme ) c ngh a trong nghin c u ti n ha v ng d ng th c ti n.

m t phn lo i sinh h c,

i v i m t s

gen c tnh

b o th cao, mang tnh

c th loi, ch ng h n cc gen

m ha cho ARN ribosome (rRNA). D a vo nh ng trnh t ADN c a cc gen ny i ta c th sai khc v nh ng loi sinh v t khc nhau s xc nh

m ng m c

so snh chng trn c trnh t i loi. nucleotit t

m ph ng

m i quan h loi, d


(Bi th c t p xc

nh m i quan h di truy n b ng cch

so snh trnh t nucleotit)

Hnh thi gi ng nhau

V t ch t di truy n nh th no?

Early globin gene Gene Duplication

E-chain gene
mouse E human E cattle E

-chain gene
cattle human mouse

Bi t hay s

c trnh t c a m t gen (ch ng h n gen ung th c m t c a cc virus nguy hi m ch ng h n H5N1, m tr ng tm ) ng i ta c th pht ng n ch n,

b nh virus i u tr .


hi n s m b ng k thu t PCR, lai ADN Thi t k nh ng c p m i (primer) ny cho nh ng m c Nghin c u s khc nhau (xc o n, xc cn s hi n s nh s c m t c a gen

nhn b n cc o n : trong cc sinh v t

ch nghin c u khc nhau nh

c m t gen ch ng b nh b c l, pht

nh gi i tnh, b nh di truy n ). Ngoi ra, ho t nh. ng c a cc gen trong

d ng cc k thu t microarray, DNA chip c m t v m c

nh ng i u ki n nh t

trnh t cb n

nucleotit c a m t phn t

ADN c th

bi t

cc v tr nh n bi t c a cc enzym c t h n c bi t c ngh a trong k ngh ADN ti

ch .

i u ny

t h p.


(Bi th c hnh xc d nh b n ADN P)

gi i h n c a genome

M t trong nh ng ph

ng php tr

li u gen (gene

therapy) d a trn trnh t mRNA

ribonucleotit trn phn t

t ng h p s i b sung (antisense) nh m ng n ng c a cc gen .

ch n s ho t

M t trong nh ng ng d ng quan tr ng l chuy n gen t o ra cc sinh v t m i mang nh ng n m men c tnh mong ng mu n ho c c th chuy n gen vo cc t bo vi khu n, s n xu t s n ph m gen theo con ti t h p (protein, enzym, vaccine v cc h p ch t c ho t tnh sinh h c). N u nh c th chng ta bi t nh gi bi t cs c thnh ph n, trnh t s p

x p c a cc axit amin trong phn t trong cc phn t loi khc nhau

protein, enzym no cc

sai khc gi a cc axit amin c thnh ph n axit amin no

protein, enzym cng ch c n ng

ng vai tr quan tr ng.

i v i trnh t

axit amin c thnh ph n, trnh t s p

 N u nh

chng ta bi t

x p c a cc axit amin trong phn t c th nh gi cs

protein, enzym no

sai khc gi a cc axit amin cc

trong cc phn t loi khc nhau

protein, enzym cng ch c n ng bi t

c thnh ph n axit amin no

ng vai tr quan tr ng.


 T

trnh t

axit amin c a phn t

protein, c th

suy

di n ra trnh t nucleotit c a gen m ha.

trnh t

axit amin c th d

on

c c u trc ba protein,

chi u, v tr ho t enzym .


ng (domain) c a phn t

Ngy nay, vi c pht hi n s c a phn t thu t hi n protein c th i nh

ng

ng hay s

c m t

c th c hi n b ng cc k nh

kh i ph . Tuy nhin vi c xc c.

trnh t cc axit amin l khng th thi u

3.2. CSDL c a cc ngn hng gen C 3 CSDL l n nh t (NCBI, EMBL v DDBJ) y thng tin c u c

b n v cc CSDL trn. Tuy

nhin m i ngn hng c m t cch phn lo i v t ch c cc lo i d vi bi gi ng, ch CSDL chnh, li u khc nhau. Trong ph m ng ny ch c s d ng th c p n nh ng

ng xuyn.

3.2.1. CSDL c a EMBL/EBI

a/ CSDL ti li u (Literature Databases)




Medline: Bao qut t t c cc l nh v c c a y h c, ch m sc b nh nhn, nha khoa, th y, h th ng ch m sc s c kh e v khoa h c ti n lm sng. Omim: Di truy n Mendel ng i (Online Mendelian Inheritance in Man -OMIM) l m t t p h p c a cc gen v cc r i lo n di truy n. Patent Abstracts: Cc b n tm t t c lin quan n CNSH c a cc ng d ng m hnh l y t cc s n ph m d li u c a European Patent Office (EPO). Taxonomy: CSDL phn lo i c a ISDC (International Sequence Database Collaboration) ch a cc tn c a cc sinh v t c trnh by d i d ng CSDL trnh t .

b/ CSDL Microarray (Microarray Databases)




ArrayExpress: M t CSDL cho microarry d a vo d li u bi u hi n gen. Miame : Thng tin t i thi u v Experiment (MIAME). m t th nghi m

microarry (Minimum Information About a Microarray Cng ngh microarry t n d ng cc ngu n trnh t t o ra t cc d n xc nh trnh t genom cu h i l cc gen no ang t bo nh t nh trong nh ng i u ki n nh t c bi u hi n nh c tr l i m t d ng

nh c a m t sinh v t

m t th i gian nh t

c/ CSDL Nucleotide (Nucleotide Databases)




CSDL trnh t n

nucleotide c a EMBL l m t thnh vin cc

c chu u trong 3 CSDL l n nh t th gi i. C th truy genom hon ch nh cng v i

c p vo hng tr m trnh t

cc s n ph m protein d ch m nh my ch c a EBI. EBI.




ASD: ASD: CSDL phn c t n y sinh (Alternative Splicing Database) ch a d li u v cc exon phn c t pht sinh n ASD nh m quy m

cng v i cc thng tin b sung i km. D km. hi u r h n v genome. genome. c ch

c t ghp n y sinh

ATD: CSDL a d ng v cc b n phin m n y sinh (Alternate Transcript Diversity Database ATD) ch a d li u v cc b n phin m trong m i b n phin m

c m t

cho m t d ng c t ghp n y sinh v s

polyadenyl ha n y sinh (alternative polyadenylation).




EMBL-Align database: CSDL so snh nhi u trnh t . EMBL-Bank: Ngn hng EMBL cn trnh t c g i l CSDL

nucleotide EMBL, ng gp vo ngu n trnh t

nucleotide s c p c a chu u.

EMBL CDS: l m t CSDL c a trnh t trnh t m ha (CDS coding sequence) Ensembl: M t t genom hon ch nh truy c p v i m t s l

nucleotide c a

 

ng c a cc genome eukaryote. EBI. Nh ng trang web ny cho php ng l n cc genom hon ch nh. c ch nh s a bao nucleotide c a m t s trnh t

Genomes Server: m t ci nhn t ng quan c a cc

Genome Reviews: CSDL genom genom hon ch nh t EMBL/GenBank/DDBJ CSDL trnh t

g m cc phin b n chnh xc c a cc m c tra c u (entry)

Karyn's Genomes: thu th p v m t genom.

IMGT/HLA: CSDL di truy n mi n d ch, bao g m CSDL IMGT/HLA c a ph c h ph h p t ch c (MHC). CSDL di truy n mi n d ch IMGT/LIGM bao g m CSDL IMGT/LIGM c a cc Ig v cc th th t bo T. IPD: CSDL a hnh mi n d ch (Immuno Polymorphism Database IPD), bao g m cc gen a hnh c a h th ng mi n d ch, ch ng h n nh KIR, HPA v MHC khng ph i c a ng i. LGICdb: CSDL cc ch t g n cc knh Ion (Ligand Gated Ion Channel Database) Mutations: D n CSDL s variation database project) a hnh trnh t (Sequence

Parasites: CSDL genome k sinh (Parasite Genome databases).

d/ CSDL protein (Protein Databases) EBI pht tri n v duy tr m t s CSDL protein c lin quan v i nhau. Danh sch c a cc d n v CSDL: nhau. CSDL:  CluSTr: CluSTr: xu t m t s phn lo i t ng c a UniProtKB/SwissUniProtKB/Swiss-Prot + UniProtKB/TrEMBL. UniProtKB/TrEMBL.  CSA: T p h p cc v tr xc tc (Catalytic Site Atlas) l CSA: m t ngu n c a cc v tr xc tc v cc g c c tm th y cc enzym b ng cch s d ng CSDL c u trc. trc.  GO: Cc trang c a hi p h i Gene Ontology c a EBI. GO: EBI.  GOA: Cung c p cc thng tin v GOA: s n ph m gen vo ngu n GO. GO.  HPI: Cc proteomic c a ng HPI: i ban u (Human Proteomics Initiative) l m t kh i x ng b i SIB v EBI m t t t c cc trnh t bi t c a ng i theo tiu chu n ch t l ng c a UniProtKB/Swiss-Prot. UniProtKB/Swiss-Prot.


IntAct: IntAct: L m t CSDL i km v i h th ng phn tch, n cung c p m t giao di n truy v n v m t module phn tch cc d li u. IntEnz: IntEnz: CSDL lin quan gi a cc enzym (Integrated relational Enzyme database) ch a cc d li u enzym c ch ng nh n b i h i ng nh tn (Nomenclature Committee) v i m c ch l t o ra m t CSDL cc enzym c m i quan h n. InterPro: CSDL l m t s k t h p c a ngu n ti li u InterPro: trch d n cho cc h proein, cc domain v cc v tr ho t ng. ng. IPI: IPI: (International Protein Index) m t h th ng proteom khng d th a (non-redundant) (nonc xy d ng t UniProtKB/Swiss-Prot, UniProtKB/TrEMBL, UniProtKB/SwissEnsembl v RefSeq. RefSeq. PANDIT: PANDIT: l m t b cc ch ng trnh so snh trnh t v xy d ng cy phn lo i.

Proteome Analysis: Phn tch so snh v th ng k cc Analysis: proteom c a cc sinh v t. UniProt: Ngu n Protein ph bi n cho cc trnh t UniProt: protein v trung tm c a cc CSDL protein cho cc ngn hng CSDL khc. khc. UniProt Archive: M t ph n trnh t protein Archive: c trch ra t CSDL cng khai ch ch a cc trnh t protein. protein. UniProt/UniRef Features clustering of similar sequences to yield a representative subset of sequences. sequences. This produces very fast search times. times. UniProtKB/Swiss-Prot: CSDL trnh t UniProtKB/Swiss-Prot: m t , m t ph n c a UniProtKB. UniProtKB. UniProtKB/TrEMBL: M t CSDL protein UniProtKB/TrEMBL: b ng my tnh, l m t ph n c a UniProtKB. UniProtKB. protein c

c t o ra

e/ CSDL Proteomic (Proteomic Databases) EBI pht tri n v l u gi lin quan proteom. D


m t s l

ng l n cc CSDL

i y l m t s CSDL proteom.

ChEBI (Chemical Entities of Biological Interest ChEBI): m tt i n c a cc phn t nh (small molecular ).

IntAct : Cung c p m t h th ng CSDL m , s n c v cc cng c phn tch cho cc d li u t ng tc protein.

IntEnz (Integrated relational Enzyme database IntEnz) s ch a d li u enzym c h i nh tn qu c t t o

(Nomenclature Committee) xc nh n. M c tiu l ra m t CSDL quan h enzym




n.

IUPHAR CSDL th th c ad cho

i di n cho t t c cc l nh v c l thuy t

c h c theo ngh a r ng nh t c a n t n lm sng quy m th gi i.

PRIDE (PRoteomics IDEntifications database) cho php ng k (submit) d li u protein d ng PRIDE XML.

f/ CSDL c u trc (Structure Databases) EBI pht tri n v duy tr m t s l quan n c u trc c a cc ng cc CSDL c lin quan i phn t . Trong

tr ng nh t l (Macromolecular Structure Database MSD). MSD).


 

DALI: Th m c c u trc domain c a cc protein. DALI: protein. MSD: CSDL c u trc cc MSD: cng c tm ki m PDB. PDB. MSDchem: Th MSDchem: i phn t (MSD), bao g m cc

vi n ha h c c a cc thnh ph n ha d dng truy c p vo

h c tm th y trong PDB. PDB.




MSDlite: Cung c p cc cng c MSDlite: PDB. PDB.

MSDtarget: My ch chung c p cc cng c tm ki m MSDtarget:

RESID: RESID: CSDL cc protein c i bi n l m t t p h p cc c u trc v m t carboxyl t n cng cho c cc bi n bao g m u amino,

Reactome: Reactome: CSLD chnh xc v cc qu trnh sinh h c c th ng i. Reactome s khng ch h u ch i v i cc nh sinh h c ni chung nh sinh h c pht hi n ra cc con l m t cu n sch v i v i cc nh tin ng sinh h c m i.

sinh h c tr c tuy n m cn c ch

BioModels: CSDL c a cc m hnh sinh h c cho php BioModels: cc nh sinh h c l u gi , tm ki m v cng b cc m hnh ton h c trong l nh v c sinh h c.

3.2.2. CSDL c a NCBI a/ CSDL ti li u (Literature Database)




Bookshelf: Tm ki m nh ng thng tin c ch PubMed.

b n ho c cc

nghin c u m i, mi n ph, c m t ph n

PubMed: B t k ai v c sinh y h c.

u c th truy c p vo, ch a ph n

tm t t c a h n 15.000.000 k t qu nghin c u trong l nh

PubMed Central: L m t t p ch khoa h c s

s ng, k t do

h p v i h th ng Enztrez, PMC cho php truy c p t

v khng h n ch v i h n 160 t p ch khoa h c s s ng.

Online Mendelian Inheritance in Man (OMIM): V i h n 15.000 ng m c, OMIM ( c duy tr b i ti n s i h c Johns Hopkins), Victor A. McKusick v c ng s ng t i m t h c update.

th ng cc b nh di truy n v lin t c

Online Mendelian Inheritance in Animals (OMIA): L m t CSDL c a cc gen, cc r i lo n di truy n v cc tnh tr ng c a cc loi ng v t. th ng Entrez k c CSDL di Journals: Tm ki m cc CSDL t p ch cho php k t n i (link) cc t p ch v i h truy n.

b/ CSDL Nucleotide (Nucleotide databases)




GenBank: T p h p t t c cc trnh t axit amin hi n c

nucleotide v

GenBank l CSDL trnh t di truy n c a NIH. C kho ng 51.674.486.881 base trong 46.947.388 b n trnh t trong cc nhnh c a GenBank v 53.346.605.784 base trong 10.276.161 b n ghi trnh t nhnh WGS vo 8/2005. Ch ng h n, chng ta c th xem b n ghi cho m t gen c a Saccharomyces cerevisiae th GenBank cung c p y . C sau 2 thng, m t phin b n update c a ra. GenBank l m t ph n c a (International Nucleotide Sequence Database Collaboration) bao g m DDBJ, EMBL v NCBI. Ba t ch c ny trao i d li u v i nhau hng ngy.

Trong l n cng b g n y nh t, INSDC cho bi t CSDL trnh t nhin DNA v t qu 100 Gb. GenBank l m t ng gp cho m c ny v t t ng gp c a r t nhi u cc nh thnh vin quan tr ng l k t qu

khoa h c trn ton th gi i.

dbEST (data base of Expressed Sequence Tags): Theo Nature Genetics 4:332-3; 1993 th dbEST l m t t p h p c a cc trnh t nh t l y t c a GenBank. eo th ho c cc trnh t ng n, duy mRNA (cDNA). dbEST c ng l m t nhnh

dbGSS (data base of Genome Survey Sequences): c ng l m t nhnh c a GenBank nh ng khc v i dbEST l h u h t cc trnh t u c ngu n g c t genomic ch khng ph i l cDNA (mRNA). Nhnh dbGSS ch a cc d ng d li u sau: Single - pass genom sequence Cc trnh t t n cng c a cosmid/BAC/YAC Cc trnh t Alu PCR Cc trnh t tagged). transposon c eo th (transposon

dbSNP (data base of Single-base Nucleotide Polymorphism): l CSDL cc a hnh do s thay th ho c thm, b t m t nucleotide.

RefSeq: CSDL c a cc trnh t tra c u khng c s d th a (non-redundant reference sequence) bao g m: cc o n contig DNA genom, cc mRNA, cc protein c a cc gen bi t. dbSTS (data base of sequence tagged sites): CSDL c a cc v tr trnh t c eo th ho c cc trnh t ng n th ng ch c m t m t l n duy nh t trong genom. UniSTS: l m t c s d li u ton di n c a cc STS (cc v tr nh d u trnh t ) c l y t cc b n STS v cc th nghi m khc. UniGene: T p h p c a cc trnh t EST v cc trnh t mRNA c chi u di y c nhm vo cc c m v m i c m i di n cho m t gene duy nh t c bi t ho c gene ng i c m t cng v i b n v nh ng thng tin v qu trnh bi u hi n gen.

dbHTG (data base of high-throughput genom sequence): t p h p c a cc trnh t genom thu c t cc trung tm xc nh trnh t genom. HomoloGene: S d ng so snh trnh t nucleotide gi a hai sinh v t nh gi m c ortholog gi nh. MGC: (Mamalian Gene Collection) cung c p cc dng y chi u di cc khung c m (fulllength open reading frame FL-ORF) cho ng i, chu t nh t v chu t c ng. PopSet: PopSet l m t h th ng cc trnh t DNA c thu th p phn tch m i quan h ti n ha c a m t qu n th .

RefSeq: Cung c p h th ng cc trnh t : DNA, cc lo i RNA v s n ph m protein nghin c u cc sinh v t. c tr th c nghi m v h nh

TPA: Third Party Annotation (TPA) Sequence: thi t k m ng i thu ht cc k t qu i ng k khng xc d cho nh ng ng ng k m t , gi i thch v trnh t

c tr c ti p nh ng li u th cs

c th l y t d li u s c p c a GenBank.


RHdb: l m t c s g m cc d

li u c a cc d

d ng trong vi c thi t k cc b n v cc tra c u cho.

lai phng x . N bao

li u STS, i m s , cc i u ki n th nghi m

c. CSDL Protein (Protein Databases) 3D Domains: Bao g m cc trnh t v c u trc 3 chi u c a cc domain

trong cc phn t protein.




Proteins: T p h p cc CSDL trnh t protein RefSeq: Cung c p m t CSDL khng d redundant) bao g m DNA, RNA v Protein. th a (non-

PROW: CSDL v

protein trn Web (Protein Reviews on

the Web PROW) ...

d/ C

li u c u trc (Structure Databases) 3D

Domain


MMDB (Molecular Modeling Database) : CSDL m hnh c u trc phn t 3D, bao g m cc protein v cc c

polynucleotide. MMDB ch a h n 28.000 c u trc v lin k t v i ph n cn l i c a CSDL

NCBI, bao g m cc v

trnh t , cc trch d n, phn lo i h c, v cc trnh t c u trc ln c n.




Conserved Domains: T p h p cc CSDL v domain b o th c a cc protein, h protein.

cc vng

e/

C C

s s

d d

li u li u h

th ng

h c

(Taxonomy

Databases)


th ng h c (Taxonomy database) s d li u di nucleotide ho c protein. th ng phn lo i cng

ch a tn c a cc sinh v t c m t trong c truy n v i t nh t m t trnh t NCBI cung c p m t h v i cc n v phn lo i (taxa). th ng h

f/ C


s d

li u genom (genome database)

Cc nhi m s c th ung th : Cancer Chromosomes: 3 c s d li u NCI/NCBI SKY?M-FISH v CGH. COGs (Clusters of Orthologous Groups of proteins): Nhm c a cc nhm Orthologous c a protein xu t t vi c so snh trnh t protein cc genom hon ch nh, c d n c m ha thnh

i di n cho cc dng gi ng ch c l u tr d ng cc cng

y u c a phn lo i h th ng h c.


li u cc gen: Gene: Cc gen truy c p c th s

trong m t h th ng, c nh Entrez Gene.

li u c a cc d u

n gi i trnh t : Genome giai o n l p rp trong m t h c l u tr

Project: Cc trnh t ho c Project




hon t t, ang

ang ti n hnh truy c p.

th ng. Chng ta c th s

d ng cng c Entrez Genome c th t ng lo i sinh v t:

Genomes: Cc ngu n genom genom hon ch nh v

Ch a ton b genom c a h n 1000 sinh v t g m nh ng ang ti n hnh: Aspergillus, Bacteria, Bee, Cat, Chickent, Cow, Dog, cc bo quan c a eukaryote, Frog, Fruit fly, Human, Mosquito, Mouse, Pig, plant genome, rat, Retrovirus, Sheep, Viral Genomes, Yeast, Zebrafish...

g/ C


s C

d s

li u h th ng h c d li u h th ng h c (Taxonomy database) s d li u di

ch a tn c a cc sinh v t c m t trong c truy n v i t nh t m t trnh t

nucleotide ho c protein.

NCBI cung c p m t h th ng h th ng phn lo i cng v i cc n v phn lo i (taxa).

h/ C s


li u c u trc (Structure Databases)

MMDB (Molecular Modeling Database): CSDL m hnh phn t ch a cc c u trc 3D c a i phn t , bao g m

cc protein v cc polynucleotide. MMDB ch a h n 28,000 c u trc v n c k t n i v i ph n cn l i c a th ng phn

NCBI bao g m cc trnh t , trch d n, h lo i v cc trnh t v c u trc lin quan.




GEO Datasets: CSDL ny l u gi (Gene Expression Omnibus GEO)

cc gen bi u hi n

SAGE: b cc d

h tr

cho vi c s

d ng cng c ng v cng

li u gen bi u hi n (serial analysis of gene y ch nh s a l i

expression SAGE), NCBI g n website ny.




SAGEmap l m t ngu n d

li u SAGE cho yu c u truy li u SAGE t m t loi

v n, thu nhn v phn tch cc d sinh v t b t k . T t c cc d ny c th c truy c p t

li u c m t trong website cc kho l u tr c a GEO

(Gene Expression Omnibus).

k/ C s
  

li u ha h c (Chemical Databases)

Cc ph n ng, test.. Danh sch cc ch t ha h c Danh sch cc h p ch t

3.3. Cc cng c tm ki m, phn tch cc CSDL 3.3.1. Cng c c a EMBL/EBI a/ Similarity & Homology: Cng c phn tch m c nhau v t


gi ng

ng

ng gi a cc trnh t .

Cc cng c nh : Fasta, Blast, MPsrch v Scanps. D ng tm ki m v k t qu tri n. tr v qua email c ng c pht

Hai ch

ng trnh c th

c s

d ng t ng

tm ki m, so ng suy di n

snh m c

gi ng nhau v m c

l BLAST or Fasta.

Cng c tm ki m DNA v Protein chung


Cng c Blast2-WU Protein Blast2-WU Nucleotide Blast2-NCBI Protein Blast2-NCBI Nucleotide Blast2-NCBI EVEC Fasta Nucleotide Fasta Protein Fasta-Proteome server Fasta-Genome server Fasta-WGS server ng d ng, m t Tm ki m CSDL protein (Blast 2.0 with gaps) c a Washington University Tm ki m CSDL nucleotide (Blast 2.0 with gaps) Washington University Ch Ch Ch ng trnh tm ki m CSDL protein c a NCBI.(blastall) ng trnh tm ki m CSDL nucleotide c a NCBI.(blastall) ng trnh nh m pht hi n cc trnh t l n v i trnh t c a cc vector. tm cc trnh t gi ng v i trnh t nucleotide trong tm cc trnh t gi ng v i trnh t protein trong

S d ng FASTA CSDL S d ng FASTA CSDL

Tm ki m b ng Fasta trong my ch Proteome Tm ki m b ng Fasta trong my ch genome Tm ki m b ng Fasta trong my ch WGS (cc genome thu ph ng php shotgun Whole genome shotgun - WGS) c b ng

Cc cng c tm ki m chuyn bi t cho Protein Cng c Blitz ng d ng, m t Blitz l m t d ch v tm ki m CSDL thng qua email. Th c ch t l vi c tr k t qu tm trnh t t EBI pht tri n hai ph MPsrch v Scanps. ng ng v i trnh t c n quan tm thng qua Email. ng php khc nhau g i l

Cc cng c tm ki m chuyn bi t cho DNA Cng c


Blast2-ASD Blast2-Parasite Fasta-ASD Fasta-LGIC Protein server

ng d ng, m t
Tm ki m trnh t gi ng nhau nh my ch Blast2-ASD Tm ki m trnh t gi ng nhau nh my ch Parasite Genomes blast S d ng Fasta tm trnh t protein gi ng v i CSDL ASD

S d ng Fasta tm trnh t protein gi ng v i CSDL Ligand Gated Ion Channel.

Fasta-LGIC S d ng Fasta tm trnh t nucleotide gi ng v i CSDL Ligand Nucleotide server Gated Ion Channel. Fasta-SNP server Fasta tm ki m trnh t t (HGBASE). ng ng trong CSDL European SNP

b/ Protein Functional Analysis: Phn tch ch c n ng c a protein M t trong nh ng nghin c u phn tch ch c n ng c a protein l pht hi n cc vng ho t trong cc trnh t c s ph d ng xc ng c tr ng (motif)

protein. Ph n ny cung c p cc cng nh ch c n ng c a protein v i nhi u quan tr ng

ng php v CSDL khc nhau. D ch v

nh t trong ph n ny l InterProscan cng k t h p v i r t nhi u ph d ng. ng php khc trong m t giao di n r t d s

Cng c
CluSTr Search FingerPRINTScan GeneQuiz Inquisitor

ng d ng, m t
Tm ki m cc UniProtKB (UniProtKB/Swiss-Prot + UniProtKB/TrEMBL) b ng s truy c p Tm ki m cc PRINTS Protein fingerprint Phn tch trnh t t ng cc trnh t sinh h c

Inquisitor cung c p m t giao di n truy v n n xc nh cc trnh t protein gi n nhau trong cc proteome. Cc trnh t ch a bi t c phn tch s d ng FASTA v InterProScan.

InterProScan PPSearch Pratt Radar

Tm ki m cc trnh t protein trong cc CSDL thnh vin InterPro. Tm ki m cc motif c a protein Pht hi n cc m hnh trong cc trnh t protein ch a (alignment) Pht hi n cc protein l p l i. c so snh

c/ Proteomic Services Bao g m cc ph ng th c truy c p vo cc d ch v proteomic do EBI cung c p. Trong quan tr ng nh t l my ch UniProt DAS n cho php cc nh nghin c u trnh by k t qu nghin c u c a mnh d i d ng m t c a UniProtKB/Swiss-Prot. Cng c Dasty ng d ng, m t Cng c cho php trnh by nh ng thng tin v i m trnh t protein d i d ng d quan st.

UniProt DAS

My ch UniProt DAS cho php cc nh nghin c u trnh by k t qu nhin c u c a mnh, ch ng h n xc nh cc peptide ho c cc trnh t tn hi u trn my ch UniProt d i d ng m t c a UniProtKB/Swiss-Prot.

d/ Sequence Analysis: Phn tch trnh t




d ng r t nhi u ph

ng php tin sinh h c

xc

nh ch c n ng sinh h c, c u trc c a cc gen v protein m chng m ha.




Cc cng c nh

Transeq c th

gip xc

nh cc

vng m ha protein c a m t trnh t c s d ng so snh trnh t c ng nh

DNA. ClustalW

DNA ho c protein ngu n g c ti n ha

lm sng t m i quan h c a chng.

Cc cng c phn tch do EBI cung c p:


Cng c Align ClustalW CpG Plot/CpGreport Dna Block Aligner Form GeneMark Genetic Code Viewer GeneWise Muscle Mutation Checker Pepstats/Pepwindow/Pepinfo PromoterWise Reverse Translator SAPS T-Coffee Transeq C c so snh nhi u trnh t . Cng c tm v v CpG Island So snh hai trnh t DNA d h p cho cc promoter. D ch v d on gen i d ng kh i m ch th ng (colinear block), r t thch ng d ng, m t Cng c so snh c p trnh t theo 2 ki u ton b v c c b .

Cng c t ng k t cc s khc nhau v m di truy n So snh m t trnh t protein ho c m t h s protein HMM v i trnh t DNA. Cng c so snh nhi u trnh t v i chnh xc v t c cao h n so v i Clustal W ho c T-Coffee ph thu c vo ty ch n khc nhau. nh gi Ch trnh xc c a trnh t , pht hi n o ng t bi n. c v i v tr, r t l t ng cho cc ng trnh phn tch trnh t protein

So snh hai trnh t DNA cho php promoter. Ki m tra trnh t Ch o ng c, b sung Phn tch th ng k trnh t protein

ng trnh so snh trnh t cho php ng i s d ng k t h p cc k t qu thu c v i m t s ph ng php so snh khc nhau.

Cng c d ch m trnh t DNA

e/ Phn tch c u trc (Structural Analysis): xc

Vi c

nh c u trc 2D/3D c a m t protein l m t cng

vi c quan tr ng nh t khi nghin c u ch c n ng c a n. Ng i s d ng s tm th y r t nhi u d ch v gip cho

vi c phn tch c u trc do EBI cung c p. M t trong nh ng cng c l DALI. v MSDfold. Cng c c a cho php xc nh c u trc

MSDfold ho c DALI

protein c n nghin c u v so snh n v i cc c u trc trong PDB (Protein Data Bank).

Cng c
DALI DaliLite So snh c u trc protein Ch

ng d ng, m t
d ng 3D ng trnh so snh c u trc c p (hai c u trc). So snh c u trc c n quan tm (c u trc th nh t) v i m t c u trc tham chi u (c u trc th hai)

MSD Services MSDfold MSDpro

B ng tm t t v danh sch c a t t c cc cng c v d ch v c a Macromolecular Structure Database (MSD) So snh cc chu i/c u trc v tm ki m cc chu i/c u trc t trong CSDL PDB ho c trong cc domain SCOP. ng ng

M t ch ng trnh ng d ng cho php xy d ng cc truy v n quan h ph c t p c a MSD m khng c n bi t s s p x p c a d li u trong CSDL ho c ngn nh m truy v n s d ng. Cng c cho php tm cc v tr ho t ng d a vo ch t g n (ch ng h n ATP) ho c thng tin v tr ho t ng (CYS CYS CYS CYS). Tm ki m cc c u trc thu Xc nh c u trc b c 4 c t NMR trong PDB

MSDsite NMR Representatives PQS PQS-Quick

Cng c cho php xc nh c thng tin c u trc b c 4 t m ID c a PDB m t cch nhanh chng.

Cng c BioLayout CAST EBIMed NEW EMBL Computational Services Expression Profiler NEWT Protein Colourer Protein Corral NEW Readseq Webservices Whatizit

f/ Cc cng c khc

ng d ng, m t

Trnh by, hnh nh ha cc bi u v m ng l i sinh h c, ch ng h n nh gi ng nhau gi a cc trnh t protein v cc m ng l i t ng tc protein. L c pht hi n cc thnh ph n trnh t protein bias c a Computational Genomics Group.

L m t ng d ng Web k t h p thu nh n v truy xu t thng tin t Medline. T p h p cc cng c do EMBL cung c p DNA/protein. Heidelberg c phn tch trnh t

M t b cc cng c cho php phn tch, nhm v hi n th s bi u hi n gen v cc d li u genom. CSDL taxon, k t h p cc d li u taxon Prot NCBI v i CSDL c a UniProtKB/Swiss-

M t cng c cho php t mu cc trnh t axit amin. M t ng d ng Web k t h p thu nh n v truy xu t thng tin t Medline. Cng c chuy n cc trnh t sang cc d ng format khc nhau. Cung c p cc ch ng trnh truy c p vo cc CSDL sinh h c khc nhau.

C th ni cho ng i s d ng bi t ngh a c a cc t tm th y trong v n b n ph c thu c vo d ng thng tin m ng i dng mu n xem d ng c hightlight.

3.3.2. Cng c c a NCBI a/ Cc cng c phn tch trnh t




Cluster of Orthologous Groups (COGs): M t h th ng c a cc h gen t cc genom hon ch nh. Gene Expression Omnibus (GEO): Kho d d li u gen bi u hi n. li u gen

bi u hi n v cc ngu n tr c tuy n cho vi c thu nh n cc HomoloGene: So snh cc trnh t c p sinh v t loi v chng th xc nh cc gen nucleotide gi a cc cc loi khc nhau c nguyn ch c n ng

c ti n ha t m t gen t tin chung do qu trnh phn ng v n gi trong qu trnh ti n ha.

CSDL

cc

vng

b o

th

(Conserved

Domain

Database CDD): T p h p cc b n so snh trnh t (sequence alignment) v cc profile c a cc vng b o th c a cc phn t protein trong qu trnh ti n ha phn t .


T p h p cc gen

ng v t c v (Mammalian Gene thu c tham y c s . d ng b i s vi c i v chu t c trnh t ,

Collection MGC): M t n l c m i c a NIH cc ngu n cDNA v i chi u di




Clone Registry: M t CSDL gia c a cc trung tm trnh t l u gi v nh ng dng c c l u gi cc dng ang

genom ng

c l a ch n t c trnh t

v cc dng hon t t

GenBank

Trace Archive: c pht tri n l u gi cc d li u trnh t th c t o ra t cc d n xc nh trnh t . Tm khung c m (ORF Finder): M t cng c phn tch hi n th d i d ng ho cho php tm cc khung c m c a m t o n trnh t ho c m t trnh t c trong CSDL. VecScreen: M t cng c cho php xc nh cc o n trnh t nucleotide m c th l c a vector, cc vng linker ho c cc i m kh i u sao chp (origin) tr c khi s d ng cc cng c phn tch trnh t ho c ng k trnh t . Electronic-PCR (e-PCR): C th c s d ng so snh m t trnh t truy v n (query sequence) v i cc v tr trong trnh t nh d u (sequence-tagged sites) tm ra m t v tr b n c th cho trnh t truy v n.

b/ Tm ki m trnh t Searching)


gi ng nhau (Sequence Similarity

BLAST Homepage: Cho php truy c p vo ch ng trnh v cng c BLAST (Basic Local Alignment Search Tool), cc tr gip BLink: Trnh by cc k t qu tm ki m c a BLAST v i m i trnh t protein trong CSDL protein Entrez. i

Network-Client BLAST: cho php ti p c n cc cng c tm ki m BLAST c a NCBI. Blastcl3 c th tm t t c cc trnh t trong file FASTA v t o ra m t hay nhi u cc b n so snh trnh t d i d ng text ho c HTML. Stand-alone BLAST: Ch ng trnh c th s d ng sau khi download v ci t vo trong my tnh c nhn.

c/ H th ng


n v phn lo i (Taxonomy) cho php tm ki m cc BLAST s p x p

Taxonomy Browser: Cng c CSDL taxonomy c a NCBI

Taxonomy BLAST: Cc nhm cng c TaxTable: B ng tm t t cc d ho mu.

cc ngu n sinh v t theo CSDL Taxonomy c a NCBI.




li u taxon c a BLAST v

cc m i quan h c a sinh v t v i nhau thng qua d ng ProtTable: Cung c p m t b ng tm t t cc vng m ha protein trong m t gene


TaxPlot: Cung c p cc d ng quan st genom gi ng nhau (three-way view of genome similarities).

 

d/ ng k trnh t (Sequence Submission) Sequin: M t cng c ng k trnh t bao g m c ORF finder, m t cng c s a ch a v xem trnh t . BankIt: ng k m t hay nhi u trnh t m t lc thng qua WWW.

e/ Tm ki m cc thu t ng (Text Term Searching)  Entrez: Truy c p vo cc d li u trnh t protein v DNA t h n 100000 sinh v t cng v i cc c u trc protein 3D, cc thng tin v b n gen v PubMed MEDLINE.  LinkOut: M t d ch v ng k t o ra cc ng k t n i t cc bi bo, t p ch ho c cc d li u sinh h c trong Entrez v i cc ngu n trang Web bn ngoi.  Citation Matcher: Cho php tm cc ID c a PubMed ho c cc UID c a MEDLINE c a b t k bi bo no trong CSDL PubMed.

f/ Cc cng c cho th hi n c u trc 3D v cc k t qu tm ki m trnh t gi ng nhau (Tools for 3d structure display and similarity searching)


CD-Search: D ch v tm ki m cc vng b o th (Conserved Domain Search Service (CD-Search) c th c s d ng xc nh cc vng b o th c m t trong cc trnh t protein. Cn3D: Cng c cho php hi n th cc trnh t 3D cho cc CSDL NCBI. v c u trc

Domain Architecture Retrieval Tool: Displays the functional domains that make up a protein and lists proteins with similar domain architectures. VAST Search: D ch v tm ki m c u trc t ng ng, so snh c u trc protein c a m t c u trc protein m i xc nh v i cc CSDL MMDB/PDB.

g/ CSDL b n

(MAPS) v t l v di truy n khc nhau. (Map Viewer): Cung c p nhi m s c th c a h n 17 loi

Truy c p t i cc d ng b n


Cng c

quan st b n

cng c quan st b n

sinh v t. Map Viewer trnh by m t ho c nhi u b n iv ib n trnh t d a vo m c

c so snh v i nhau d a trn cc ch th v cc gen, gi ng nhau gi a c a Arabidopsis, t ng ng cc trnh t . Hi n nay, c cc b n Ru i gi m (fruit fly), ng c a ng

i (human), b n

i v chu t, s t rt, mu i, chu t, giun trn

(nemato), chu t (rat), Zebrafish

3.4. ExpaSy 3.4.1. C s d li u ExpaSy


  

Swiss-Prot and TrEMBL PROSITE SWISS-2DPAGE polyacrylamide) ENZYME -

CSDL protein bi t

Cc h protein v cc domain. CSDL protein ( i n di 2 chi u trn gel

 

nh tn cc enzym (Enzyme nomenclature)

SWISS-MODEL Repository Cc m hnh protein ct o ra t ng (Automatically generated protein models) GermOnLine V cc CSDL v s bi t ha t bo m m. n cc CSDL khc. Ashbya Genome Database ng d n

  

 

SWISS-PROT Swiss-Prot l m t c s d li u protein c kh i u vo n m 1986 do s h p tc c a Department of Medical Biochemistry Tr ng i h c Geneva v EMBL. Sau n m 1994, t ch c ny chuy n t i m t tr m c a EMBL Anh g i l EBI. Vo thng 4 n m 1998, n c chuy n t i Swiss Institute of Bioinformatics (SIB), do c s d li u ny by gi c duy tr b i SIB v EBI/EMBL. C s d li u ny c g ng cung c p nh ng thng tin m c cao bao g m: cc m t v ch c n ng c a cc protein v c u trc c a cc domain c a n, s c i bi n sau phin m, cc d ng bi n i v nh ng thng tin khc. SWISS-PROT m c ch l gi m thi u s d th a, v n lin k t v i nhi u ngu n khc. Vo n m 1996, m t ch ng trnh h tr my tnh cho SWISS-PROT c t o ra g i l TrEMBL (s c m t chi ti t d i y). Tr c h t chng ta hy tm hi u k h n v c u trc c a SWISS-PROT.

 

C u trc c a SWISS-PROT C u trc c a c s d li u, v s l ng cc m t

c a n, cc b SWISS-PROT tch kh i cc ngu n trnh t s protein khc v n tr d li u thnh m t trong nh ng c ch li u

c l a ch n cho h u h t cc m c s d

nghin c u. Vo gi a nh ng n m 1998, c ch a 70000 m c t p trung ch

ng nh p t h n 5000 loi khc nhau

y u l Homo sapiens, Saccharomyces

cerevisiae, Escherichia coli, Mus musculus v Rattus norvegicus.

3.4.2. Cc cng c phn tch 3.4.2.1. Nh n d ng v xc nh cc c i m c a protein li u a/ Nh n d ng v xc nh cc protein thng qua cc d peptide thu c t kh i ph .


Aldente - Nh n d ng cc protein v i cc d li u kh i ph peptide. y l m t ti n b m i trong v c nh n d ng protein. FindMod - D on nh ng kh n ng c i bi n sau d ch m v kh n ng thay th cc amino acid trong chu i peptide. Cc th c nghi m o kh i l ng peptide v i cc peptide tnh ton l thuy t t CSDL Swiss-Prot ho c t cc trnh t do ng i s d ng ng k. So snh s khc bi t v kh i l ng c a cc peptide c ng l m t trong nh ng bi n php hi u qu trong vi c nh n d ng protein.

FindPept - Nh n d ng cc peptide do k t qu t nghi m kh i ph t GlycoMod - D trn phn t l ng. Cng c nh n d ng kh i l Tm ki m cc trnh t kh i l PepMAPPER UMIST, UK ProFound thng tin v h c, c i bi n sau d ch m v ho t

cc th

gi i thch cho nh ng c i bi n ha ng t th y phn. nh kh i

on cc c u trc oligosacharide x y ra protein t cc th nghi m khc

ng peptide t bi t v i ng i h c

protein tr

ng peptide t

Rockefeller and NY.

b/ Nh n d ng v xc li u MS/MS.


nh cc

c i m c a protein nh

Popitam - Cng c nh n d ng v xc peptide v i nh ng c i bi n khng ch ng h n

nh protein cho cc on tr c c,

t bi t ho c nh ng c i bi n sau d ch m nh

vo kh i ph xen k (tandem mass spectrometry)




Phenyx - Nh n d ng, xc

nh

i m c a protein v

peptide t d li u MS/MS t GeneBio, Switzerland




OMSSA - Nh n d ng cc ph peptide MS/MS b ng cch so snh cc th vi n c a cc protein bi t.

PepFrag

Tm ki m cc trnh t

protein

bi t v i

thng tin v kh i ph m nh peptide t Rockefeller v NY Universities ho c t




Genomic Solutions

ProteinProspector - UCSF tools for fragment-ion masses data (MS-Tag, MS-Seq, MS-Product, etc.)

SearchXLinks

Phn tch kh i ph c a cc protein b axit

c i bi n, lin k t ngang, phn gi i m c cc trh t amin bi t t Caesar, c.

c/ Nh n d ng protein d a vo thnh ph n axit amin, pI, kh i l ng phn t




AACompIdent - Xc axit amin c a n. n.

nh m t protein nh

vo thnh ph n

AACompSim - So snh thnh ph n axit amin c a m t ng nh p trong UniProtKB/Swiss-Prot v i cc ng nh p khc UniProtKB/Swiss(other entries) TagIdent - Nh n d ng cc protein nh vo pI, Mw v cc trnh t eo th (sequence tag) ho c a ra m t danh sch cc protein c pI v Mw g n v i protein truy v n nh t. MultiIdent - Nh n d ng cc protein d a vo thnh ph n axit amin, pI, Mw, trnh t eo th v d li u kh i ph peptide. peptide.

d/ Cc cng c d


on khc (other prediction tools) ng c a m t c u trc

GlycanMass - Tnh ton kh i l oligosacharide. oligosacharide. PeptideCutter - D b i cc ha ch t bi n sau d ch m trnh t b t k do ng

on cc v tr phn c t v th y phn i v i m t trnh t i v i m t i s d ng phn b nh t nh. nh. ng peptide v cc c i ng nh p c a

PeptideMass - Tnh ton kh i l

UniProtKB/Swiss-Prot ho c UniProtKB/TrEMBL ho c m t UniProtKB/Swissa vo. vo. ng v v m t l IsotopIdent - D on s

thuy t c a m t chu i peptide, protein, polynucleotide ho c cc ch t ha h c.

3.4.2.2. Cc cng c chuy n DNA -> Protein Translate - D ch m m t trnh t trnh t protein. Transeq - D ch m t trnh t t ph n m m EMBOSS. nucleotide thnh m t

nucleotide thnh protein Hi n th codon bias

Graphical Codon Usage Analyser d i d ng h a.

Codon bias l m t thu t ng ch hi n t ng t n su t m t b ba c s d ng m ha cho m t axit amin no m t sinh v t nh t nh cao h n so v i cc b ba khc cng m ha.M i loi sinh v t c d ng codon bias khc nhau.

BCM search launcher - D ch m ra 6 khung t t nucleotide. Backtranslation - D ch m m t trnh t tr l i thnh trnh t nucleoide. Reverse Translate - D ch m m t trnh t trnh t nucleotide. Genewise khung. So snh trnh t

m t trnh c

protein ng

protein thnh

c a m t protein v i trnh t bi n l ch

t DNA genomic Pht hi n

nghin c u intron, cc t bi n l ch khung c.

 

FSED

List of gene identification software sites

3.4.2.3. Tm ki m cc trnh t gi ng nhau BLAST v WU-BLAST - k t h p v i r t nhi u cc phin b n BLAST (Basic Local Alignment Search Tool)  BLAST M ng l i d ch v c a ExPASy  BLAST EMBnet-CH/SIB (Switzerland)  BLAST NCBI  WU-BLAST c a EMBL (Heidelberg)  WU-BLAST v BLAST EBI (Hinxton)  BLAST PBIL (Lyon)  Fasta3 Phin b n FASTA 3 EBI  MPsrch So snh trnh t c a Smith/Waterman EBI  PropSearch Tm ki m c u trc t ng ng  Scanps Tm ki m trnh t gi ng nhau b ng thu t ton c a Barton

You might also like