You are on page 1of 14

GIO TRNH THC TP BIOINFORMATIC (H thng website http://www.ncbi.nlm.nih.

gov/ )
-----oOo----1. Cc ni dung chnh trn NCBI: 1.1. Gii thiu NCBI Website NCBI (National Center for Biotechnology Information http://www.ncbi.nlm.nih.gov/) l mt trong nhng website hng u v khoa hc s sng v y hc. NCBI trc thuc Th vin Y khoa Quc gia Hoa K (National Library of Medicine, NLM), ng thi l mt c quan ca Cc V in Y t Quc gia Hoa K (National Institutes of Health, NIH). Hu ht cc phn ca h thng NCBI l cc c s d liu (CSDL) m cho php truy cp min ph thng qua Internet. c xy dng nhm mc ch cho php ngi s dng tip cn vi ngun d liu khng l bao gm cc ti liu, bo co, cc nghin cu, trnh t v cu trc ca cc phn t sinh hc phc v nghin cu v hc tp, NCBI c trang b h thng tm kim cc mnh Entrez cho php tm kim rt nhanh kt qu t CSDL. Hu ht cc b cng c tm kim, so snh ca NCBI u da trn h thng Entrez. Hin ti cc CSDL trn NCBI khng ngng c pht trin m rng thm nh vo s ng gp ca cng ng cc nh khoa hc, cc hc vin, cc vin nghin cu trn th gii. 1.2. Trang ch NCBI Giao din trang

ch

ca

NCBI

truy

cp

ti

ch

http://www.ncbi.nlm.nih.gov/ .

Hnh 1: Giao din trang ch NCBI (nh chp ngy 24/11/2007) Trang ch NCBI cung cp ng dn lin kt n cc ni dung ch yu trn NCBI. C th tm hiu cc thnh phn trong trang ch NCBI ti a ch http://www.ncbi.nlm.nih.gov/ hoc t a CD nh km ti liu ny. Ghi ch: C th s dng tt c cc trnh duyt truy cp vo trang http://www.ncbi.nlm.nih.gov/. Tuy nhin ch s dng trnh duyt Internet Explorer (IE) th hin y cc ni dung trang web trong CD km theo. Di chuyn chut vo cc thnh phn trn trang xem gii thch. Cc lin kt trn CD c m phng tng t vic truy cp online NCBI km vi gii thch ni dung cc thnh phn trn giao din. Vic gii thch thnh phn bng ting Vit ch c tc dng trn CD km theo. Nu trnh duyt IE khng c mc nh th bm chut phi vo file NCBI HomePage chn Open with chn Internet Explorer

Hnh 2: S dng trnh duyt IE duyt ti liu trn CD


2

T trang ch ca NCBI ta c th truy cp n mi phn quan trng ca h thng NCBI. 1.3. Trang chnh ca CSDL PubMed (http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed) truy cp vo trang PubMed, t trang ch NCBI chn PubMed

Hnh 3: Giao din trang PubMeb (nh chp ngy 29/11/2007) Trang PubMed cung cp cc phng thc khc nhau cho php tm kim nhng ti liu, tp ch, bo co, n phm v xut bn v khoa hc s sng v y hc cn thit t CSDL khng l ca PubMed. C th tm hiu cc thnh phn trong trang PubMed ti a ch http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed. 1.4. Trang chnh ca cng c BLAST (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi) truy cp vo trang BLAST, t trang ch NCBI chn BLAST

Hnh 4: Giao din trang BLAST (nh chp ngy 29/11/2007) Trang BLAST cung cp b cng c kh n gin nhng mnh m cho php ngi s dng tm kim, so snh chui trnh t ca cc phn t sinh hc hin c trong CSDL (gm nucleotide, protein,...) da trn vic a vo cc chui trnh t c ngi s dng nhp vo dng lm iu kin. B cng c ca BLAST bao gm mt tp hp nhiu cng c khc nhau cung cp cho nhng mc ch tm kim v so snh khc nhau cng nh vi nhng cp ngi dng khc nhau t c bn (basic BLAST) n nng cao (Specialized BLAST). B cng c BLAST c xy dng da trn ngun c s d liu chui trnh t khng l v h thng tm kim cc mnh Entryz m NCBI ang s hu. C th tm hiu cc thnh phn trong trang BLAST ti a ch http://www.ncbi.nlm.nih.gov/blast/Blast.cgi hoc t a CD nh km ti liu ny. Lu : Cng c BLAST s dng nh dng FASTA (Rapid and sensitive protein similarity searches) cho trnh t truy vn a vo dng lm iu kin tm kim hay so snh cc chui trnh t. Cu trc ca ca nh dng FASTA nh sau: >Tn (hoc ID)| (c th thm cc m t khc) Chui trnh t vi dng k t text khng nh dng

Hnh 5: nh dng FASTA ca mt chui protein 1.5. Trang chnh ca CSDL structure (CSDL cu trc ca cc phn t sinh hc) truy cp vo trang Structure t trang ch NCBI ta bm chn Structure hoc ti http://www.ncbi.nlm.nih.gov/sites/entrez?db=Structure. CSDL Structure cha ng mt lng ln cc thng tin v cng c h tr vic nghin cu cu trc ca cc chui phn t sinh hc nh protein. c xy dng bi s kt hp ca CSDL v h thng Entryz, Structure cung cp kh nng tm kim nhanh cu trc ca cc phn t sinh hc.

Hnh 6: Trang chnh ca CSDL Structure C th tm hiu cc thnh phn trong trang Structure ti a ch http://www.ncbi.nlm.nih.gov/sites/entrez?db=Structure hoc t a CD nh km ti liu ny. 1.6. Cc CSDL khc Ngoi cc CSDL trn, c th tm thy ng link trc tip t trang ch NCBI n cc CSDL khc bao gm: PubMed, Protein, 3D Domains, UniGene , UniSTS , SNP, Conserved Domains, Journals, PMC, NCBI Web Site, MeSH, GEO Profiles, Nucleotide, GEO Datasets, Gene, HomoloGene, CancerChromosomes, NLM Catalog, PubChem BioAssay, PubChem Compound, PubChem Substance, GENSAT, Probe, Genome
5

Project, OMIA, dbGaP, Protein Clusters, CoreNucleotide, EST, GSS, Genome, PopSet, All Databases, OMIM, Taxonomy, Books, phc v cho nhng mc ch nghin cu khc nhau C th kim ni dung trc tip trn trang ch hoc bt c trang no bng cch nhp t kha v chn CSDL tng ng. Hoc truy cp trc tip vo trang chnh ca CSDL bng cch chn CSDL t menu th xung v bm Go Hnh 7: Truy cp vo cc CSDL khc trn NCBI 2. S dng cc cng c trn NCBI: Phn ny hng dn s lc cc bc nhm s dng cc phn ca h thng NCBI cho vic nghin cu v hc tp. Bao gm cc cng c tm kim ti liu, tm kim so snh chui m v cu trc ca phn t sinh hc. 2.1. Tm kim trong NCBI Vic tm kim trong NCBI (tr tm kim bng BLAST) hu ht cc CSDL cn li u thc hin vic tm kim bng t kha. T kha bao gm nhng t hoc cm t khc nhau ch ni dung cn tm, cc s ID (s lu tr ca ti liu hoc chui m cn tm), cc thut ton logic (and, or, not,..), cc t kha c bit ch nh v tr cc trng (filed) cn tm (th d [AB] tm trong abtract, c th tm hiu thm v cc t kha ch nh trng ti NCBI handbooks). V d: Cu lnh (("drought tolerance") AND (Wilson SD[Auth])) AND (ecology[Jour]) c ngha tm tt c ti liu c cm drought tolerance tc gi l Wilson SD v ng trn tp ch Ecology C th tm hiu v cch thc nhp t kha v tm kt qu trn trang NCBI hoc tm hiu qua m phng qu trnh trn CD nh km (t kha mc nh th nhim trn CD l drought tolerance) trong thc t trn NCBI cc t kha khc cng c qu trnh tm kim v tr kt q u tng t. + M phng tm kim trong Entrez + M phng tm kim trong PubMeb + M phng tm kim trong Nucleotide + M phng tm kim trong Structure + M phng tm kim trong All database + M phng tm kim trong Protein 2.2. S dng CSDL tm kim PubMed: Vic s dng CSDL PubMed c thc hin nh sau: 2.2.1. Cch 1: S dng CSDL PubMed thng qua trang ch NCBI
6

Bc 1: Chn vng ch nh tm kim l PubMed Bc 2: Nhp iu kin (keywords t kha) tm kim vo vng iu kin v click Go. V d t kha cn tm l drought tolerant

Hnh 8: Minh ha tm kim trc tip t trang ch cc ti liu v nghin cu chu hn hin c trn NCBI Bc 3: Duyt kt qu tm c, chn cch trnh by ni dung cho trang kt qu Th hin mc nh ca nhng ti liu tm c l ni dung tm tt (Summary) ca ti liu . C th thay i ni dung th hin ca ti liu hoc s lng ti liu th hin trn 1 trang bng cch chn cc menu th xung tng ng trong vng Display ( Ghi ch: c th tm hiu cc thnh phn ca trang trnh by kt qu trn CD)

Hnh 9: Mt phn ca cc kt qu tm c t PubMed Ngoi ra c th tng kh nng tm kim chnh xc ti liu bng ty chn Limits 2.2.2. Cch 2: Thao tc trc tip t CSDL PubMed Bc 2: T trang ch bm vo lin kt n trang PubMed Bc 3: Bm chn th Limits Bc 4: Nhp t kha ca ni dung cn tm vo (V d t kha cn tm l drought tolerant).

Hnh 10: Trang cho php gii hn iu kin tm kim trn PubMed Bc 5: Gii hn bt tm kim chnh xc ti liu cn tm + Click chn Add Author tm kim ti liu do mt hoc mt nhm tc gi. + Click chn Add Journal tm kim ti liu c ng bi tp ch hay n phm. + nh du chn vo vng ca tng ng trn Full Text, Free Full Text, and Abstracts tm kim nhng ti liu c ni dung y (Full Text), ni dung y v c truy cp min ph (Free Full Text) hoc tm tt ni dung (Abstracts).

Hnh 11: ba vng chn la quan trng cho php thu hp kt qu tm c + Ngoi ra cn nhiu gii hn tm kim khc cho php thu hp phm vi tm kim ca t kha bao gm: tm theo ngy thng xut bn, tm theo lnh vc, tm theo ngn ng s dng, Bc 6: Click OK bt u tm kim. Bc 7: Chn cch trnh by ni dung cho trang kt qu. 2.3. S dng cng c tm kim Entrez
8

Cng c tm kim Entrez l c my tm kim chnh bao trm ton b phc v cho mc ch tm kim thng tin trong cc CSDL ca h thng ncbi. Cch tm kim thng tin bng Entrez: Bc 1: Bm chn All Database t trang ch Bc 2: Nhp t kha ca ni dung cn tm vo tm kim v bm Go

Hnh 12: Kt qu thu c khi tm kim vi t kha trong trang All database Bc 3: Chn kt qu thu c bng cch bm vo CSDL cn nghin cu 2.4. S dng cng c tm kim BLAST: Trn trang BLAST c nhiu cng c so snh khc nhau nhm phc v cho nhng mc ch tm kim chui trnh t khc nhau trong ngn hng gen cng nh ngn hng chui trnh t. Hu ht cc cng c trn u hot ng theo mt s bc nht nh nh sau: Bc 1: Truy cp vo trang BLAST Bc 2: Bm chn cng c cn s dng (giao din v nguyn tc ca cc cng c kh ging nhau ch khc nhau v chc nng)

Hnh 13: Giao din cng c nucleotide BLAST Bc 3: Nhp vo s truy cp hoc chui trnh t vo vng nhp liu hoc ti ln t my tnh i vi nhng chui trnh t chun b sn Bc 4: Ch nh yu cu, gii hn hoc cch thc hot ng cho cng c sau bm nt BLAST thc hin qu trnh tm kim v so snh Sau h thng s thc hin vic tm kim, ngi s dng ch i qu trnh din ra cho n khi kt qu c tr v. Bc 5: Nu tm thy kt qu, ngi s dng cn la chn cch trnh by kt qu ph hp vi mc ch s dng. Vic ch nh cng nh gii hn cc iu kin cho vic so snh chui rt hu ch, tuy vy trong khun kh ti liu ny chng ti khng th m phng c th s thay i trong kt qu Blast khi c cc gii hn v iu kin khc nhau 2.5. Cc bi tp ng dng tm kim bng BLAST 2.5.1. Bi tp 1: C chui trnh t nh sau s dng BLAST tm kim v so snh vi CSDL hin c ca NCBI (y l chui d liu ca protein Cys2/His2 m ha bi gen ZPT2 -3 c vai tr quan trng trong kh nng chng chu kh hn - PubMeb). 2.5.1.1. Tm vi d liu l protein kt qu l protein
MERHRCKLCSRSFMNGRALGGHMRSHLATLPLPLKKQKTPGNSNFQLGGGTESDSSSTR SEDENNNNNNNNNKLSSYELRDNPRKSVKALDPEFMDAGSIVVQDRESETESTQNPTRR RSKRASQRTSRQLEFEVPKKCKWVGSESAAESTPVSSVSDPSQDEEVALCLMMLSRDAW ERVEKEKSVEDTNESATELKTGLITRRPATRVAAKFKCLGCKKVFRTGRALAGHKASNK QCCHENSTSDDHVNVVGVKIFECPFCYKVFGSGQALGGHKRSHLLGLSSANNNNNNNN NNANVVASNNADRVGETTTTTTTTNTSFILDLNLPAPFEDDDEDDHI

Bc 1: Truy cp vo trang Blast Bc 2: Bm chn so snh protein Blast


10

Bc 3: Chp chui trnh t trn vo tm kim, cc ty chn khc nguyn vo bm nt BLAST thc hin lnh. Bc 4: Xem v nhn xt kt qu. 2.5.1.2. Tm vi d liu l protein kt qu l cc translated nucleotide Bc 1: Truy cp vo trang Blast Bc 2: Bm chn tblastn Bc 3: Chp chui trnh t protein trn vo tm kim, cc ty chn khc nguyn vo bm nt BLAST thc hin lnh Bc 4: Xem v bm vo mt chui m (chui u tin) v nhn xt kt qu. 2.5.1.3. Tm vi d liu l nucleotide kt qu l nucleotide: ATGGAGAGACACAGATGCAAACTTTGTTCTAGGAGCTTTATGAATGGTAGAG CATTGGGTGGTCATATGAGGTCTCATTTAGCTACTTTACCTCTTCCTCTTAAGA AGCAAAAAACTCCTGGAAATTCAAATTTCCAACTCGGTGGTGGGACCGAGTC CGACTCGTCCTCAACTCGTTCAGAAGACGAGAATAATAATAATAATAATAAT AATAATAAACTGAGTTCGTACGAGTTGAGGGATAACCCAAGGAAGAGTGTTA AGGCATTAGATCCCGAGTTTATGGATGCAGGGTCAATCGTTGTGCAAGACAG GGAAAGCGAGACCGAGTCAACTCAGAACCCAACTCGGAGACGATCTAAGAG GGCGAGTCAGAGGACGAGCCGGCAACTCGAGTTTGAAGTGCCGAAGAAATGT AAATGGGTTGGGTCGGAGTCAGCCGCTGAATCGACCCCGGTCAGTTCCGTGTC TGACCCGAGTCAGGATGAAGAGGTTGCACTTTGTCTTATGATGCTGTCTAGGG ATGCTTGGGAGAGAGTTGAGAAGGAGAAGTCTGTTGAGGATACTAATGAGTC GGCGACCGAGTTGAAGACGGGTTTAATAACACGTCGTCCTGCAACTCGTGTG GCCGCAAAATTCAAGTGTTTGGGATGTAAAAAAGTGTTCAGGACAGGCAGGG CACTAGCTGGGCATAAGGCGTCTAATAAACAATGTTGCCATGAAAATTCGAC AAGTGATGATCATGTTAATGTGGTGGGAGTAAAAATATTTGAATGCCCGTTTT GTTATAAGGTTTTTGGGTCGGGTCAAGCTTTGGGAGGTCATAAAAGATCACAC CTTTTAGGGTTGTCATCGGCTAACAACAACAACAACAACAACAATAATAATG CTAATGTTGTTGCATCTAACAATGCTGATAGAGTTGGTGAAACTACCACTACT ACGACTACTACTAATACTAGCTTTATTTTGGATCTCAACTTGCCTGCACCGTTT GAAGATGATGATGAGGACGATCATATATAG Bc 1: Truy cp vo trang Blast Bc 2: Bm chn so snh nucleotide Blast Bc 3: Chp chui trnh t trn vo tm kim, trong vng chn Choose Search Set bm chn CSDL cn so snh l nguyn vo bm nt BLAST thc hin lnh. . Cc ty chn khc
11

Bc 4: Xem v nhn xt kt qu. 2.5.1.4. Tm vi d liu l translated nucleotide kt qu l protein Bc 1: Truy cp vo trang Blast Bc 2: Bm chn so snh Blastx Bc 3: Chp chui trnh t trn vo tm kim. Cc ty chn khc nguyn vo bm nt BLAST thc hin lnh. Bc 4: Xem v nhn xt kt qu 2.5.1.5. Tm vi d liu l translated nucleotide kt qu l translated nucleotide Bc 1: Truy cp vo trang Blast Bc 2: Bm chn so snh tBlastx Bc 3: Chp chui trnh t trn vo tm kim. Cc ty chn khc nguyn vo bm nt BLAST thc hin lnh. Bc 4: Xem v nhn xt kt qu 2.5.1.6. Tm trong CSDL gen ca mt hoc mt nhm sinh vt Bc 1: Chn mt trong cc CSDL gen ca sinh vt cn nghin cu (V d nh km CD m phng cho qu nhm Microbes) Bc 2: Chp chui trnh t protein trn vo tm kim (ng thi xc nh loi truy vn l DNA hay protein trong v d l protein) Bc 3: Ch nh cn so snh vi CSDL gen ca sinh vt no. Nu chn ht th bm Select All ( Bc 4: Bm BLAST thc hin so snh Bc 5: Trong trang k bm chn vo qu. Bc 6: Xem kt qu so snh v tm kim. a ra nhn xt ). xem kt

2.5.2. Bi tp 2: C cc chui trnh t nh sau s dng BLAST tm kim v so snh vi CSDL hin c ca NCBI (y l chui d liu ca protein alcohol dehydrogenase c lin quan n tnh chng chu hn PubMeb) Protein QGQTPLFPRIFGHEAAGIVESIGEGV Nucleotide GGTCTCGGAGTGGATCGATTTGGGATTCTGTTCGAAGATTTGCGG AGGGGGGCAATGGCGACCGCGGGGAAGGTGATC Thc hin cc bc nh bi tp 1 2.5.3. Bi tp 3:
12

C cc chui trnh t nh sau s dng BLAST tm kim v so snh vi CSDL hin c ca NCBI (y l chui d liu ca c lin quan n tnh chng chu hn trn cy Arabidopsis thaliana v thc vt thuc h Oryza sativa PubMeb). Protein MEVEASYSYGFLPSGRHQPYAPPPPHPAEEGELWEYFPCPFCYIEVEVP FICNHLQEEHCFDTRNAVCPLCADNIGRDMGAHFRVQHSHLLKRRKP SRPSSSWPTPSNNSDPYFEGPPQYMMNNRTYQDPAPDPLLSQFICSMA QTDTNSDNTNTEIAVSAVSHDQRLSQRVTLTDDASKLELKERLQRIEF VKEIIMSTIL Nucleotide ATCTCTTCTATGTCTTTCACTCTCTCTCTCTCTCTATATCCAAAGAA TTAAAAACCATAATAAAAAAAGAAGATAAAAAGACGAATTCCAGA AAAAAAAGACGCAAAATCGTCGTCGTCGTCTTCGTCTTTGCATTTC GTCGGAATCTTTTGATTCTTGAATCGGAATATCTCTGTTTTTGTTTT ATCCGGACTCAAGATCAATTCGGATTCTTGGAATTTATTTGATTTTT TGTTGTTGTTGTTGAAAAAGTGGATTCTTTGGTTTCGATTTGTAATA ATCTTCGTAGAAAAAATGGATTCTGATTCTTGGAGTGATCGTCTCG CATCGGCTTCAAGAAGATATCAGCTCGATTTCTTGTCTCGATCTGA CAATTTCTTGGGGTTTGAGGAGATAGAAGGAGAAGATGACTTCAG GGAGGAGTATGCTTGCCCGTTCTGTTCAGACTATTTTGATATCGTCT CTCTATGCTGTCACATTGATGAAGATCATCCTATGGATGCAAAAAA TGGGGTATGTCCCATTTGTGCGGTGAAAGTGAGCTCTGATATGATT GCTCATATAACCCTACAACATGCAAATATGTTCAAGGTGACGCGG AAAAGGAAATCAAGAAGAGGCGGGGCTCAATCCATGCTATCGATC TTGAAGAGAGAGTTTCCTGATGGAAATTTTCAGAGCCTATTTGAAG GAACATCACGTGCTGTATCCTCTTCTTCTGCCAGTATAGCTGCTGAT CCTTTACTGTCTTCATTCATTTCACCAATGGCTGATGACTTTTTCAT TTCTGAGTCAAGTCTATGTGCAGACACAAGTTCTGCCAAGAAAACA TTGAATCAGAGTTTGCCTGAAAGGAATGTTGAGAAGCAGTCTCTTT CAGCAGAGGATCACAGAGAAAAGTTGAAACAGAGCGAGTTCGTTC AAGGGATATTGAGCTCTATGATTCTTGAAGACGGCTTATAAAGGA GAAACATATTCCGGTAACTTGTCTACGGTTAGATGATTCAGCGATT GGAAAATGTAACGGCTTTGTTGTGTTAAACGCAACAATTGATTTGG GAAAGGGTTTGGGAATAAGCAGAATAATGTAAAAGAGAGACATG ATAATGAGTATTATTTTTAACTTATGAACTACATATTTGCTTTAATG AACACTCGAATTGTCTGTACACCGTAGGTCTTACAAAAAGAAACC AAAAAAAGGTATGTATTTGATCATATATTTGCACTGAGTTTTCTGG
13

TCT Thc hin cc bc nh bi tp 1.

14

You might also like