Professional Documents
Culture Documents
A. Gene annotation
Step 1. Accesing EMBL database to retrieve the gene
Go to EMBL database
Press Go button
Have a look at the different entry fields: detect the mRNA and CDS exons
Click on Text Entry link to see the plain text formatted output
Figure 1. Signal, exons and genes predicted by geneid in the sequence HS307871
Step 3. Running other genefinders
Provided that there are several alternative programs to analyze a DNA sequence, we can run every application and
observe the common parts of the predictions.
1. GENSCAN:
o
2. FGENESH:
o
3. GRAIL:
o
4. NOTE: First exon is always missed in the predictions and there are some problems to detect the donor site
from exon 5. Detection of Start codons is a serious drawback in current gene finding programs (see Figure 2).
However, this problem can be overcome by using homology information to complete the gene prediction.
Compare annotations, ab initio GRAIL prediction and five predicted alternative spliced variants
Connect to the FGENESH-C server (on Gene finding with similarity menu)
Notice that predicted gene will necessarily supported by homology information, so it will likely mapped only in
the genomic region overlapping your EST query.
Paste the genomic sequence and press the Blast! and Format!
Select the first protein. Display the FASTA sequence or click here. Obviously, it is the real protein annotated in
the genomic sequence.
Open genewise web server to use this protein to predict the best gene structure
Paste both protein and genomic sequences and run the program
Compare predicted gene (end of the file) and annotations: look for splice sites within introns to check exon
boundaries are correct
Homo sapiens
Ovis aries
Mus musculus
Rattus norvegicus
Danio rerio
Drosophila melanogaster
Drosophila virilis
Saccharomyces cerevisiae
Schizosaccharomyces pombe
Check the results. It seems that the first exon has not been detected even using homology information. This is
due to the fact that blast programs have a minimal word lenght.
Select the blat link to locate the genomic coordinates of our sequence
Compare the graphical annotation with the EMBL entry of the gene
F. Results
Here you can find the solutions to every exercise:
EMBL annotation
EMBL annotation (plain text)
FASTA sequence
geneid results: signals
geneid results: exons
F. Bibliography
1. J.F. Abril and R. Guig. gff2ps: visualizing genomic annotations. Bioinformatics 16:743-744 (2000).
2. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. Basic local alignment search tool. J. Mol.
Biol. 215:403-410 (1990).
3. Burge, C. and Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol.
268, 78-94 (1997).
4. E. Blanco, G. Parra and R. Guig. Using geneid to Identify Genes. In A. D. Baxevanis and D. B. Davison,
chief editors: Current Protocols in Bioinformatics. Volume 1, Unit 4.3. John Wiley & Sons Inc., New York.
ISBN: 0-471-25093-7 (2002).
5. G. Parra, E. Blanco, and R. Guig. Geneid in Drosophila. Genome Research 10:511-515 (2000).
6. Asaf A. Salamov and Victor V. Solovyev. Ab initio Gene Finding in Drosophila Genomic DNA Genome
Res. 10: 516-522 (2000).
7. Yeh, R.-F., Lim, L. P. and Burge, C. B. Computational inference of homologous gene structures in the
human genome. Genome Res. 11: 803-816 (2001).
8. D. Hyatt, J. Snoddy, D. Schmoyer, G. Chen, K. Fischer, M. Parang, I. Vokler, S. Petrov, P. Locascio, V. Olman,
Miriam Land, M. Shah, and E. Uberbacher. Improved Analysis and Annotation Tools for Whole-Genome
Computational Annotation and Analysis: GRAIL-EXP Genome Analysis Toolkit and Related Analysis