Professional Documents
Culture Documents
Faculty of Science
Institute of Biological Science
Bioinformatics SHGB 6111
Assignment
FEBRUARY,2010
2
Q1)Below is the protein sequence of DNA-directed RNA polymerase alpha subunit/40kD subunit
in Fasta format. The data is for the organisms:-
Archaeoglobus fulgidus
Thermoplasma acidophilum
Thermoplasma volcanium
Mycobacterium tuberculosis
Mycobacterium leprae
Haemophilus influenzae
SEQUANCE RETRIEVAL
>sp|O28002|RPOD_ARCFU DNA-directed RNA polymerase subunit D OS=Archaeoglobus fulgidus
GN=rpoD PE=3 SV=1
MMPEIEILEEKDFKIKFILKNASPALANSFRRAMKAEVPAMAVDYVDIYLNSSYFYDEVI
AHRLAMLPIKTYLDRFNMQSECSCGGEGCPNCQISFRLNVEGPKVVYSGDFISDDPDVVF
AIDNIPVLELFEGQQLMLEAVARLGTGREHAKFQPVSVCVYKIIPEIVVNENCNGCGDCI
EACPRNVFEKDGDKVRVKNVMACSMCGECVEVCEMNAISVNETNNFLFTVEGTGALPVRE
VMKKALEILRSKAEEMNKIIEEIQ
1a) Use ClustalW to align these sequences, choose phylip format for the out put
ALLIGNMENT
6 356
tuberculos -----MLISQ RPTLSEDVLT DNRSQFVIEP LEPGFGYTLG NSLRRTLLSS
leprae -----MLISQ RPTLSEDILT DNRSQFVIEP LEPGFGYTLG NSLRRTLLSS
influenzae MQGSVTEFLK PRLVDIEQIS STHAKVILEP LERGFGHTLG NALRRILLSS
acidophilu ---------- -MDVKIFKLS DKYIRFEIDG ITP----SQA NALRRTLIND
volcanium ---------- -MEVKIFKLS DKYMSFEIDG ITP----SQA NALRRTLIND
falgidus ---------M MPEIEILEEK DFKIKFILKN ASP----ALA NSFRRAMKAE
AETEQL
AETEQL
------
------
------
------
---------- ----
---------- ----
---------- ----
ILDVETKNSI SPRD
ILDVETKSSI TP--
VIELETN--- ----
TableII: Score table of multiple alignment in ClustalW2 using half sequence of
Figure II:Jalview NJ TREE and BLOSUM distance of the RNA alpha subunit /40kD half sequences
-
-
Q
-
-
Figure III: mixed sequences NJ-Tree showing BLOSUM62 CALCULATED DISTANCE IN Jalview
When table I and table II above were considered, where the results of multiple alignment scores of both full
sequence and half sequence were presented; we can see that both M. tuberculosis and M. leprae hold the same
highest score of 95 in both tables, while when we consider othe microbes we will see that though the position
remain the same in both tables, but there is slight differences in the alignment score, this clearly show that uniform
deletion of a related sequences gives almost related score, but however, when the deletion is done in random or
sequences are mixed between that of one species to another as in table III above, it show clear reduction in score
points as well as lineage relationship.
9
When we considered the NJ- tree constructed from both sequence alignments i.e. figures I, II and III we will
notice that, uniform deletion of half of the sequeances does not cause destruction in lineage relationship in the NJ
tree when figure I and II are compared respectively, but however the BLOSUM62 calculated distance do varies
considerably, which result in repositioning of the group lineage from top position in figure I to down position in
figure II. This is because the NJ tree in figure I shows that A. falgidus is more likely related to Thermoplasma
strains than the Mycobacterium strains or H. influenzae, and this is true when we consider the theoretical
relationship shared among the three i.e A. falgidus, T. acidophilum and T. volcanium, which are both
thermophyles bacteria and resembles the archae groups. Likewise, Mycobacterial strains are more related H.
influenzae which are both human parasite a traits that may possibly evolve along their evolutionary course. When
we noticed the figures carefully in I and II we will see that there is repositioning of the lineage group as a result of
half sequence deletion, this causes the BLOSUM62 distance values to varies and consequently change the position
of the lineage group in the NJ tree, example in figure I falgidus is related to other Thermoplasma by BLOSUM62
distance of 763.00 i.e when the full retrieved sequence was used, but as we deleted half the sequence in figure II
BLOSUM62 distance change to 422.94 though related, but the distance is more closer here than in figure I, and this
lower in BLOSUM62 value change the lineage group in the NJ tree from top position in figure I to down position
in figure II. One we further consider figure III where the rerieved sequences were mixed among related species
but not in order, we will be supprise to see that both the NJ tree and BLOSUM62 distance calculation, totally
differ from those in figure I and II, we can see that in figure III M. tuberculosis is now related to lineage of the
Archaeoglobus and Thermoplasma, while M. leprae a more related species to M. tuberculosis is placed as an out group,
hence, random changes in sequences can easily change the derived phylogenetic information leading to wrong
conclusion.
In conclution,
we see that uniform sequence deletion (a deletion which is in order, like deleting last two or three lines of
each sequence) among related phylogeny may not cause change in the NJ tree lineage grouping, but the
alignment score and BLOSUM62 distance calculation values may differ which may result in different in
lineage group positioning in the NJ tree as depicted in figure I and II.
Random deletion or random mixing of the sequences totally gives another information which may lead to
wrong conclusion as seen in figure III
NJ tree is quite useful in depicting the phylogenetic relationship, because in the tree I was able to see that
as I had expected the phylogentic relationship to be between the Thermoplasma and Archaeoglobus and
between Mycobacterial and H. influenzae was successfully depicted.
2.
Sulfolobus solfataricus gene for 16S ribosomal RNA
>gi|47609|emb|X03235.1| Sulfolobus solfataricus gene for 16S ribosomal RNA
ATTCCGGTTGATCCTGCCGGACCCGACCGCTATCGGGGTAGGGATAAGCCATGGGAGTCTTACACTCCCG
GGTAAGGGAGTGTGGCGGACGGCTGAGTAACACGTGGCTAACCTACCCTCGGGACGGGGATAACCCCGGG
AAACTGGGGATAATCCCCGATAGGGAAGGAGTCCTGGAATGGTTCCTTCCCTAAAGGGCTATAGGCTATT
TCCCGTTTGTAGCCGCCCGAGGATGGGGCTACGGCCCATCAGGCTGTCGGTGGGGTAAAGGCCCACCGAA
CCTATAACGGGTAGGGGCCGTGGAAGCGGGAGCCTCCAGTTGGGCACTGAGACAAGGGCCCAGGCCCTAC
GGGGCGCACCAGGCGCGAAACGTCCCCAATGCGCGAAAGCGTGAGGGCGCTACCCCGAGTGCCTCCGCAA
10
GGAGGCTTTTCCCCGCTCTAAAAAGGCGGGGGAATAAGCGGGGGGCAAGTCTGGTGTCAGCCGCCGCGGT
AATACCAGCTCCGCGAGTGGTCGGGGTGATTACTGGGCCTAAAGCGCCTGTAGCCGGCCCACCAAGTCGC
CCCTTAAAGTCCCCGGCTCAACCGGGGAACTGGGGGCGATACTGGTGGGCTAGGGGGCGGGAGAGGCGGG
GGGTACTCCCGGAGTAGGGGCGAAATCCTTAGATACCGGGAGGACCACCAGTGGCGGAAGCGCCCCGCTA
GAACGCGCCCGACGGTGAGAGGCGAAAGCCGGGGCAGCAAACGGGATTAGATACCCCGGTAGTCCCGGCT
GTAAACGATGCGGGCTAGGTGTCGAGTAGGCTTAGAGCCTACTCGGTGCCGCAGGGAAGCCGTTAAGCCC
GCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTTAAAGGAATTGGCGGGGGAGCACCACAAGGGGTGGA
ACCTGCGGCTCAATTGGAGTCAACGCCTGGAATCTTACCGGGGGAGACCGCAGTATGACGGCCAGGCTAA
CGACCTTGCCTGACTCGCGGAGAGGAGGTGCATGGCCGTCGCCAGCTCGTGTTGTGAAATGTCCGGTTAA
GTCCGGCAACGAGCGAGACCCCCACCCCTAGTTGGTATTCTGGACTCCGGTCCAGAACCACACTAGGGGG
ACTGCCGGCGTAAGCCGGAGGAAGGAGGGGGCCACGGCAGGTCAGCATGCCCCGAAACTCCCGGGCCGCA
CGCGGGTTACAATGGCAGGGACAACGGGATGCTACCTCGAAAGGGGGAGCCAATCCTTAAACCCTGCCGC
AGTTGGGATCGAGGGCTGAAACCCGCCCTCGTGAACGAGGAATCCCTAGTAACCGCGGGTCAACAACCCG
CGGTGAATACGTCCCTGCTCCTTGCACACACCGCCCGTCGCTCCACCCGAGCGCGAAAGGGGTGAGGTCC
CTTGCGATAAGTGGGGGATCGAACTCCTTTCCCGCGAGGGGGGAGAAGTCGTAACAAGGTAGCCGTAGGG
GAACCTGCGGCTGGATCACCTCATATATTTACTCCCCCGCTAATTGGGTGGGAGGGCTTCACTAAAACTC
GTAATCTTCCCTTTTATAGATGCAGTTCTCCTCTTGGGCCAGAGGGGAATGAAGTGCCTAGGGCCCATTT
GGCAGAGACATACAAATATGTCTCTGCCAAGTTAGGGCTCAATGAGGCTAGTACTAGGTAGCCACATTAT
AGCCGTCTAGGAGTTCTACCCAGGGGCCGAAGCCTCCCGGTGGATGGCT
GAGGCGCGAAAGTGTGGGGAGCAAACCGGATTAGATACCCGGGTAGTCCACACCCTAAACGATGTACGTT
GGCTAAGCGCAGGATGCTGTGCTTGGCGAAGCTAACGCGATAAACGTACCGCCTGGGAAGTACGGCCGCA
AGGTTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAAC
GCGAAGAACCTTACCAGGTCTTGACATGCTAGGAACTTTGCAGAGATGCAGAGGTGCCCTTCGGGGAACC
TAGACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCG
CAACCCTTGCCTTTAGTTGTCAGCATTCAGTTGGACACTCTAGAGGGACTGCCTATGAAAGTAGGAGGAA
GGCGGGGATGACGTCTAGTCAGCATGGTCCTTACGTCCTGGGCGACACACGTGCTACAATGGGTAGGACA
ACGCGCAGCAAACCCGCGAGGGTAAGCGAATCGCTAAAACCTATCCCCAGTTCAGATCGGAGTCTGCAAC
TCGACTCCGTGAAGTTGGAATCGCTAGTAATCGCGGGTCAGCATACCGCGGTGAATACGTTCCCGGGCCT
TGTACACACCGCCCGTCACACCATGGGAGTAGATTGCAGTTGAAACCGCCGGGAGCTTTGCGGCAGGCGT
CTAGACTGTGGTTTATGACTGGGGTGAAGTCGTAACAAGGTAACTGTACCGGAAGGTGCGGCTGGA
ATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGACAGAACAAAGGGCAGCGAAACCGCGAG
GTTAAGCCAATCCCACAAATCTGTTCTCAGTTCGGATCGCAGTCTGCAACTCGACTGCGTGAAGCTGGAA
TCGCTAGTAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACA
CCACGAGAGTTTGTAACACCCGAAGTCGGTGAGGTAACCTTTTAGGAGCCAGCCGCCGAAGGTGGGACAG
ATGATTGGGGTGAAGTCGTAACAAGGTAGCCGTATCGGAAGGTGCGG
Sulfolobus -----ATTCCGGTTGATCCTGCCGGACCCG-ACCGCTATCGGGGTAGGGATAAGCCATGG 54
Thermococcus -----ATTCCGGTTGATCCTGCCGGAGGCGCACTGCTATGGGGGTCCGACTAAGCCATGC 55
radiodurans -------------------------AGGGTGAACGCTGGCGGCGT--GCTTAAGACATGC 33
Bacillus -----------------CCTGGCTCAGGACGAACGCTGGCGGCGT--GCCTAATACATGC 41
Thermus NTGGTGGAGAGTTTGATCCTGGCTCAGGGTGAACGCTGGCGGCGT--GCCTAAGACATGC 58
* * *** ** ** * *** ****
Table iv: ClustalW@ multiple alignment score table of ssrRNA gene sequences
16
2b)
Figure iv: NJ –tree showing bootstraps drawn using MEGA4.1 phylogenetic analysis software
ORIGINAL NJ-TREE
67 radiodurans
Bacillus subtilis
Thermus aquaticus
Sulfolobus solfataricus
100 Thermococcus marinus
0.05
BOOTSTRAP CONSENSUS TREE (after bootstrapping, the evolutionary distances that were
computed using the Maximum Composite Likelihood method appeared to reduce between radiodurans
and B. subtillis as compared to their position in original NJ-tree above)
67 radiodurans
Bacillus subtilis
Thermus aquaticus
Sulfolobus solfataricus
100 Thermococcus marinus
0.05
ORIGINAL MP –TREE
(c)Discuss the biological relevance of your findings. Which of the resulting trees
appears to best reflect the actual evolution? Give a justification for your choice?
From the academic literatures, we knew that Sulfolobus solfataricus and Thermococcus
marinus are grouped as extreme thermophilic archaea sharing closely related features,
which may possibly arises from the same evolutionary lineage. Bacillus subtilis is a
common rod shape gram positive bacteria. Deinococcus is gram positive cocci and Thermus
aquaticus is a non-motile, rod-shape to filamentous microbe. Deinococcus has ornithine as
the di-amino acid in its cell wall peptidoglycan. From the alignments result, both NJ
(figue iv) and MP tree (figure v)were generated using MEGA4.1 phylogentic analysis
software. NJ is a distance-based method for tree building. It simply counts the differences
between 2 sequences. This number is referred to as the evolutionary distance. The actual
tree is then computed from the matrix of distance values by running a clustering
algorithm that starts with the most similar sequences. As for Maximum Parsimony, it
searches for a tree that requires the smallest number of changes to explain the differences
observed among the taxa under study.
Based on the findings above figure iv and v respectively, it appeared that the MP tree is
more closely resemble to the cladogram generated by clusterW. In the MP tree, Thermus
aquaticus,Thermococcus marinus and Sulfolobus solfataricus are all extreme thermophiles, are
rooted on the same branch. Bacillus subtilis, and Deinococcus radiodurans are both gram
positive bacteria, have little in common with the other three bacteria, are thus chosen as
the out group.