You are on page 1of 17

1

Faculty of Science
Institute of Biological Science
Bioinformatics SHGB 6111
Assignment

Name : AHMAD MOHAMED GUMEL

Matric no. : SGF 090001

Lecturer : Dr. Geok Yuam Annie Tan

FEBRUARY,2010
2

Q1)Below is the protein sequence of DNA-directed RNA polymerase alpha subunit/40kD subunit
in Fasta format. The data is for the organisms:-

Archaeoglobus fulgidus
Thermoplasma acidophilum
Thermoplasma volcanium
Mycobacterium tuberculosis
Mycobacterium leprae
Haemophilus influenzae
SEQUANCE RETRIEVAL
>sp|O28002|RPOD_ARCFU DNA-directed RNA polymerase subunit D OS=Archaeoglobus fulgidus
GN=rpoD PE=3 SV=1
MMPEIEILEEKDFKIKFILKNASPALANSFRRAMKAEVPAMAVDYVDIYLNSSYFYDEVI
AHRLAMLPIKTYLDRFNMQSECSCGGEGCPNCQISFRLNVEGPKVVYSGDFISDDPDVVF
AIDNIPVLELFEGQQLMLEAVARLGTGREHAKFQPVSVCVYKIIPEIVVNENCNGCGDCI
EACPRNVFEKDGDKVRVKNVMACSMCGECVEVCEMNAISVNETNNFLFTVEGTGALPVRE
VMKKALEILRSKAEEMNKIIEEIQ

>sp|Q9HJD9|RPOD_THEAC DNA-directed RNA polymerase subunit D OS=Thermoplasma acidophilum


GN=rpoD PE=3 SV=1
MDVKIFKLSDKYIRFEIDGITPSQANALRRTLINDIPKLAIENVTFHHGEIRDSEGNVYD
SSLPLFDEMVAHRLGLIPLKTDLTMNFRDQCSCGGKGCSLCTVTYSINKLGPSTVFSSDL
QAVSHPDLIPVDGEIPIVKLGPKQAILITAEAILGTAKEHAKWQVTSGVSYKYHREFHVS
KKDFEDWQKLKGACPKSVMSETDTEIVFTDDFGCNDLNVLFESDGVNMIEDDSRFIFQFE
TDGSLTAKETLLYALNRLKDRWDILVESLSE

>sp|Q97B93|RPOD_THEVO DNA-directed RNA polymerase subunit D OS=Thermoplasma volcanium


GN=rpoD PE=3 SV=1
MEVKIFKLSDKYMSFEIDGITPSQANALRRTLINDIPKLAIENVTFHHGEIRDAEGNVYD
SSLPLFDEMVAHRLGLIPLKTDLSLNFRDQCSCGGKGCSLCTVTYSINKIGPASVMSGDI
QAISHPDLVPVDPDIPIVKLGAKQAILITAEAILGTAKEHAKWQVTSGVAYKYHREFEVN
KKLFEDWAKIKERCPKSVLSEDENTIVFTDDYGCNDLSILFESDGVQIKEDDSRFIFHFE
TDGSLTAEETLSYALNRLMDRWGILVESLSE

>sp|P66701|RPOA_MYCTU DNA-directed RNA polymerase subunit alpha OS=Mycobacterium


tuberculosis GN=rpoA PE=3 SV=1
MLISQRPTLSEDVLTDNRSQFVIEPLEPGFGYTLGNSLRRTLLSSIPGAAVTSIRIDGVL
HEFTTVPGVKEDVTEIILNLKSLVVSSEEDEPVTMYLRKQGPGEVTAGDIVPPAGVTVHN
PGMHIATLNDKGKLEVELVVERGRGYVPAVQNRASGAEIGRIPVDSIYSPVLKVTYKVDA
TRVEQRTDFDKLILDVETKNSISPRDALASAGKTLVELFGLARELNVEAEGIEIGPSPAE
ADHIASFALPIDDLDLTVRSYNCLKREGVHTVGELVARTESDLLDIRNFGQKSIDEVKIK
LHQLGLSLKDSPPSFDPSEVAGYDVATGTWSTEGAYDEQDYAETEQL

>sp|Q9X798|RPOA_MYCLE DNA-directed RNA polymerase subunit alpha OS=Mycobacterium leprae


GN=rpoA PE=3 SV=1
MLISQRPTLSEDILTDNRSQFVIEPLEPGFGYTLGNSLRRTLLSSIPGAAVTSIRIDGVL
HEFTTVPGVKEDVTEIILNLKGLVVSSEEDEPVTMYLRKQGPGEVTAGDIVPPAGVTLHN
PGMRIATLNDKGKIEAELVVERGRGYVPAVQNRALGAEIGRIPVDSIYSPVLKVTYKVDA
TRVEQRTDFDKLILDVETKSSITPRDALASAGKTLVELFGLARELNVEAEGIEIGPSPAE
ADHIASFALPIDDLDLTVRSYNCLKREGVHTVGELVSRTESDLLDIRNFGQKSIDEVKVK
LHQLGLSLKDSPDSFDPSEVAGYDVTTGTWSTDGAYDSQDYAETEQL
3

sp|P43737|RPOA_HAEIN DNA-directed RNA polymerase subunit alpha OS=Haemophilus influenzae


GN=rpoA PE=3 SV=1
MQGSVTEFLKPRLVDIEQISSTHAKVILEPLERGFGHTLGNALRRILLSSMPGCAVTEVE
IDGVLHEYSSKEGVQEDILEVLLNLKGLAVKVQNKDDVILTLNKSGIGPVVAADITYDGD
VEIVNPDHVICHLTDENASISMRIRVQRGRGYVPASSRTHTQEERPIGRLLVDACYSPVE
RIAYNVEAARVEQRTDLDKLVIELETNGALEPEEAIRRAATILAEQLDAFVDLRDVRQPE
IKEEKPEFXPILLRPVDDLELTVRSANCLKAETIHYIGDLVQRTEVELLKTPNLGKKSLT
EIKDVLASRGLSLGMRLENWPPASIAED

1a) Use ClustalW to align these sequences, choose phylip format for the out put
ALLIGNMENT

ClustalW2 multiple sequence alignment using phylip output format

6 356
tuberculos -----MLISQ RPTLSEDVLT DNRSQFVIEP LEPGFGYTLG NSLRRTLLSS
leprae -----MLISQ RPTLSEDILT DNRSQFVIEP LEPGFGYTLG NSLRRTLLSS
influenzae MQGSVTEFLK PRLVDIEQIS STHAKVILEP LERGFGHTLG NALRRILLSS
acidophilu ---------- -MDVKIFKLS DKYIRFEIDG ITP----SQA NALRRTLIND
volcanium ---------- -MEVKIFKLS DKYMSFEIDG ITP----SQA NALRRTLIND
falgidus ---------M MPEIEILEEK DFKIKFILKN ASP----ALA NSFRRAMKAE

IPGAAVTSIR IDGVLHEFTT VPGVKEDVTE IILNLKSLVV SSEEDEPVTM


IPGAAVTSIR IDGVLHEFTT VPGVKEDVTE IILNLKGLVV SSEEDEPVTM
MPGCAVTEVE IDGVLHEYSS KEGVQEDILE VLLNLKGLAV KVQNKDDVIL
IPKLAIENVT FH--HGEIRD SEGNVYDSSL PLFDEMVAHR LGLIPLKTDL
IPKLAIENVT FH--HGEIRD AEGNVYDSSL PLFDEMVAHR LGLIPLKTDL
VPAMAVD--- ---------- -YVDIYLNSS YFYDEVIAHR LAMLPIKTYL

Y-LRKQGPGE VTAGDIVPPA GVTVHNPGMH IATLNDK-GK LEVELVVERG


Y-LRKQGPGE VTAGDIVPPA GVTLHNPGMR IATLNDK-GK IEAELVVERG
T-LNKSGIGP VVAADITYDG DVEIVNPDHV ICHLTDENAS ISMRIRVQRG
T-MNFRDQCS CGGKGCSLCT VTYSINKLGP STVFSSDLQA VSHPDLIP-V
S-LNFRDQCS CGGKGCSLCT VTYSINKIGP ASVMSGDIQA ISHPDLVP-V
DRFNMQSECS CGGEGCPNCQ ISFRLNVEGP KVVYSGD--F ISDDPDVVFA

RGYVPAVQNR ASG--AEIGR IPVDSIYSPV LKVTYKVDAT RVEQRTDFDK


RGYVPAVQNR ALG--AEIGR IPVDSIYSPV LKVTYKVDAT RVEQRTDFDK
RGYVPASSRT HTQEERPIGR LLVDACYSPV ERIAYNVEAA RVEQRTDLDK
DGEIPIVKLG PKQ----AIL ITAEAILGTA KEHAKWQVTS GVSYKYHREF
DPDIPIVKLG AKQ----AIL ITAEAILGTA KEHAKWQVTS GVAYKYHREF
IDNIPVLELF EGQ----QLM LEAVARLGTG REHAKFQPVS VCVYKIIPEI

LILDVETKNS ISPRDALASA GKTLVELFGL ARELNVEAEG IEIGPSPAEA


LILDVETKSS ITPRDALASA GKTLVELFGL ARELNVEAEG IEIGPSPAEA
LVIELETNGA LEPEEAIRRA ATILAEQLDA FVDLRDVRQP EIKEEKPEFX
HVSKKDFEDW QKLKGACPKS VMSETDTEIV FTDDFGCNDL NVLFES----
EVNKKLFEDW AKIKERCPKS VLSEDENTIV FTDDYGCNDL SILFES----
VVNEN-CNGC GDCIEACPRN VFEKDGDKVR VKNVMACSMC GECVEVCEMN

DHIASFALPI DDLDLTVRSY NCLKREGVHT VGELVARTES DLLDIRNFGQ


DHIASFALPI DDLDLTVRSY NCLKREGVHT VGELVSRTES DLLDIRNFGQ
---PILLRPV DDLELTVRSA NCLKAETIHY IGDLVQRTEV ELLKTPNLGK
-------DGV NMIEDDSRFI FQFETDGSLT AKETLLYALN RLKDRWDILV
-------DGV QIKEDDSRFI FHFETDGSLT AEETLSYALN RLMDRWGILV
---------A ISVNETNNFL FTVEGTGALP VREVMKKALE ILRSKAEEMN
4

KSIDEVKIKL HQLGLSLKDS PPSFDPSEVA GYDVATGTWS TEGAYDEQDY


KSIDEVKVKL HQLGLSLKDS PDSFDPSEVA GYDVTTGTWS TDGAYDSQDY
KSLTEIKDVL ASRGLSLGMR LENWPPASIA ED-------- ----------
ESLSE----- ---------- ---------- ---------- ----------
ESLSE----- ---------- ---------- ---------- ----------
KIIEEIQ--- ---------- ---------- ---------- ----------

AETEQL
AETEQL
------
------
------
------

Table I: SCORE TABLE of Multiple alignment, using full RNA alpha


suuint/40kD sequances

1b) Calculate a distance matrix using BLOSUM62 and Draw NJ tree


Figure I: ClustalW2 integrated Jalview of NJ-TREE showing BLOSUM62 distance
5

1c) ALIGNMENT WITH DELETED SEQUENCES

Phylip output format of Multiple alaignment using ClustalW2 after


the delition of half the sequences.
6 214
acidophilu ---------- -MDVKIFKLS DKYIRFEIDG ITP----SQA NALRRTLIND
volcanium ---------- -MEVKIFKLS DKYMSFEIDG ITP----SQA NALRRTLIND
falgidus ---------M MPEIEILEEK DFKIKFILKN ASP----ALA NSFRRAMKAE
tuberculos -----MLISQ RPTLSEDVLT DNRSQFVIEP LEPGFGYTLG NSLRRTLLSS
leprae -----MLISQ RPTLSEDILT DNRSQFVIEP LEPGFGYTLG NSLRRTLLSS
influenzae MQGSVTEFLK PRLVDIEQIS STHAKVILEP LERGFGHTLG NALRRILLSS

IPKLAIENVT FHHGEIRDSE GNVYDSSLPL FDEMVAHRLG LIPLKTDLT-


IPKLAIENVT FHHGEIRDAE GNVYDSSLPL FDEMVAHRLG LIPLKTDLS-
VPAMAVDYV- ---------- -DIYLNSSYF YDEVIAHRLA MLPIKTYLDR
IPGAAVTSIR IDGVLHEFTT VPGVKEDVTE IILNLKSLVV SSEEDEPVTM
IPGAAVTSIR IDGVLHEFTT VPGVKEDVTE IILNLKGLVV SSEEDEPVTM
MPGCAVTEVE IDGVLHEYSS KEGVQEDILE VLLNLKGLAV KVQNKDDVIL

MNFRDQCSCG GKGCSLCTVT YSINKLGPST VFSSDLQAVS HPDLIPVDGE


LNFRDQCSCG GKGCSLCTVT YSINKIGPAS VMSGDIQAIS HPDLVPVDPD
FNMQSECSCG GEGCPNCQIS FRLNVEGPKV VYSGDFISD- DPDVVFAIDN
YLRKQGPGEV TAGDIVPPAG VTVHNPGMHI ATLNDK-GKL EVELVVERGR
YLRKQGPGEV TAGDIVPPAG VTLHNPGMRI ATLNDK-GKI EAELVVERGR
TLNKSGIGPV VAADITYDGD VEIVNPDHVI CHLTDENASI SMRIRVQRGR

IPIVKLGPKQ ---------- ---------- ---------- ----------


IPIVKLGAK- ---------- ---------- ---------- ----------
IPVLELFEGQ QLMLEAVARL GT-------- ---------- ----------
GYVPAVQNRA SG--AEIGRI PVDSIYSPVL KVTYKVDATR VEQRTDFDKL
GYVPAVQNRA LG--AEIGRI PVDSIYSPVL KVTYKVDATR VEQRTDFDKL
GYVPASSRTH TQEERPIGRL LVDACYSPVE RIAYNVEAAR VEQRTDLDKL

---------- ----
---------- ----
---------- ----
ILDVETKNSI SPRD
ILDVETKSSI TP--
VIELETN--- ----
TableII: Score table of multiple alignment in ClustalW2 using half sequence of

RNA alphja subunit/40kD


6

Figure II:Jalview NJ TREE and BLOSUM distance of the RNA alpha subunit /40kD half sequences

Multiple alaignment by exchanging some part of sequences between falgidus


and influenzae, acidophilum and volcanium, and tuberculosis and leprae
TableIII: Score table of clustalW2 multiple alignment of mixed sequences

CLUSTALW2 phylip output format of MULTIPLE ALIGNMENT of mixed


sequences.
5 751
sp|O28002| ---------- ---------- ---------- ---------- ----------
sp|P66701| ---------- ---------- ---------- ---------- ----------
sp|Q9X798| MLISQRPTLS EDILTDNRSQ FVIEPLEPGF GYTLGNSLRR TLLSSIPGAA
sp|Q9HJD9| ---------- ---------- ---------- ---------- ----------
sp|Q97B93| ---------- ---------- ---------- ---------- ----------

---------- ---------- ---------- ---------- ----------


---------- ---------- ---------- ---------- ----------
7

VTSIRIDGVL HEFTTVPGVK EDVTEIILNL KGLVVSSEED EPVTMYLRKQ


---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------

---------- ---------- ---------- ---------- ----------


---------- ---------- ---------- ---------- ----------
GPGEVTAGDI VPPAGVTLHN PGMLISQRPT LSEDVLTDNR SQFVIEPLEP
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------

---------- ---------- ---------- ---------- ----------


---------- ---------- ---------- ---------- ----------
GFGYTLGNSL RRTLLSSIPG AAVTSIRIDG VLHEFTTVPG VKEDVTEIIL
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------

---------- ---------- ---------- ---------- ---MMPEIEI


---------- ---------- ---------- ---------- ----MRIATL
NLKSLVVSSE EDEPVTMYLR KQGPGEVTAG DIVPPAGVTV HNPGMHIATL
---------- ---------- ---------- ---------- -----MDVKI
---------- ---------- ---------- ---------M NFRDQCSCGG

LEEKDFKIKF ILKNASPALA NSFRRAMKAE VPAMAVD--- ----------


NDKGKIEAEL VVERGRGYVP AVQNRALGAE IGRIPVDSIY SPVLKVTYKV
NDKGKLEVEL VVERGRGYVP AVQNRASGAE IGRIPVDSIY SPVLKVTYKV
FKLSDKYIRF EIDGITPSQA NALRRTLIND IPKLAIEN-- ----------
KGCSLCTVTY SINKLGPSTV FSSDLQAVSH PDLIPVDG-- ----------

---------Y VDIYLNSSYF YDEVIAHRLA MLPIKTYLDR FN--------


DATRVEQRTD FDKLILDVET KSSITPRDAL ASAGKTLVEL FG--------
DATRVEQRTD FDKLILDVET KNSISPRDAL ASAGKTLVEL FGSPPRPOAH
---------- VTFHHGEIRD SEGNVYDSSL PLFDEMVAHR LG--------
---------E IPIVKLGPKQ AILITAEAIL GTAKEHAKWQ VTSG------

--------MQ SECSCGGR-- ---GYVPASS RTHTQEERPI G---------


--------LA RELNVEAEGI -EIGPSPAEA DHIASFALPI DDLDLTVRSY
AEINDNA-DI RECTEDRNAP OLYMERASES UBUNITALPH AOSHAEMOPH
---------L IPLKTDLT-- ---MEVKIFK LSDKYMSFEI D---------
------VSYK YHREFHVSKK DFEDWQKLKG ACPKSVMSET DT--------

---------- ---------- ---------- ---------- ----------


NC-------- ---------- ---------- ---------- ----------
ILUSINFLUE NZAEGNRPOA PESVMQGSVT EFLKPRLVDI EQISSTHAKV
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------

------RLLV DACYSPVERI AYNVEAARVE QRTDLDKLVI ELETN-----


----LKREGV HTVGELVSRT ESDLLDIRNF GQKSIDEVKV KLHQLGLSLK
ILEPLERGFG HTLGNALRRI LLSSMPGCAV TEVEIDGVLH EYSSKEGVQE
-------GIT PSQANALRRT LINDIPKLAI ENVTFHHGEI RDAEG-----
-----EIVFT DDFGCNDLNV LFESDGVNMI EDDSRFIFQF ETDGSLTAKE

---GALEPEE -----AIRRA ATILAEQLDA FVDLRDVRQP E--------I


DSPDSFDPSE -----VAGYD VTTGTWSTDG AYDSQDYAET EQLLARELNV
DILEVLLNLK GLAVKVQNKD DVILTLNKSG IGPVVAADIT YDGDVEIVNP
---NVYDSSL PLFDEMVAHR LGLIPLKTDL SLNFRDQCSC G---------
TLLYALNRLK DRWDILVESL SEITAEAILG TAKEHAKWQV TSG-------

KEEKPEFXP- --------IL LRPVDDLELT VRSANCLKAE TIHYIGDLVQ


EAEGIEIGPS PAEADHIASF ALPIDDLDLT VRSYNCLKRE GVHTVGELVA
DHVICHLTDE NASISMRIRV QRGEGCPNCQ ISFRLNVEGP KVVYSGDFIS
8

---------- ---------- --GKGCSLCT VTYSINKIGP ASVMSGDIQA


---------- ---VAYKYHR EFEVNKKLFE DWAKIKERCP KSVLSEDENT

RTEVELLKTP NLGKKSLTEI KDVLASRGLS LGM--RLENW PPASIAED--


RTESDLLDIR NFGQKSIDEV KIKLHQLGLS LKD--SPPSF DPSEVAGYDV
DDPDVVFAID NIPVLELFEG QQLMLEAVAR LGTGREHAKF QPVSVCVYKI
ISHPDLVPVD PDIPIVKLGA KQAIL----- ---------- ----------
IVFTDDYGCN DLSILFESDG VQIKEDDSRF IFHFETDGSL TAEETLSYAL

---------- ---------- ---------- ---------- ----------


ATGTWSTEGA YDEQDYAETE QL-------- ---------- ----------
IPEIVVNENC NGCGDCIEAC PRNVFEKDGD KVRVKNVMAC SMCGECVEVC
---------- ---------- ---------- ---------- ----------
NRLMDRWGIL VESLSE---- ---------- ---------- ----------

---------- ---------- ---------- ---------- ----------


---------- ---------- ---------- ---------- ----------
EMNAISVNET NNFLFTVEGT GALPVREVMK KALEILRSKA EEMNKIIEEI
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------

-
-
Q
-
-
Figure III: mixed sequences NJ-Tree showing BLOSUM62 CALCULATED DISTANCE IN Jalview

When table I and table II above were considered, where the results of multiple alignment scores of both full
sequence and half sequence were presented; we can see that both M. tuberculosis and M. leprae hold the same
highest score of 95 in both tables, while when we consider othe microbes we will see that though the position
remain the same in both tables, but there is slight differences in the alignment score, this clearly show that uniform
deletion of a related sequences gives almost related score, but however, when the deletion is done in random or
sequences are mixed between that of one species to another as in table III above, it show clear reduction in score
points as well as lineage relationship.
9

When we considered the NJ- tree constructed from both sequence alignments i.e. figures I, II and III we will
notice that, uniform deletion of half of the sequeances does not cause destruction in lineage relationship in the NJ
tree when figure I and II are compared respectively, but however the BLOSUM62 calculated distance do varies
considerably, which result in repositioning of the group lineage from top position in figure I to down position in
figure II. This is because the NJ tree in figure I shows that A. falgidus is more likely related to Thermoplasma
strains than the Mycobacterium strains or H. influenzae, and this is true when we consider the theoretical
relationship shared among the three i.e A. falgidus, T. acidophilum and T. volcanium, which are both
thermophyles bacteria and resembles the archae groups. Likewise, Mycobacterial strains are more related H.
influenzae which are both human parasite a traits that may possibly evolve along their evolutionary course. When
we noticed the figures carefully in I and II we will see that there is repositioning of the lineage group as a result of
half sequence deletion, this causes the BLOSUM62 distance values to varies and consequently change the position
of the lineage group in the NJ tree, example in figure I falgidus is related to other Thermoplasma by BLOSUM62
distance of 763.00 i.e when the full retrieved sequence was used, but as we deleted half the sequence in figure II
BLOSUM62 distance change to 422.94 though related, but the distance is more closer here than in figure I, and this
lower in BLOSUM62 value change the lineage group in the NJ tree from top position in figure I to down position
in figure II. One we further consider figure III where the rerieved sequences were mixed among related species
but not in order, we will be supprise to see that both the NJ tree and BLOSUM62 distance calculation, totally
differ from those in figure I and II, we can see that in figure III M. tuberculosis is now related to lineage of the
Archaeoglobus and Thermoplasma, while M. leprae a more related species to M. tuberculosis is placed as an out group,
hence, random changes in sequences can easily change the derived phylogenetic information leading to wrong
conclusion.

In conclution,

 we see that uniform sequence deletion (a deletion which is in order, like deleting last two or three lines of
each sequence) among related phylogeny may not cause change in the NJ tree lineage grouping, but the
alignment score and BLOSUM62 distance calculation values may differ which may result in different in
lineage group positioning in the NJ tree as depicted in figure I and II.
 Random deletion or random mixing of the sequences totally gives another information which may lead to
wrong conclusion as seen in figure III
 NJ tree is quite useful in depicting the phylogenetic relationship, because in the tree I was able to see that
as I had expected the phylogentic relationship to be between the Thermoplasma and Archaeoglobus and
between Mycobacterial and H. influenzae was successfully depicted.

2.
Sulfolobus solfataricus gene for 16S ribosomal RNA
>gi|47609|emb|X03235.1| Sulfolobus solfataricus gene for 16S ribosomal RNA
ATTCCGGTTGATCCTGCCGGACCCGACCGCTATCGGGGTAGGGATAAGCCATGGGAGTCTTACACTCCCG
GGTAAGGGAGTGTGGCGGACGGCTGAGTAACACGTGGCTAACCTACCCTCGGGACGGGGATAACCCCGGG
AAACTGGGGATAATCCCCGATAGGGAAGGAGTCCTGGAATGGTTCCTTCCCTAAAGGGCTATAGGCTATT
TCCCGTTTGTAGCCGCCCGAGGATGGGGCTACGGCCCATCAGGCTGTCGGTGGGGTAAAGGCCCACCGAA
CCTATAACGGGTAGGGGCCGTGGAAGCGGGAGCCTCCAGTTGGGCACTGAGACAAGGGCCCAGGCCCTAC
GGGGCGCACCAGGCGCGAAACGTCCCCAATGCGCGAAAGCGTGAGGGCGCTACCCCGAGTGCCTCCGCAA
10

GGAGGCTTTTCCCCGCTCTAAAAAGGCGGGGGAATAAGCGGGGGGCAAGTCTGGTGTCAGCCGCCGCGGT
AATACCAGCTCCGCGAGTGGTCGGGGTGATTACTGGGCCTAAAGCGCCTGTAGCCGGCCCACCAAGTCGC
CCCTTAAAGTCCCCGGCTCAACCGGGGAACTGGGGGCGATACTGGTGGGCTAGGGGGCGGGAGAGGCGGG
GGGTACTCCCGGAGTAGGGGCGAAATCCTTAGATACCGGGAGGACCACCAGTGGCGGAAGCGCCCCGCTA
GAACGCGCCCGACGGTGAGAGGCGAAAGCCGGGGCAGCAAACGGGATTAGATACCCCGGTAGTCCCGGCT
GTAAACGATGCGGGCTAGGTGTCGAGTAGGCTTAGAGCCTACTCGGTGCCGCAGGGAAGCCGTTAAGCCC
GCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTTAAAGGAATTGGCGGGGGAGCACCACAAGGGGTGGA
ACCTGCGGCTCAATTGGAGTCAACGCCTGGAATCTTACCGGGGGAGACCGCAGTATGACGGCCAGGCTAA
CGACCTTGCCTGACTCGCGGAGAGGAGGTGCATGGCCGTCGCCAGCTCGTGTTGTGAAATGTCCGGTTAA
GTCCGGCAACGAGCGAGACCCCCACCCCTAGTTGGTATTCTGGACTCCGGTCCAGAACCACACTAGGGGG
ACTGCCGGCGTAAGCCGGAGGAAGGAGGGGGCCACGGCAGGTCAGCATGCCCCGAAACTCCCGGGCCGCA
CGCGGGTTACAATGGCAGGGACAACGGGATGCTACCTCGAAAGGGGGAGCCAATCCTTAAACCCTGCCGC
AGTTGGGATCGAGGGCTGAAACCCGCCCTCGTGAACGAGGAATCCCTAGTAACCGCGGGTCAACAACCCG
CGGTGAATACGTCCCTGCTCCTTGCACACACCGCCCGTCGCTCCACCCGAGCGCGAAAGGGGTGAGGTCC
CTTGCGATAAGTGGGGGATCGAACTCCTTTCCCGCGAGGGGGGAGAAGTCGTAACAAGGTAGCCGTAGGG
GAACCTGCGGCTGGATCACCTCATATATTTACTCCCCCGCTAATTGGGTGGGAGGGCTTCACTAAAACTC
GTAATCTTCCCTTTTATAGATGCAGTTCTCCTCTTGGGCCAGAGGGGAATGAAGTGCCTAGGGCCCATTT
GGCAGAGACATACAAATATGTCTCTGCCAAGTTAGGGCTCAATGAGGCTAGTACTAGGTAGCCACATTAT
AGCCGTCTAGGAGTTCTACCCAGGGGCCGAAGCCTCCCGGTGGATGGCT

Thermococcus marinus 16S ribosomal RNA gene, partial sequence


>gi|19110964|gb|AF479012.1| Thermococcus marinus 16S ribosomal RNA gene, partial sequence
ATTCCGGTTGATCCTGCCGGAGGCGCACTGCTATGGGGGTCCGACTAAGCCATGCGAGTCATGGGGCGCG
CTCTGCGCGCACCGGCGGACGGCTCAGTAACACGTCGGTAACCTACCCTCGGGAGGGGGATAACCCCGGG
AAACTGGGGCTAATCCCCCATAGGCCTGGGGTACTGGAAGGTCCCCAGGCCGAAAGGGGCTCTGCCCGCC
CGAGGATGGGCCGGCGGCCGATTAGGTAGTTGGTGGGGTAACGGCCCACCAAGCCGAAGATCGGTACGGG
CCATGAGAGTGGGAGCCCGGAGATGGACACTGAGACACGGGTCCAGGCCCTACGGGGCGCAGCAGGCGCG
AAACCTCCGCAATGCGGGCAACCGCGACGGGGGGACCCCCAGTGCCGTGGCTCAGGCCACGGCTTTTCCG
GAGTGTAAAAAGCTCCGGGAATAAGGGCTGGGCAAGGCCGGTGGCAGCCGCCGCGGTAATACCGGCGGCC
CAAGTGGTGGCCGCTATTATTGGGCCTAAAGCGTCCGTAGCCGGGCCCGTAAGTCCCTGGCGAAATCCCA
CGGCTCAACCGTGGGGCTTGCTGGGGATACTGCGGGCCTTGGGACCGGGAGAGGCCGGGGGTACCCCTGG
GGTAGGGGTGAAATCCTATAATCCCGGGGGGACCGCCAGTGGCGAAGGCGCCCGGCTGGAACGGGTCCGA
CGGTGAGGGACGAAGGCCAGGGGAGCGAACCGGATTAGATACCCGGGTAGTCCTGGCTGTAAAGGATGCG
GGCTAGGTGTCGGGCGAGCTTCGAGCTCGGCCCGGTGCCGGAGGGCAAGCCGTTAAGCCCGCCGCCTGGG
GAGTACGGCCGCAAGGCTGAAACTTAAAGGAATTGGCGGGGGAGCACTACAAGGGGTGGAGCGTGCGGTT
TAATTGGATTCAACGCCGGGAACCTCACCGGGGGCGACGGCAGGATGAAGGCCAGGCTGAAGGTCTTGCC
GGACACGCCGAGAGGAGGTGCATGGCCGCCGTCAGCTCGTACCGTGAGGCGTCCACTTAAGTGTGGTAAC
GAGCGAGACCCGCGCCCCCAGTTGCCAAGTCCTCCCCGCTGGGGAGGAGGCACTCTGGGGGGACCACCGG
CGATAAGCCGGAGGAAGGAGCGGGCGACGGTAGGTCAGTATGCCCCGAAACCCCCGGGCTACACGCGCGC
TACAATGGGCGGGACAATGGGATCCGACCCCGAAAGGGGAAGGGAATCCCCTAAACCCGCCCTCAGTTCG
GATCGCGGGCTGCAACTCGCCCGCGTGAAGCTGGAATCCCTAGTACCCGCGTGTCATCATCGCGCGGCGA
ATACGTCCCTGCTCCTTGCACACACCGCCCGTCACTCCACCCGAGCGGGGTCTGGGTGAGGCCTGGTCTC
CCTTCGGGGAGGCCGGGTCGTTCTGGGCTCCGTGAGGGGGAGAATCGTACAAGGTAGCGTAGGGAACTAC
GTCSGAATCACTCTATCGCGGGA

D.radiodurans 16S rRNA gene


>gi|2108294|emb|Y11332.1| D.radiodurans 16S rRNA gene
AGGGTGAACGCTGGCGGCGTGCTTAAGACATGCAAGTCGAACGCGGTCTTCGGACCGAGTGGCGCACGGG
TGAGTAACACGTAACTGACCTACCCAGAAGTCACGAATAACTGGCCGAAAGGTCCGCTAATACGTGATGT
GGTGATGCACCGTGGTGCATCACTAAAGATTTATCGCTTCTGGATGGGGTTGCGTTCCATCAGCTGGTTG
GTGGGGTAAAGGCCTACCAAGGCGACGACGGATAGCCGGCCTGAGAGGGTGGCCGGCCACAGGGGCACTG
AGACACGGGTCCCACTCCTACGGGAGGCAGCAGTTAGGAATCTTCCACAATGGGCGCAAGCCTGATGGAG
CGACGCCGCGTGAGGGATGAAGGTTTTCGGATCGTAAACCTCTGAATCTGGGACGAAAGAGCCTTAGGGC
AGATGACGGTACCAGAGTAATAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAA
GCGTTACCCGGAATCACTGGGCGTAAAGGGCGTGTACGCGGAAATTTAAGTCTGGTTTTAAAGACCGGGG
CTCAACCTCGGGGATGGACTGGATACTGGATTTCTTGACCTCTGGAGAGGTAACTGGAATTCCTGGTGTA
GCGGTGGAATGCGTAGATACCAGGAGGAACACCAATGGCGAAGGCAAGTTACTGGACAGAAGGTGACGCT
11

GAGGCGCGAAAGTGTGGGGAGCAAACCGGATTAGATACCCGGGTAGTCCACACCCTAAACGATGTACGTT
GGCTAAGCGCAGGATGCTGTGCTTGGCGAAGCTAACGCGATAAACGTACCGCCTGGGAAGTACGGCCGCA
AGGTTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAAC
GCGAAGAACCTTACCAGGTCTTGACATGCTAGGAACTTTGCAGAGATGCAGAGGTGCCCTTCGGGGAACC
TAGACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCG
CAACCCTTGCCTTTAGTTGTCAGCATTCAGTTGGACACTCTAGAGGGACTGCCTATGAAAGTAGGAGGAA
GGCGGGGATGACGTCTAGTCAGCATGGTCCTTACGTCCTGGGCGACACACGTGCTACAATGGGTAGGACA
ACGCGCAGCAAACCCGCGAGGGTAAGCGAATCGCTAAAACCTATCCCCAGTTCAGATCGGAGTCTGCAAC
TCGACTCCGTGAAGTTGGAATCGCTAGTAATCGCGGGTCAGCATACCGCGGTGAATACGTTCCCGGGCCT
TGTACACACCGCCCGTCACACCATGGGAGTAGATTGCAGTTGAAACCGCCGGGAGCTTTGCGGCAGGCGT
CTAGACTGTGGTTTATGACTGGGGTGAAGTCGTAACAAGGTAACTGTACCGGAAGGTGCGGCTGGA

T.aquaticus (1) 16S ribosomal RNA, part


>gi|48084|emb|X58340.1| T.aquaticus (1) 16S ribosomal RNA, part
NTGGTGGAGAGTTTGATCCTGGCTCAGGGTGAACGCTGGCGGCGTGCCTAAGACATGCAAGNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNGCTAATCCCCCATGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTNGCNNGCTTCCGGAT
GGGCCCGCGTCCCATCAGCTAGTTGGTGGGGTAAAGGCCCACCAAGGCNACGACGGNNNGCCNGTCTGAG
AGGATGGCCGGCCACAGGGGCACTGAGACACGGGCCCCACTCCTACGGGAGGCAGCAGTTAGGAATCTTC
CGCAATGGGCGCAAGCCTGACGGAGCGACGCTNCNTGGAGGAGGAANNCCTTCGGGGTGTAAACTCCTGA
ACCCGGGACGAAACCCCCGATTAGGGGACTGACGGTACCGGGGTAATAGCGCCGGCCAACTCCGTGCCAG
CAGCCNCNGTAATACGGAGNGCNCNANCGTTACCCGGATTTACTGGGCGTAAANNNCGCGCAGGCGGCTT
GGGNNGTCCCATGTGAAATNNCNCGGCTCAACCGGGGAGAGGCNNGGGATACGCTCAGGCTAGACGGTGG
NNGAGGNNNGNNGAATTCCCGGAGTAGCGGTGAAATGCGCAGATACCGGGAGGAACGCCGATGGCGAAGG
CAGCCACCTGGTCCACTCGTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACCGGATTAGATACCCGGGT
AGTCCACGCCCTAAACGATGCGCGCTAGGTCTCTGGGTTATCTGNGGGGCCGAAGCTAACGGGTTAAGCG
CGCCGCCTGGGGAGTACGCCNNCNNNNNNGAAACTCAAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNGTTTAATTCGNNNNNACGCGAAGAACCTTACCAGGCCTTGACNTGCTAGGGAACCTGGGTGAA
AGCCTGGGNTGCCCGCGAGGGGAGCCCTAGCACTGGTGCTGCATGGCCGTCGTCAGCTCGTGTCGTGAGA
TGTTGGGTTAAGTCCCGCAACGAGCGCAACCCCTGCCGTTAGTTGCCAGCGGGTGAAGCCGGGCACTCTA
ACGNNACTGCCTGCGAAAGCAGGAGGAAGGCGGGGACGACGTCTGGTCATCATNGCCCTTACGGCCTGGT
CGACACACGTGCTACAATGCCCACTACAGAGCGAGTCGACCTGGCAACAGGGAGCGAATCGCAAAAAGGT
GGGNGTAGTTCGGATTGGGGTCTGCAACCCGACCCCATGAAGCCGGAATCGCTAGTAATCGCGGATCAGC
CATGCCGCGGTGAATACGTNCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGTAACAAGNN
NNNNNNNNNNNNNNNNNNNNNNNGATCACCTCCTTTCT

Bacillus subtilis 16S rRNA gene, strain DSM10


>gi|8980302|emb|AJ276351.1| Bacillus subtilis 16S rRNA gene, strain DSM10
CCTGGCTCAGGACGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAGCGGACAGATGGGAGCTTGCTC
CCTGATGTTAGCGGCGGACGGGTGAGTAACACGTGGGTAACCTGCCTGTAAGACTGGGATAACTCCGGGA
AACCGGGGCTAATACCGGATGGTTGTTTGAACCGCATGGTTCAAACATAAAAGGTGGCTTCGGCTACCAC
TTACAGATGGACCCGCGGCGCATTAGCTAGTTGGTGAGGTAACGGCTCACCAAGGCAACGATGCGTAGCC
GACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGG
GAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGAAGGTTTTCGGATCGTAA
AGCTCTGTTGTTAGGGAAGAACAAGTACCGTTCGAATAGGGCGGTACCTTGACGGTACCTAACCAGAAAG
CCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCG
TAAAGGGCTCGCAGGCGGTTTCTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGGGGAGGGTCATTGGA
AACTGGGGAACTTGAGTGCAGAAGAGGAGAGTGGAATTCCACGTGTAGCGGTGAAATGCGTAGAGATGTG
GAGGAACACCAGTGGCGAAGGCGACTCTCTGGTCTGTAACTGACGCTGAGGAGCGAAAGCGTGGGGAGCG
AACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGTTTCCGCCCC
TTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTCAAAGGAA
TTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGT
CTTGACATCCTCTGACAATCCTAGAGATAGGACGTCCCCTTCGGGGGCAGAGTGACAGGTGGTGCATGGT
TGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGATCTTAGTTGCC
AGCATTCAGTTGGGCACTCTAAGGTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATC
12

ATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGACAGAACAAAGGGCAGCGAAACCGCGAG
GTTAAGCCAATCCCACAAATCTGTTCTCAGTTCGGATCGCAGTCTGCAACTCGACTGCGTGAAGCTGGAA
TCGCTAGTAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACA
CCACGAGAGTTTGTAACACCCGAAGTCGGTGAGGTAACCTTTTAGGAGCCAGCCGCCGAAGGTGGGACAG
ATGATTGGGGTGAAGTCGTAACAAGGTAGCCGTATCGGAAGGTGCGG

CLUSTAL 2.0.12 multiple sequence alignment with numbers

Sulfolobus -----ATTCCGGTTGATCCTGCCGGACCCG-ACCGCTATCGGGGTAGGGATAAGCCATGG 54
Thermococcus -----ATTCCGGTTGATCCTGCCGGAGGCGCACTGCTATGGGGGTCCGACTAAGCCATGC 55
radiodurans -------------------------AGGGTGAACGCTGGCGGCGT--GCTTAAGACATGC 33
Bacillus -----------------CCTGGCTCAGGACGAACGCTGGCGGCGT--GCCTAATACATGC 41
Thermus NTGGTGGAGAGTTTGATCCTGGCTCAGGGTGAACGCTGGCGGCGT--GCCTAAGACATGC 58
* * *** ** ** * *** ****

Sulfolobus GAGTCTTACAC------------TCCCGGGTAAGGGAGTGTGGCGGACGGCTGAGTAACA 102


Thermococcus GAGTCATGGGG------------CGCGCTCTGCGCGCAC-CGGCGGACGGCTCAGTAACA 102
radiodurans AAGTCGAACG--------------CGGTCTTCGGACCGAGTGGCGCACGGGTGAGTAACA 79
Bacillus AAGTCGAGCGGACAGATGGGAGCTTGCTCCCTGATGTTAGCGGCGGACGGGTGAGTAACA 101
Thermus AAGNNNNNNN--------------NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 104
**

Sulfolobus CGTGGCTAACCTACCCTCGGGACGGGGATAACCCCGGGAAACTGGGGATAATCCCCGATA 162


Thermococcus CGTCGGTAACCTACCCTCGGGAGGGGGATAACCCCGGGAAACTGGGGCTAATCCCCCATA 162
radiodurans CGTAACTGACCTACCCAGAAGTCACGAATAACTGGCCGAAAGGTCCGCTAATACGTGATG 139
Bacillus CGTGGGTAACCTGCCTGTAAGACTGGGATAACTCCGGGAAACCGGGGCTAATACCGGATG 161
Thermus NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGCTAATCCCCCATG 164
* **** * **

Sulfolobus GGGAAGGAGTCCTGGAATGGTTCCTTCCCTAAAGGGCTATAGGCTATTTCCCGTTTGTAG 222


Thermococcus GGCCTGGGGTACTGGAAGGTCCCCAGGCCGAAAGGGG-----------------CTCTGC 205
radiodurans ----TGGTGATGCACCGTGGTGCAT-CACTAAAGAT---------------------TTA 173
Bacillus G--TTGTTTGAACCGCATGGTTCAAACATAAAAGGTGGCT-------------TCGGCTA 206
Thermus -----NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNT---------------------NGC 198

Sulfolobus CCGCCCGAGGATGGGGCTACGGCCCATCAGGCTGTCGGTGGGGTAAAGGCCCACCGAACC 282


Thermococcus CCGCCCGAGGATGGGCCGGCGGCCGATTAGGTAGTTGGTGGGGTAACGGCCCACCAAGCC 265
radiodurans TCGCTTCTGGATGGGGTTGCGTTCCATCAGCTGGTTGGTGGGGTAAAGGCCTACCAAGGC 233
Bacillus CCACTTACAGATGGACCCGCGGCGCATTAGCTAGTTGGTGAGGTAACGGCTCACCAAGGC 266
Thermus NNGCTTCCGGATGGGCCCGCGTCCCATCAGCTAGTTGGTGGGGTAAAGGCCCACCAAGGC 258
* ***** ** ** ** ** **** ***** *** *** * *

Sulfolobus TATAACGGGTAGGGGCCGTGGAAGCGGGAGCCTCCAGTTGGGCACTGAGACAAGGGCCCA 342


Thermococcus GAAGATCGGTACGGGCCATGAGAGTGGGAGCCCGGAGATGGACACTGAGACACGGGTCCA 325
radiodurans GACGACGGATAGCCGGCCTGAGAGGGTGGCCGGCCACAGGGGCACTGAGACACGGGTCCC 293
Bacillus AACGATGCGTAGCCGACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAG 326
Thermus NACGACGGNNNGCCNGTCTGAGAGGATGGCCGGCCACAGGGGCACTGAGACACGGGCCCC 318
* * ** ** * * * * ********* ** *

Sulfolobus GGCCCTACGGGGCGCACCAGGCGCGAAACGTCCCCAATGCGCGAAAGCGTGAGGGCGCTA 402


Thermococcus GGCCCTACGGGGCGCAGCAGGCGCGAAACCTCCGCAATGCGGGCAACCGCGACGGGGGGA 385
radiodurans ACTCCTACGGGAGGCAGCAGTTAGGAATCTTCCACAATGGGCGCAAGCCTGATGGAGCGA 353
Bacillus ACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAA 386
Thermus ACTCCTACGGGAGGCAGCAGTTAGGAATCTTCCGCAATGGGCGCAAGCCTGACGGAGCGA 378
******** *** *** *** * *** ***** * ** ** ** * *

Sulfolobus CCCCGAGTGCCTCCGCAAGGA----GGCTTTT----CCCCGC--------TCTAAAAAGG 446


Thermococcus CCCCCAGTGCCGTGGCTCAGGCCACGGCTTTT----CCGGAG--------TGTAAAAAGC 433
radiodurans CGCCGCGTGAGGGATGAAGGTTTTCGGATCGTAAACCTCTGA-ATCTGGGACGAAAGAGC 412
13

Bacillus CGCCGCGTGAGTGATGAAGGTTTTCGGATCGTAAAGCTCTGTTGTTAGGGAAGAACAAGT 446


Thermus CGCTNCNTGGAGGAGGAANNCCTTCGGGGTGTAAACTCCTGA----ACCCGGGACGAAAC 434
* * ** ** * * *

Sulfolobus CG----------GGGGAAT---------------------AAGCGGGGGGCAAGTCTGGT 475


Thermococcus TC----------CGGGAAT---------------------AAGGGCTGGGCAAGGCCGGT 462
radiodurans CT--------TAGGGCAGA----TGACGGTACCAGAGTA-ATAGCACCGGCTAACTCCGT 459
Bacillus ACCGTTCGAATAGGGCGGTACCTTGACGGTACCTAACCAGAAAGCCACGGCTAACTACGT 506
Thermus CCC---CGATTAGGGGACT-----GACGGTACCGGGGTA-ATAGCGCCGGCCAACTCCGT 485
** * *** * **

Sulfolobus GTCAGCCGCCGCGGTAATACCAGCTCCGCGAGTGGTCGGGGTGATTACTGGGCCTAAAGC 535


Thermococcus GGCAGCCGCCGCGGTAATACCGGCGGCCCAAGTGGTGGCCGCTATTATTGGGCCTAAAGC 522
radiodurans GCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTACCCGGAATCACTGGGCGTAAAGG 519
Bacillus GCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGG 566
Thermus GCCAGCAGCCNCNGTAATACGGAGNGCNCNANCGTTACCCGGATTTACTGGGCGTAAANN 545
* **** *** * ******* * * * * * * * ***** ****

Sulfolobus GCCTGTAGCCGGCCCACCAAGTCGCCCCTTAAAGTCCCCGGCTCAACCGGGGAACTGGG- 594


Thermococcus GTCCGTAGCCGGGCCCGTAAGTCCCTGGCGAAATCCCACGGCTCAACCGTGGGGCTTGCT 582
radiodurans GCGTGTACGCGGAAATTTAAGTCTGGTTTTAAAGACCGGGGCTCAACCTCGGGGATGGAC 579
Bacillus GCTCGCAGGCGGTTTCTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGGGGAGGGTCAT 626
Thermus NCGCGCAGGCGGCTTGGGNNGTCCCATGTGAAATNNCNCGGCTCAACCGGGGAGAGGCNN 605
* * *** *** *** * ********* **

Sulfolobus GGCGATACTGGTGGGCTAGGGGGCGGGAGAGGCGGGGGGTACTCCCGGAGTAGGGGCGAA 654


Thermococcus GGGGATACTGCGGGCCTTGGGACCGGGAGAGGCCGGGGGTACCCCTGGGGTAGGGGTGAA 642
radiodurans TG-GATACTGGATTTCTTGACCTCTGGAGAGGTAACTGGAATTCCTGGTGTAGCGGTGGA 638
Bacillus TG-GAAACTGGGGAACTTGAGTGCAGAAGAGGAGAGTGGAATTCCACGTGTAGCGGTGAA 685
Thermus GG-GATACGCTCAGGCTAGACGGTGGNNGAGGNNNGNNGAATTCCCGGAGTAGCGGTGAA 664
* ** ** ** * * **** * * ** * **** ** * *

Sulfolobus ATCCTTAGATACCGGGAGGACCACCAGTGGCGGAAGCGCCCCGCTAGAACGCGCCCGACG 714


Thermococcus ATCCTATAATCCCGGGGGGACCGCCAGTGGCGAAGGCGCCCGGCTGGAACGGGTCCGACG 702
radiodurans ATGCGTAGATACCAGGAGGAACACCAATGGCGAAGGCAAGTTACTGGACAGAAGGTGACG 698
Bacillus ATGCGTAGAGATGTGGAGGAACACCAGTGGCGAAGGCGACTCTCTGGTCTGTAACTGACG 745
Thermus ATGCGCAGATACCGGGAGGAACGCCGATGGCGAAGGCAGCCACCTGGTCCACTCGTGACG 724
** * * ** *** * ** ***** * ** ** * ****

Sulfolobus GTGAGAGGCGAAAGCCGGGGCAGCAAACGGGATTAGATACCCCGGTAGTCCCGGCTGTAA 774


Thermococcus GTGAGGGACGAAGGCCAGGGGAGCGAACCGGATTAGATACCCGGGTAGTCCTGGCTGTAA 762
radiodurans CTGAGGCGCGAAAGTGTGGGGAGCAAACCGGATTAGATACCCGGGTAGTCCACACCCTAA 758
Bacillus CTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCCGTAA 805
Thermus CTGAGGCGCGAAAGCGTGGGGAGCAAACCGGATTAGATACCCGGGTAGTCCACGCCCTAA 784
**** **** * *** *** *** ************* ******** * ***

Sulfolobus ACGATGCGGGCTAGGTGTCGAGTAGGCTTAGAGCCTA-CTCGGTGCCGCAGGG-AAGCCG 832


Thermococcus AGGATGCGGGCTAGGTGTCGGGCGAGCTTCGAGCTCGGCCCGGTGCCGGAGGGCAAGCCG 822
radiodurans ACGATGTACGTTGGCTAAGCGCAGGATGCTGTGCTT--------GGCGAAGCT-AACGCG 809
Bacillus ACGATGAGTGCTAAGTGTTAGGGGGTTTCCGCCCCTT----AGTGCTGCAGCT-AACGCA 860
Thermus ACGATGCGCGCTAGGTCTCTGGGTTATCTGNGGG----------GCCGAAGCT-AACGGG 833
* **** * * * * * ** **

Sulfolobus TTAAGCCCGCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTTAAAGGAATTGGCGGGGG 892


Thermococcus TTAAGCCCGCCGCCTGGGGAGTACGGCCGCAAGGCTGAAACTTAAAGGAATTGGCGGGGG 882
radiodurans ATAAACGTACCGCCTGGGAAGTACGGCCGCAAGGTTGAAACTCAAAGGAATTGACGGGGG 869
Bacillus TTAAGCACTCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTCAAAGGAATTGACGGGGG 920
Thermus TTAAGCGCGCCGCCTGGGGAGTACGCCNNCNNNNNNGAAACTCAAAGNNNNNNNNNNNNN 893
*** * ********* ****** * ****** ****

Sulfolobus AGCACCACAAGGGGTGGAACCTGCGGCTCAATTGGAGTCAACGCCTGGAATCTTACCGGG 952


14

Thermococcus AGCACTACAAGGGGTGGAGCGTGCGGTTTAATTGGATTCAACGCCGGGAACCTCACCGGG 942


radiodurans CCCGC-ACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGG 928
Bacillus CCCGC-ACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGG 979
Thermus NNNNN-NNNNNNNNNNNNNNNNNNNGTTTAATTCGNNNNNACGCGAAGAACCTTACCAGG 952
* * **** * **** *** ** *** **

Sulfolobus G---GAGACCGCAGT--ATGACGGCCAGGCTAACGA------CCTTGCCTGACTCGCGGA 1001


Thermococcus G---GCGACGGCAGG--ATGAAGGCCAGGCTGAAGG------TCTTGCCGGACACGCCGA 991
radiodurans TCTTGACATGCTAGG--AACTTTGCAGAGATGCAGAGGTGCCCTTCGGGGAACCTAGACA 986
Bacillus TCTTGACATCCTCTG--ACAATCCTAGAGATA-GGACGTCCCCTTCGGGGG-CAGAGTGA 1035
Thermus CCTTGACNTGCTAGGGAACCTGGGTGAAAGCCTGGGNTGCCCGCGAGGGGAGCCCTAGCA 1012
* * * * * *

Sulfolobus GAGGAGGTGCATGGCCGTCGCCAGCTCGTGTTGTGAAATGTCCGGTTAAGTCCGGCAACG 1061


Thermococcus GAGGAGGTGCATGGCCGCCGTCAGCTCGTACCGTGAGGCGTCCACTTAAGTGTGGTAACG 1051
radiodurans CAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACG 1046
Bacillus CAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACG 1095
Thermus CTGGTGCTGCATGGCCGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACG 1072
** * ******* * ** ******** **** ** ****** * ****

Sulfolobus AGCGAGACCCCCACCCCTAGTTGGTA--TTCTGGACTCCGGTCCAGAACCACACTAGGGG 1119


Thermococcus AGCGAGACCCGCGCCCCCAGTTGCCAAGTCCTCCCCGCTGGGGAGGAGGCACTCTGGGGG 1111
radiodurans AGCGCAACCCTTGCCTTTAGTTGTCA-------GCATTCAGTTGGA---CACTCTAGAGG 1096
Bacillus AGCGCAACCCTTGATCTTAGTTGCCA-------GCATTCAGTTGGG---CACTCTAAGGT 1145
Thermus AGCGCAACCCCTGCCGTTAGTTGCCAG-----CGGGTGAAGCCGGG---CACTCTAACGN 1124
**** **** ***** * * *** ** *

Sulfolobus GACTGCCGGCG-TAAGCCGGAGGAAGGAGGGGGCCACGGCAGGTCAGCATGCCCCGAAAC 1178


Thermococcus GACCACCGGCGATAAGCCGGAGGAAGGAGCGGGCGACGGTAGGTCAGTATGCCCCGAAAC 1171
radiodurans GACTGCCTATGA-AAGTAGGAGGAAGGCGGGGATGACGTCTAGTCAGCATGGTCCTTACG 1155
Bacillus GACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATG 1205
Thermus NACTGCCTGCGA-AAGCAGGAGGAAGGCGGGGACGACGTCTGGTCATCATNGCCCTTACG 1183
** ** * ** ********* * ** *** *** ** ** *

Sulfolobus TCCCGGGCCGCACGCGGGTTACAATGGCAGGGACAACGGGATGCTACCTCGAAAGGGGGA 1238


Thermococcus CCCCGGGCTACACGCGCGCTACAATGGGCGGGACAATGGGATCCGACCCCGAAAGGGGAA 1231
radiodurans TCCTGGGCGACACACGTGCTACAATGGGTAGGACAACGCGCAGCAAACCCGCGAGGGTAA 1215
Bacillus ACCTGGGCTACACACGTGCTACAATGGACAGAACAAAGGGCAGCGAAACCGCGAGGTTAA 1265
Thermus GCCTGGTCGACACACGTGCTACAATGCCCACTACAGAGCGAGTCGACCTGGCAACAGGGA 1243
** ** * *** ** * ******* *** * * * * * * *

Sulfolobus GCCAATCCT-TAAACCCTGCCGCAGTTGGGATCGAGGGCTGAAACCCGCCCTCGTGAACG 1297


Thermococcus GGGAATCCCCTAAACCCGCCCTCAGTTCGGATCGCGGGCTGCAACTCGCCCGCGTGAAGC 1291
radiodurans GCGAATCGCTAAAACCTATCCCCAGTTCAGATCGGAGTCTGCAACTCGACTCCGTGAAGT 1275
Bacillus GCCAATCCCACAAATCTGTTCTCAGTTCGGATCGCAGTCTGCAACTCGACTGCGTGAAGC 1325
Thermus GCGAATCGCAAAAAGGTGGGNGTAGTTCGGATTGGGGTCTGCAACCCGACCCCATGAAGC 1303
* **** *** **** *** * * *** *** ** * * ****

Sulfolobus AGGAATCCCTAGTAACCGCGGGTCAAC-AACCCGCGGTGAATACGTCCCTGCTCCTTGCA 1356


Thermococcus TGGAATCCCTAGTACCCGCGTGTCATC-ATCGCGCGGCGAATACGTCCCTGCTCCTTGCA 1350
radiodurans TGGAATCGCTAGTAATCGCGGGTCAGC-ATACCGCGGTGAATACGTTCCCGGGCCTTGTA 1334
Bacillus TGGAATCGCTAGTAATCGCGGATCAGC-ATGCCGCGGTGAATACGTTCCCGGGCCTTGTA 1384
Thermus CGGAATCGCTAGTAATCGCGGATCAGCCATGCCGCGGTGAATACGTNCCNNNNNNNNNNN 1363
****** ****** **** *** * * ***** ******** **

Sulfolobus CACACCGCCCGTCGCTCCACCCGAGCGCGAAAGGGGTGAGGTC--CCTTGCGATAAGTGG 1414


Thermococcus CACACCGCCCGTCACTCCACCCGAGCGGGGTCTGGGTGAGGCC--TGGTCTCCCTTCGGG 1408
radiodurans CACACCGCCCGTCACACCATGGGAGTAGATTGCAGTTGAAACCGCCGGG--AGCTTTGCG 1392
Bacillus CACACCGCCCGTCACACCACGAGAGTTTGTAACACCCGAAGTCGGTGAGGTAACCTTTTA 1444
Thermus NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN---NNNNNNNNNNNNNN 1420
15

Sulfolobus GGGATCGAACTCCT---TTC--CCGCGAGG--GGGGAGAAGTCGTAACAAGGTAGCCGTA 1467


Thermococcus GAGGCCGGGTCGTT---CTGGGCTCCGTGA--GGGGGAGAATCGTA-CAAGGTAGC-GTA 1461
radiodurans G----CAGGCGTCTAGACTGTGGTTTATGACTGGGGTGAAGTCGTAACAAGGTAACTGTA 1448
Bacillus GGAGCCAGCCGCCGAAGGTGGGACAGATGATTGGGGTGAAGTCGTAACAAGGTAGCCGTA 1504
Thermus NNNNNNNNNNNNNN---NNNNNNNNNNNNNNNNNNNNNNNNNNGTAACAAGNNNNNNNNN 1477
*** ****

Sulfolobus GGGGAACCTGCGGCTGGATCACCTCATATATTTACTCCCCCGCTAATTGGGTGGGAGGGC 1527


Thermococcus GGGAA--CTACGTCSGAATCACTCTATCGCGGGA-------------------------- 1493
radiodurans CCGGAAGGTGCGGCTGGA------------------------------------------ 1466
Bacillus TCGGAAGGTGCGG----------------------------------------------- 1517
Thermus NNNNNNNNNNNNNNNNGATCACCTCCTTTCT----------------------------- 1508

Sulfolobus TTCACTAAAACTCGTAATCTTCCCTTTTATAGATGCAGTTCTCCTCTTGGGCCAGAGGGG 1587


Thermococcus ------------------------------------------------------------
radiodurans ------------------------------------------------------------
Bacillus ------------------------------------------------------------
Thermus ------------------------------------------------------------

Sulfolobus AATGAAGTGCCTAGGGCCCATTTGGCAGAGACATACAAATATGTCTCTGCCAAGTTAGGG 1647


Thermococcus ------------------------------------------------------------
radiodurans ------------------------------------------------------------
Bacillus ------------------------------------------------------------
Thermus ------------------------------------------------------------

Sulfolobus CTCAATGAGGCTAGTACTAGGTAGCCACATTATAGCCGTCTAGGAGTTCTACCCAGGGGC 1707


Thermococcus ------------------------------------------------------------
radiodurans ------------------------------------------------------------
Bacillus ------------------------------------------------------------
Thermus ------------------------------------------------------------

Sulfolobus CGAAGCCTCCCGGTGGATGGCT 1729


Thermococcus ----------------------
radiodurans ----------------------
Bacillus ----------------------
Thermus ----------------------

Table iv: ClustalW@ multiple alignment score table of ssrRNA gene sequences
16

2b)

Figure iv: NJ –tree showing bootstraps drawn using MEGA4.1 phylogenetic analysis software

ORIGINAL NJ-TREE

67 radiodurans
Bacillus subtilis
Thermus aquaticus
Sulfolobus solfataricus
100 Thermococcus marinus

0.05

BOOTSTRAP CONSENSUS TREE (after bootstrapping, the evolutionary distances that were
computed using the Maximum Composite Likelihood method appeared to reduce between radiodurans
and B. subtillis as compared to their position in original NJ-tree above)

67 radiodurans
Bacillus subtilis
Thermus aquaticus
Sulfolobus solfataricus
100 Thermococcus marinus

0.05

Figure V: MAXIMUM PARSIMONY TREE USING MEGA4 PHYLOGENETIC SOFTWARE

ORIGINAL MP –TREE

100 Sulfolobus solfataricus


76 Thermococcus marinus
Thermus aquaticus
radiodurans
Bacillus subtilis

BOOTSTRAP CONSENSUS MP TREE

100 Sulfolobus solfataricus


76 Thermococcus marinus
Thermus aquaticus
radiodurans
Bacillus subtilis
17

(c)Discuss the biological relevance of your findings. Which of the resulting trees
appears to best reflect the actual evolution? Give a justification for your choice?

From the academic literatures, we knew that Sulfolobus solfataricus and Thermococcus
marinus are grouped as extreme thermophilic archaea sharing closely related features,
which may possibly arises from the same evolutionary lineage. Bacillus subtilis is a
common rod shape gram positive bacteria. Deinococcus is gram positive cocci and Thermus
aquaticus is a non-motile, rod-shape to filamentous microbe. Deinococcus has ornithine as
the di-amino acid in its cell wall peptidoglycan. From the alignments result, both NJ
(figue iv) and MP tree (figure v)were generated using MEGA4.1 phylogentic analysis
software. NJ is a distance-based method for tree building. It simply counts the differences
between 2 sequences. This number is referred to as the evolutionary distance. The actual
tree is then computed from the matrix of distance values by running a clustering
algorithm that starts with the most similar sequences. As for Maximum Parsimony, it
searches for a tree that requires the smallest number of changes to explain the differences
observed among the taxa under study.

Based on the findings above figure iv and v respectively, it appeared that the MP tree is
more closely resemble to the cladogram generated by clusterW. In the MP tree, Thermus
aquaticus,Thermococcus marinus and Sulfolobus solfataricus are all extreme thermophiles, are
rooted on the same branch. Bacillus subtilis, and Deinococcus radiodurans are both gram
positive bacteria, have little in common with the other three bacteria, are thus chosen as
the out group.

You might also like