You are on page 1of 33

Perspectives and Current Trends in Bioinformatics Dr.

Lalitha Guruprasad

Lessons Learnt From Comparative Genomics


Dr. Lalitha Guruprasad
Email: lgpsc@uohyd.ernet.in

Perspectives and Current Trends in Bioinformatics


CCMB, 13th September, 2007
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

What is comparative genomics?

What are the goals of comparative


genomics?
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

Why carry out comparative genomics?

• Analyse and compare genomic information from


different species
• Understand the similarities and differences
between different species
• To understand the uniqueness of a species
• To compare species across the evolution
• in silico analysis reduces the time taken to get the
results as compared to experimental methods
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

What is compared?
At the proteome level
- number and sequence
At the genome level length of proteins
- number of chromosomes - extent of similarity
- number of genes - number and nature of
- position of genes domains
- SNP positions - sequence motifs and
- EST database searches patterns
- codon usage - post translational
- splice sites modifications
- number of exons - secondary structure
- number of introns similarity
- alternate splicing - tertiary structure
- sequence similarity similarity
- functional similarity
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

What is the reference?


Depends upon the quest

- all representative genome comparison (the


evolution of organisms)
- human versus chimpanzee (speech, recognition,
movement)
- virulent strain vs avirulent strain (same organism
for disease specific genes)
- protein in the disease state vs normal protein
(mutations, structure and function loss?)
- isoforms in the same species (duplication events)
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

Of mice and men


Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

Analysis of genes in microbial genomes


Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

Comparison of M. tuberculosis complex


members
recent
deletion

ancient
deletion
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

PE, PPE protein family


analysis
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

- The pair-wise sequence identity is greater than


18%

-Conservation of 15 amino acid residues is


distributed over whole length of the domain.

-α-helices and β-strands arranged in the order H1-


E1-H2-E2-E3-E4-E5-H3-E6-E7-H4 at aligned
positions in the multiple sequence alignment.

-Therefore this domain is likely to have a compact


fold.
(accession number: PF08237), ‘PE-PPE domain’
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad
Rv2608
MT2683
Rv1430
MT1474
Rv1800
MT1849
Rv0151
MT0160 PE and PPE proteins
Rv0152
MT0162
Rv0159
MT0168
Rv0160
MT0169

Rv3539
MT3643
Rv3822
MT3930 Hypothetical proteins
ML1232
Q49633
Rv1184
MT1121
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

Propose a cell
surface antigenic
role for the PE
family proteins
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad
- The sequence of archaeal genome, M. acetivorans
C2A was reported by Galagan et al., 2002 and shown to
comprise 5,751,492 bp with 4,524 proteins.

-Cell surface proteins (CSPs) make up the surface layer


(S-layer), the outer most envelope of most of the
archaeal and eubacterial organisms.

- These are multi-domain proteins.

- Responsible for various functions.

- They are responsible for the cell shape and contribute


to virulence when present in pathogens.
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad
RIVW 42 aa repeat.
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

DNRLRE 200 aa domain

LGxL 42 aa repeat

PEGA 70 aa domain
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

LGFP 54 aa repeat
LVIVD 42 aa repeat
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad
Repeats
identified by
TRUST program

SMART entry present


SMART
database
searches

No SMART entry

No Repeats
PSI-BLAST
submission

Repeats

Offline Interpro and Pfam Yes


searches

No

Yes
Online Interpro and Pfam

No
Discarded
Novel Repeats or
Domains
Systematic analysis of Repeats
identification
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

Protein structure modeling

comparative modeling: To predict the


probable three - dimensional structure of
a protein given its amino acid sequence
and the 3-D structure co-ordinates of
another protein with which it shares
sequence homology.
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad
- Ag85A, B and C (PDB_ID;
1SVR, 1F0N, 1DQZ) structures
were used as templates for
building the models. Comprise
α/β hydrolase fold and have
catalytic triad similar to that of
serine proteases, (Ser 124, Glu
228 and His 260).
-Two trehalose substrate
binding pockets are present one
closer to the active site.
-Pairwise sequence identity Overall fold of
among mycobacteria 67-82% mycolyltransferases
and among corynebacteria is
29-44%.
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad
Shorter loop length proteins:
CGL2878, CGL2875,
CE2709, CE2710

Long loop length proteins:


CGL1031, CE1488,
CGL2181, CGL0922,
CE0984 and CGL0343,
CE0356

We observed mutations in
key trehalose substrate
binding residues in some
corynebacterium proteins.
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad
40 43 12 22 26 26 26 154 15 15 23 23 23 23 23
D* R* 6S 3N 2H 3S 4W D# 7Q 9M 1N 2F 5S 6S 9K
* * # * * # # # # * * *

1F0N
D R S N H S W D Q M N F S S K
ML2028
D R S N H S W D Q I N F G S K
1DQZ
D R S N H S W N E W G L R T T
ML2655
D R S N H S W N E W S L S T I
RV3804
D R S N H S W D Q M G F T S K
ML0097
D R S N H S W D N M G L T S K
CGL2875
D R S N H S W D S G V I M T T
CE2709
D R S N H A W D S G L I M T T
CGL2878
D R S R H G W N A G F V T S I
CE2710
D R S R H S W T A G A V A T A
CGL1031
D G S L H N W S L P R G S C A
CE1488
D G S L H N W S L P R G S C A
CGL0922
G D S V H A W E N W A M T C N
CE0984
G D S I H A W E N W A L T C N
CGL2181
G D S V H A W E N W A M T C N
CGL0343
N D S V H S W A S L A A K C D
CE0356
N D S I H S W S S P A A K C D
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

From the binding pocket specificities and loop


length, - it is clear that
-CGL2875, CGL2878, CE2710, CE2710 exhibit
high affinity for trehalose.
-CGL1031, CE1488, CGL0922,
CE0984,CGL2181 – low affinity for trehalose
-CGL0343, CE0356 no affinity for the
trehalose.
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

1F0P Cgl1031 Cgl0343


Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

Validation of hypothesis

Our predictions Experimental


reports

Short loop CGL2875 deletion mutant


- binds TDCM - low TDCM
concentration

Long loop CGL0343 deletion mutant


- does not bind TDCM - no effect on TDCM
concentration
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

- Varying number of isozymes in each genera, what


is the evolutionary relationship?
4 in M. tuberculosis and C. diphtheria each, 6 in C.
glutamicum, 5 in C. efficiens and 13 in N. farcinica.

- Genomic context analysis of Ag85 proteins from


related species of Mycobacteria, Corynebacteria
and Nocardia.
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

Genomic context analysis

Gene fusion Gene neighborhood Co-occurrence of genes


(operons) (Similar phyletic patterns)

Taken from Bork et al., 2004 Nat. Biotech. 22 : 911-917


Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad
The role of conserved noncoding sequences (CNSs) in DNA
The comparison of long genomic DNA sequences between two
species reveals CNSs. For example, a million base pair region of
human chromosomal region 5q31 consists 5 interlukins and 18 other
proteins. The alignment produced with orthologous mouse genome
revealed 90CNSs, the longest CNS-1 is between IL-4 and IL-13.
Transgenic mouse with CNS-1 deletion reduced the production of IL-
4, IL-13 and IL-5 (120 kb away) in type 2 helper cells (TH2).
RAD50, a gene between IL-13 and IL-5 was unaffected. I.e. the
effect of CNS-1 is over a long distance, in a highly selective manner
only regulating the expression of cytokines in specific cells.
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad
Perspectives and Current Trends in Bioinformatics Dr. Lalitha Guruprasad

Thank you

You might also like