Professional Documents
Culture Documents
org/cgi/content/full/319/5868/1387/DC1
DOI: 10.1126/science.1152692
Other Supporting Online Material for this manuscript includes the following:
(available at www.sciencemag.org/cgi/content/full/319/5868/1387/DC1)
Table of contents
V. References
1
I. Kinetic parameters for designed retro-aldolases
Table S1. Rate enhancements for active designs
2
Table S2. Enaminone formation rates with 2,4-pentanedione
Design Additional
Name Mutations T1/2 (h)
Jelly rolls
RA61 1.5
RA60 1
RA59 3
RA28 8
TIM barrels
RA17 10
RA58 2.5
RA31 3
RA32 6
RA33 6
RA22 10
S210A 8
RA34 2
Y51V 1.5
RA35 3
RA36 12
RA47 none
RA39 1
RA41 1
RA63 2
RA6 6
T159V 12
RA42 none
RA44 6
RA45 2
E233T none
RA46 1
RA48 6
RA55 none
RA56 none
RA57 none
RA49 0.5
RA26 none
RA40 none
RA43 3
RA53 none
RA68 6
RA5* 4
RA54* 0.5
none = no 316nm absorbance increase due to enaminone formation was detected
* denotes designs which had no detectable aldolase activity
3
II. Computational enzyme design methodology for multi-step
reactions
Figure 1 in the main text provides an overview of the new computational enzyme design
methodology for multi-step reactions. In the following sections, we describe the key
Given a schematic representation of a particular catalytic motif as in Figure 2C, the first
site. The chance of being able to perfectly recreate any single three dimensional
arrangement of transition state and associated protein functional groups within a given
scaffold set and subsequently achieve high affinity interactions with surrounding residues
in the protein is small; therefore, it is desirable to generate as large and diverse a set of
three dimensional realizations as possible for a given catalytic motif. We accomplish this
by fine sampling along each of the degrees of freedom of the transition state-functional
group complex. For a multi-step reaction such as the retro-aldol considered in this
paper, it is necessary to ensure that the designed active site not only strongly stabilizes
the highest energy transition state, but also interacts favorably with the reaction
intermediates and other transition states along the reaction pathway (if it destabilizes any
of these to the point that they become higher in energy than the original highest energy
transition state, catalysis will be impaired). We thus must construct diversified structural
ensembles representative of the key intermediates and transition states in the reaction.
4
The active site ensembles for catalytic motif I were created using quantum chemistry
methods while the ensembles for catalytic motifs II-IV were created using a hybrid
For catalytic site motif I, the geometry of the TS and the catalytic functional groups (two
lysines and an aspartate) was fully optimized at the B3LYP/6-31G(d) level in the gas
phase using Gaussian 03(1), as described previously for an aldol reaction(2). The 43
lowest energy carbon-carbon bond-breaking transition states, shown in Figure S1A, were
used in the design calculations. For each of these 43 active sites, the other enantiomer of
the TS was created by reflecting across the plane of the acetone moiety (enamine) and the
Ensembles of transition state models were generated for motifs II, III, and IV by
sampling around the lowest energy QM optimized structures. The C-C bond adjacent to
the breaking C-C bond is the most constrained. As the retro-aldol process proceeds the
breaking C-C σ bond is being transformed into a C=C π bond in the acetone moiety. The
torsion around the forming C=C bond is constrained to be close to ± 90º (between the
breaking C-C bond and the C-N bond of the imine, see Figure S1B). With the two
enantiomers at the chiral center and the two possibilities for the χ1 dihedral, four
structures were generated from the lowest energy C-C bond-breaking TS found in the
These four base states were diversified by sampling the degrees of freedom of the TS as
indicated in Figure S1B to generate an ensemble of 1296 models. For each degree of
5
freedom independently, the deviation from the optimal structure producing a 0.5-1.0 kcal
increase in energy after allowing the remaining degrees of freedom to relax was
which each degree of freedom took on the ideal value or the ideal value ± the deviation
identified as above. This procedure is not optimal in that it neglects energetic couplings
The retro-aldolase reaction pathway shown in Figure 2B has multiple intermediates and
transition states. A superposition of models based on the imine for all of these states
this set of models can be largely represented by just two structures: the carbinolamine
intermediate and the transition state for the carbon-carbon bond-breaking step. The
carbinolamine mimics four distinct steps along the reaction pathway: the initial lysine
attack and the dehydration step as well as the associated proton shuttling steps and also
the reverse reactions required for enzyme recycling shown in Figure 2B. For
by a superposition of the carbinolamine and the C-C bond-breaking step (Figure S2B). In
the superposition, the naphthyl moiety is very close in space and for simplicity we use the
naphthyl placement in the C-C bond-breaking TS as its geometry and orientation are
more restricted. We simplify further by building the carbinolamine alcohol and the shift
in the terminal methyl position onto the C-C bond-breaking transition state to create the
6
For motif IV, a water molecule is placed to coordinate the two alcohol groups on the
and basic residues) are then incoporated to hydrogen bond to and position the water
molecule; the two OH groups on the composite TS and the two hydrogen bonding
For all motifs, the conformations of the imine-forming Lys residue (Figure S3A) and the
other catalytic sidechains were diversified as described in (Figure S3B-F). For example,
motif II begins with the addition of the methyl and the alcohol to the C-C bond-breaking
TS. The composite TS is then diversified using the degrees of freedom outlined in the
Figure S1B giving 1,296 conformations. These are combined with the 2,268 rotamers
and 108 functional group placements of Lys shown in Figure S3A. For the Tyr residue
responsible for proton abstraction, the dihedral angles χ1, χ2 and χ3 are sampled at 60˚
sidechain rotamers (Figure S3C). Finally the π-stacking residues (Phe, Tyr, Trp) are
placed in 1296 distinct positions for each of 189 rotamers (Figure S3B). In total, the
diversification of the composite TS, the orientation of the TS relative to the key
sidechains, and the internal geometry of the sidechains results in a total of 1,296 x 2,268
x 108 x 216 x 54 x 1296 x 189 = 9.1 x 1017 (Table S3) possible three dimensional
The selection criteria for the scaffolds used in the designs were 1) high-resolution crystal
structure available, 2) expression in Escherichia coli (E. coli) possible, 3) stable protein,
7
4) contain a preexisting pocket, and 5) span a variety of protein folds. The list of input
protein scaffolds used in this study is provided in Table S4. For each scaffold, 3-
dimensional grids representing the structural space available for possible transition state
bound ligand in the PDB, or centered on the catalytic residues if using an apo structure,
were generated using an in-house pymol plugin; an example is shown in Figure S4. This
first grid allows very rapid pruning of TS placements falling outside the desired pocket.
A second grid representing backbone location was also generated for each structure to
allow very rapid lookup based rejection of sidechain rotamer and transition state
placements overlapping with the scaffold backbone. For each scaffold, positions within 6
Å of the first grid and with a Cα-Cβ vector pointing toward this grid were considered as
possible sites for catalytic residue placement by Rosetta Match. This list of positions was
further pruned in some cases for specific residues, for example to ensure a highly buried
catalytic lysine.
Identification of sites in scaffolds where composite active site can be accurately created
catalytic residues. For each composite active site description (the superimposed transition
states and the associated catalytic residues), candidate catalytic sites were generated in a
scaffold set (Table S4). At each position in the active site pockets on each scaffold (see
previous section), each rotamer of each catalytic sidechain was placed and the ensuing
position of the composite TS if there were no clashes with the scaffold backbone was
recorded in the six-dimensional (6D) hash. To maximize the number of solutions, large
8
sets of sidechain rotamers were generated as detailed in Figure S3A-G. Following the
filling out of the hash table, which scales linearly with the number of scaffold positions
and number of sidechain rotamers, the hash was examined for TS positions compatible
Many of the possibilities for sidechain and composite active site placement were pruned,
or trimmed from the hash. We used both general and reaction specific pruning criteria.
The former criteria for elimination include sidechain or transition state clashes with the
backbone grid, or key atoms of the transition state (for the retro-aldol: the imine C, Cαs,
Cβ, β-OH, the C6 on the naphthyl, and their associated protons) not fitting inside the grid
representing the potential binding site on a scaffold. Reaction specific criteria include
restriction to TS orientations with the imine forming lysine at the bottom of the pocket
and allowing only extended conformations of the imine forming lysine, which enable
extensive packing with other residues to lower the pKa and keep the lysine fixed in place.
This pruning reduces the search space and results in only rotamers and composite active
To benefit from the binding energy provided by a π-stacking interaction with the
treated the aromatic sidechains (Trp, Phe, Tyr) as catalytic residues in order to obtain
active sites which also incorporated a key interaction for good binding affinity (Figure
S3B). We searched for both parallel-displaced and T-shaped π-π interactions. This
addition to the composite active site matching process greatly slowed down the search
9
when conducted in parallel with the other catalytic residues due to the considerable
increase in site complexity. In order to overcome the slowdown due to this diversity, we
often did an initial RosettaMatch run with only the catalytic residues shown in Figure 2C.
This was followed by a second matching stage for the sole purpose of incorporating the
aromatic π-stacking residue into the previously matched doublets or triplets of residues
Optimization of transition state rigid body orientation and the conformations of the
catalytic sidechains
The site identification method described in the previous section produces initial active
site configurations that are often suboptimal due to slightly strained catalytic residue
geometry and/or small clashes between the composite transition state and the protein
backbone. To eliminate these imperfections and to ensure a near ideal geometry for the
subsequent full site design calculations, the rigid body degrees of freedom of the
composite transition state and the chi angles of the catalytic sidechains were subjected to
a constraint function representing the relevant catalytic constraints (see next section) and
a sidechain torsional potential (4). Attractive interactions were omitted during this
optimization as the amino acid composition of the site at this stage is still polyalanine or
polyglycine. During the catalytic site optimization, the internal torsions of the composite
transition state were also allowed to vary within specified tolerances (Figure S3) in order
to maximize the possibility of the simultaneous satisfaction of all the catalytic constraints
10
Constraint function
The constraint function consists of a flat bottom well with a width equal to the deviations
listed in Figure S3 with a harmonic penalty for violations greater than these values.
indicated in the figures, all degrees of freedom varied in the RosettaMatch search are
optimized during the minimization along with a subset of the degrees of freedom kept
fixed during RosettaMatch. The force constant for the distance, angle, and torsional
deviations are scaled so the penalties are equal for deviations of 0.15 Å, 10˚, and 10˚,
importance to catalysis, in particular the constraints on the catalytic lysine was weighted
We carried out full sequence optimization of the remaining (not included in the minimal
catalytic site description) positions surrounding the docked composite TS model. We use
the standard RosettaDesign Monte Carlo optimization using an extended version of the
Dunbrack rotamer library (4) as carried out in the design of TOP7 (5), with protein-ligand
space, and a subset of residues were redesigned (varying the amino acid identity and
sidechain conformation) and a second subset repacked (keeping the amino acid identity
but varying the sidechain conformation). The first subset includes amino acid positions
whose Cα atom are within 6.0 Å of the TS model and positions with Cα atom between
6.0 Å and 8.0 Å and with Cα-Cβ vector pointing toward the TS model. The second
subset includes positions with Cα atom within 10.0 Å of the TS model and positions,
11
whose Cα atom is between 10.0 Å and 12.0 Å and with Cα -Cβ vector points to the TS
The design calculations employ a discrete rotamer representation of the sidechains, and
after design the full energy function can be used in the refinement of the site. Hence, we
next carry out simultaneous quasi-Newton optimization of the composite TS rigid body
orientation, the sidechain torsion angles, and in some cases, the torsion angles of the
composite TS and the backbone torsion angles in the site using the complete Rosetta
energy function supplemented with the catalytic constraint function(7). This step does not
change the designed sequence, but is essential to the subsequent assessment of the
To optimize interactions (H-bonding or packing) that are still missing at the end of the
design process, we next carried out small random perturbations to the TS rigid-body
degrees of freedom (0.2 Å for translational degrees of freedom; 10˚ for rotational degrees
catalytic sidechains to ensure they are still within the geometric constraints. After
catalytic sidechain minimization, the remaining pocket is redesigned with the pre-
perturbed design as the starting point. Another round of energy-based refinement of the
active site provides slight variations on the initial design which may have optimized
properties. Usually several iterative runs of small dock perturbation, pocket design and
12
Ranking of designs
1) Designs with a transition state-protein van der Waals attractive energy > -5.0
kcal/mol were removed.
2) Designs with fewer than 35 Cβ atoms within 10Å of the TS were eliminated (too
exposed) as were designs with >85 Cβ atoms (too buried).
3) Designs in which the solvent accessible surface (SASA) of the TS was less than
2
10Å were eliminated (no access to binding pocket).
The remaining designs were then ranked according to each of the following metrics
independently:
1) Energy of binding of composite TS. Designs were ranked not only on the total
binding energy, but also on each of the components separately (Lennard-Jones
interactions, solvation, hydrogen bonding, and electrostatics)(5, 6, 8).
3) Packing of the catalytic lysine sidechain assessed through its Lennard-Jones and
solvation energy.
4) Atomic packing around transition state as assessed with a new algorithm for
detecting packing imperfections and cavities in proteins (Sheffler, W. and Baker, D.,
submitted).
Designs which ranked in the top 40% according to each of these measures were further
screened as follows:
1) The consistency of side chain conformations after repacking in the presence and
absence of the TS model was determined, and designs in which there were large
differences were discarded (we seek designs with a largely preformed active site).
2) The total folding free energy of the designed enzyme was compared to that of the
original scaffold and designs predicted to be largely destabilized were eliminated.
13
Evaluation of top-ranking designs
The top 100 or so designs remained after these filtering and ranking steps. And two
calculations are performed on those designs: the substrate rigid-body dock perturbation
experiments (6) in order to check whether the substrate can fit into the pocket in the
orientation appropriate for catalysis, the DDG calculation (9) which inspects the free
energetic change upon the protein-transition state binding. And then designed satisfying
the substrate docking and the DDG calculation were inspected visually to confirm routes
for substrate entrance and product release, and the enzymatic activity of selected designs
An overview of the active site search and design results is shown in Table S5. In a
typical active site search using catalytic motif III, a total of 181,555 matches were found
body orientation and the identities and conformations of the surrounding residues, 343 of
these had low TS binding energy and satisfied the catalytic constraints. Interestingly, a
large fraction of the low energy matches were found in TIM barrel and jelly roll folds
(Table S5).
A total of 72 designs with 10-20 amino acid identity changes in 10 different scaffolds
were selected for experimental characterization. Soluble purified protein was obtained for
14
70 of 72 of the expressed designs. An overview of the rate enhancement of 70 designs is
provided in Table S6. The sequences of enzymatically active designs, whose rate
enhancement are provided in the Table S1, are shown in Table S7.
Experimental methods
Genes were synthesized by Codon Devices (Cambridge, MA) and inserted between NdeI
and XhoI in pET29b+ (Novagen, Madison, WI) with a C-terminal Gly-Ser as a spacer
between the protein and the XhoI-6xHis tag. Proteins were expressed in an auto-
induction media(10). The cells were spun down and sonicated to expose soluble protein.
The proteins were purified over 10 mL of Ni-NTA His•Bind Resin (Qiagen, Valencia,
CA). After elution, the proteins were dialyzed three times against 100-fold larger volume
into 25 mM HEPES, 100 mM NaCl, pH 7.5. The purity of the proteins was monitored by
Coomassie staining which showed them to be >90% pure in all cases and generally
significantly more pure than that especially for the designs fully characterized in the
paper.
The protein concentrations were determined using the absorbance at 280 nm measured
molar extinction coefficient was calculated using the amino acid sequence- a tryptophan
residue contributing 5500 cm-1 M-1 to the extinction coefficient, tyrosine contributing
1490 cm-1 M-1 and cysteine 125 cm-1 M-1(11). Additionally, amino acid analysis was done
15
Site-directed mutagenesis was performed following the Kunkel protocol(12, 13), using
(http://www.stratagene.com/tradeshows/feature.aspx?fpId=118) (14).
and protein with 5 µL small molecule in acetonitrile; normally, 6-16 µM protein and 270
96 Well Black Flat Bottom Polystyrene Non Binding Surface Microplates (Corning,
Lowell, MA) using λex of 330 nm (9 nm bandwidth) and λem of 452 nm (15 nm badwidth)
with an additional filter at 435 nm for increased noise reduction. Quartz cuvette
measurements were used to verify the plate measurements. The reactions were controlled
for evaporation and were generally stable up to 4-5 hours with minimal evaporation-
though most kinetic measurements were done within the first hour. Known
concentrations of the product were used to quantitate the fluorescence (which is linear at
low concentrations). HPLC was done to verify that the product of the reaction coeluted
with the expected product. Using an Agilent 1200 Series HPLC monitoring at 254 nm,
Agilent Zorbax ODS 5 µm 4.6 x 250 mm column, and eluting 0.5 mL/min 50% CH3CN
8.6 minutes.
16
Product curves
Representative progress curves are shown in Figure S5. As noted in the main text, the
progress curves associated with different designs can be quite different; many clearly do
not follow the simple activated complex model used for Michaelis-Menten calculations.
RA60 and RA61 (Figure S5A and S5B) demonstrate nearly linear behavior for the length
of the reaction. There is a slight lag or delay at the beginning for several minutes before
they reach their maximal velocity. Both continue for multiple turnovers or until substrate
is consumed.
RA45 and RA46 (Figure S5C and S5D) show a very distinct lag phase of many minutes
where no product is formed- our hypothesis is that this represents a very slow imine
formation rate. After this lag phase, the enzymes quickly attain their maximal velocity
RA22 and RA34 (Figure S5E and S5F) display even more complex behavior. They
begin with a similar short lag phase and take off similarly to RA60 and RA61, but after
several minutes, slow down (almost like one would expect for burst kinetics); however,
the enzymes slow down well before one equivalent of aldehyde product has formed. The
difference between the two rates is ~7-fold lower for RA22 and ~8-fold lower for RA34.
The rate after the slowdown is continued indefinitely; we report both the burst phase and
the steady state rates in Table 2A. The amount of product formed during the burst phase
17
between independently purified preparations of designed enzyme. In experiments to
examine the cause of the burst and steady state rate differences, pre-incubation of the
and subsequent addition of substrate resulted in only the steady state rate being observed.
Pre-incubation with acetone had no similar effect. The interaction between the product
and RA22 and RA34 results in a significant decrease in the fluorescent signal, which at
least partially explains the observed non-stoichiometry of the burst phase. This also
indicates that the true enzymatic rate is even faster than the burst phase, but is inhibited
1a53 and 1m4w (Figure S4G), which are the original unmutated scaffold proteins used
for many of the best designs, show no retro-aldolase activity. The product slope actually
appears to be slightly downward and a much longer calculation is required to get a steady
state rate which is calculated to be on the same order as the uncatalyzed rate.
mM HEPES, 100 mM NaCl, pH 7.5. (180 µL buffer and protein with 5 µL small
plates either from a single reaction run (if fast enough) or by taking hourly point
measurements. After the absorbance was saturatated, the total absorbance change was
determined and the time at half (T1/2) of the absorbance increase was interpolated and
displayed in Table S1. Using the extinction coefficient of 15,000 M-1 cm-1 for the
18
enaminone(15), the active site lysine concentration appears to be less by 3-fold than the
enzyme concentration we estimate based on tryptophan absorbance for RA60 and RA61;
if the former is a more correct measure our reported kcat values would be underestimates
by 3-fold.
Crystallographic Methods.
RA22 S210A and RA61 M48K were both overexpressed from E. coli strain BL21 (RIL)
as C-terminal and N-terminal his-tagged protein constructs, respectively, and purified via
RA22 S210A that was crystallized was expressed using the pET-29b vector (Novagen)
and contained a non-cleavable C-terminal 6XHis-tag after the translated protein sequence
of the designed enzyme. In contrast, the RA61 M48K construct that was crystallized was
expressed using the pET-15b vector (Novagen), allowing removal of the N-terminal his-
tag after purification (resulting in three exogenous amino acid residues, G-S-H, prior to
against low molecular weight cut-off sieves (Centricon) and then screened for
crystallization conditions using a commercial sparse matrix screen (Nextal Classic Suite;
Qiagen). Crystals of RA22 S210A were grown by equilibration of the protein against a
19
reservoir containing 0.2 M ammonium sulfate, 0.1 M sodium acetate pH 4.6 and 27%
PEG4000, followed by further equilibration after the initial crystallization over a well
containing 0.2 M ammonium sulfate, 0.1 M sodium acetate pH4.6 and 30% PEG3350.
The crystals were frozen in an artificial mother liquor matching the final crystallization
reservoir, augmented with 20% glycerol v/v. Crystals of RA61 M48K were grown by
equilibration of the protein against reservoirs containing 1.9 to 2.1 M ammonium sulfate,
0.1 M MES pH 6.5 and 5% v/v PEG 400. The crystals were frozen in an artificial mother
liquor matching the final crystallization reservoir, augmented with 25% sucrose w/v.
X-ray data were collected on an R-AXIS IV/++ area detector (Rigaku) using X-rays
corresponding to λ = 1.54 Å provided from the Cu-Ka line of a Micromax 007 rotating
anode generator (Rigaku). A complete redundant data set was collected for each
specimen, to 2.2 Å and 1.8 Å resolution, respectively. Data were processed and scaled
During scaling and subsequent molecular replacement and refinement, the space groups
for the diffraction data sets were determined to be P21212 and P212121 for RA22 S210A
and RA61 M48K, respectively. Molecular replacement was performed using the original
PDB files (1LBF and 1M4W for RA22 and RA61, respectively) from the RCSB data
base, corresponding to the parental proteins prior to computational redesign, with all
sidechains and extended peptide regions that were subjected to redesign deleted. All
stages of molecular replacement, model building, and refinement were performed using
programs from the CCP4 computational suite(16) (PHASER, COOT and REFMAC).
Data, model building and refinement statistics are shown in Table S8.
20
Verification of Enzyme Activity and Characterization
Amino Acid analysis was performed at the Texas A&M Protein Chemistry Laboratory on
samples of RA22, RA22 K159M, RA34, RA60, RA61, and RA61 K176M, which had
been purified over a Ni-NTA column and dialyzed to pH 7.5, 25mM HEPES, and
100mM NaCl. The concentrations observed corresponded quite well with the expected
Mass Spectrometry was also performed on RA22, RA22 K159M, RA34, RA45, RA46,
RA60, RA60 K48M, RA61, and RA61 K176M. In the cases of RA22, RA22 K159M,
RA34, RA45, and RA46, masses were determined that corresponded to within 6 Daltons
of the expected average mass. In the cases of RA60, RA60 K48M, RA61, and RA61
K176M, a mass was obtained that corresponded to within 6 Daltons of the calculated
protein mass with the formyl-methionine remaining. These proteins are on the same
An imidazole gradient was run to elute the proteins from the Ni-NTA column (Figure
the proteins. The elution was monitored at 280nm. 1mL fractions were collected and
assayed for retro-aldolase activity. Fractions with significant 280nm absorbances were
run in 16.5% Tris-Tricine/ Peptide gels (BIO-RAD) and stained with GelCode Blue Stain
Reagent (PIERCE). Separation is not great; however, the peak in catalytic activity for all
designs tested (RA22, RA34, RA45, RA60 and RA61) corresponds to the peak the peak
at 280nm and to the most intense protein bands on the stained gel (only results for RA61
21
shown for brevity, but it is representative of the remainder). The major bands on the gel
correspond to the approximate molecular weight expected for the designed enzymes as
well.
Size exclusion chromatography using an ÄKTA FPLC eluting at 0.8 mL/min and loading
1.5 mL of a Ni-NTA purified enzyme sample (10-20 µM) on a Superdex 200 column,
which had been equilibrated and eluted with 150 mM NaCl, 25 mM HEPES, pH 7.5,
were assayed for retro-aldolase activity and fractions with significant 280nm absorbances
were run on PAGE gels. The results are displayed in Figure S8. Once again, the
retroaldolase activity corresponds to the major protein fractions, which show single bands
on a gel. RA34 is an interesting case in that it has a non-protein based impurity which
elutes with a significant absorbance at 280nm. The impurity has no catalytic activity,
however.
Our results demonstrate that novel enzyme catalysts for non-natural reactions can be
created using computational enzyme design. Our success with the retro-aldol reaction is
notable because of the complexity and large number of steps in the reaction—the enzyme
While the designs are less active than aldolase catalytic antibodies(17), they should be
excellent starting points for generating improved catalysts using directed evolution due to
their relatively small size and the robustness of the scaffolds which should allow for
22
increased expression, easier purification and library synthesis, etc. The more constrained
designed active sites are also likely to have different substrate selectivities: for example,
the nucleophilic lysine in the catalytic antibody combines rapidly with both the retro-
aldol substrate and the diketone probe despite their very different structures, whereas
many of our designs with considerable retro-aldolase activity interact very slowly or not
The success in computational design of enzyme catalysts for the retro-aldol reaction is
due to not only the enzyme design methodology described, but also to availability of
sufficiently powerful computers (an average of 30,000 cpu hours per active site motif)
and the availability of low cost and rapid gene synthesis capability which made possible
the experimental testing of many designs for each of four different enzyme active site
enzymes and primordial enzymes that arose early in evolution resemble one another more
than they resemble highly refined and sophisticated modern day enzymes. Whether or
not this analogy is correct in detail, we look forward to developing increasingly powerful
aldol catalysts by improving on the robust and stable designs described here both by
incorporating additional backbone flexibility into the design process, particularly in loop
regions, to increase the reactivity of the imine forming lysine and to lower KM by making
possible tighter substrate and transition state binding, and by using directed evolution for
23
V. References
24
15. J. Wagner, R. A. Lerner, C. F. Barbas, 3rd, Science 270, 1797 (1995).
16. Acta Crystallogr D Biol Crystallogr 50, 760 (Sep 1, 1994).
17. G. Zhong et al., Angew Chem Int Ed Engl 37, 2481 (1998).
18. F. Tanaka, R. Fuller, C. F. Barbas, 3rd, Biochemistry 44, 7583 (2005).
19. H. M. Berman et al., Acta Crystallogr D Biol Crystallogr 58, 899 (Jun, 2002).
25
VI. Figures and Tables
A. A graphical depiction of the active site ensemble for the carbon-carbon bond-
breaking step in motif I. The lowest energy conformation appears as thicker sticks
and the other conformations in the thinner lines. The xyz coordinates of the
ensemble are provided in the online support. Only the terminal atoms of the
catalytic sidechains are shown.
26
B. Diversification of the degrees of freedom of the C-C bond-breaking transition state for
motifs II, III and IV.
C. The four base transition states for the carbon-carbon bond-breaking step (R
and S enantiomers; χ1 = ± 90˚) The Nζ (blue), Cε of the catalytic lysine, and the
water in motif IV are shown.
27
Figure S2. Generation of the composite TS
Lys
28
B. Composite transition state with water molecule for motif IV. The methyl group from
the carbinolamine intermediate (purple) and carbon-carbon breaking step (yellow) are
superimposed on the key atoms of the imine (Nζ of lysine and its Cα and Cβ atoms)-
which are highlighted with the additional spheres. Nζ (blue) of the catalytic lysine is also
shown. The yellow dashed lines indicate hydrogen bonds; the X’s are positions of
sidechain hydrogen bonding groups sought during matching.
29
Figure S3. Geometric parameters for catalytic residue placements used in motifs II, III
and IV.
a
. this degree of freedom is not diversified during matching, but is allowed to vary during
the subsequent optimization steps
b
. planarity of the imine/enamine
c
. freely rotatable bond
d
. values for the highest frequency Lys rotamer from the Dunbrack backbone dependent
library are shown for illustration; the highest frequency 28 rotamers were used in the
matching calculations
e
. for each of the 28 high frequency Lys rotamers from the Dunbrack library, 81 variants
were generated with each chi angle perturbed by ± 1 standard deviation (3rd column)
(3x3x3x3=81)
30
B. Geometric parameters for the π-stacking residue (Phe,Tyr,Trp) used in motif II and
III.
a
. idealized pi-stacking geometries (4)
b
. this degree of freedom is not diversified during matching, but is allowed to vary during
the subsequent optimization steps
c
. the different choices for Cβ atom extension
d
. values for a high frequency Phe rotamer are shown for illustration, a total of six
(Phe,Tyr) or nine (Trp) rotamers are used including nine variants for each rotamer
generated with each chi angle perturbed by ± 1 standard deviation (3rd column) (3x3)
31
C. Geometric parameters and diversification for placing the sidechain of Tyr in motif II
a
. this degree of freedom is not diversified during matching, but is allowed to vary during
the subsequent optimization steps
b
. values from a high frequency Tyr rotamer are shown for illustration, a total of six
rotamers are used including nine variants for each rotamer generated with each chi angle
perturbed by ± 1 standard deviation (3rd column) (3x3)
32
D. Geometric parameters and diversification for placing His-Asp dyad in motif III
a
. To place the His-Asp dyad, the placement of His rotamer is constrained by identifying,
for each His rotamer in a scaffold, the set of Asp rotamers which can provide the
supporting hydrogen bond
b
. this degree of freedom is not diversified during matching, but is allowed to vary during
the subsequent optimization steps
c
. the value indicates the atoms are in the plane of the imidazole ring
d
. values from a popular rotamer are shown for illustration, a total of nine rotamers are
used including nine variants for each rotamer generated with each chi angle perturbed by
± 1 standard deviation (3rd column) (3x3)
33
E. Geometric parameters and diversification for placing basic residues (Tyr,Asp,Glu,His)
in motif IV
a
. this degree of freedom is not diversified during matching, but is allowed to vary during
the subsequent optimization steps
b
. the value indicates tetrahedral geometry at the water molecule
c
. values from a popular Tyr rotamer are shown for illustration, a total of six (Tyr), nine
(Asp,His) or twenty-one (Glu) rotamers are used including nine (3x3;Tyr,Asp,His) or
twenty seven (33;Glu) variants for each rotamer generated with each chi angle perturbed
by ± 1 standard deviation (3rd column)
d
. Asp,Glu,His are placed by the geometric parameters following the simple rule for
hydrogen bonding interactions described in our previous paper
34
F. Geometric parameters and diversification for placing an H-bonding group
(Ser,Thr,Tyr) in motif IV
a
. this degree of freedom is not diversified during matching, but is allowed to vary during
the subsequent optimization steps
b
. the value indicates tetrahedral geometry at the water molecule
c
. values from a popular Ser rotamer are shown for illustration, a total of three (Ser) or six
(Thr,Tyr) rotamers are used including three (Ser,Thr) or nine (Tyr) variants for each
rotamer generated with each chi angle perturbed by ± 1 standard deviation (3rd column)
35
Figure S4. A pictoral representation of the grid used in RosettaMatch. A TIM barrel
scaffold protein, 1a53, is shown as gray ribbons. The green grid is the three dimensional
space allowed for localization of the substrate or composite active site. It has 0.25 Å
resolution and is trimmed to remove clashes with the backbone and to encompass the
pocket where enzyme active sites would generally occur within a given scaffold. A
similar backbone grid, which is a precomputed three-dimensional representation of the
backbone is used to quickly look up backbone clashes without the need to cycle through
all atoms in the protein during the clash check.
36
Figure 5. Progress curve for selected designs
A. Progress curve for design RA60 incubated with 540 µM racemic substrate. RA60 was
8.6 µM in 25 mM HEPES, 100 mM NaCl, pH 7.5.
B. Progress curve for design RA61 incubated with 540 µM racemic substrate. RA61 was
13.8 µM in 25 mM HEPES, 100 mM NaCl, pH 7.5.
37
C. Progress curve for design RA45 incubated with 540 µM racemic substrate. RA45 was
18.0 µM in 25 mM HEPES, 100 mM NaCl, pH 7.5.
D. Progress curve for design RA46 incubated with 540 µM racemic substrate. RA46 was
11.0 µM in 25 mM HEPES, 100 mM NaCl, pH 7.5.
38
E. Progress curve for design RA22 incubated with 540 and 108 µM racemic substrate.
RA22 was 16.0 µM in 25 mM HEPES, 100 mM NaCl, pH 7.5.
F. Progress curve for design RA34 incubated with 540 and 108 µM racemic substrate.
RA34 was 12.3 µM in 25 mM HEPES, 100 mM NaCl, pH 7.5.
39
G. Progress curve for wild type scaffold proteins 1a53 and 1m4w incubated with 540 µM
racemic substrate. 1a53 and 1m4w were 18.1 and 22.5 µM, respectively, in 25 mM
HEPES, 100 mM NaCl, pH 7.5.
40
Figure S6. Uncatalyzed reaction for 560 µM substrate in 25 mM HEPES, 100 mM NaCl,
pH 7.5, 2.7% acetonitrile. The uncatalyzed rate (the slope divided by the initial substrate
concentration: 2.26 x10-10 M/min / 0.000560M ) is thus 4.1 ± 0.2 x 10-7 min-1. We use
3.9 x 10-7 min-1 to be consistent with previous literature(18).
41
2 3 4 5 6 7 MW 8 9 10 11 12 13 14 15
std
Figure S7. RA61 eluted with an imidazole gradient
Column fractions from a Ni-NTA column eluted with an imidazole gradient from 20 to
150 mM over 15 minutes at 1 mL/min. The gradient is shown in red and 1 mL fractions
are collected (top panel). The absorbance at 280 nm for each fraction is shown in green
and the relative retro-aldolase activity of each fraction, when assayed with 270 uM
substrate, is shown in blue. The fractions were then analyzed by SDS PAGE (bottom
panel). The mass of RA61 is 22.9 kDa.
42
MW 6 7 8 9 14 15 16 17 18 19 20 21 22 23
Std
Figure S8A. RA22 size exclusion chromatography
1mL fractions were collected and tested for retro-aldolase activity. The absorbance at
280 nm of the eluent (red) and the relative retro-aldolase activity (green) of the fraction
are shown versus to the elution volume. Fractions with significant absorbance at 280 nm
were subjected to SDS PAGE (lower panel). The mass of RA22 is 29.5 kDa.
43
13 14 MW 15 16 17 18 19 20 21 22 23 24 25 26
std
Figure S8B. RA34 size exclusion chromatography
1mL fractions were collected and tested for retro-aldolase activity. The absorbance at
280 nm of the eluent (red) and the relative retro-aldolase activity (green) of the fraction
are shown versus to the elution volume. Fractions with significant absorbance at 280 nm
were subjected to SDS PAGE (lower panel). The mass of RA34 is 29.6 kDa. The
additional absorbance peak at 22mL is not protein (see the gel) and has no catalytic
activity.
44
7 8 9 20 MW 21 22 23 24 25 26 27 28 29 30
std
Figure S8C. RA61 size exclusion chromatography
1mL fractions were collected and tested for retro-aldolase activity. The absorbance at
280 nm of the eluent (red) and the relative retro-aldolase activity (green) of the fraction
are shown versus to the elution volume. Fractions with significant absorbance at 280 nm
were subjected to SDS PAGE (lower panel). The mass of RA61 is 22.9 kDa.
45
Table S3. Combinatorics of transition state and catalytic sidechain placement
46
Table S5. Protein fold distribution of in silico designs after active site identification
and pocket design
Fold # scaffolds # matches # low energy designs
per scaffold per scaffold*
Lipocalin 8 4498 4
1~101 38
101~102 6
102~103 12
103~104 12
104~105 2
ALL 70
47
Table S7. Sequence information for active retro-aldolase designs
PDB
Design scaffold Amino Acid Positions Changed in the Design
Jelly roll
16 43 45 81 90 180
1f5j E N L Y E E
RA28 6 A F A F T K
9 11 48 78 101 103 126 128 130 169 171 176 201 222 224
1thf C D V T S N V A D L T D S L A
RA17 14 V L A I A L K - F A V L H A D
RA58 10 S Y T - - H - S W K V W H - -
TIM
7 47 49 79 81 83 87 106 108 110 129 131 155 157 177 179 183 209 210 229
1i4n I E K S L E F L K F L I L E G N L E S L
RA31 14 A Q S L D A - - H I V L - L - K - L W -
RA32 13 - Q S T D - - V H - V - - V K A A F - S
RA33 15 - Q S T D A A V H - - - V V K A A L - S
9 51 53 56 58 81 83 89 110 112 131 135 159 161 178 180 182 184 187 210 211 231 233
1lbf L E K S S S L F K F L K E N G N R L L E S L G
RA22 13 - G D - - - T - A - A - K - - V A W - S F S H
RA34 13 - Y M - W - T A A - A - K - - V - W - S Y S -
48
RA35 14 - G I - - A T G A - A - K - - I - W - S Y S Y
RA36 17 - G V G W A A W T - A - K - - V A W - S F A Y
RA39 14 - G S - T - - - T - A - V T K L - W G S - S Y
RA41 17 A V S - - T Y W A L V P V V K K - W - S - I -
RA47 15 - G M - W A T A A - A - K - - V - W - S Y S Y
8 51 53 56 58 81 83 85 89 110 112 131 133 135 157 159 161 178 180 182 184 187 210 211 231 233 234
1lbl W E K S S S L E F K F L I K L E N G N R L L E S L G S
RA6 15 - M S - W V - - A V W V V - - V - K S E - - A H - - -
RA42 15 - L S - W V A - T W - - - - - V - K S M Y - S - V Y -
RA45 13 - L S H - - W - V V - - F - - V - - K - W - M - - E Y
RA46 14 - V S - - - Y - W V L V - S V V - L K - - - S W - - -
RA48 16 - M S - L A F - W L - K S L - S Y L V - P - L - - - -
RA49 17 - M S - W V - G A V W V V - - T - K S - H - A L I - -
RA55 12 - Y T - W - - - - T - A - - - K - - V - W - S - S V I
RA56 16 R G I H - A Y - T I - V - - - K - - H - W F T - S Y -
RA57 13 - G M - - - W - S L - V - - - K - - V - W - S H T Y -
9 51 53 56 58 81 83 85 89 110 112 131 133 135 159 161 178 180 182 184 187 189 210 211 231 233
1a53 L E K S S S L E F K F L I K E N G N R L L I E S L G
RA26 10 - - S - W L - - - H - K - - A H - S - P - - L - - -
RA40 17 V V M - - T Y - - V - V F P V - K L S V - - S H S -
RA43 16 - V S - - L W G G V W - V - S Y K A - P - - S - V -
RA53 16 - V S H - - Y - T V V V - - K - V S Y W - - S G S -
RA68 10 - - - - T - S - H - - - - - L - - - - Y G T I Y - S
49
Table S8. Crystallographic Data and Refinement Statistics
Unit Cell Dimensions (Å) a=72.05, b=73.58, c=47.40 a=39.19, b=66.06, c=66.07
Resolution Limits (Total / (final shell) 36.0 - 2.20 (2.28 - 2.20) 33.7 - 1.80 (1.86 - 1.80)
Ramachandran Distribution 90.3, 9.3, 0.0, 0.4 88.5, 11.5, 0.0, 0.0
(% most favored, additional allowed,
generously allowed, disallowed)
50