Professional Documents
Culture Documents
AND
YINAN WEI1
Abstract
Combinatorial libraries of de novo amino acid sequences can provide a rich source of diversity for the
discovery of novel proteins with interesting and important activities. Randomly generated sequences,
however, rarely fold into well-ordered proteinlike structures. To enhance the quality of a library, features of
rational design must be used to focus sequence diversity into those regions of sequence space that are most
likely to yield folded structures. This review describes how focused libraries can be constructed by designing
the binary pattern of polar and nonpolar amino acids to favor proteins that contain abundant secondary
structure, while simultaneously burying hydrophobic side chains and exposing hydrophilic side chains to
solvent. The binary code for protein design was used to construct several libraries of de novo proteins,
including both -helical and -sheet structures. The recently determined solution structure of a binary
patterned four-helix bundle is well ordered, thereby demonstrating that sequences that have neither been
selected by evolution (in vivo or in vitro) nor designed by computer can form nativelike proteins. Examples
are presented demonstrating how binary patterned libraries have successfully produced well-ordered structures, cofactor binding, catalytic activity, self-assembled monolayers, amyloid-like nanofibrils, and proteinbased biomaterials.
Keywords: artificial proteins; binary patterning; combinatorial libraries; de novo protein design
Protein Science (2004), 13:17111723. Published by Cold Spring Harbor Laboratory Press. Copyright 2004 The Protein Society
1711
Hecht et al.
The binary code strategy is based on the premise that appropriate patterning of polar and nonpolar residues can
drive a polypeptide chain to fold into segments of amphiphilic secondary structure that anneal together to form a
desired tertiary structure (Kamtekar et al. 1993; West and
Hecht 1995; Xiong et al. 1995). However, because of its
combinatorial nature, the binary code strategy does not allow explicit design of specific interresidue interactions. Because unique packing cannot be designed a priori, it is reasonable to question whether nativelike structures can
nonetheless be isolated from binary code libraries a posteriori.
Well-folded structures versus molten globules
We developed several methods to rapidly screen libraries
for sequences that successfully recapitulate the features of
well-folded native proteins. These screens were based either
Figure 3. The genetic code. Codons for polar residues are highlighted in
red. Those for nonpolar residues are highlighted in yellow.
Figure 4. Design of a second-generation library of binary patterned fourhelix bundles. Helices were designed to be 50% longer than in the first
library.
www.proteinscience.org
1713
Hecht et al.
Figure 5. Natural abundance 13C,1H-HSQC NMR for five proteins from the unselected second-generation library. and methyl
resonances of Ile side chains appear in the 13C dimension at 11 ppm and 15 ppm, respectively. Methyl groups from the side chains
of Val, Met, Leu, and Thr appear between 18 ppm and 24 ppm. Spectra for four of the five proteins (S-213, S-285, S-824, S-836)
display resolved peaks and good dispersion, indicating well-ordered structures. (Reprinted from Wei et al. 2003b.)
the actual structure. Attempts to determine the three-dimensional structures of our first-generation -helical proteins
were unsuccessful. This was not surprising given that structural flexibility is known both to disfavor crystal growth and
to yield poorly resolved NMR spectra. In contrast with these
first-generation sequences, proteins from the second-generation library yield high quality NMR spectra indicative of
well-folded (nonfluctuating) structures (Wei et al. 2003b).
Therefore the three-dimensional structures of these new
proteins are accessible by NMR spectroscopy.
Recently, we solved the solution structure of a protein
(S-824) from the second-generation library (Wei et al.
2003a). As shown in Figure 6, the structure is indeed a
four-helix bundle. In accordance with the binary code design, the polar and nonpolar residues segregate on the sur-
Figure 6. Solution structure of protein S-824 from the second-generation binary patterned library. (Reprinted from Wei et al. 2003a;
2003 National Academy of Sciences, U.S.A.) (Left) Ribbon diagram of the four-helix bundle. (Center) Head-on view of the
four-helix bundle with polar residues colored red and nonpolar residues colored yellow. (Right) Same as center image in space-filling
representation.
1714
Figure 7. Overlay of 10 lowest energy structures calculated from the NMR data (Reprinted with permission from Wei et al. 2003a,
2003 National Academy of Sciences, U.S.A.) (A) Backbone. (B) Nonpolar side chains of -helices 1 and 2. (C) Nonpolar side chains
of -helices 3 and 4.
1715
Hecht et al.
Figure 9. Red color associated with heme binding by proteins from the
initial 74-residue -helical library. Proteins 86 and F bind heme. Protein B does not. The other cuvettes show negative controls. (Reprinted
from Rojas et al. 1997.)
have neither been selected by evolution nor explicitly designed for redox activity. We measured the midpoint reduction potentials for five first-generation and three secondgeneration binary patterned proteins. The potentials ranged
from 112 mV to 176 mV (Moffet et al. 2003). These
default reduction potentials can be compared with the reduction potentials of (1) heme alone, (2) naturally evolved
heme proteins, and (3) rationally designed heme proteins.
The midpoint reduction potential for unbound heme is 220
mV. Nature has altered this potential for particular redox
functions by evolving heme proteins spanning a wide range
of potentials from approximately 400 mV to +400 mV.
Attempts to engineer the potentials of either natural proteins
(Springs et al. 2000, 2002) or novel protein maquettes (Shifman et al. 2000) have produced narrower ranges.
using rational design and/or automated computational methods (Broo et al. 1998; Bolon and Mayo 2001). Moreover,
the observed activity rivals those of the first catalytic antibodies (Pollack et al. 1986; Tramontano et al. 1986).
Because hydrolysis of p-nitrophenyl esters by protein
S-824 is presumed to involve a histidine nucleophile (Wei
and Hecht 2004), it was important to establish that the rate
catalyzed by protein S-824 is above that catalyzed by free
imidazole. At pH 7, protein S-824 hydrolyzed p-nitrophenyl
acetate 100-fold faster than does 4-methylimidazole (Wei
and Hecht 2004). Even after correcting for the 11 histidines
in the sequence of S-824, it is clear that the de novo protein
catalyzes ester hydrolysis more effectively than a simple
imidazole-based catalyst. Like natural enzymes, S-824 catalyzed multiple turnovers and remained active after prolonged exposure to excess substrate.
To assess whether the activity of S-824 is representative
of other proteins in our binary patterned libraries, we measured the esterase activity of six additional binary patterned
proteins. These proteins were from two libraries: the original library of 74-residue sequences and the second-generation library of 102-residue sequences described above. Both
libraries were nave in that they were neither designed to
bind substrate nor subjected to high throughput screens for
activity. All six of the additional proteins displayed esterase
activity significantly above background (Wei and Hecht
2004).
These findings suggest that although the exquisite levels
of activity and specificity typical of natural enzymes may
have required eons of evolutionary selection, proteins with
moderate levels of activity are surprisingly common in libraries of de novo sequences designed by binary patterning.
The activities of these unselected proteins provide a reference state for the levels of activity that have been reported
for proteins obtained by selection and/or computational
design.
the various amyloidogenic sequences, coupled with the unavailability of a high-resolution structure, has limited understanding of the sequence determinants of amyloidogenesis (Wood et al. 1995; Selkoe and Podlisny 2002; Wurth et
al. 2002; Williams et al. 2004).
What are the molecular determinants of amyloidogenesis? Which features of an amino acid sequence cause it to
assemble into amyloid? These questions can be addressed
by two very different approaches: (1) One can probe the
sequence determinants of natural amyloid proteins by
screening randomly generated amino acid substitutions
(mutations) to identify those that prevent amyloidogenesis
or (2) one can test hypotheses about possible sequence determinants by using them as the basis for the design of de
novo amyloid-like proteins.
We have used both approaches. Our work probing the
sequence determinants of amyloidogenesis in a natural system (the Alzheimers A peptide) has been described previously (Wurth et al. 2002) and will not be reviewed here.
Our design of libraries of de novo amyloid-like fibrils is
described in this section.
To probe the sequence determinants of amyloidogenesis
by de novo protein design, we once again used the binary
code strategy. We designed a binary patterned combinatorial library of de novo proteins (West et al. 1999) using the
-strand pattern shown in Figure 1. All sequences in the
library were constrained by the OOOO alternating
pattern consistent with amphiphilic -structure. The precise
identities of the side chains were not constrained and were
varied combinatorially. Polar sites were allowed to be His,
Lys, Asn, Asp, Gln, or Glu and nonpolar residues were
allowed to be Leu, Ile, Val, or Phe. A schematic diagram of
the binary pattern for proteins containing six -strands
punctuated by turns is shown in Figure 10. The amino acid
sequences of proteins from this library are shown in Figure 11.
www.proteinscience.org
1717
Hecht et al.
Figure 11. (A) Schematic illustration of the design of a combinatorial library of de novo proteins containing six -strands (arrows)
punctuated by turns. (B) Designed binary sequence pattern. Alternating pattern in the -strands is indicated with polar residues (O) as
open circles on a black background and nonpolar residues () as closed circles on a gray background. Combinatorial diversity is
incorporated at positions marked O, , and t (turn). Fixed residues are incorporated at the termini and in some of the turns. (C) Amino
acid sequences (single letter code) of 17 de novo proteins from the combinatorial library. (Reprinted from West et al. 1999, 1999
National Academy of Sciences, U.S.A.)
Figure 12. Transmission electron micrograph of a 63-residue binary patterned de novo protein designed to form amphiphilic -strands. The observed fibrillar structures resemble those seen for natural amyloid proteins.
(Reprinted from West et al. 1999; 1999 National Academy of Sciences, U.S.A.)
1718
These libraries of de novo amyloid proteins can be compared with our earlier work designing -helical proteins
(see above). In both cases, combinatorial libraries were
designed by specifying the binary pattern of polar and nonpolar residues. Yet the resulting proteins display dramatically different properties: In the previous work, the sequences formed -helices that folded intramolecularly into
small globular domains. In contrast, the sequences described here form -strands and self-assemble intermolecularly into high-order oligomers that assume fibrillar structures. What causes these dramatically different structures?
The lengths of the sequences are not dramatically different,
nor are their overall compositions. We propose that the
determining difference is the binary patterning itself. In the
earlier work, the library was constrained by the pattern
OOOOOOOO, consistent with the periodicity
of -helical structure. In contrast, this library is constrained
by the pattern OOOO, consistent with the periodicity
of amphiphilic -strands.
If it is indeed true that alternating patterns of polar and
nonpolar residues have an inherent propensity to form amyloid fibrils, then one might expect that these binary patterns
would be disfavored by natural selection. We tested this
possibility with a bioinformatics study: We analyzed a database of 250,514 natural protein sequences comprising
79,708,024 residues and calculated the frequencies of alternating patterns relative to other patterns with similar com-
Protein-based biomaterials
Self-assembled monolayers
The amyloid-like fibrils described above and shown in Figure 12 are self-assembled one-dimensional structures. To
Figure 13. (A) Schematic representation of a fibril formed by open-ended oligomerization of a six-stranded binary patterned -sheet
protein. -Strands are shown in green and turns in silver. Polar side chains are shown in red, and nonpolar side chains, in yellow. (B)
Model of a monomeric six-stranded -sandwich (rotated 90 relative to A). In the monomer, the hydrophobic side chains (yellow) of
the edge stands are accessible to water. For simplicity, flat -sheets are depicted. In reality, a six-stranded -sandwich would be
twisted. (C) Model of a monomeric six-stranded -sandwich in which lysine side chains (blue) are substituted in place of Ile 5 in the
N-terminal -strand and Val 60 in the C-terminal -strand of sequence 17 (see Fig. 3). In the monomeric structure, the charged ends
of the lysine side chains on the edge strands would be exposed to solvent. (D) Same as panel C, rotated by 90. (Reprinted from Wang
and Hecht 2002, 2002 National Academy of Sciences, U.S.A.)
www.proteinscience.org
1719
Hecht et al.
explore the possibility of fabricating two-dimensional protein layers, we studied the ability of our de novo proteins to
self-assemble into monolayers at an air/water interface (Xu
et al. 2001).
Several proteins were probed for their abilities to form
amphiphilic monolayers. The proteins were chosen from the
binary patterned library described above, which was designed to form six -strands punctuated by reverse turns
(Figs. 10, 11). We used LangmuirBlodgett techniques to
study the properties of the monolayers at the air/water interface, and spectroscopic methods to assess secondary
structure. The results of these studies demonstrated that (1)
the proteins self-assembled into monolayers at an air/water
interface; (2) the monolayers were dominated by -sheet
secondary structure, as shown by both circular dichroism
and infrared spectroscopies; (3) the measured area per protein molecule was approximately 500600 2. This matches
the area expected for a model of an amphiphilic -sheet
shown in Figure 14. If the polar turns project down into the
aqueous solvent, as shown in the figure, then the measured
area would comprise only the 42 residues in the six
-strands, which would indicate an area of approximately
1214 2 per residue (Xu et al. 2001).
Our finding that distinctly different sequences from our
library assemble into very similar structures suggests that
-sheet monolayers can be encoded by the designed pattern
of polar and nonpolar amino acids. Moreover, because the
designed pattern is compatible with a wide variety of different sequences, it may be possible to fabricate -sheet
monolayers using combinations of side chains that are explicitly designed for particular applications of novel biomaterials.
Template-directed assembly of de novo
designed proteins
A number of biological materials owe their unusual structural characteristics and mechanical properties to long-range
1720
Figure 15. (A) A binary patterned de novo protein modeled as a flat six-stranded amphiphilic -sheet with polar residues (red)
projecting up and nonpolar residues (green) projecting down toward the hydrophobic graphite surface. (B) Schematic representation
of the six-stranded -sheet protein assembled on a graphite surface. -Strands are shown as blue arrows. The threefold symmetry of
the graphite template is recapitulated in the assembly of the protein. The long axis of the fibers is shown perpendicular to the -strands
and is indicated with green arrows. (C) AFM image of a binary patterned protein deposited on graphite. (Inset) A Fourier transform
of this image. The threefold symmetry is apparent both in the AFM image and in its Fourier transform. (Reprinted from Brown et al.
2002 with permission from J. Am. Chem. Soc., 2002 Am. Chem. Soc.)
signed without using either residue-by-residue rational design or computational methods. Moreover, neither genetic
selections in vivo nor high throughput methods in vitro were
used to screen the libraries. Nonetheless, appropriately designed binary patterned libraries produced -helical proteins, -sheet proteins, well-ordered structures, cofactor
binding proteins, catalytically active enzymes, self-assembled monolayers, amyloid-like nanofibrils, and template-directed assemblies of two-dimensional biomaterials.
These initial successes notwithstanding, thus far the potential of the binary code for protein design has been explored only cursorily. To date, we have completed detailed
studies of only several representative proteins from each
collection. Although characterization of these few proteins
has provided a proof-of-principle demonstration that the binary code strategy is an effective method for discovering
novel proteins with interesting properties (e.g., well-ordered
structures, cofactor binding, enzyme-like activity, etc.), the
range of properties that might ultimately be uncovered will
require detailed studies of many more proteins. For example, the first binary patterned protein whose three-dimensional structure was solved formed a four-helix bundle that
is well ordered and displays nativelike interior packing
(Figs. 6, 7). Will this be true for all members of this library?
Or will there be a range of behaviors with some structures
being less well packed? If it turns out that the majority of
sequences from this library form highly ordered structures
with nativelike packing (as suggested by Fig. 5 and described by Wei et al. 2003b), then this would imply that
even when packing is not explicitly designed, protein sequences nonetheless find a way to achieve good packing.
Such a result would have significant implications both for
www.proteinscience.org
1721
Hecht et al.
References
Adams, P.A. 1990. The peroxidasic activity of the haem octapeptide microperoxidase-8 (MP-8): The kinetic mechanism of the catalytic reduction of H2O2
by MP-8 using 2,2-azinobis-(3-ethylbenzothiazoline-6-sulphonate)(ABTS)
as reducing substrate. J. Chem. Soc. Perkin Trans. 2 8: 14071414.
Aksay, I.A., Trau, M., Manne, S., Honma, I., Yao, N., Zhou, L., Fenter, P.,
Eisenberger, P.M., and Gruner, S.M. 1996. Biomimetic pathways for assembling inorganic thin films. Science 273: 892898.
Axe, D.D., Foster, N.W., and Fersht, A.R. 1996. Active barnase variants with
completely random hydrophobic cores. Proc. Natl. Acad. Sci. 93: 5590
5594.
Beasley, J.R. and Hecht, M.H. 1997. Protein design: The choice of de novo
sequences. J. Biol. Chem. 272: 20312034.
Belcher, A.M., Wu, X.H., Christensen, R.J., Hansma, P.K., Stucky, G.D., and
Morse, D.E. 1996. Control of crystal phase and orientation by soluble mollusk-shell proteins. Nature 381: 5658.
Benson, D.E., Wisz, M.S., and Hellinga, H.W. 2000. Rational design of nascent
metalloenzymes. Proc. Natl. Acad. Sci. 97: 62926297.
Betz, S.F. and DeGrado, W.F. 1996. Controlling topology and native-like behavior of de novo designed peptides: Design and characterization of antiparallel four-stranded coiled coils. Biochemistry 35: 69556962.
Bolon, D.N. and Mayo, S.L. 2001. Enzyme-like proteins by computational
design. Proc. Natl. Acad. Sci. 98: 1427414279.
Bowie, J.U., Reidhaar-Olson, J.F., Lim, W.A., and Sauer, R.T. 1990. Deciphering the message in protein sequences: Tolerance to amino acid substitutions.
Science 247: 13061310.
Bromberg, S. and Dill, K.A. 1994. Side-chain entropy and packing in proteins.
Protein Sci. 3: 9971009.
Broo, K.S., Nilsson, H., Nilsson, J., and Baltzer, L. 1998. Substrate recognition
and saturation kinetics in de novo designed histidine-based four-helix
bundle catalysts. J. Am. Chem. Soc. 120: 1028710295.
Broome, B.M. and Hecht, M.H. 2000. Nature disfavors sequences of alternating
polar and nonpolar amino acids: Implications for amyloidogenesis. J. Mol.
Biol. 296: 961968.
Brown, C.L., Aksay, I.A., Saville, D.A., and Hecht, M.H. 2002. Templatedirected assembly of a de novo designed protein. J. Am. Chem. Soc. 124:
68466848.
Chen, G., Hayhurst, A., Thomas, J.G., Harvey, B.R., Iverson, B.L., and Georgiou, G. 2001. Isolation of high-affinity ligand-binding proteins by periplasmic expression with cytometric screening (PECS). Nature Biotechnol. 19:
537542.
Chothia, C. and Lesk, A.M. 1987. The evolution of protein structures. Cold
Spring Harbor Symp. Quant. Biol. 52: 399.
Davidson, A.R. and Sauer, R.T. 1994. Folded proteins occur frequently in
libraries of random amino acid sequences. Proc. Natl. Acad. Sci. 91: 2146
2150.
Davidson, A.R., Lumb, K.J., and Sauer, R.T. 1995. Cooperatively folded proteins in random sequence libraries. Nat. Struct. Biol. 2: 856864.
Dill, K.A. 1985. Theory for the folding and stability of globular proteins. Biochemistry 24: 15011509.
Dobson, C.M. 2002. Getting out of shape. Nature 418: 729730.
Fairman, R.T., Chao, H.-G., Mueller, L., Lavoie, T.B., Shen, L., Novotny, J.,
and Matsueda, G.R. 1995. Characterization of a new four-chain coiled coil:
Influence of chain length on stability. Protein Sci. 4: 14571469.
Falini, G., Albeck, S., Weiner, S., and Addadi, L. 1996. Control of aragonite or
calcite polymorphism by mollusk shell macromolecules. Science 271: 67
69.
Gassner, N.C., Baase, W.A., and Matthews, B.W. 1996. A test of the jigsaw
puzzle model for protein folding by multiple methionine substitutions
within the core of T4 lysozyme. Proc. Natl. Acad. Sci. 93: 1215512158.
Hanes, J. and Plckthun, A. 1997. In vitro selection and evolution of functional
proteins by using ribosome display. Proc. Natl. Acad. Sci. 94: 49374942.
Heuer, A.H., Fink, D.J., Laraia, V.J., Arias, J.L., Calvert, P.D., Kendall, K.,
Messing, G.L., Blackwell, J., Rieke, P.C., Thompson, D.H., et al. 1992.
1722
amino acids displays native-like properties. J. Am. Chem. Soc. 119: 5302
5306.
Sarikaya, M. and Aksay, I.A. 1995. Biomimetics: Design and processing of
materials. AIP Press, Woodbury, NY.
Selkoe, D.J. 2001. Alzheimers disease: Genes, proteins, and therapy. Physiol.
Rev. 81: 741766.
Selkoe, D.J. and Podlisny, M.B. 2002. Deciphering the genetic basis of Alzheimer disease. Annu. Rev. Genom. Hum. Genet. 3: 6799.
Shifman, J.M., Gibney, B.R., Sharp, R.E., and Dutton, P.L. 2000. Heme redox
potential control in de novo designed four -helical bundle proteins. Biochemistry 39: 1481314821.
Smith, G.P. 1985. Filamentous fusion phage: Novel expression vectors that
display cloned antigens on the virion surface. Science 228: 13151317.
Springs, S.L., Bass, S.E., and McLendon, G.L. 2000. Cytochrome b562 variants:
A library for examining redox potential evolution. Biochemistry 39: 6075
6082.
Springs, S.L., Bass, S.E., Bowman, G., Nodelman, I., Schutt, C.E., and McLendon, G.L. 2002. A multigeneration analysis of cytochrome b562 redox variants: Evolutionary strategies for modulating redox potential revealed using
a library approach. Biochemistry 41: 43214328.
Stewart, J.D., Krebs, J.F., Siuzdak, G., Berdis, A.J., Smithrud, D.B., and Benkovic, S.J. 1994. Dissection of an antibody-catalyzed reaction. Proc. Natl.
Acad. Sci. 91: 74047409.
Tramontano, A., Janda, K.D., and Lerner, R.A. 1986. Catalytic antibodies.
Science 234: 15661570.
Wang, W. and Hecht, M.H. 2002. Rationally designed mutations convert de
novo amyloid-like fibrils into soluble monomeric -sheet proteins. Proc.
Natl. Acad. Sci. 99: 27602765.
Wei, Y. and Hecht, M.H. 2004. Enzyme-like proteins from an unselected library
of designed amino acid sequences. Protein Eng. Des. Sel. 17: 6775.
Wei, Y., Kim, S., Fela, D., Baum, J., and Hecht, M.H. 2003a. Solution structure
of a de novo protein from a designed combinatorial library. Proc. Natl.
Acad. Sci. 100: 1327013273.
Wei, Y., Liu, T., Sazinsky, S.L., Moffet, D.A., Pelczer, I., and Hecht, M.H.
2003b. Stably folded de novo proteins from a designed combinatorial library. Protein Sci. 12: 92102.
Weiner, S. and Addadi, L. 1997. Design strategies in mineralized biological
materials. J. Mater. Chem. 7: 689702.
West, M.W. and Hecht, M.H. 1995. Binary patterning of polar and nonpolar
amino acids in the sequences and structures of native proteins. Protein Sci.
4: 20322039.
West, M.W., Wang, W., Patterson, J., Mancias, J.D., Beasley J.R., and Hecht,
M.H. 1999. De novo amyloid proteins from designed combinatorial libraries. Proc. Natl. Acad. Sci. 96: 1121111216.
Williams, A.D., Portelius, E., Kheterpal, I., Guo, J.T., Cook, K.D., Xu, Y., and
Wetzel, R. 2004. Mapping A amyloid fibril secondary structure using
scanning proline mutagenesis. J. Mol. Biol. 335: 833842.
Wood, J., Wetzel, R., Martin, J.D., and Hurle, M.R. 1995. Prolines and amyloidogenicity in fragments of the Alzheimers peptide /A4. Biochemistry
34: 724730.
Wurth, C., Guimard, N.K., and Hecht, M.H. 2002. Mutations that reduce aggregation of the Alzheimers A42 peptide: An unbiased search for the
sequence determinants of A amyloidogenesis. J. Mol. Biol. 319: 1279
1290.
Xiong, H., Buckwalter, B.L., Shieh, H.M., and Hecht, M.H. 1995. Periodicity of
polar and non-polar amino acids is the major determinant of secondary
structure in self-assembling oligomeric peptides. Proc. Natl. Acad. Sci. 92:
63496353.
Xu, G., Wang, W., Groves, J.T., and Hecht, M.H. 2001. Self-assembled monolayers from a designed combinatorial library of de novo -sheet proteins.
Proc. Natl Acad. Sci. 98: 36523657.
Yamauchi, A., Nakashima, T., Tokuriki, N., Hosokawa, M., Nogami, H., Arioka, S., Urabe, I., and Yomo, T. 2002. Evolvability of random polypeptides
through functional selection within a small library. Protein Eng. 15: 619
626.
www.proteinscience.org
1723