Folding of small disulfide-rich proteins: clarifying the puzzle

The process by which small proteins fold to their native conformations has been intensively studied over the last few decades. In this field, the particular chemistry of disulfide bond formation has facilitated the characterization of the oxidative folding of numerous small, disulfide-rich proteins with results that illustrate a high diversity of folding mechanisms, differing in the heterogeneity and disulfide pairing nativeness of their intermediates. In this review, we combine information on the folding of different protein models together with the recent structural determinations of major intermediates to provide new molecular clues in oxidative folding. Also, we turn to analyze the role of disulfide bonds in misfolding and protein aggregation and their implications in amyloidosis and conformational diseases.


Introduction
In the absence of the cellular folding machinery, small proteins spontaneously fold to their native states under suitable conditions, implying that most of the information needed to specify a protein 3-D structure is contained within the amino acid sequence. To understand protein folding, over the past 30 years much experimental work has been focused on the identification and characterization of partially folded intermediates that occur along the folding reaction [1,2]. However, this process is usually very fast therefore the commonly used spectroscopic techniques (e.g., fluorescence and circular dichroism) provide limited information about the nature of intermediates that accumulate transiently. The use of small, disulfide-rich proteins, displaying folding intermediates that can be chemically trapped in a time-course manner, purified and structurally characterized, has proved to be very valuable for folding studies [3][4][5].
A large number of protein models comprising divergent and physiologically relevant functions, i.e., protease inhibitors, proteases, nucleases, growth factors and venom toxins, have been analyzed in detail to date and show an unexpected scenario of folding diversity.
Intramolecular disulfide bonds (S-S) are important not only for the folding of secreted proteins, but also for their stability and function. In the last several years, there have been significant advances in the understanding of disulfide bond formation and rearrangement in the cell, for instance, concerning the role of protein disulfide isomerases (PDI) and their relatives (Dsb proteins) that catalyze both the formation and isomerization of disulfide bonds in the eukaryotic endoplasmic reticulum and prokaryotic periplasm, respectively (for reviews, see [6,7]). Because of the complex environment of the cell, however, more basic research is necessary to fully understand the concomitant relationship between protein folding and disulfide bond formation. Failure of these processes is likely to cause protein degradation by proteases or misfolding and subsequent aggregation (e.g., prion disorders) [8,9].
In this review, the folding behavior of the most-studied, small disulfide-rich proteins in the 3 literature is described. The determinants that account for the underlying diversity of folding landscapes and patterns of disulfide bond formation are addressed. Also, recent developments in the field, such as the structural determination of disulfide intermediates and studies of protein aggregation that may have important implications in protein folding and biomedicine are herein discussed.

Disulfide bonds and oxidative folding
The covalent link of cysteine residues by disulfide bonds is thought to serve distinct functions.
First, they significantly influence the thermodynamics of protein folding: disulfide bonds stabilize the native conformation of a protein by lowering the entropy of the unfolded state, making it less favorable, as compared with the folded form. Second, they maintain protein integrity: oxidants and proteolytic enzymes in the extracellular environment can inactivate proteins. By stabilizing their structure, disulfide bonds protect proteins from damage, thus increasing their half-life [10]. Finally, although the disulfide bonds present in mature proteins have been considered chemically inert, now in some cases they appear to be cleaved or rearranged with significant consequences for their function (see review [11]).
The native set of disulfide bonds is the end-result of an often-complicated process involving covalent reactions such as oxidation (S-S formation), reduction (S-S breaking) and isomerization/reshuffling (S-S rearrangement) [12]. The term "oxidative folding" describes the composite process by which a reduced, unfolded protein gains both its native disulfide bonds (disulfide bond generation) and its native structure (conformational folding) (for experimental details of the oxidative folding technique, see Box 1). The course of oxidative folding is affected by three structural factors that have been identified from different in vitro folding studies, namely, the proximity, reactivity and accessibility of thiol groups and disulfide bonds [13]. The proximity of two reactive groups, defined as their effective intramolecular 4 concentration, is determined by the propensity of the protein backbone chain to juxtapose two groups; in unfolded species, this proximity is largely determined by loop entropy and enthalpic interactions. The reactivity of the groups relies on the disulfide reactions that occur through a thiol-disulfide exchange in which only the thiolate group is reactive to perform the nucleophilic attack; therefore, changes in the local electrostatic environment may affect this parameter. Nevertheless, the most critical factor seems to be the accessibility of the thiol groups and disulfide bonds. Because thiol-disulfide exchange reactions occur only when a thiol and a disulfide bond come into contact, their burial prevents contact, hence blocking the reaction.
Accordingly, the formation of a stable tertiary structure is a key event in the oxidative folding of proteins because it protects native disulfide bonds from reduction and reshuffling by making them inaccessible to protein thiols and redox agents [14]. The burial of both thiol groups and disulfide bonds in a disulfide intermediate may hinder any further progress in the folding reaction. In this regard, dead-end intermediates tend to be "disulfide-insecure" in that their structural fluctuations expose their disulfide bonds in concert with their thiol groups, leading to reshuffling rather than oxidation. These intermediates are normally long-lived "metastable" species that constitute rate-limiting steps in oxidative folding. In contrast, productive intermediates (leading to the native state) tend to be "disulfide-secure", meaning that their structural fluctuations preferentially expose their thiol groups while keeping their disulfide bonds buried. This distinction helps to understand the oxidative folding of many disulfide-rich proteins included in following sections.

Oxidative folding of BPTI and hirudin: two opposed models
Extensive studies on small, disulfide-rich proteins have shown divergent mechanisms of folding that are illustrated by: (a) the extent of heterogeneity of folding intermediates, (b) the predominance of intermediates containing native disulfide bonds, and (c) the accumulation of scrambled isomers (fully oxidized species that contain at least two non-native disulfide bonds) as intermediates. On the basis of these points, bovine pancreatic trypsin inhibitor (BPTI) and hirudin represent two notable models with very different folding characteristics. The original studies on BPTI (58 residues; containing three disulfide bonds), conducted by Creighton et al. and later revised by Weissman and Kim, resulted in one of the most extensively studied models of oxidative folding [15][16][17][18]. They showed that the folding of BPTI is characterized by the predominance of a limited number of 1-and 2-disulfide intermediates (five of 75 possible) that adopt native disulfide pairings and native-like substructures ( Figure 1). These intermediates seem to funnel protein conformations toward the native state and prevent the accumulation of 3-disulfide scrambled isomers. The rate-limiting step of the process is then the conversion of 2-disulfide intermediates into the intermediate processor that rapidly forms the third and final native disulfide. Venom neurotoxins like α62 (structured in a "three-finger fold"; four disulfide bonds), the squash trypsin inhibitor EETI-II and the cyclotide protein kalata B1 (28 and 29 residues; three disulfide bonds) share similar folding with that of BPTI [19][20][21].
Likewise, the folding of the 3-disulfide insulin-like growth factor (IGF-1; 70 residues), which folds into two isomers (native and swap) with different disulfide linkages but similar thermodynamic stability, bears resemblance to that of BPTI [22]. Interestingly, this kind of folding is in line with the "framework model", which stresses the importance of local interactions in reducing conformational search and in guiding efficient protein folding through the hierarchic condensation of native-like elements.
The oxidative folding of hirudin (core domain, 49 residues; three disulfide bonds) differs from that of BPTI in the three folding features mentioned above: (a) folding intermediates are far more heterogeneous (at least 30 fractions of 1-and 2-disulfide intermediates have been identified), (b) predominant intermediates adopting native disulfides are absent, and (c) 3-disulfide scrambled isomers strongly accumulate [23,24]. Taken together, the folding of hirudin can be dissected into two stages: an initial stage of non-specific disulfide bond formation (packing) leading to the formation of scrambled species, followed by a final stage of disulfide reshuffling (consolidation) of a heterogeneous scrambled population leading to the native structure ( Figure 1). It is worth mentioning that the oxidative folding of many other small 3-disulfide proteins is similar to that of hirudin; among them, tick anticoagulant peptide (TAP; 60 residues), various venom neurotoxins and proteins stabilized by "cystine-knot" disulfide bonds such as potato carboxypeptidase inhibitor (PCI; 39 residues) and Amaranthus α-amylase inhibitor (AAI; 32 residues) [25-30]. The hirudin folding is consistent with the "collapse model" that depicts protein folding as an initial stage of rapid hydrophobic collapse, followed by slower annealing in which specific interactions refine the structure rather than dominate the folding code. Importantly, these studies have contradicted conventional wisdom, which considered scrambled isomers as abortive "off-pathway" folding intermediates. The presence of productive, "on-pathway" scrambled isomers seems to be frequent in the oxidative folding of small, disulfide-rich proteins.

Mixing folding mechanisms: the cases of EGF and LCI
The oxidative folding of epidermal growth factor (EGF) and leech carboxypeptidase inhibitor (LCI) serve to magnify the extent of diversity of disulfide folding landscapes because it displays both similarity and dissimilarity to the folding of BPTI and hirudin. EGF (53 residues; three disulfide bonds) folds through several 1-disulfide intermediates that rapidly form a single  Figure 2). Similar to EGF, the recently determined oxidative folding of LCI (67 residues; four disulfide bonds) undergoes a sequential flow through 1-and 2-disulfide intermediates that rapidly accumulate as two predominant 3-disulfide intermediates with native disulfide pairings [33,34]. These two species act as major kinetic traps of the process, which need structural rearrangements through the formation of 4-disulfide scrambled isomers to attain the native structure. Thus, EGF and LCI display an analogue folding that resembles that of BPTI by the presence of few predominant native-like intermediates. But, at the same time, it is also similar to that of hirudin due to the initial formation of heterogeneous populations of intermediates and the final rate-limiting step of conversion of scrambled isomers into the native protein.

Diversity in the oxidative folding of α-lactalbumin
Folding diversity of small, disulfide-rich proteins seems to depend crucially on the presence (or absence) of localized, stable domain structures. An outstanding example to illustrate this hypothesis is the mechanism of oxidative folding of α-lactalbumin (αLA) elucidated in the absence or presence of calcium ( Figure 3) [35,36]. αLA (122 residues; four disulfide bonds) comprises an α-helical domain and a β-sheet domain that is considerably stabilized upon binding to calcium [37]. In the absence of calcium, the oxidative folding of αLA resembles that of hirudin, proceeding through the formation of heterogeneous populations of 1-, 2-and 3-disulfide intermediates and with the final accumulation of 4-disulfide scrambled isomers.
Binding of calcium changes the folding process of αLA significantly because it thermodynamically stabilizes the β-sheet domain. Consequently, the complexity of folding intermediates diminishes drastically and the process now involves the accumulation of two predominant intermediates that adopt native disulfide-bond pairings and native-like structures of the β-sheet domain. Thus, in the presence of calcium, the folding of αLA bears close resemblance to what is observed in BPTI folding.
To provide more insight into the role played by stable-structure elements in oxidative folding, 8 the reductive unfolding process of several model proteins has been examined (for details of this technique, see Box 1) [38,39]. A striking correlation is observed between the mechanisms of oxidative folding and reductive unfolding. Those proteins with their native disulfide bonds reduced collectively in an "all-or-none" mechanism, without significant accumulation of partially reduced species, display both a high degree of heterogeneity of folding intermediates and the formation of scrambled isomers along their oxidative folding (e.g., hirudin, TAP, PCI, AAI and αLA). For these proteins, it is only at the final stage of folding (consolidation) where the attainment of the native disulfide bonds is guided by specific non-covalent interactions, which results in the cooperative and concerted stabilization of disulfide bonds [40]. In contrast, a sequential reduction of the native disulfide bonds is associated with the presence of predominant intermediates with native-like structures during folding (e.g., BPTI, EETI-II and calcium-bound αLA). The final and rate-limiting stage in the attainment of native RNase A is the formation of two intermediates, des  and des[65-72] (i.e., intermediates lacking the 40-95 and 65-72 S-S, respectively), with native disulfide-bond pairings and native-like structures. These species are formed mainly by the reshuffling of 3-disulfide intermediates, although a small fraction (up to 5%) may be formed by oxidation from the 2S ensemble. Upon formation of a stable tertiary structure, their three native disulfide bonds become protected from reduction and reshuffling ("locked in"), causing these "des species" to accumulate at high levels. However, their thiol groups remain solvent-accessible and, hence, these disulfide-secure intermediates oxidize relatively rapidly to the native protein. The protective structure of these two species is a critical factor in promoting the oxidative folding of RNase A [14]. The other two des species of the 3S ensemble, des[26-84] and des , are metastable dead-end intermediates that reshuffle preferentially to the 3S ensemble rather than oxidize to the native protein. Presumably, these two disulfide-insecure intermediates bury both their thiol groups and disulfide bonds in hydrophobic cores of a native-like structure, thus inhibiting oxidation as well as reduction and reshuffling. The specific reasons behind the accumulation of metastable intermediates have been determined recently from several structural characterizations of intermediates (see Box 2).
The effects of "locking in" native disulfide bonds are also illustrated by the BPTI, LCI and lysozyme models. Despite its small size, BPTI has two hydrophobic cores that may fold semi-independently. Specifically, the formation of the 5-55 S-S induces global folding, while the formation of the 30-51 S-S causes only one core to fold. In intermediates containing either 5-55 or 30-51 S-S, the other native S-S (14-38) forms quickly, producing the des[30-51] and des  species, respectively [45]. However, these des species appear to be disulfide-insecure intermediates, preferentially reshuffling rather than oxidizing. Therefore, the productive precursor of native BPTI is the disulfide-secure des  intermediate and the rate-limiting step of the folding process is either a reshuffling to form des  or the escape from the dead-end metastable des[30-51] and des  species. In the oxidative folding of LCI, the des  and des  species, lacking the native disulfide bond that connects the α-helix to the β-sheet and stabilizes the β-sheet core, respectively, also seem to behave as metastable, disulfide-insecure intermediates with a high content of a native-like structure [46]. Lysozyme (129 residues; four disulfide bonds) is a more complex protein that comprises two folding domains, called the αand β-domains. A heterogeneous ensemble of relatively unstructured intermediates (containing 1S and 2S) is rapidly formed from the reduced protein after initiation of folding [47][48][49]. In these early stages, and similarly to BPTI and RNase A

Disulfide bonds and aggregation
The presence of insoluble protein deposits in human tissues correlates with the development of many debilitating disorders including amyloidosis and several neurodegenerative diseases. In a number of proteins related to conformational diseases, improper disulfide bond formation may result in structural rearrangements of the polypeptide chain leading to increased aggregation.
One of the first pieces of evidence of that came from pioneering studies by Dobson et al. on lysozyme, a protein that later was shown to form amyloid fibrils in individuals suffering from non-neuropathic systemic amyloidosis [50,51]. Lysozyme aggregation takes place during the early stages of the folding reaction and is strongly prevented by the presence of PDI, which facilitates the attainment of the native state [52]. This reflects the importance of folding catalysts in physiological folding and suggests an important role avoiding aggregation of partially folded molecules in the intracellular environment.
The reduction of disulfide bonds has been shown to critically affect amyloid formation. For Bence Jones proteins, which are the major component of the amyloid fibrils in patients with systemic AL-amyloidosis, the reduction of native disulfide bonds leads to non-native protein association and formation of amyloid-like aggregates [53]. In the case of Amylin, found in islet amyloid deposits of Type II diabetes, disulfide bonds play a central role in the assembly mechanism and kinetics of fibril formation [54]. The disulfide linkage of β2-microglobulin, whose aggregation into amyloid deposits is common in patients with long-term hemodialysis, seems to protect against deposition by reducing conformational fluctuations and maintaining the global native-like topology [55]. Accordingly, disulfide bond reduction both destabilizes the native state and enhances the conformational flexibility of the polypeptide, resulting in increased formation of oligomeric structures. In addition, there is the special case of the prion protein, which causes transmissible spongiform encephalopathies. The oxidative folding of reduced, monomeric non-infective form PrP C has been shown to result in the formation of an oligomeric, protease-resistant species with a high content of β-sheet structure joined by intermolecular disulfide bonding [56,57]. More importantly, it has been demonstrated that the aberrant "scrapie" isoform (PrP Sc ) is able to convert native PrPP C into oligomers, thus proposing a mechanism for prion self-propagation ( Figure 4).
Finally, the formation of new non-native intramolecular disulfide bonds may also result in aggregation. This is the case of superoxide dismutase (SOD), a protein that has been implicated in the familial form of the neurodegenerative disease amyotrophic lateral sclerosis [58]. SOD1 has four cysteines, two linked and two free in the native structure. Formation of a new disulfide bond between the originally free cysteines serves to trap SOD1 in aggregation-prone conformations and produce accelerated protein deposition.

Concluding remarks and future perspectives
In the last few years, the results obtained from many studies on small, disulfide-rich proteins have substantially clarified the understanding of oxidative folding mechanisms. The folding of proteins like hirudin, PCI or AAI has shown the interdependence between conformational folding and assembly of native disulfide bonds. Because of the large heterogeneity of species at the start of the folding process, disulfide bonds are necessary to restrict the search in conformational space by cross-linking the protein in its unfolded state. In the cell this is most likely achieved with the help of folding catalysts, for instance, PDI, which may reduce and reshuffle non-native disulfide bonds at the early stages of folding to avoid misfolding.
Other works have highlighted the critical role of accessibility of both disulfide bonds and thiol groups, as observed in the oxidative folding of RNase A, BPTI or LCI. For these proteins, the critical step seems to be the formation of a stable tertiary structure that sequesters the native disulfide bonds (preventing subsequent rearrangements) but leaves the thiol groups of intermediates relatively exposed (or exposable) for subsequent oxidation. The "locking in" of native disulfide bonds by conformational folding could also assist the oxidative folding of larger disulfide-rich proteins. For such proteins, the absence of either structured intermediates or a strong native bias would hinder the folding reaction significantly. However, the independent folding of (sub)domains could help in overcoming this entropic barrier by

Box 2. Structures of intermediates: shedding light on protein folding
Although some intermediates have been directly purified from the folding reaction (see below), most studies have taken advantage of the construction of analogues in which the free cysteine residues are replaced with alanine or serine residues. The first intermediate analogues to be analyzed were those of BPTI. NMR studies on analogues   However, small structural differences spatially adjacent to the mutation sites accounted for lower stability compared to the native protein ( Figure I). An analogue of the MFI scrambled isomer of AAI was recently characterized by NMR, providing clues on the role of structural constraints in directing the folding process: the compact fold brings the cysteine residues into close proximity, thus facilitating reshuffling to native disulfide bonds [67]. Also, scrambled isomers of the peptide α-conotoxin GI were synthesized and characterized with implications for its structure and stability [68].
"Real" intermediates of other small, disulfide-rich proteins (with reactive cysteines, unlike the analogues) have been trapped, isolated by RP-HPLC and further characterized by NMR. The Besides the accessibility of thiol groups, other factors like proximity may affect the formation 30 of disulfide bonds. Thus, for kalata B1, the recently determined structure of its major intermediate, des [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15], revealed a native-like structure that maintains the two cysteine residues distant from each other, thereby preventing direct oxidation to the final disulfide bond [19]. The overall results presented here emphasize the necessity of high-resolution structure determinations of real intermediates to fully understand disulfide folding. folding intermediates des  and des .