HOME    »    SCIENTIFIC RESOURCES    »    Volumes
SCIENTIFIC RESOURCES

Abstracts and Talk Materials
Protein Folding
January 14 - 18, 2008

Using Motion Planning to Study Molecular Motions
January 15, 2008

Protein motions, ranging from molecular flexibility to large-scale conformational change, play an essential role in many biochemical processes. For example, some devastating diseases such as Alzheimer's and bovine spongiform encephalopathy (Mad Cow) are associated with the misfolding of proteins. Despite the explosion in our knowledge of structural and functional data, our understanding of protein movement is still very limited because it is difficult to measure experimentally and computationally expensive to simulate. In this talk we describe a method we have developed for modeling protein motions that is based on probabilistic roadmap methods (PRM) for motion planning. Our technique yields an approximate map of a protein's potential energy landscape and can be used to generate transitional motions of a protein to the native state from unstructured conformations or between specified conformations. We also describe new analysis tools that enable us to extract kinetics information, such as folding rates or to identify and study the folding core. For example, we show how our map-based tools for modeling and analyzing folding landscapes can capture subtle folding differences between protein G and its mutants, NuG1 and NuG2. More information regarding our work, including an archive of protein motions generated with our technique, are available from our protein folding server: http://parasol.tamu.edu/foldingserver/

Chemical reaction systems with a low to moderate number of molecules are typically modeled as continuous time Markov chains. More explicitly, the state of the system is modeled as a vector giving the number of molecules of each species present with each reaction modeled as a possible transition for the state. The model for the kth reaction is determined by a vector of inputs specifying the number of molecules of each chemical species that are consumed in the reaction, a vector of outputs specifying the number of molecules of each species that are created in the reaction and a function of the state that gives the rate at which the reaction occurs. To understand how the probability distribution of the system changes in time one could attempt to solve the Chemical Master Equation (CME), however this is typically an extremely difficult task. Therefore, simulation methods such as the Stochastic Simulation Algorithm (Gillespie Algorithm) and tau-leaping have been developed so as to approximate the probability distribution of the system via Monte Carlo methods. I will demonstrate how using a random time change representation for these models leads naturally to simulation methods that achieve greater efficiency and stability than existing methods.

Due to the inherent complexity of the associated problems, investigations of the basic principles of protein folding and evolution are usually restricted to simplified protein models.

Our group has developed methods and programs for exact and complete solving of problems typical for studies using HP-like 3D lattice protein models.

Addressed tasks are the prediction of globally optimal and listing of suboptimal structures, sequence design, neutral network exploration, and degeneracy computation. The used methods are based on fast and non-heuristic techniques (constraint programming) instead of following stochastic approaches, which are not capable of answering many fundamental questions. Thus, we are able to find optimal structure for HP-sequences of length greater than 200, including a proof of optimality. We have used these methods to find unique folding sequences, to investigate neutral nets and to design low-degenerated sequences for given structures.

Two groups of studies recently proved to provide insights into such intrinsic, structure-induced effects: elastic network models that permit us to visualize the cooperative changes in conformation that are most readily accessible near native state conditions, and information-theoretic approaches that elucidate the most efficient pathways of signal transmission favored by the overall architecture. Using a combination of these two approaches, we highlight, by way of application to the bacterial chaperonin complex GroEL-GroES, how the most cooperative modes of motion play a role in mediating the propagation of allosteric signals. A functional coupling between the global dynamics sampled under equilibrium conditions and the signal transduction pathways inherently favored by network topology appears to control allosteric effects.

Continuum electrostatics methods have become increasingly popular due to their ability to provide approximate descriptions of solvation energies and forces without expensive sampling required by explicit solvent models. In particular, the Poisson-Boltzmann equation (PBE) provides electrostatic potentials, solvation energies, and forces by modeling the solvent as a featureless, dielectric material, and the mobile ions as a continuous distribution of charge. In this talk, I will provide a review of PBE-based and new apolar continuum solvation methods as well as approaches for assessing their performance by comparison with explicit solvent simulations. In particular, I will focus on the ability of these continuum solvent models to describe solvation forces on proteins and nucleic acids and will comment on strengths and weaknesses of these implicit solvent approaches.

Many small single-domain proteins undergo cooperative, switch-like folding/unfolding transitions with very low populations of intermediate, i.e., partially folded, conformations. The phenomenon of cooperative folding is not readily accounted for by common notions about driving forces for folding. I will discuss how common protein chain models with pairwise additive interactions are insufficient to account for the folding cooperativity of natural proteins, and how models with nonadditive local-nonlocal coupling may rationalize cooperative folding rates that are well correlated with native topology. The traditional formulation of folding transition states entails a macroscopic folding free energy barrier with both enthalpic and entropic components. I will explore the microscopic origins of these thermodynamic signatures in terms of conformational entropy as well as desolvation (dewetting) effects. Notably, the existence of significant enthalpic folding barriers raises fundamental questions about the validity of the funnel picture of protein folding, because such enthalpic barriers appear to imply that there are substantial uphill moves along a microscopic folding trajectory. Using results from extensive atomic simulations, I will show how the paradox can be resolved by a dramatic entropy-enthalpy compensation at the rate-limiting step of folding. In this perspective, the height of the enthalpic barrier is seen as related to the degree of cooperativity of the folding process.

Multivalent ions (Mg2+) in RNA tertiary structure folding can be strongly correlated and thus cannot be treated by mean-field theories such as the Poisson-Boltzmann equation. We recently developed a statistical mechanical model (TBI) to account for ion correlation by considering ensemble of discrete ion distributions. Experimental tests show that the TBI model gives improved predictions for nucleic folding folding stability over the Poisson-Boltzmann equation, which generally underestimates the (multivalent) ion-dependent folding stability due to ignoring the ion correlation. Using the TBI theory, we investigate the folding energy landscape for a simple system with loop-tethered short DNA helices and find that Na+ and Mg2+ play contrasting roles in helix–helix assembly. High [Na+] (>0.3 M) causes a reduced helix–helix electrostatic repulsion and a subsequent disordered packing of helices, while Mg2+ of concentration > 1 mM is predicted to induce a more compact and ordered helix–helix packing. Mg2+ is much more efficient in causing nucleic acid compaction and is predicted to induce a collapse transition around 1mM of [Mg2+].

Lead generation is a major hurdle in small-molecule drug discovery, with an estimated 60% of projects failing from lack of lead matter or difficulty in optimizing leads for drug-like properties. It would be valuable to identify these less-druggable targets before incurring substantial expenditure and effort. We discovered that a model-based approach using basic biophysical principles yields good prediction of druggability based solely on the crystal structure of the target binding site. We quantitatively estimate the maximal affinity achievable by a drug-like molecule, and we show that these calculated values correlate with drug discovery outcomes. We experimentally test two predictions using high-throughput screening of a diverse compound collection. The collective results highlight the utility of our approach as well as strategies for tacking difficult targets. I will also discuss our approach to calculating protein curvature and some potential computational approaches for difficult targets.

This talk (and a related poster) describes Lie-group-theoretic techniques that can be applied in the analysis and modeling of protein conformations. Three topics are covered: (1) Conformational transitions between two known end states; (2) proper normalization of helix-helix crossing angle data in the PDB; (3) models of the conformational entropy of the ensemble of unfolded polypeptide conformations. Using the concept of convolution on the group of rigid-body motions, the probability density of position and orientation of the distal end of a polypeptide chain is obtained by convolving the distributions for shorter segments that make up the chain. This methodology can also be used in the analysis of loop entropy in folded proteins as well as the ensemble of unfolded conformations.

The geometrical problem of protein folding, especially in its later stages, is composed of two types of freedom, the full torsional flexibility of loops connecting nearly rigid structural pieces (helices, beta-sheets etc), and the relative placing of such pieces. We present a method for sampling the feasible conformations of protein loops, based on Triaxial Loop Closure (TLC), a simple and highly efficient inverse kinematic (IK) method for solving the loop closure problem. TLC is easily extended to incorporate additional (i.e. position, orientation) constraints, or more general geometrical conditions. Due to its relative simplicity TLC compares favorably to more general IK robotics algorithms, both in robustness and in speed. We consider two applications: (i) An algorithm for the rapid sampling of the conformations of protein loops including three or more residues which uses quasirandom Sobol sampling of the Ramachandran regions. Ideas akin to Delauney triangulation may be employed to ensure sampling loop shapespace at a desired density. (ii) An efficient method for the sequential assembly of helical proteins via maximal hydrophobic packing. The geometrical problem of considering all possible mutual arrangements of a system of helices that are compatible with closing the corresponding loops is already too large to sample directly. We introduced a measure of hydrophobic packing by seeking to minimize the radius of gyration of the hydrophobic residues. Thus, we sequentially assemble the helices, by sampling relative orientations of pairs of them that bring specified hydrophobic residues in proximity. For the best candidates, in terms of energy and hydrophobig radius of gyration, the loops are closed using the algorithm in (i) and another helix is added to the assembly, always seeking to maximizing hydrophobic contact. We tested this iterative assembly method on 26 helical proteins each containing up to 5 helices. The method heavily samples native-like conformations. The average RMSD-to-native of the best conformations for the 18 helix bundle proteins that have 2 or 3 helices is less than 2 Angstroms with slightly worse errors for proteins containing more helices.

Sequence-structure relationships in proteins are highly asymmetric since many sequences fold into relatively few structures. What is the number of sequences that fold into a particular protein structure? Is it possible to switch between stable protein folds by point mutations? To address these questions we compute a directed graph of sequences and structures of proteins, which is based on experimentally determined protein shapes. Two thousand and sixty experimental structures from the Protein Data Bank were considered, providing a good coverage of fold families. The graph is computed using an energy function that measures stability of a sequence in a fold. A node in the graph is an experimental structure (and the computationally matching sequences). A directed and weighted edge between nodes A and B is the number of sequences of A that switch to B because the energy of B is lower. The directed graph is highly connected at native energies with ³sinks² that attract many sequences from other folds. The sinks are rich in beta sheets. The in-degrees of a particular protein shape correlates with the number of sequences that matches this shape in empirically determined genomes. Properties of strongly connected components of the graph are correlated with protein length and secondary structure. Joint work with Leonid Meyerguz and Jon Kleinberg

Joint work with S. R. McAllister.

The protein folding question has developed over the past four decades as one of the most challenging and potentially rewarding problems in computational biology. Three general classes of algorithms have emerged, based on the techniques of comparative modeling, fold recognition, and first principles methods. For a detailed summary of protein structure prediction methods, the reader is directed to two recent reviews [1,2]. Within the field of protein structure prediction, the packing of alpha-helices has been one of the more difficult problems. The use of distance constraints and topology predictions can be highly useful for reducing the conformational space that must be searched by deterministic algorithms to find a protein structure of minimum conformational energy.

We present a novel first principles framework to predict the structure of alpha-helical proteins. Given the location of the alpha-helical regions, a mixed-integer linear optimization model maximizes the interhelical residue contact probabilities to generate distance restraints between alpha-helices [3]. Two levels of this formulation allow the prediction of both primary'' contacts between a helical pair as well as the prediction of wheel'' contacts, one helical turn beyond the primary contacts. These predictions are subject to a number of mathematical constraints to disallow sets of contacts that cannot be achieved by a folded protein. The interhelical contact prediction for alpha-helical proteins was evaluated on 26 proteins, where it identified an average contact distance below 11.0 Angstroms for the entire set.

A related optimization-based approach is proposed for the prediction of alpha-helical contacts in mixed alpha/beta proteins [4]. This contact prediction is based on the maximization of the number and hydrophobicity of hydrophobic interactions. The allowable sets of contacts is restricted based on knowledge or prediction of the beta-sheet topology and a number of distance geometry rules and constraints. The interhelical contact prediction for alpha/beta proteins was evaluated on 12 test proteins, where it identified an average contact distance below 11.0 Angstroms for 11 of these proteins.

The distance restraints from the interhelical contacts are then used to restrict the feasible space of the protein during the prediction of the tertiary structure using a hybrid optimization algorithm [5,6]. This tertiary structure prediction approach combines torsion angle dynamics and rotamer optimization with a deterministic global optimization technique (alphaBB) and a stochastic optimization technique (conformational space annealing) to minimize a detailed atomistic-level energy function. The tertiary structure prediction results are promising and are highlighted by the exciting, near-native blind prediction of a de novo designed 4-helix bundle protein.

[1] Floudas CA, Fung HK, McAllister SR, Monningmann M, and Rajgaria R. Advances in Protein Structure Prediction and De Novo Protein Design: A Review. Chem Eng Sci. 2006;61: 966-988.

[2] Floudas CA. Computational Methods in Protein Structure Prediction. Biotechnol Bioeng. 2007;97:207-213.

[3] McAllister SR, Mickus BE, Klepeis JL, and Floudas CA. A Novel Approach for Alpha-Helical Topology Prediction in Globular Proteins: Generation of Interhelical Restraints. Prot Struct Funct Bioinf. 2006;65:930-952.

[4] McAllister SR and Floudas CA. Alpha-helical Residue Contact Prediction in Mixed Alpha/Beta Proteins Using Mixed-Integer Linear Programming. In preparation, 2007.

[5] Klepeis JL and Floudas CA. ASTRO-FOLD: A Combinatorial and Global Optimization Framework for Ab Initio Prediction of Three-dimensional Structures of Proteins from the Amino Acid Sequence. Biophys J, 2003;85:2119-2146.

[6] McAllister SR and Floudas CA. An Improved Hybrid Global Optimization Method for Protein Tertiary Structure Prediction. In preparation, 2007.

Coarse master equations and diffusion models provide powerful tools to study the equilibrium and non-equilibrium properties of molecular systems. Maximum likelihood and Bayesian approaches have been used successfully to construct such models from the observed dynamics projected onto discrete and continuous low-dimensional sub-spaces. By using a Green's-function based formalism, issues arising from fast non-Markovian dynamics can be circumvented. The general formalism for the construction of coarse master equations and diffusion models will be illustrated with applications to peptide and protein folding.

While equilibrium free energy differences can be obtained from simulations of non-equilibrium systems, such estimates are often hampered by convergence difficulties similar to those that plague traditional perturbation methods. In the nonequilibrium context, dissipation is the culprit behind poor convergence. I will discuss a general strategy for addressing these difficulties, in which dissipation is reduced by adding non-physical terms to the equations of motion. When these terms are chosen appropriately, the efficiency of the free energy estimate can improve dramatically. After sketching the general features of this strategy, I will illustrate its application using specific examples, and will discuss its relation to other strategies that are similar in spirit.

Are protein motions limited because of a higher level of cooperativity than indicated by usual potentials?

Recently derived four-body coarse-grained potentials show improved performance in threading over pairwise potentials. Their apparently increased extent of cooperativity is consistent with the high level of control of motions manifested in the elastic network model computations. The elastic network models are providing strong evidence that proteins control their functional motions through their most important slowest domain motions. A major strength of these models appears to be their ability to represent the structures as highly cohesive rubbery materials, and much evidence supporting them has now accumulated. Such models exhibit strong control over their motions, arising principally from the shape, sometimes even including control of the motions of surface loops by domain motions and the motion of reactive atoms at enzyme active sites. These highly coordinated atom motions may be relevant to enzyme mechanisms. There is accumulating evidence that the behavior of protein machines can be understood with these models, and the important large domain motions can be obtained readily. For the ribosome, the results clearly indicate that its motions relate strongly to many aspects of its function. Already we have seen that the large ribosomal ratchet motion simultaneously causes the t-RNAs and mRNA to translate in the processing direction. The control of the mRNA at the anti-codon binding site is extremely strong, to ensure fidelity of copying, with the mRNA being moved translationally as a fully rigid body, with no internal motions.

I will present and discuss a number of computational experiments associated with the coarse-graining of atomistic/agent-based simulations. In particular, I will discuss coarse reverse integration, as well as the use of diffusion maps (a manifold-learning technique) to suggest the selection of certain coarse-grained observables ("reduction coordinates") for the atomistic simulations. The illustrations will come from molecular dynamics, Monte Carlo as well as agent-based models.

Topological Analysis of DNA-binding Protein Complexes
December 31, 1969

Difference topology is a methodology to derive the number of DNA crossings trapped in an unknown protein complex. By this method, Pathania, Jayaram, and Harshey revealed the topological structure within the Mu protein complex which consisted of three DNA segments containing five nodes [1]. In their experiments, they used a member of the site-specific recombinases which is known as Cre. Cre mediates DNA exchange by rearranging target sites of the DNA segments. During this DNA recombination, there are no extra DNA crossings introduced. The initial DNA conformation is unknotted. After Cre recombination, the products are knots or catenanes. Recently, Darcy, Luecke, and Vazquez analyzed these experimental results and proved that the five-noded conformation is the only biologically reasonable structure of the Mu protein DNA complex [2]. We address the possibility of protein complexes that binds four DNA segments. By the useful property of Cre, we can make the assumption that after Cre recombination, the topology of a DNA-protein complex would be a knot or catenane. The latest results of the topological tangle model for this case and very basic biological and mathematical backgrounds will be discussed.

Reference:

[1] S. Pathania, M. Jayaram, and R. Harshey, Path of DNA within the Mu transpososome: Transposase interaction bridging two Mu ends and the enhancer trap five DNA supercoils, Cell 109 (2002), 425-436.

[2] I. K. Darcy, J. Luecke, and M. Vazquez, A tangle analysis of the Mu transpososome protein complex which binds three DNA segments, Preprint.

Efficient exploration of the configuration space of a protein is essential for its structure prediction. In this talk we will consider two recent Monte Carlo developments for such a task: (i) equi-energy (EE) sampler and (ii) fragment regrowth via energy-guided sequential sampling (FRESS). The EE sampler provides accurate estimation of the density of states of the energy landscape, which then allows detailed study of the thermodynamics of lattice protein folding. The FRESS algorithm provides an efficient means to sequentially simulate a protein structure. As an illustration we will consider 2D and 3D HP models. For the benchmark sequences, we not only found new lower energies for all the 3D sequences longer than 80 residues with little computing effort, but also were able to accurately estimate the density of states that characterizes the global energy landscape.

I discuss two ongoing research programs concerning stochastic microphysical models in biology. First, in joint work with Grigorios Pavliotis (Imperial College) and Juan Latorre, we apply the methods of homogenization theory to compute transport properties for mathematical models (Brownian motors) of molecular motors in cells. The objective is to ascertain how the design properties of the motor affect its function. Secondly, in joint work with Shekhar Garde and Adnan Khan, we explore a simple stochastic model for the behavior of water molecules near a solute surface which has the potential for improving substantially upon Brownian dynamics models more conventionally used in engineering applications. We use exactly solvable mathematical models as a testbed for addressing some basic data-driven parameterization issues.

We report on replica-exchange simulations of the folding of a 21-residue alpha-helical peptide in explicit solvent. Using eight replicas over a 280-450 K temperature range, we were able to simulate both the folding equilibrium and the helix formation process. While the melting temperature was exaggerated by about 50 K, the folding enthalpy and entropy were in good agreement with experimental data. Simulations of the helix formation process showed a sequential mechanism, with helical structures formed within ca. 3 ns of simulation, and confirmed that the alpha-helical state was a global free energy minimum of the peptide at low temperatures. We also present long-term conventional MD simulations of several simple peptide model systems for which we compare simulation results to experimental data, in order to test current peptide and water models.

RNA tertiary motifs play an important role in RNA folding. To understand the complex organization of RNA tertiary interactions, we compiled a dataset containing 54 high-resolution RNA crystal structures. Seven RNA tertiary motifs (coaxial helix, A-minor, ribose zipper, pseudoknot, kissing hairpin, tRNA D-loop:T-loop and tetraloop-tetraloop receptor) were searched by different computer programs. For the non-redundant RNA dataset, 601 RNA tertiary interactions were found. Most of these 3D interactions occur in the 16S and 23S rRNAs. Exhaustive search of these motifs revealed diversity of A-minor interactions, and other loop-loop receptor interactions similar to the tetraloop-tetraloop receptor. Correlation between motifs, such as pseudoknot or coaxial helix with A-minor, shows that they can form composite motifs. These findings may lead to tertiary structure constraints for RNA 3D prediction.

Can genomics provide a new level of strategic intelligence about rapidly evolving pathogens? We have developed a new approach to measure the rates of all possible evolutionary pathways in a genome, using conditional Ka/Ks to estimate their “evolutionary velocity”, and have applied this to several datasets, including clinical sequencing of 50,000 HIV-1 samples. Conditional Ka/Ks predicts the preferred order and relative rates of competing evolutionary pathways. We recently tested this approach using independent data generously provided by Shafer and coworkers (Stanford HIV Database), in which multiple samples collected at different times from each patient make it possible to track which mutations occurred first during this time-course. Out of 35 such mutation pairs in protease and RT, conditional Ka/Ks correctly predicted the experimentally observed order in 28 cases (p=0.00025). Conditional Ka/Ks data reveal specific accessory mutations that greatly accelerate the evolution of multi-drug resistance. Our analysis was highly reproducible in four independent datasets, and can decipher a pathogen’s evolutionary pathways to multi-drug resistance even while such mutants are still rare. Analysis of samples from untreated patients shows that these rapid evolutionary pathways are specifically associated with drug treatment, and vanish in its absence.

Replica exchange (RE) is a generalized ensemble simulation method for accelerating the exploration of free-energy landscapes which define many challenging problems in computational biophysics, including protein folding and binding. Although replica exchange is a parallel simulation technique whose implementation is relatively straightforward, kinetics and the approach to equilibrium in the replica exchange ensemble are complicated; there is much to learn about how to best employ RE to protein folding and binding problems. Protein folding rates often slow down as the temperature is raised above a critical value and this “anti-Arrhenius” behavior represents a challenge for RE. However, it is far from straightforward to systematically explore the impact of this on RE by brute force molecular simulations, since RE simulations of protein folding are very difficult to converge. In studies over the past two years using both atomistic and simplified models, we have clarified some of the obstacles to obtaining converged thermodynamic information from RE simulations. In my talk I will describe some simple continuous and discrete models we have constructed to explore the behavior of replica exchange sampling under a variety of conditions.

References

Andrec, M., A.K. Felts, E. Gallicchio, and R.M. Levy. Protein folding pathways from replica exchange simulations and a kinetic network model. Proceedings Natl. Acad. Sci. USA, 102, 6801-6806 (2005).

K.P. Ravindranathan, E. Gallicchio, R.A. Friesner, A.E. McDermott, and R.M. Levy. Conformational equilibrium of a cytochrome P450 – substrate complex, a replica exchange MD study. J. Am. Chem. Soc., 128, 5786-5791 (2006)

Zheng, W., M. Andrec, E. Gallicchio, and R.M. Levy. Simulating replica exchange simulations of protein folding with a kinetic network model. Proceedings Natl. Acad. Sci. USA, 104, 15340-15345 (2007).

Zheng, W., M. Andrec, E. Gallicchio, and R.M. Levy. Simple continuous and discrete models for simulating replica exchange simulations of protein folding. J. Phys. Chem., in press.

For a protein molecule in vacuo, the net overall rotation due to flexibility is expressed in internal coordinates by using Eckart's decomposition of the total rotational angular momentum. Regardless of whether the total (rotational) angular momentum vanishes, the condition for zero overall rotation is zero orbital angular momentum. Previous approaches toward the elimination of overall rotation included (i) using normal modes, (ii) minimizing the root-mean-squared deviation (RMSD through finite rotations with respect to an initial configuration, and (iii) setting the total angular momentum to zero. These three approaches neglected the contribution of nonzero internal angular momentum. While this approach [1, 2] is motivated by results in geometric mechanics [3], the results agree with an experimental observation of a rotation of 20 degrees in triatomic photodissociation [4] and a computational observation of an overall rotation of 42 degrees in the dynamics of protein molecules [5].

References: [1] F. J. Lin, Hamiltonian dynamics of atom-diatomic molecule complexes and collisions, Discrete and Continuous Dynamical Systems, Supplement 2007, 655 – 666 (2007). [2] F. J. Lin, Separation of overall rotation and internal motion in the N-body dynamics of protein molecules, 2007a. [3] J. E. Marsden, R. Montgomery, and T. Ratiu, Reduction, symmetry, and phases in mechanics, Memoirs of the American Mathematical Society, Vol. 88, No. 436, American Mathematical Society, Providence, RI, 1990. [4] A. V. Demyanenko, V. Dribinski, H. Reisler, H. Meyer, and C. X. W. Qian, Quantum-product state-dependent anisotropies in photoinitiated unimolecular decomposition, Journal of Chemical Physics 111, 7383 – 7396 (1999). [5] Y. Zhou, M. Cook and M. Karplus, Protein motions at zero-total angular momentum: The importance of long-range correlations, Biophysical Journal 79, 2902 – 2908 (2000).

Recent progress in obtaining docked protein complexes will be discussed. The combination of exhaustive search, clustering and localized global optimization can reliably find energy minima to highly nonconvex biomolecular energy functions. Using an energy function that adds desolvation and screened electrostatics to classical molecular mechanics potentials, the global minimum is found very near to the observed native state. This is demonstrated across a large number of benchmark examples.

Simulating Protein Conformations by a Geometric Model
December 31, 1969

Protein molecules are usually studied through mathematical models which consider the physicochemical forces among the atoms forming the molecule. Recently, however, the interest on the geometric properties of protein conformations has been growing, and models mainly based on such properties have been developed. In this work, a global optimization problem is formulated for simulating protein conformations, that is based on the so-called "tube model", in which a protein is modeled as a thickened tube [1,2]. The optimization problem is solved by the meta-heuristic method Monkey Search [3], which is inspired by the behavior of a monkey climbing trees in its search for food. Computational experiences proved that conformations having the typical geometric properties of proteins can be generated and that some of them can be close to conformations that proteins in Nature actually have.

[1] J.R. Banavar, A. Maritan, C. Micheletti and A. Trovato, "Geometry and Physics of Proteins", Proteins: Structure, Function, and Genetics 47 (3): 315–322, 2002

[2] G. Ceci, A. Mucherino, M. D’Apuzzo, D. di Serafino, S. Costantini, A. Facchiano, G. Colonna, "Computational Methods for Protein Fold Prediction: an Ab-Initio Topological Approach", Data Mining in Biomedicine, Springer Optimization and Its Applications, Panos Pardalos et al (Eds.), vol.7, 2007

[3] A. Mucherino and O. Seref, "Monkey Search: A Novel Meta-Heuristic Search for Global Optimization", AIP Conference Proceedings 953, Data Mining, System Analysis and Optimization in Biomedicine, 162–173, 2007

Globally the energy landscape of a folding protein resembles a partially rough funnel. The local roughness of the funnel reflects transient trapping of the protein configurations in local free energy minima. The overall funnel shape of the landscape, superimposed on this roughness, arises because the interactions present in the native structure of natural proteins conflict with each other much less than expected if there were no constraints of evolutionary design to achieve reliable and relatively fast folding (minimal energetic frustration).

A consequence of minimizing energetic frustration is that the topology of the native fold also plays a major role in the folding mechanism. Some folding motifs are easier to design than others suggesting the possibility that evolution not only selected sequences with sufficiently small energetic frustration but also selected more easily designable native structures. The overall structure of the on-route and off-route (traps) intermediates for the folding of more complex proteins is also strongly influenced by topology. In this context, folding of larger and more complex proteins will be discussed.

Many cellular functions rely on interactions among proteins and between proteins and nucleic acids. The limited success of binding predictions may suggest that the physical and chemical principles of protein binding have to be revisited to correctly capture the essence of protein recognition. Going beyond folding, the power of reduced models to study the physics of protein assembly will be discussed. Since energetic frustration is sufficiently small, native topology-based models, which correspond to perfectly unfrustrated energy landscapes, have shown that binding mechanisms are robust and governed primarily by the protein’s native topology. These models impressively capture many of the binding characteristics found in experiments and highlight the fundamental role of flexibility in binding. Deciphering and quantifying the key ingredients for biological self-assembly is invaluable to reading out genomic sequences and understanding cellular interaction networks. Going even beyond binding, we will be discussing the energy landscape for the molecular motor kinesin.

Biological networks that arise in signal transduction, metabolism, gene control or other cellular functions frequently involve many steps and many levels of control, but are remarkably reliable in producing the desired output in response to inputs. In this talk we will discuss general characteristics such as sensitivity and adaptation of networks, give several examples that illustrate how robustness is achieved, and discuss the mathematical techniques that can contribute to understanding complex networks.

Protein Folding by ZAM & FRODA
December 31, 1969

Protein folding by ZAM & FRODA

Protein folding problem’ stemming from Levinthal’s paradox (i.e. how proteins fold fast even though they have vast conformational space) has been answered by the zipping and assembly mechanism (ZA). According to ZA mechanism, an unfolded chain first explores locally favorable structures at multiple independent positions along the chain. Then, these local structures engage neighboring amino acids in the chain sequence to form additional contacts, growing individual local structures by zipping or assembling. Using ZA principle, an all-atom structure prediction method has been developed, called zipping and assembly method (ZAM). ZAM has successfully predicted protein structures of small single domain proteins. It uses replica exchange molecular dynamics for conformational sampling which creates a bottleneck in the assembly stage. We modify ZAM assembly stage by introducing FRODA which is a Monte Carlo based geometric simulation. Since FRODA can explore the large-amplitude motions of larger systems so much faster than molecular dynamics, we can speed up the assembly stage and generate the complete enumeration of all topologies quickly. The results show that the native structure of proteins can be sampled during the FRODA-assembly stage within a RMSD

Computing Conformational Free Energy by Deactivated Morphing
December 31, 1969

"What is the free energy difference between two different conformations of a protein?" This simple question is apparently not so simple to answer. Despite the significant advances in free energy computations, there has been relatively little success in computing conformational free energies. To compute conformational free energy differences, we need a transformation path that connects different conformations. The free energy difference is the same no matter what path is taken, but not all paths are equally useful. Finding a path that allows an efficient computation of free energy is a crucial step.

Tremendous recent efforts to find physical paths of conformational changes have motivated us to take a stab at using nonphysical paths. This poster introduces a method we call 'deactivated morphing' and presents applications to two test systems: alanine dipeptide (AlaD) and deca-alanine (Ala10), both in explicit water. In this method, the internal interaction of a protein is completely turned off before a transformation is carried out along a nonphysical path.

The replica exchange/parallel tempering method and its variations offer the hope of improved sampling for many challenging problems in molecular simulation. Like all sampling methods, however, the effectiveness of replica exchange is highly dependent on the specific physical system being studied. We have carried out large scale temperature replica exchange (T-REMD) simulations of peptides and proteins in explicit solvent and encountered a number of issues that limit the effectiveness of the method in sampling biomolecular conformations. In particular, our results suggest the need to either replace or augment the temperature variable with an alternative extended variable, such as Hamiltonian scaling.

The task of molecular biology is to identify and define cellular states and functions in terms of molecular structures, dynamics, and chemical reactions. At macromolecular level, functional states and dynamics of individual proteins and enzymes are determined by Newton's Law and/or statistical thermodynamics. Computational approach thus follows molecular dynamic simulations and statistical thermodynamics. At cellular level, "the structures" of biochemical reaction systems, i.e., metabolic networks, genetic regulatory modules and signaling pathways, have been the central focus of current reserch. We introduce the chemical master equation (CME) as the theoretical foundation of their dynamics. Dynamic models based on the CME represent open chemical systems that follow nonequilibrium statistical thermodynamics - a key ingredient for living cell but not in usual macromolecular models. The CME supersedes the traditional deterministic models based on the law of mass action; it provides reaction kinetics and concentration (or copy number) fluctuations; and it allows a rigorous definition of "cellular state(s)" in terms of the concentrations and copy numbers of biomolecules in a biochemical reaction system in open chemical environment. We discuss this computational approach to cellular biochemistry, its relation to open-system thermodynamics. In particular we outline its similarities to and distinctions from computational macromolecular dynamics.

In many cases the native structures of proteins encode information about their folding pathways. The degree to which this is true may be related to the similarity of various structural solutions, i.e. multiple NMR structures or independently solved X-ray structures. We explore both the robustness of the native state and its impact upon putative folding pathways for these structures by examining simulated thermal denaturation pathways for ensembles of “native” structures. Previous rigidity analysis of proteins using the FIRST software[1] has demonstrated that such simulated unfolding results correlate well with experimental hydrogen-deuterium exchange data[2] and mutational results[3]. We introduce an unfolding rigidity profile to characterize unfolding pathways and indicate which protein residues are most likely to adopt native-like conformations. This rigidity profile is also used to discriminate conformational sub-states of the protein native state which are the results of different folding pathways.

[1] D. J. Jacobs, A.J. Rader, Leslie A. Kuhn, and M.F. Thorpe Protein Flexibility Prediction Using Graph Theory. Proteins, 44, 150-165, 2001.

[2] B.M. Hespenheide, A.J. Rader, M.F. Thorpe, and L.K. Kuhn J. Molec. Graph. & Model., 21, 195-207, 2002.

[3] A.J. Rader, G. Andersen, B. Isin, H.G. Khorana, I. Bahar and J. Klein-Seetharaman Identification of Core Amino Acids Stabilizing Rhodopsin. Proc. Natl. Acad. Sci., 101, 7246-7251, 2004.

We consider the protein folding problem as a global optimization problem on the free energy surface of the all-atom OPLS force field. Starting from amino acid sequences arranged in a linear chain configuration we use the Minima Hopping algorithm to find low energy configurations . The algorithm samples the potential energy surface following the Bell-Evans-Polanyi principle. In this way our moves are completely unbiased but have nevertheless a strong tendency to lead into other low energy configurations. Some small peptides like polyalanine were studied in vacuo. For an Ac(Ala)nLysH+ in vacuum we obtained a helical conformation while other (Ala)nH+ systems are found to be not helical.

Investigation of the Unfolding Pathway of Cyt2Aa2 Toxin
December 31, 1969

The expressed 29-kDa Cyt2Aa2 protoxin is produced from Bacillus thuringiensis subsp. darmstadiensis during sporulation. This toxin is proteolytically processed into the 25-kDa active form which is lethal to Dipteran (Stegomyia and Culex) larvae. It has been proposed that the mechanism of action required a conformational change during the interaction with lipid membrane. We have demonstrated previously that the toxin can adopt a stable intermediate state between the transitions from native to unfolded state. In this study, we aim to investigate the conformational state of each secondary structure elements of the toxin in intermediate state. The Φ values analysis was employed as a tool to reveal the conformation state on various amino acid positions of toxin. Non-disruptive mutant toxins were designed and constructed as conformational probes using site-directed mutagenesis. The effect of mutation at different position is characterized in terms of protein expression level, solubility, mosquito-larvicidal toxicity and hemolytic activity compared to those of wild type. The results showed that an intermediate state of this toxin was found in the aggregation/oligomerization form. Transitional free energy and activation energy of these toxins were obtained and derived for the Φ values. The data revealed helices A, B and C of the intermediate are quite deviated from the native state while helix D is still maintained similar to native state conformation. The relocation of these helices was proposed to be related to the conformational changes and contributes to the functional mechanism of this toxin.

Protein design opens new ways to probe the determinants of folding, to facilitate the study of proteins, and to arrive at novel molecules, materials and nanostructures. Recent theoretical methods for identifying the properties of amino acid sequences consistent with a desired structure and function will be discussed. Such methods address the structural complexity of proteins and their many possible amino acid sequences. Several computationally designed protein-based molecular systems will be presented that have been experimentally realized, including novel proteins tailored to accommodate nonbiological cofactors.

Useful insight into the folding behavior of proteins has been gained by studying the free energy landscapes of model peptides and very small proteins. These can be particularly useful in exploring the denatured state, which is difficult to characterize directly through experiments. Several key challenges remain in obtaining accurate and precise computational data for peptides in solution, including the accuracy of the biomolecular force field and solvent model along with difficulties in obtaining converged ensembles. In this talk these issues will be explored, including a comparison of the properties of several Amber parameter sets and the effects of simulation with different explicit and implicit water models. Methods that improve convergence will be discussed, such as modified replica exchange approaches that permit application to larger systems at reduced computational cost.

What is a Transition Path?
January 14, 2008

Calculating transition paths of conformation change is not amenable to computer solution unless the problem is defined precisely. Inspired by the work of others, we offer a precise definition of the problem without invoking unmotivated stochastic forces. A weakness of this--and other--approaches is the need for the user to identify an appropriate set of collective variables. In principal at least, the quality of the result can be checked a posteriori by calculating committor values from dynamics trajectories. The approach we advocate leads to a nonstandard minimum free energy path that is more reasonable physically.

Using the BlueGene computer at IBM Research we have performed extensive simulations on a mutant construct of lambda repressor. The protein consists of 80 amino acids stuctured as a five helix bundle in the folded state. The mutant was designed in the Gruebele lab at UIUC to exhibit sub-10 microsecond folding times in laser temperature jump experiments. Our simulations employed replica exchange and traditional molecular dynamics at a number of different temperatures. Although it is very difficult to thoroughly equilibrate and sample molecular systems of this size and complexity, our simulations clearly reveal very complex folding behavior. In particular, different structural elements exhibit different degrees of thermodynamic stability and melt at different temperatures. Some of the structural transitions appear to be relatively cooperative, whereas others are quite diffuse.

We focus on the problem of determining the structure of complexes formed by the association of two proteins by searching for the global minimum of a function approximating the free energy of the complex. Solving this problem requires the combination of a number of different optimization methods. First we explore the conformational space by systematic global search based on the Fast Fourier Transform (FFT) correlation approach that evaluates the energies of billions of docked conformations on a grid. We show that the method can be efficiently used with pairwise interactions potentials that substantially improve the docking results. A new 5D FFT algorithm is also discussed. The 1000 best energy conformations are clustered, and the 30 largest clusters are retained for refinement. The conformations are refined by a new medium-range optimization method that has been developed to locate the global minima within well defined regions of the conformational space. In each cluster, the energy of the complex is a very noisy funnel-like function on the space of rigid body motions, the Euclidean group SE(3). The Semi-Definite programming based Underestimation (SDU) method constructs a convex quadratic under-estimator to the energy funnel based on a sample of the local mimima of the energy function, and uses the quadratic under-estimator to guide future sampling. We show that the parameterization of SE(3) has a significant impact on the effectiveness of SDU and introduce a parameterization that dramatically reduces the number of very costly energy function evaluations. We also discuss the application of the combined method to recent targets in CAPRI (Critical Assessment of Protein Interactions), the first community-wide docking experiment.

Transmembrane beta-barrel (TMB) proteins are embedded in the outer membrane of Gram-negative bacteria, mitochondria, and chloroplasts. Despite their importance, very few nonhomologous TMB structures have been determined by X-ray diffraction because of the experimental difficulty encountered in crystallizing transmembrane proteins. We introduce the program partiFold to investigate the folding landscape of TMBs. By computing the Boltzmann partition function, partiFold estimates inter--strand residue interaction probabilities, predicts contacts and per-residue X-ray crystal structure B-values, and samples conformations from the Boltzmann low energy ensemble. This broad range of predictive capabilities is achieved using a single, parameterizable grammatical model to describe potential beta-barrel supersecondary structures, combined with a novel energy function of stacked amino acid pair statistical potentials. PartiFold outperforms existing programs for inter--strand residue contact prediction on TMB proteins, offering both higher average predictive accuracy as well as more consistent results. Moreover, the integration of these contact probabilities inside a stochastic contact map can be used to infer a more meaningful picture of the TMB folding landscape, which cannot be achieved with other methods. Partifold's predictions of B-values are competitive with recent methods specifically designed for this problem. Finally, we show that sampling TMBs from the Boltzmann ensemble matches the X-ray crystal structure better than single structure prediction methods. A webserver running partiFold is available at http://partiFold.csail.mit.edu/.

Joint work with: Charles W. O'Donnell, Srini Devadas, Peter Clote and Bonnie Berger.

References: [1] J. Waldispühl*, C.W. O'Donnel*, S. Devadas, P. Clote and B. Berger, Modeling Ensembles of Transmembrane beta-barrel Proteins, PROTEINS: Structure, Function and Bioinformatics, published online 14 Nov. 2007. doi:10.1002/prot.21788 (* authors equally contributed) [2]J. Waldispühl, B. Berger, P. Clote and J.-M. Steyaert, Predicting Transmembrane beta-barrels and Inter-strand Residue Interactions from Sequence, PROTEINS: Structure, Function and Bioinformatics, vol. 65, issue 1, p.61-74, 2006. doi:10.1002/prot.21046

We show that diffusion can play an important role in proteinfolding kinetics. We explicitly calculate the diffusion coefficient of protein folding in a lattice model. We found that diffusion typically is configuration- or reaction coordinate-dependent. The diffusion coefficient is found to be decreasing with respect to the progression of folding toward the native state, which is caused by the collapse to a compact state constraining the configurational space for exploration. The configuration- or position-dependent diffusion coefficient has a significant contribution to the kinetics in addition to the thermodynamic free-energy barrier. It effectively changes (increases in this case) the kinetic barrier height as well as the position of the corresponding transition state and therefore modifies the folding kinetic rates as well as the kinetic routes. The resulting folding time, by considering both kinetic diffusion and the thermodynamic folding free-energy profile, thus is slower than the estimation from the thermodynamic free-energy barrier with constant diffusion but is consistent with the results from kinetic simulations. The configuration- or coordinate-dependent diffusion is especially important with respect to fast folding, when there is a small or no free-energy barrier and kinetics is controlled by diffusion. Including the configurational dependence will challenge the transition state theory of protein folding. The classical transition state theory will have to be modified to be consistent. The more detailed folding mechanistic studies involving phi value analysis based on the classical transition state theory also will have to be modified quantitatively.

Mathematical Methods for Implicit Solvent Models
December 31, 1969

Rigorous, quantitative, and atomic scale description of complex biological systems is a grand challenge. Explicit description of biomolecules and their aqueous environment, including solvent, co-solutes, and mobile ions, is prohibitively expensive although a variety of methods, including Ewald summations, Euler summations, periodic images and reaction field theory, have been developed in the past few decades. Therefore, multiscale analysis is an attractive and sometimes indispensable approach. Implicit solvent models which treat the solvent as a macroscopic continuum while admitting a microscopic atomic description for the biomolecule, are efficient multiscale approaches to complex, large scale biological systems. We summarize recent advances in mathematical methods for the Poisson-Boltzmann (PB) equation based implicit solvent theory. These include the rigorous mathematical treatments of molecular surface interfaces, surface geometric singularities, charge singularities and associated force evaluation for molecular dynamics.

Small single-domain proteins often exhibit only a single free-energy barrier, or transition state, between the denatured and the native state. The folding kinetics of these proteins is usually explored via mutational analysis. A central question is which structural information on the transition state can be derived from the mutational data. To interpret these data, we have developed models that are based (a) on the substructural cooperativity of helices and hairpins, and (b) on splitting up mutation-induced stability changes of a protein into components for its substructures. We obtain a consistent structural interpretation of mutational Phi-values by fitting few parameters that describe the degrees of structure formation of helices and hairpins in the transition state. Our models explain how mutations at a given site can lead to different Phi-values, and capture non-classical Phi-values smaller than 0 or larger than 1, which have been difficult to interpret. Non-classical Phi-values simply arise, e.g., if mutations stabilize a helix or hairpin, but destabilize its tertiary interactions.

References: [1] C. Merlo, K. A. Dill, and T. R. Weikl, PNAS 102, 10171 (2005). [2] T. R. Weikl and K. A. Dill, J. Mol. Biol. 365, 1578 (2007). [3] T. R. Weikl, Biophys. J., in press (2008).

In order to understand protein folding, we need to understand both folded and unfolded state structure. One of the key systems for these studies is the 36 residue villin headpiece helical subdomain (HP36) because of its simple topology, small size and fast folding properties. Structures of HP36 have been determined using X-ray crystallography and NMR spectroscopy, with the resulting structures exhibiting clear structural differences. We complement the existing data by using molecular dynamics simulations and experimental double mutant cycles to show that the x-ray structure is the better representation in solution at neutral conditions. Denatured state studies using fragment analysis coupled with relatively low resolution spectroscopic techniques show a small tendency to form locally stabilized structure. Using standard Replica Exchange Molecular Dynamics, our simulations show that the first helix contains the most native-like helical structure of all three helices. Overall, our analysis shows how theoretical and experimental collaborative efforts can help aid in the understanding of the dynamic nature of the folding pathway.

We examine the temperature dependence of the folding time in the Trp-cage mini-protein, using all-atom replica exchange molecular dynamics (REMD) simulations. This is done by using an "equation free" approach. The central idea is to use REMD simulations to generate appropriately initialized bursts of atomistic simulation trajectories to obtain the drift and diffusion coefficients of coarse variables of interest to capture quantitatively the coupling of fast with slow moving degrees of freedom. Then we employ a stochastic (Langevin) dynamic approach to follow the evolution of coarse variables and compute a distribution of folding times that is consistent with the drift and diffusion coefficients obtained from all-atom simulations. We use physically motivated order parameters as coarse variables. We describe the distribution of folding times as a function of temperature for the Trp-cage mini-protein.

A Novel Method for Protein Folding Shape Description
December 31, 1969

A novel method has been developed to describe the protein folding shape structures. A set of 27 vectors is generated from an enclosed geometric space to describe the protein backbone folding shapes. This algorithm has mathematically reserved all possible folding shapes in space, and it is capable of making the complete assignment of folding shapes along the protein backbone without any gap. All possible types of folding can be uniquely described, including the regular protein secondary structures, irregular turn and loop structures and even rare possible folding structures. This method offers a simple one-dimensional description for the complicated three-dimensional folding structures which is able to align with the protein sequence for structural comparison. The results are compared with the protein data bank (PDB) and the structural assignments of other methods. This method has the ability to reveal the protein structural similarity and dissimilarity with the accurate and consistent meaning.

 Connect With Us: Go
 © 2014 Regents of the University of Minnesota. All rights reserved. The University of Minnesota is an equal opportunity educator and employer Last modified on October 06, 2011 Twin Cities Campus:   Parking & Transportation   Maps & Directions Directories   Contact U of M   Privacy