January 14 - 18, 2008
Protein motions, ranging from molecular flexibility to large-scale
conformational change, play an essential role in many biochemical
processes. For example, some devastating diseases such as Alzheimer's
and bovine spongiform encephalopathy (Mad Cow) are associated with
the misfolding of proteins. Despite the explosion in our knowledge of
structural and functional data, our understanding of protein movement
is still very limited because it is difficult to measure experimentally
and computationally expensive to simulate.
In this talk we describe a method we have developed for modeling protein
motions that is based on probabilistic roadmap methods (PRM) for motion
planning. Our technique yields an approximate map of a protein's potential
energy landscape and can be used to generate transitional motions of a
protein to the native state from unstructured conformations or between
specified conformations. We also describe new analysis tools that enable
us to extract kinetics information, such as folding rates or to identify
and study the folding core. For example, we show how our map-based tools
for modeling and analyzing folding landscapes can capture subtle folding
differences between protein G and its mutants, NuG1 and NuG2. More
information regarding our work, including an archive of protein motions
generated with our technique, are available from our protein folding server:
http://parasol.tamu.edu/foldingserver/
Chemical reaction systems with a low to moderate number of molecules are typically modeled as continuous time Markov chains. More explicitly, the state of the system is modeled as a vector giving the number of molecules of each species present with each reaction modeled as a possible transition for the state. The model for the kth reaction is determined by a vector of inputs specifying the number of molecules of each chemical species that are consumed in the reaction, a vector of outputs specifying the number of molecules of each species that are created in the reaction and a function of the state that gives the rate at which the reaction occurs. To understand how the probability distribution of the system changes in time one could attempt to solve the Chemical Master Equation (CME), however this is typically an extremely difficult task. Therefore, simulation methods such as the Stochastic Simulation Algorithm (Gillespie Algorithm) and tau-leaping have been developed so as to approximate the probability distribution of the system via Monte Carlo methods. I will demonstrate how using a random time change representation for these models leads naturally to simulation methods that achieve greater efficiency and stability than existing methods.
Due to the inherent complexity of the associated problems,
investigations of the basic principles of protein folding and evolution
are usually restricted to simplified protein models.
Our group has developed methods and programs for exact and complete solving of
problems typical for studies using HP-like 3D lattice protein models.
Addressed tasks are the prediction of globally optimal and listing of
suboptimal structures, sequence design, neutral network exploration, and
degeneracy computation. The used methods are based on fast and
non-heuristic techniques (constraint programming) instead of following
stochastic approaches, which are not capable of answering many fundamental
questions. Thus, we are able to find optimal structure for HP-sequences of length
greater than 200, including a proof of optimality. We have used these methods to
find unique folding sequences, to investigate neutral nets and to design low-degenerated
sequences for given structures.
Two groups of studies recently proved to provide insights into such intrinsic, structure-induced effects: elastic network models that permit us to visualize the cooperative changes in conformation that are most readily accessible near native state conditions, and information-theoretic approaches that elucidate the most efficient pathways of signal transmission favored by the overall architecture. Using a combination of these two approaches, we highlight, by way of application to the bacterial chaperonin complex GroEL-GroES, how the most cooperative modes of motion play a role in mediating the propagation of allosteric signals. A functional coupling between the global dynamics sampled under equilibrium conditions and the signal transduction pathways inherently favored by network topology appears to control allosteric effects.
Continuum electrostatics methods have become increasingly popular due
to their ability to provide approximate descriptions of solvation
energies and forces without expensive sampling required by explicit
solvent models. In particular, the Poisson-Boltzmann equation (PBE)
provides electrostatic potentials, solvation energies, and forces by
modeling the solvent as a featureless, dielectric material, and the
mobile ions as a continuous distribution of charge. In this talk, I
will provide a review of PBE-based and new apolar continuum solvation
methods as well as approaches for assessing their performance by
comparison with explicit solvent simulations. In particular, I will
focus on the ability of these continuum solvent models to describe
solvation forces on proteins and nucleic acids and will comment on
strengths and weaknesses of these implicit solvent approaches.
Many small single-domain proteins undergo cooperative, switch-like
folding/unfolding transitions with very low populations of intermediate,
i.e., partially folded, conformations. The phenomenon of cooperative folding
is not readily accounted for by common notions about driving forces for
folding. I will discuss how common protein chain models with pairwise
additive interactions are insufficient to account for the folding
cooperativity of natural proteins, and how models with nonadditive
local-nonlocal coupling may rationalize cooperative folding rates that
are well correlated with native topology. The traditional formulation
of folding transition states entails a macroscopic folding free energy
barrier with both enthalpic and entropic components. I will explore the
microscopic origins of these thermodynamic signatures in terms of
conformational entropy as well as desolvation (dewetting) effects.
Notably, the existence of significant enthalpic folding barriers
raises fundamental questions about the validity of the funnel picture of
protein folding, because such enthalpic barriers appear to imply that
there are substantial uphill moves along a microscopic folding trajectory.
Using results from extensive atomic simulations, I will show how the
paradox can be resolved by a dramatic entropy-enthalpy compensation
at the rate-limiting step of folding. In this perspective, the height
of the enthalpic barrier is seen as related to the degree of cooperativity
of the folding process.
Multivalent ions (Mg2+) in RNA tertiary structure folding can be strongly correlated and thus cannot be treated by mean-field theories such as the Poisson-Boltzmann equation. We recently developed a statistical mechanical model (TBI) to account for ion correlation by considering ensemble of discrete ion distributions. Experimental tests show that the TBI model gives improved predictions for nucleic folding folding stability over the Poisson-Boltzmann equation, which generally underestimates the (multivalent) ion-dependent folding stability due to ignoring the ion correlation. Using the TBI theory, we investigate the folding energy landscape for a simple system with loop-tethered short DNA helices and find that Na+ and Mg2+ play contrasting roles in helix–helix assembly. High [Na+] (>0.3 M) causes a reduced helix–helix electrostatic repulsion and a subsequent disordered packing of helices, while Mg2+ of concentration > 1 mM is predicted to induce a more compact and ordered helix–helix packing. Mg2+ is much more efficient in causing nucleic acid compaction and is predicted to induce a collapse transition around 1mM of [Mg2+].
Lead generation is a major hurdle in small-molecule drug discovery, with an estimated 60% of projects failing from lack of lead matter or difficulty in optimizing leads for drug-like properties. It would be valuable to identify these less-druggable targets before incurring substantial expenditure and effort. We discovered that a model-based approach using basic biophysical principles yields good prediction of druggability based solely on the crystal structure of the target binding site. We quantitatively estimate the maximal affinity achievable by a drug-like molecule, and we show that these calculated values correlate with drug discovery outcomes. We experimentally test two predictions using high-throughput screening of a diverse compound collection. The collective results highlight the utility of our approach as well as strategies for tacking difficult targets. I will also discuss our approach to calculating protein curvature and some potential computational approaches for difficult targets.
This talk (and a related poster) describes Lie-group-theoretic techniques that can be applied in the analysis and modeling of protein conformations. Three topics are covered: (1) Conformational transitions between two known end states; (2) proper normalization of helix-helix crossing angle data in the PDB; (3) models of the conformational entropy of the ensemble of unfolded polypeptide conformations. Using the concept of convolution on the group of rigid-body motions, the probability density of position and orientation of the distal end of a polypeptide chain is obtained by convolving the distributions for shorter segments that make up the chain. This methodology can also be used in the analysis of loop entropy in folded proteins as well as the ensemble of unfolded conformations.
The geometrical problem of protein folding, especially in its later stages, is composed of two types of freedom, the full torsional flexibility of loops connecting nearly rigid structural pieces (helices, beta-sheets etc), and the relative placing of such pieces. We present a method for sampling the feasible conformations of protein loops, based on Triaxial Loop Closure (TLC), a simple and highly efficient inverse kinematic (IK) method for solving the loop closure problem. TLC is easily extended to incorporate additional (i.e. position, orientation) constraints, or more general geometrical conditions. Due to its relative simplicity TLC compares favorably to more general IK robotics algorithms, both in robustness and in speed. We consider two applications: (i) An algorithm for the rapid sampling of the conformations of protein loops including three or more residues which uses quasirandom Sobol sampling of the Ramachandran regions. Ideas akin to Delauney triangulation may be employed to ensure sampling loop shapespace at a desired density. (ii) An efficient method for the sequential assembly of helical proteins via
maximal hydrophobic packing. The geometrical problem of considering all
possible mutual arrangements of a system of helices that are compatible with closing the corresponding loops is already too large to sample directly. We introduced a measure of hydrophobic packing by seeking to minimize the radius of gyration of the hydrophobic residues. Thus, we sequentially assemble the helices, by sampling relative orientations of pairs of them that bring specified hydrophobic residues in proximity. For the best candidates, in terms of energy and hydrophobig radius of gyration, the loops are closed using the algorithm in (i) and another helix is added to the assembly, always seeking to maximizing hydrophobic contact. We tested this iterative assembly method on 26 helical proteins each containing up to 5 helices. The method heavily samples native-like conformations. The average RMSD-to-native of the best conformations for the 18 helix bundle proteins that have 2 or 3 helices is less than 2 Angstroms with slightly worse errors for proteins containing more helices.
Sequence-structure relationships in proteins are highly asymmetric since
many sequences fold into relatively few structures. What is the number of
sequences that fold into a particular protein structure? Is it possible to
switch between stable protein folds by point mutations? To address these
questions we compute a directed graph of sequences and structures of
proteins, which is based on experimentally determined protein shapes. Two
thousand and sixty experimental structures from the Protein Data Bank were
considered, providing a good coverage of fold families. The graph is
computed using an energy function that measures stability of a sequence in a
fold. A node in the graph is an experimental structure (and the
computationally matching sequences). A directed and weighted edge between
nodes A and B is the number of sequences of A that switch to B because the
energy of B is lower. The directed graph is highly connected at native
energies with ³sinks² that attract many sequences from other folds. The
sinks are rich in beta sheets. The in-degrees of a particular protein shape
correlates with the number of sequences that matches this shape in
empirically determined genomes. Properties of strongly connected components
of the graph are correlated with protein length and secondary structure.
Joint work with Leonid Meyerguz and Jon Kleinberg
Joint work with S. R. McAllister.
The protein folding question has developed over the past four decades as one of the most challenging and potentially rewarding problems in computational biology. Three general classes of algorithms have emerged, based on the techniques of comparative
modeling, fold recognition, and first principles methods. For a detailed summary of protein structure prediction methods, the reader
is directed to two recent reviews [1,2]. Within the field of protein structure prediction, the packing of alpha-helices has been one of the more difficult problems. The use of distance constraints and topology predictions can be highly useful for reducing the conformational space that must be searched by deterministic algorithms to find a protein structure of minimum conformational energy.
We present a novel first principles framework to predict the structure of alpha-helical proteins. Given the location of the alpha-helical regions, a mixed-integer linear optimization model maximizes the interhelical residue contact probabilities to generate distance restraints
between alpha-helices [3]. Two levels of this formulation allow the prediction of both ``primary'' contacts between a helical pair
as well as the prediction of ``wheel'' contacts, one helical turn beyond the primary contacts. These predictions are subject to a
number of mathematical constraints to disallow sets of contacts that cannot be achieved by a folded protein. The interhelical contact prediction for alpha-helical proteins was evaluated on 26 proteins, where it identified an average contact distance below 11.0 Angstroms for the entire set.
A related optimization-based approach is proposed for the prediction of alpha-helical contacts in mixed alpha/beta proteins [4]. This contact prediction is based on the maximization of the number and hydrophobicity of hydrophobic interactions. The allowable sets of contacts is restricted based on knowledge or prediction of the beta-sheet topology and a number of distance geometry rules and constraints. The interhelical contact prediction for alpha/beta proteins was evaluated on 12 test proteins, where it identified an average contact distance below 11.0 Angstroms for 11 of these proteins.
The distance restraints from the interhelical contacts are then used to restrict the feasible space of the protein during the prediction of the tertiary structure using a hybrid optimization algorithm [5,6]. This tertiary structure prediction approach combines torsion angle dynamics and rotamer optimization with a deterministic global optimization technique (alphaBB) and a stochastic optimization technique
(conformational space annealing) to minimize a detailed atomistic-level energy function. The tertiary structure prediction results are promising and are highlighted by the exciting, near-native blind prediction of a de novo designed 4-helix bundle protein.
[1] Floudas CA, Fung HK, McAllister SR, Monningmann M, and Rajgaria R. Advances in Protein Structure Prediction and De Novo Protein Design: A Review. Chem Eng Sci. 2006;61: 966-988.
[2] Floudas CA. Computational Methods in Protein Structure Prediction. Biotechnol Bioeng. 2007;97:207-213.
[3] McAllister SR, Mickus BE, Klepeis JL, and Floudas CA. A Novel Approach for Alpha-Helical Topology Prediction in Globular
Proteins: Generation of Interhelical Restraints. Prot Struct Funct Bioinf. 2006;65:930-952.
[4] McAllister SR and Floudas CA. Alpha-helical Residue Contact Prediction in Mixed Alpha/Beta Proteins Using Mixed-Integer Linear Programming. In preparation, 2007.
[5] Klepeis JL and Floudas CA. ASTRO-FOLD: A Combinatorial and Global Optimization Framework for Ab Initio Prediction of
Three-dimensional Structures of Proteins from the Amino Acid Sequence. Biophys J, 2003;85:2119-2146.
[6] McAllister SR and Floudas CA. An Improved Hybrid Global Optimization Method for Protein Tertiary Structure Prediction. In preparation, 2007.
Coarse master equations and diffusion models provide powerful tools to study the equilibrium and non-equilibrium properties of molecular systems. Maximum likelihood and Bayesian approaches have been used successfully to construct such models from the observed dynamics projected onto discrete and continuous low-dimensional sub-spaces. By using a Green's-function based formalism, issues arising from fast non-Markovian dynamics can be circumvented. The general formalism for the construction of coarse master equations and diffusion models will be illustrated with applications to peptide and protein folding.
While equilibrium free energy differences can be obtained from simulations of non-equilibrium systems,
such estimates are often hampered by convergence difficulties similar to those that plague traditional
perturbation methods. In the nonequilibrium context, dissipation is the culprit behind poor convergence.
I will discuss a general strategy for addressing these difficulties, in which dissipation is reduced
by adding non-physical terms to the equations of motion. When these terms are chosen appropriately,
the efficiency of the free energy estimate can improve dramatically. After sketching the general features
of this strategy, I will illustrate its application using specific examples, and will discuss its relation
to other strategies that are similar in spirit.
Are protein motions limited because of a higher level of cooperativity than indicated by usual potentials?
Recently derived four-body coarse-grained potentials show improved performance in threading over pairwise potentials. Their apparently increased extent of cooperativity is consistent with the high level of control of motions manifested in the elastic network model computations. The elastic network models are providing strong evidence that proteins control their functional motions through their most important slowest domain motions. A major strength of these models appears to be their ability to represent the structures as highly cohesive rubbery materials, and much evidence supporting them has now accumulated. Such models exhibit strong control over their motions, arising principally from the shape, sometimes even including control of the motions of surface loops by domain motions and the motion of reactive atoms at enzyme active sites. These highly coordinated atom motions may be relevant to enzyme mechanisms. There is accumulating evidence that the behavior of protein machines can be understood with these models, and the important large domain motions can be obtained readily. For the ribosome, the results clearly indicate that its motions relate strongly to many aspects of its function. Already we have seen that the large ribosomal ratchet motion simultaneously causes the t-RNAs and mRNA to translate in the processing direction. The control of the mRNA at the anti-codon binding site is extremely strong, to ensure fidelity of copying, with the mRNA being moved translationally as a fully rigid body, with no internal motions.
I will present and discuss a number of computational experiments associated
with the coarse-graining of atomistic/agent-based simulations. In particular, I
will discuss coarse reverse integration, as well as the use of diffusion maps
(a manifold-learning technique) to suggest the selection of certain coarse-grained observables
("reduction coordinates") for the atomistic simulations. The illustrations will come
from molecular dynamics, Monte Carlo as well as agent-based models.
Difference topology is a methodology to derive the number of
DNA crossings trapped in an unknown protein complex. By this
method, Pathania, Jayaram, and Harshey revealed the
topological structure within the Mu protein complex which
consisted of three DNA segments containing five nodes [1]. In
their experiments, they used a member of the site-specific
recombinases which is known as Cre. Cre mediates DNA exchange
by rearranging target sites of the DNA segments. During this
DNA recombination, there are no extra DNA crossings
introduced. The initial DNA conformation is unknotted. After
Cre recombination, the products are knots or catenanes.
Recently, Darcy, Luecke, and Vazquez analyzed these
experimental results and proved that the five-noded
conformation is the only biologically reasonable structure of
the Mu protein DNA complex [2]. We address the possibility of
protein complexes that binds four DNA segments. By the useful
property of Cre, we can make the assumption that after Cre
recombination, the topology of a DNA-protein complex would be
a knot or catenane. The latest results of the topological
tangle model for this case and very basic biological and
mathematical backgrounds will be discussed.
Reference:
[1] S. Pathania, M. Jayaram, and R. Harshey, Path of DNA
within the Mu transpososome: Transposase interaction bridging
two Mu ends and the enhancer trap five DNA supercoils, Cell
109 (2002), 425-436.
[2] I. K. Darcy, J. Luecke, and M. Vazquez, A tangle analysis
of the Mu transpososome protein complex which binds three DNA
segments, Preprint.
Efficient exploration of the configuration space of a protein is essential for its structure prediction. In this talk we will consider two recent Monte Carlo developments for such a task: (i) equi-energy (EE) sampler and (ii) fragment regrowth via energy-guided sequential sampling (FRESS). The EE sampler provides accurate estimation of the density of states of the energy landscape, which then allows detailed study of the thermodynamics of lattice protein folding. The FRESS algorithm provides an efficient means to sequentially simulate a protein structure. As an illustration we will consider 2D and 3D HP models. For the benchmark sequences, we not only found new lower energies for all the 3D sequences longer than 80 residues with little computing effort, but also were able to accurately estimate the density of states that characterizes the global energy landscape.
I discuss two ongoing research programs concerning stochastic
microphysical models in biology. First, in joint work with Grigorios
Pavliotis (Imperial College) and Juan Latorre, we apply the methods of
homogenization theory to compute transport properties for mathematical
models (Brownian motors) of molecular motors in cells. The objective
is to ascertain how the design properties of the motor affect its
function. Secondly, in joint work with Shekhar Garde and Adnan Khan,
we explore a simple stochastic model for the behavior of water
molecules near a solute surface which has the potential for improving
substantially upon Brownian dynamics models more conventionally used
in engineering applications. We use exactly solvable mathematical
models as a testbed for addressing some basic data-driven
parameterization issues.
We report on replica-exchange simulations of the folding of a 21-residue alpha-helical peptide in explicit solvent. Using eight replicas over a 280-450 K temperature range, we were able to simulate both the folding equilibrium and the helix formation process. While the melting temperature was exaggerated by about 50 K, the folding enthalpy and entropy were in good agreement with experimental data. Simulations of the helix formation process showed a sequential mechanism, with helical structures formed within ca. 3 ns of simulation, and confirmed that the alpha-helical state was a global free energy minimum of the peptide at low temperatures. We also present long-term conventional MD simulations of several simple peptide model systems for which we compare simulation results to experimental data, in order to test current peptide and water models.
RNA tertiary motifs play an important role in RNA folding. To understand the complex organization of RNA tertiary interactions, we compiled a dataset containing 54 high-resolution RNA crystal structures. Seven RNA tertiary motifs (coaxial helix, A-minor, ribose zipper, pseudoknot, kissing hairpin, tRNA D-loop:T-loop and tetraloop-tetraloop receptor) were searched by different computer programs. For the non-redundant RNA dataset, 601 RNA tertiary interactions were found. Most of these 3D interactions occur in the 16S and 23S rRNAs. Exhaustive search of these motifs revealed diversity of A-minor interactions, and other loop-loop receptor interactions similar to the tetraloop-tetraloop receptor. Correlation between motifs, such as pseudoknot or coaxial helix with A-minor, shows that they can form composite motifs. These findings may lead to tertiary structure constraints for RNA 3D prediction.
Can genomics provide a new level of strategic intelligence about rapidly evolving pathogens? We have developed a new approach to measure the rates of all possible evolutionary pathways in a genome, using conditional Ka/Ks to estimate their “evolutionary velocity”, and have applied this to several datasets, including clinical sequencing of 50,000 HIV-1 samples. Conditional Ka/Ks predicts the preferred order and relative rates of competing evolutionary pathways. We recently tested this approach using independent data generously provided by Shafer and coworkers (Stanford HIV Database), in which multiple samples collected at different times from each patient make it possible to track which mutations occurred first during this time-course. Out of 35 such mutation pairs in protease and RT, conditional Ka/Ks correctly predicted the experimentally observed order in 28 cases (p=0.00025). Conditional Ka/Ks data reveal specific accessory mutations that greatly accelerate the evolution of multi-drug resistance. Our analysis was highly reproducible in four independent datasets, and can decipher a pathogen’s evolutionary pathways to multi-drug resistance even while such mutants are still rare. Analysis of samples from untreated patients shows that these rapid evolutionary pathways are specifically associated with drug treatment, and vanish in its absence.
Replica exchange (RE) is a generalized ensemble simulation method for accelerating the exploration of free-energy landscapes which define many challenging problems in computational biophysics, including protein folding and binding. Although replica exchange is a parallel simulation technique whose implementation is relatively straightforward, kinetics and the approach to equilibrium in the replica exchange ensemble are complicated; there is much to learn about how to best employ RE to protein folding and binding problems. Protein folding rates often slow down as the temperature is raised above a critical value and this “anti-Arrhenius” behavior represents a challenge for RE. However, it is far from straightforward to systematically explore the impact of this on RE by brute force molecular simulations, since RE simulations of protein folding are very difficult to converge. In studies over the past two years using both atomistic and simplified models, we have clarified some of the obstacles to obtaining converged thermodynamic information from RE simulations. In my talk I will describe some simple continuous and discrete models we have constructed to explore the behavior of replica exchange sampling under a variety of conditions.
References
Andrec, M., A.K. Felts, E. Gallicchio, and R.M. Levy. Protein folding pathways from replica exchange simulations and a kinetic network model. Proceedings Natl. Acad. Sci. USA, 102, 6801-6806 (2005).
K.P. Ravindranathan, E. Gallicchio, R.A. Friesner, A.E. McDermott, and R.M. Levy. Conformational equilibrium of a cytochrome P450 – substrate complex, a replica exchange MD study. J. Am. Chem. Soc., 128, 5786-5791 (2006)
Zheng, W., M. Andrec, E. Gallicchio, and R.M. Levy. Simulating replica exchange simulations of protein folding with a kinetic network model. Proceedings Natl. Acad. Sci. USA, 104, 15340-15345 (2007).
Zheng, W., M. Andrec, E. Gallicchio, and R.M. Levy. Simple continuous and discrete models for simulating replica exchange simulations of protein folding. J. Phys. Chem., in press.
For a protein molecule in vacuo, the net overall rotation due to flexibility is expressed in internal coordinates by using Eckart's decomposition of the total rotational angular momentum. Regardless of whether the total (rotational) angular momentum vanishes, the condition for zero overall rotation is zero orbital angular momentum. Previous approaches toward the elimination of overall rotation included (i) using normal modes, (ii) minimizing the root-mean-squared deviation (RMSD through finite rotations with respect to an initial configuration, and (iii) setting the total angular momentum to zero. These three approaches neglected the contribution of nonzero internal angular momentum. While this approach [1, 2] is motivated by results in geometric mechanics [3], the results agree with an experimental observation of a rotation of 20 degrees in triatomic photodissociation [4] and a computational observation of an overall rotation of 42 degrees in the dynamics of protein
molecules [5].
References:
[1] F. J. Lin, Hamiltonian dynamics of atom-diatomic molecule complexes and collisions, Discrete and Continuous Dynamical Systems, Supplement 2007, 655 – 666 (2007).
[2] F. J. Lin, Separation of overall rotation and internal motion in the N-body dynamics of protein molecules, 2007a.
[3] J. E. Marsden, R. Montgomery, and T. Ratiu, Reduction, symmetry, and phases in mechanics, Memoirs of the American Mathematical Society, Vol. 88, No. 436, American Mathematical Society, Providence, RI, 1990.
[4] A. V. Demyanenko, V. Dribinski, H. Reisler, H. Meyer, and C. X. W. Qian, Quantum-product state-dependent anisotropies in photoinitiated unimolecular decomposition, Journal of Chemical Physics 111, 7383 – 7396 (1999).
[5] Y. Zhou, M. Cook and M. Karplus, Protein motions at zero-total angular momentum: The importance of long-range correlations, Biophysical Journal 79, 2902 – 2908 (2000).
Recent progress in obtaining docked protein complexes will be discussed.
The combination of exhaustive search, clustering and localized global
optimization can reliably find energy minima to highly nonconvex biomolecular
energy functions. Using an energy function that adds desolvation and
screened electrostatics to classical molecular mechanics potentials, the
global minimum is found very near to the observed native state. This is
demonstrated across a large number of benchmark examples.
Protein molecules are usually studied through mathematical models which consider the physicochemical forces among the atoms forming the molecule. Recently, however, the interest on the geometric properties of protein conformations has been growing, and models mainly based on such properties have been developed. In this work, a global optimization problem is formulated for simulating protein conformations, that is based on the so-called "tube model", in which a protein is modeled as a thickened tube [1,2]. The optimization problem is solved by the meta-heuristic method Monkey Search [3], which is inspired by the behavior of a monkey climbing trees in its search for food. Computational experiences proved that conformations having the typical geometric properties of proteins can be generated and that some of them can be close to conformations that proteins in Nature actually have.
[1] J.R. Banavar, A. Maritan, C. Micheletti and A. Trovato, "Geometry and Physics of Proteins", Proteins: Structure, Function, and Genetics 47 (3): 315–322, 2002
[2] G. Ceci, A. Mucherino, M. D’Apuzzo, D. di Serafino, S. Costantini, A. Facchiano, G. Colonna, "Computational Methods for Protein Fold Prediction: an Ab-Initio Topological Approach", Data Mining in Biomedicine, Springer Optimization and Its Applications, Panos Pardalos et al (Eds.), vol.7, 2007
[3] A. Mucherino and O. Seref, "Monkey Search: A Novel Meta-Heuristic Search for Global Optimization", AIP Conference Proceedings 953, Data Mining, System Analysis and Optimization in Biomedicine, 162–173, 2007
Globally the energy landscape of a folding protein resembles a partially rough funnel. The local roughness of the funnel reflects transient trapping of the protein configurations in local free energy minima. The overall funnel shape of the landscape, superimposed on this roughness, arises because the interactions present in the native structure of natural proteins conflict with each other much less than expected if there were no constraints of evolutionary design to achieve reliable and relatively fast folding (minimal energetic frustration).
A consequence of minimizing energetic frustration is that the topology of the native fold also plays a major role in the folding mechanism. Some folding motifs are easier to design than others suggesting the possibility that evolution not only selected sequences with sufficiently small energetic frustration but also selected more easily designable native structures. The overall structure of the on-route and off-route (traps) intermediates for the folding of more complex proteins is also strongly influenced by topology. In this context, folding of larger and more complex proteins will be discussed.
Many cellular functions rely on interactions among proteins and between proteins and nucleic acids. The limited success of binding predictions may suggest that the physical and chemical principles of protein binding have to be revisited to correctly capture the essence of protein recognition. Going beyond folding, the power of reduced models to study the physics of protein assembly will be discussed. Since energetic frustration is sufficiently small, native topology-based models, which correspond to perfectly unfrustrated energy landscapes, have shown that binding mechanisms are robust and governed primarily by the protein’s native topology. These models impressively capture many of the binding characteristics found in experiments and highlight the fundamental role of flexibility in binding. Deciphering and quantifying the key ingredients for biological self-assembly is invaluable to reading out genomic sequences and understanding cellular interaction networks. Going even beyond binding, we will be discussing the energy landscape for the molecular motor kinesin.
Biological networks that arise in signal transduction, metabolism, gene
control or other cellular functions frequently involve many steps and
many levels of control, but are remarkably reliable in producing the
desired output in response to inputs. In this talk we will discuss
general characteristics such as sensitivity and adaptation of networks,
give several examples that illustrate how robustness is achieved, and
discuss the mathematical techniques that can contribute to understanding
complex networks.
Protein folding by ZAM & FRODA
Protein folding problem’ stemming from Levinthal’s paradox (i.e. how proteins fold fast even though they have vast conformational space) has been answered by the zipping and assembly mechanism (ZA). According to ZA mechanism, an unfolded chain first explores locally favorable structures at multiple independent positions along the chain. Then, these local structures engage neighboring amino acids in the chain sequence to form additional contacts, growing individual local structures by zipping or assembling. Using ZA principle, an all-atom structure prediction method has been developed, called zipping and assembly method (ZAM). ZAM has successfully predicted protein structures of small single domain proteins. It uses replica exchange molecular dynamics for conformational sampling which creates a bottleneck in the assembly stage.
We modify ZAM assembly stage by introducing FRODA which is a Monte Carlo based geometric simulation. Since FRODA can explore the large-amplitude motions of larger systems so much faster than molecular dynamics, we can speed up the assembly stage and generate the complete enumeration of all topologies quickly. The results show that the native structure of proteins can be sampled during the FRODA-assembly stage within a RMSD
"What is the free energy difference between two different
conformations of a protein?" This simple question is apparently not
so simple to answer. Despite the significant advances in free energy
computations, there has been relatively little success in computing
conformational free energies. To compute conformational free energy
differences, we need a transformation path that connects different
conformations. The free energy difference is the same no matter what
path is taken, but not all paths are equally useful. Finding a path
that allows an efficient computation of free energy is a crucial step.
Tremendous recent efforts to find physical paths of conformational
changes have motivated us to take a stab at using nonphysical paths.
This poster introduces a method we call 'deactivated morphing' and
presents applications to two test systems: alanine dipeptide (AlaD)
and deca-alanine (Ala10), both in explicit water. In this method, the
internal interaction of a protein is completely turned off before a
transformation is carried out along a nonphysical path.
The replica exchange/parallel tempering method and its variations offer the
hope of
improved sampling for many challenging problems in molecular simulation.
Like all
sampling methods, however, the effectiveness of replica exchange is highly
dependent
on the specific physical system being studied. We have carried out large
scale
temperature replica exchange (T-REMD) simulations of peptides and proteins
in explicit solvent
and encountered a number of issues that limit the effectiveness of the
method in sampling
biomolecular conformations. In particular, our results suggest the need
to either
replace or augment the temperature variable with an alternative extended
variable,
such as Hamiltonian scaling.
The task of molecular biology is to identify and define cellular
states and functions in terms of molecular structures, dynamics, and chemical reactions. At macromolecular level, functional states and dynamics of individual proteins and enzymes are determined by Newton's Law and/or statistical thermodynamics. Computational approach thus follows molecular dynamic simulations and statistical thermodynamics. At cellular level, "the structures" of biochemical reaction systems, i.e., metabolic
networks, genetic regulatory modules and signaling pathways, have been the central focus of current reserch. We introduce the chemical master equation (CME) as the theoretical foundation of their dynamics. Dynamic models based on the CME represent
open chemical systems that follow nonequilibrium statistical
thermodynamics - a key ingredient for living cell but not in
usual macromolecular models. The CME supersedes the traditional deterministic models based on the law of mass action; it provides reaction kinetics and concentration (or copy number) fluctuations; and it allows a rigorous definition of "cellular state(s)" in terms of the concentrations and copy numbers of biomolecules in a biochemical reaction system in open chemical environment. We discuss this computational approach to cellular biochemistry, its relation to open-system thermodynamics. In particular we outline its similarities to and distinctions from computational macromolecular dynamics.
In many cases the native structures of proteins encode information about their folding pathways. The degree to which this is true may be related to the similarity of various structural solutions, i.e. multiple NMR structures or independently solved X-ray structures. We explore both the robustness of the native state and its impact upon putative folding pathways for these structures by examining simulated thermal denaturation pathways for ensembles of “native” structures. Previous rigidity analysis of proteins using the FIRST software[1] has demonstrated that such simulated unfolding results correlate well with experimental hydrogen-deuterium exchange data[2] and mutational results[3]. We introduce an unfolding rigidity profile to characterize unfolding pathways and indicate which protein residues are most likely to adopt native-like conformations. This rigidity profile is also used to discriminate conformational sub-states of the protein native state which are the results of different folding pathways.
[1] D. J. Jacobs, A.J. Rader, Leslie A. Kuhn, and M.F. Thorpe Protein Flexibility Prediction Using Graph Theory. Proteins, 44, 150-165, 2001.
[2] B.M. Hespenheide, A.J. Rader, M.F. Thorpe, and L.K. Kuhn J. Molec. Graph. & Model., 21, 195-207, 2002.
[3] A.J. Rader, G. Andersen, B. Isin, H.G. Khorana, I. Bahar and J. Klein-Seetharaman Identification of Core Amino Acids Stabilizing Rhodopsin. Proc. Natl. Acad. Sci., 101, 7246-7251, 2004.
We consider the protein folding problem as a global optimization problem
on the free energy surface of the all-atom OPLS force field. Starting from
amino acid sequences arranged in a linear chain configuration we use the Minima
Hopping algorithm to find low energy configurations . The algorithm samples
the potential energy surface following the Bell-Evans-Polanyi principle.
In this way our moves are completely unbiased but have nevertheless a strong
tendency to lead into other low energy configurations. Some small peptides like
polyalanine were studied in vacuo. For an Ac(Ala)nLysH+ in vacuum
we obtained a helical conformation while other (Ala)nH+ systems are found to
be not helical.
The expressed 29-kDa Cyt2Aa2 protoxin is produced
from Bacillus thuringiensis subsp.
darmstadiensis during sporulation. This
toxin is proteolytically
processed into the 25-kDa active form which is lethal to
Dipteran (Stegomyia and
Culex) larvae. It has been proposed that the mechanism of
action required a
conformational change during the interaction with lipid
membrane. We have
demonstrated previously that the toxin can adopt a stable
intermediate state between
the transitions from native to unfolded state. In this study,
we aim to investigate the
conformational state of each secondary structure elements of
the toxin in intermediate
state. The Φ values analysis was employed as a tool to reveal
the conformation state
on various amino acid positions of toxin. Non-disruptive mutant
toxins were
designed and constructed as conformational probes using
site-directed mutagenesis.
The effect of mutation at different position is characterized
in terms of protein
expression level, solubility, mosquito-larvicidal toxicity and
hemolytic activity
compared to those of wild type. The results showed that an
intermediate state of this
toxin was found in the aggregation/oligomerization form.
Transitional free energy and
activation energy of these toxins were obtained and derived for
the Φ values. The data
revealed helices A, B and C of the intermediate are quite
deviated from the native
state while helix D is still maintained similar to native state
conformation. The
relocation of these helices was proposed to be related to the
conformational changes
and contributes to the functional mechanism of this toxin.
Protein design opens new ways to probe the determinants of folding, to facilitate the study of proteins, and to arrive at novel molecules, materials and nanostructures. Recent theoretical methods for identifying the properties of amino acid sequences consistent with a desired structure and function will be discussed. Such methods address the structural complexity of proteins and their many possible amino acid sequences. Several computationally designed protein-based molecular systems will be presented that have been experimentally realized, including novel proteins tailored to accommodate nonbiological cofactors.
Useful insight into the folding behavior of proteins has been gained by studying the free energy landscapes
of model peptides and very small proteins. These can be particularly useful in exploring the denatured state,
which is difficult to characterize directly through experiments. Several key challenges remain in obtaining
accurate and precise computational data for peptides in solution, including the accuracy of the
biomolecular force field and solvent model along with difficulties in obtaining converged ensembles.
In this talk these issues will be explored, including a comparison of the properties of several
Amber parameter sets and the effects of simulation with different explicit and implicit water
models. Methods that improve convergence will be discussed, such as modified replica exchange
approaches that permit application to larger systems at reduced computational cost.
Calculating transition paths of conformation change
is not amenable to computer solution unless the problem is defined precisely.
Inspired by the work of others, we offer a precise definition of the
problem without invoking unmotivated stochastic forces.
A weakness of this--and other--approaches is the need for the user
to identify an appropriate set of collective variables.
In principal at least, the quality of the result can be checked
a posteriori by calculating committor values from dynamics trajectories.
The approach we advocate leads to a nonstandard minimum free energy
path that is more reasonable physically.
Using the BlueGene computer at IBM Research we have performed extensive simulations on a mutant construct of lambda repressor. The protein consists of 80 amino acids stuctured as a five helix bundle in the folded state. The mutant was designed in the Gruebele lab at UIUC to exhibit sub-10 microsecond folding times in laser temperature jump experiments. Our simulations employed replica exchange and traditional molecular dynamics at a number of different temperatures. Although it is very difficult to thoroughly equilibrate and sample molecular systems of this size and complexity, our simulations clearly reveal very complex folding behavior. In particular, different structural elements exhibit different degrees of thermodynamic stability and melt at different temperatures. Some of the structural transitions appear to be relatively cooperative, whereas others are quite diffuse.
We focus on the problem of determining the structure of complexes formed by the association of two proteins by searching for the global minimum of a function approximating the free energy of the complex. Solving this problem requires the combination of a number of different optimization methods. First we explore the conformational space by systematic global search based on the Fast Fourier Transform (FFT) correlation approach that evaluates the energies of billions of docked conformations on a grid. We show that the method can be efficiently used with pairwise interactions potentials that substantially improve the docking results. A new 5D FFT algorithm is also discussed. The 1000 best energy conformations are clustered, and the 30 largest clusters are retained for refinement. The conformations are refined by a new medium-range optimization method that has been developed to locate the global minima within well defined regions of the conformational space. In each cluster, the energy of the complex is a very noisy funnel-like function on the space of rigid body motions, the Euclidean group SE(3). The Semi-Definite programming based Underestimation (SDU) method constructs a convex quadratic under-estimator to the energy funnel based on a sample of the local mimima of the energy function, and uses the quadratic under-estimator to guide future sampling. We show that the parameterization of SE(3) has a significant impact on the effectiveness of SDU and introduce a parameterization that dramatically reduces the number of very costly energy function evaluations. We also discuss the application of the combined method to recent targets in CAPRI (Critical Assessment of Protein Interactions), the first community-wide docking experiment.
Transmembrane beta-barrel (TMB) proteins are embedded in the outer membrane of
Gram-negative bacteria, mitochondria, and chloroplasts. Despite their
importance, very few nonhomologous TMB structures have been determined by X-ray
diffraction because of the experimental difficulty encountered in crystallizing
transmembrane proteins. We introduce the program partiFold to investigate the
folding landscape of TMBs. By computing the Boltzmann partition function,
partiFold estimates inter--strand residue interaction probabilities, predicts
contacts and per-residue X-ray crystal structure B-values, and samples
conformations from the Boltzmann low energy ensemble. This broad range of
predictive capabilities is achieved using a single, parameterizable grammatical
model to describe potential beta-barrel supersecondary structures, combined with
a novel energy function of stacked amino acid pair statistical potentials.
PartiFold outperforms existing programs for inter--strand residue contact
prediction on TMB proteins, offering both higher average predictive accuracy as
well as more consistent results. Moreover, the integration of these contact
probabilities inside a stochastic contact map can be used to infer a more
meaningful picture of the TMB folding landscape, which cannot be achieved with
other methods. Partifold's predictions of B-values are competitive with recent
methods specifically designed for this problem. Finally, we show that sampling
TMBs from the Boltzmann ensemble matches the X-ray crystal structure better
than single structure prediction methods. A webserver running partiFold is
available at http://partiFold.csail.mit.edu/.
Joint work with: Charles W. O'Donnell, Srini Devadas, Peter Clote and Bonnie
Berger.
References:
[1] J. Waldispühl*, C.W. O'Donnel*, S. Devadas, P. Clote and B. Berger,
Modeling Ensembles of Transmembrane beta-barrel Proteins,
PROTEINS: Structure, Function and Bioinformatics, published online 14 Nov. 2007.
doi:10.1002/prot.21788
(* authors equally contributed)
[2]J. Waldispühl, B. Berger, P. Clote and J.-M. Steyaert,
Predicting Transmembrane beta-barrels and Inter-strand Residue Interactions from
Sequence,
PROTEINS: Structure, Function and Bioinformatics, vol. 65, issue 1, p.61-74,
2006.
doi:10.1002/prot.21046
We show that diffusion can play an important role in proteinfolding
kinetics. We explicitly calculate the diffusion coefficient of
protein folding in a lattice model. We found that diffusion typically
is configuration- or reaction coordinate-dependent. The diffusion
coefficient is found to be decreasing with respect to the progression
of folding toward the native state, which is caused by the
collapse to a compact state constraining the configurational space
for exploration. The configuration- or position-dependent diffusion
coefficient has a significant contribution to the kinetics in
addition to the thermodynamic free-energy barrier. It effectively
changes (increases in this case) the kinetic barrier height as well as
the position of the corresponding transition state and therefore
modifies the folding kinetic rates as well as the kinetic routes. The
resulting folding time, by considering both kinetic diffusion and
the thermodynamic folding free-energy profile, thus is slower than
the estimation from the thermodynamic free-energy barrier with
constant diffusion but is consistent with the results from kinetic
simulations. The configuration- or coordinate-dependent diffusion
is especially important with respect to fast folding, when there is
a small or no free-energy barrier and kinetics is controlled by
diffusion. Including the configurational dependence will challenge
the transition state theory of protein folding. The classical transition
state theory will have to be modified to be consistent. The
more detailed folding mechanistic studies involving phi value
analysis based on the classical transition state theory also will have
to be modified quantitatively.
Rigorous, quantitative, and atomic scale description of complex biological systems is a grand challenge. Explicit description of biomolecules and their aqueous environment, including solvent, co-solutes, and mobile ions, is prohibitively expensive although a variety of methods, including Ewald summations, Euler summations, periodic images and reaction field theory, have been developed in the past few decades. Therefore, multiscale analysis is an attractive and sometimes indispensable approach. Implicit solvent models which treat the solvent as a macroscopic continuum while admitting a microscopic atomic description for the biomolecule, are efficient multiscale approaches to complex, large scale biological systems. We summarize recent advances in mathematical methods for the Poisson-Boltzmann (PB) equation based implicit solvent theory. These include the rigorous mathematical treatments of molecular surface interfaces, surface geometric singularities, charge singularities and associated force evaluation for molecular dynamics.
Small single-domain proteins often exhibit only a single free-energy
barrier, or transition state, between the denatured and the native state.
The folding kinetics of these proteins is usually explored via mutational
analysis. A central question is which structural information on the
transition state can be derived from the mutational data. To interpret these
data, we have developed models that are based (a) on the substructural
cooperativity of helices and hairpins, and (b) on splitting up
mutation-induced stability changes of a protein into components for its
substructures. We obtain a consistent structural interpretation of
mutational Phi-values by fitting few parameters that describe the degrees of
structure formation of helices and hairpins in the transition state. Our
models explain how mutations at a given site can lead to different
Phi-values, and capture non-classical Phi-values smaller than 0 or larger
than 1, which have been difficult to interpret. Non-classical Phi-values
simply arise, e.g., if mutations stabilize a helix or hairpin, but
destabilize its tertiary interactions.
References:
[1] C. Merlo, K. A. Dill, and T. R. Weikl, PNAS 102, 10171 (2005).
[2] T. R. Weikl and K. A. Dill, J. Mol. Biol. 365, 1578 (2007).
[3] T. R. Weikl, Biophys. J., in press (2008).
In order to understand protein folding, we need to understand both folded
and unfolded state structure. One of the key systems for these studies is
the 36 residue villin headpiece helical subdomain (HP36) because of its
simple topology, small size and fast folding properties. Structures of HP36
have been determined using X-ray crystallography and NMR spectroscopy, with
the resulting structures exhibiting clear structural differences. We
complement the existing data by using molecular dynamics simulations and
experimental double mutant cycles to show that the x-ray structure is the
better representation in solution at neutral conditions. Denatured state
studies using fragment analysis coupled with relatively low resolution
spectroscopic techniques show a small tendency to form locally stabilized
structure. Using standard Replica Exchange Molecular Dynamics, our
simulations show that the first helix contains the most native-like helical
structure of all three helices. Overall, our analysis shows how theoretical
and experimental collaborative efforts can help aid in the understanding of
the dynamic nature of the folding pathway.
We examine the temperature dependence of the folding time in
the Trp-cage mini-protein, using all-atom replica exchange
molecular dynamics (REMD) simulations.
This is done by using an "equation free" approach.
The central idea is to use REMD simulations to generate
appropriately initialized bursts of atomistic simulation
trajectories to obtain the drift and diffusion coefficients of
coarse variables of interest to capture quantitatively the
coupling of fast with slow moving degrees of freedom.
Then we employ a stochastic (Langevin) dynamic approach to
follow the evolution of coarse variables and compute a
distribution of folding times that is consistent with the
drift and diffusion coefficients obtained from all-atom
simulations. We use physically motivated order parameters as
coarse variables. We describe the distribution of folding
times as a function of temperature for the Trp-cage
mini-protein.
A novel method has been developed to describe the protein folding shape
structures. A set of 27 vectors is generated from an enclosed geometric
space to describe the protein backbone folding shapes. This algorithm has
mathematically reserved all possible folding shapes in space, and it is
capable of making the complete assignment of folding shapes along the
protein backbone without any gap. All possible types of folding can be
uniquely described, including the regular protein secondary structures,
irregular turn and loop structures and even rare possible folding
structures. This method offers a simple one-dimensional description for the
complicated three-dimensional folding structures which is able to align with
the protein sequence for structural comparison. The results are compared
with the protein data bank (PDB) and the structural assignments of other
methods. This method has the ability to reveal the protein structural
similarity and dissimilarity with the accurate and consistent meaning.