Accounting for the instantaneous disorder in the enzyme-substrate Michaelis complex to calculate the Gibbs free energy barrier of an enzyme reaction

Many enzyme reactions present instantaneous disorder. These dynamic fluctuations in the enzyme-substrate Michaelis complexes generate a wide spread of energy barriers that cannot be experimentally observed, but that determines the measured kinetics of the reaction. These individual energy barriers can be calculated using QM/MM methods, but then the problem is how to deal with this dispersion of energy barriers to provide kinetic information. So far, the most usual procedure implies the so-called exponential average of the energy barriers. In this paper we discuss the foundations of this method, and we use the free energy perturbation theory to derive an alternative equation to get the Gibbs free energy barrier of the enzyme reaction. In addition, we propose a practical way to implement it. We have chosen four enzyme reactions as examples. In particular, we have studied the hydrolysis of a glycosidic bond catalyzed by the enzyme Thermus thermophilus β-glycosidase, and the mutant Y284P Ttb-gly, and the hydrogen abstraction reactions from C 13 and C 7 of arachidonic acid catalyzed by the enzyme rabbit 15-lipoxygenase-1.


Introduction
2][3] This huge acceleration makes possible that the vital processes for living beings take place in a time scale compatible with life.
Enzymes are very far from being rigid structures.They are highly flexible molecules.
Although the origin of the catalytic power of enzymes (electrostatic based catalysis or dynamical contributions) is still under discussion, 4 enzyme dynamics is relevant for the rate constants, no matter by what force enzyme catalysis is driven.Once the enzymesubstrate Michaelis complex is formed, the set of nuclei of the system samples a huge number of configurations on a highly dimensional potential energy surface which can include one or many reactant valleys. 5,68][9] Thus, catalytic rates of enzymes often display very large dynamic fluctuations over timescales much longer than kcat , mostly associated with slow conformational changes that connect the different reactant valleys.1][12][13][14][15][16] In these cases the interconversion rates among the corresponding reactant valleys might be much slower than the catalytic rate constants, and very long molecular dynamics simulations would be required to produce transitions among them and, especially, to reach statistical equilibrium.Thus, ergodicity (that is, that all the states of the ensemble will be populated) does not apply in practice on the full configurational ensemble of the potential energy surface corresponding to the enzyme-substrate Michaelis complex.The population of each reactant valley is not governed by thermodynamic control, but by kinetic control and each reactant valley acts as an independent enzyme with a different rate constant (the overall rate of product formation is determined by an experimentally detected distribution of rate constants).Dynamic disorder is an experimentally observed phenomenon.When only a reactant valley exists (or in the case of fast interconversion among the valleys), dynamic disorder does not longer appear, and a single rate constant can describe the enzyme reaction.In this case, another dynamic, but entirely different concept, instantaneous disorder, might still take place. 17Instantaneous disorder can be defined as instantaneous dynamic fluctuations in the enzyme-substrate Michaelis complex with timescales much smaller than kcat, leading to different energy barriers that cannot be individually detected by the experiment.
A reactant valley includes an ensemble of nuclear configurations of the enzyme-substrate Michaelis complex that can be extensively sampled by means of a long-time molecular dynamics simulation.The interconversions among all the low-energy configurations within the same reactant valley usually consist in relatively easy local conformational changes and they are quite fast at physiological temperatures.These interconversions occur with timescales much smaller than kcat.In these conditions those nuclear configurations of the reactant valley are in local equilibrium (they are populated according to a Boltzmann distribution) with each other, and the whole valley contributes to the reaction flux with a single kcat.The landscape within this reaction valley is highly ridged, with a huge number of minimum-energy structures.Starting from different initial nuclear configurations (snapshots) generated by equilibrium molecular dynamics simulations, quantum mechanics/molecular mechanics (QM/MM) methods can build multiple minimum-energy paths (MEPs), each of them connecting a potential energy minimum inside the reactant valley with a potential energy minimum in the associated product valley, crossing through the corresponding transition state structure.8][19][20][21][22][23][24][25][26][27][28][29][30][31][32] Thus, for instance, free energy barriers ranging from 14.5 kcal/mol to 51.3 kcal/mol have been calculated 17 for the first step of the reaction catalyzed by HIV-1 protease.This dispersion was attributed to different mechanisms, different conformations of the active center, and to variations in the electrostatic environment of the active site.
The point now is how to take advantage of the calculation of the distribution of the potential energy barriers to obtain information about the experimental rate constant kcat, or, at least, the Gibbs free energy barrier ΔGº ‡ of the catalytic reaction when only a reactant valley exists.In other words, how does instantaneous disorder determine the kinetics of the reaction?A discussion about this matter is the goal of this paper.We derive, based on free energy perturbation theory, an equation to calculate this free energy barrier that presents some important conceptual differences with respect to the common approaches, and propose a practical way to implement it.We apply all that to four enzyme reactions.

Theoretical basis
Let us assume that an enough lengthy molecular dynamics simulation that covers a single reaction valley has been carried out for a given enzyme-substrate Michaelis complex.Then, a wide set of snapshots (that is, instantaneous nuclear configurations) are randomly chosen from the generated molecular dynamics trajectory.The energy optimization of these snapshots leads to a set of minimum energy structures that can be used as starting structures to build minimum potential energy paths that lead to the corresponding products.As mentioned above, a wide dispersion of QM/MM potential energy barriers ΔV ‡ (or energy barriers ΔE ‡ if the thermal and zero-point energy corrections are included) are expected to be obtained because of the instantaneous disorder.From here on we assume that ΔV ‡ and ΔE ‡ do not significantly differ.Several different procedures to average these barriers have been proposed. 33The so-called exponential average has been the most used: 24,[27][28][29][30][31][32][33][34][35] where R is the gas constant, T is the temperature, ∆  ‡ is the energy barrier for each snapshot, and N is the number of snapshots selected.
Equation 1 is obtained if several assumptions are fulfilled: a) The reaction valley can be decomposed into a huge number of strictly non-overlapping reactant sub-valleys.b) The ensemble of structures inside each reactant sub-valley α evolves to the corresponding product sub-valley with a rate constant   that can be calculated through the conventional transition state theory as a function of the Gibbs free energy barrier associated to that subvalley α, Δ  ° ‡: where   is the Boltzmann constant and ℎ is the Planck constant; c) Because of all the reactant sub-valleys composing the whole reactant valley are also in local equilibrium among them, the overall rate constant   can be obtained by arithmetic average of a representative number of   weighed according to the population   of their corresponding reactant sub-valleys α (that is, according to their relative free energies): d) Each snapshot i chosen along the molecular dynamics trajectory can be classified as belonging to a particular reactant sub-valley α.After energy optimization of this snapshot, the corresponding minimum potential energy path is calculated and the value of the Anyway, the main problem when equation 1 is applied lies in the way how the considered snapshots are chosen.Because in the sum in equation 1 the smaller the value of ∆  ‡ the bigger the contribution, it has been suggested that the best result should be obtained by selecting or filtering in some way, e.g. based on catalytic parameters, a few snapshots with expected low energy barriers. 24The limit case would consist of just picking up the snapshot with the smallest ∆  ‡ .Both approaches are commonly used in QM/MM studies of enzyme reactions.However, it has been accepted 33 that the high-energy barrier snapshots do not significantly contribute to that sum, but to the number N of snapshots used.So, if those high-energy barrier snapshots are neglected, ∆  ‡ will be underestimated.
The accuracy of ∆  ‡ as an estimation of the Gibbs free energy barrier ΔGº ‡ is usually 33 assessed taking as a reference a calculation of the potential of mean force (PMF) 36,37 from umbrella sampling simulations. 38At this point it would be interesting to think about the meaning of the free energy.Free energy is used to account for the contribution of not only a unique structure, but of an ensemble of structures.If only a unique structure is considered, the logarithmic transformation of the unique Boltzmann term leads to an energy.If an ensemble of structures is considered, the sum of the corresponding Boltzmann terms gives the partition function, and its logarithmic transformation leads to the free energy.However, we cannot in practice obtain the global partition function for the whole reactant valley.Then, if only one reactive minimum is chosen, with only one associated MEP and only one PMF, only an estimate of the value Δ  ° ‡ corresponding to a reactant sub-valley α will be obtained.That is, in building a PMF just a quite limited region of the configuration space around the MEP is actually sampled, very far from the whole reactant valley.Both, the reduced time length (up to tens of picoseconds in QM/MM calculations) per each window and the biasing potential to sample the window associated to each value of the reaction coordinate, confine the sampling just to the neighborhood of the selected MEP.By the way, what MEP? Depending on the chosen MEP, a high or a low free energy barrier will be obtained.Even in the case of a low free energy barrier, the probability of appearance of the corresponding starting reactant structure (and, hence, of its low barrier) remains unknown.The free energy perturbation method as usually applied 38 as an alternative to umbrella sampling presents a similar problem.In a nutshell, in cases where instantaneous disorder is relevant, with the appearance of a huge dispersion of nuclear configurations and barriers throughout the reactant valley, none of the particular values of Δ  ° ‡ might be actually representative of the free energy barrier ΔGº ‡ of the catalytic reaction.So, the comparison of ∆  ‡ obtained by means of eq. 1 (where the snapshots i will be obtained from, e.g., a 100 ns long molecular dynamics trajectory) with a Δ  ° ‡ value calculated from umbrella sampling simulations (with per window samplings up to tens of picoseconds) makes not much sense.
We propose here a more reliable way to deduce and apply an equation quite similar to equation 1.This deduction does not make any of the assumptions a) to f) detailed above.
Free energy perturbation (FEP) 38 theory can be used to calculate the free energy difference between a reference system (0) with a Hamiltonian H0 and a perturbed system (1) characterized by a Hamiltonian H1.For a molecular dynamics simulation in an isothermal-isobaric ensemble (NPT) the Gibbs free energy difference between the perturbed 1 and the reference 0 systems can be classically written as where  (𝑁) and  (𝑁) denotes the set of 3N atomic Cartesian coordinates and conjugate momenta, respectively, H is the classical nuclear Hamiltonian, V0 and V1 are the volume of the systems 0 and 1, respectively, at each configuration (the additive term PV in the exponential of the integral is absent in a canonic (NVT) ensemble), R is the gas constant, and 〈 〉 0 denotes an ensemble average over the isothermal-isobaric ensemble sampled from the equilibrated reference state.Because the molecular dynamics simulations only sample one reaction valley, there are not significant differences between V0 and V1 nor significant changes of the volume V0 along the simulation.Then the volumes can be ignored in equation 4, and the simulation can be even carried out in the canonical ensemble (NVT) without appreciable error.Thus, to get the free energy difference between state 1 and 0, at each configuration of the equilibrated reference system the corresponding Hamiltonian is switched to the one corresponding to the perturbed system, and the differences Δ =  1 −   are collected to calculate the ensemble average.
Using Cartesian coordinates, the kinetic term is separable from the potential term and cancels out when the difference is done.
To study the instantaneous disorder in an enzyme reaction we propose the following protocol.We define the reactant ensemble as the reference system 0, and the transition state (TS) ensemble as the perturbed system 1.A molecular dynamics simulation of the enzyme-substrate Michaelis complex is run to generate a lot of nuclear configurations.
Many snapshots must be selected at random (that is, populating many different regions of the reaction valley).For each selected snapshot, a potential energy minimum is located, and the corresponding MEP is built using a QM/MM method.Thus, each selected snapshot of the reference system 0 is associated to a transition state structure that belongs to the perturbed system 1.In this case, for each snapshot i Δ = ∆  ‡ (or even ∆  ‡ ), the QM/MM potential energy barrier of the corresponding MEP.Taking into account that the set of snapshots along the molecular dynamics simulation appears according to a Boltzmann distribution in the reference ensemble, the average in the second member of At this point it is crucial to understand that any attempt to select only a few "suitable" snapshots with "convenient" configurations that lead to low energy barriers to be included in equation 1 is an inadequate procedure, which only fortuitously will provide a good estimation of the experimental free energy barrier.Conversely, as many selected snapshots as possible, belonging to as many different regions of the reaction valley as possible, should be included in the exponential average.Evidently, this fact could imply the calculation of a too huge number of QM/MM MEPs to be feasible with a reasonable computational cost.To circumvent this problem, a tested criterion (depending on each particular enzyme reaction) should be used to identify a priori those snapshots corresponding to precatalytic structures, that is, belonging to the limited part of the configurational space sampled by molecular dynamics simulations that might be adequate for catalysis (with a not foreseen excessively high energy barrier).As explained above, the high-energy barrier snapshots do not significantly contribute to the sum in the second member of equation 1 (they just add roughly zeros), but they do increase the value of N in the denominator and do have to be accounted.
Taking into account all that, if N is the high number of randomly selected snapshots along the molecular dynamics simulation, and n is the number of precatalytic structures identified after the corresponding filter according to the adopted criterion, we can reach the following equation This equation reflects the fact that the Gibbs free energy barrier of the enzyme reaction is due to two different contributions.The first term of the second member of the equation indicates how difficult it is for a precatalytic structure to appear throughout the sampling.
The second term comes from the exponentially averaged contribution of the precatalytic structures (note that this average can be calculated using just a representative set of the precatalytic structures, in such a way that n in this second term becomes the number of structures included in that representative set).As a consequence, both the non-precatalytic and the precatalytic structures contribute to the Gibbs free energy barrier of the enzyme reaction, and the weight of each part is unknown a priori.Even more, in general, the precatalytic structures will present a considerable dispersion of energy barriers, and the existence of a number of high energy barriers will also provoke an increase of the Gibbs free energy.Then, it must be emphasized that the free energy barrier according to equation 5 is the result of a long sampling over the reactant valley.Conversely, the free energy barrier resulting from a PMF calculation just includes a quite short sampling in the neighborhood of only one MEP, and therefore it just explores a quite narrow region of the reactant valley.
To illustrate all those concepts, we will focus on four enzyme reactions.Firstly, the hydrolysis of a glycosidic bond catalyzed by the enzyme Thermus thermophilus βglycosidase (Ttb-gly), a retaining glycosyl hydrolase, 39 and the mutant Y284P Ttb-gly.

Hydrolysis of a glycosidic bond catalyzed by the enzyme Thermus thermophilus βglycosidase
We started with an enzyme reaction where both the molecular dynamics simulations of the enzyme-substrate Michaelis complex and the potential energy profiles corresponding to each selected snapshot were calculated at a QM/MM level, specifically, at the QM(SCC-DFTB)/AMBER level.In a previous paper 39 we studied the reaction mechanism of those glycosylation and deglycosylation steps (both for hydrolysis and transglycosylation).Now we take advantage of those simulations to build the setup for the study of the deglycosylation step corresponding to the hydrolysis of pNP-Fuc.We will present here the results corresponding to both wild-type Ttb-gly and the mutant Y284P Ttb-gly.

Methodological details for the enzyme Thermus thermophilus β-glycosidase
The starting structure to study the hydrolysis reaction catalyzed by wild-type Ttb-gly in this paper was the covalent glycosyl-enzyme intermediate obtained in our previous work 39 as a result of the glycosylation process.The corresponding intermediate located in the case of the mutant Y284P Ttb-gly (result not published) was used for the hydrolysis reaction catalyzed by that mutant.The intermediate was modified by deleting all the pNP coordinates except those of the just formed hydroxyl group (due to the proton transfer from Glu164) that was converted into a water molecule using the PyMol program. 40The system was solvated in a cubic box of preequilibrated TIP3P 41 water molecules and three sodium ions were added to neutralize it.More than 70,200 atoms were included in all.
The coordinates of the system were relaxed at the MM level by restraining all protein atoms and the just built water molecule (sugar acceptor).Then, a short-restrained MD simulation (210 ps) at the MM level to allow the rearrangement of water molecules in the active site was performed under periodic boundary conditions (PBC) in the NVT canonical ensemble at 300 K.The temperature was controlled by Langevin dynamics. 42l the MM calculations were done using the ff14SB 43 Amber force field for the protein, and the GLYCAM06j 44 atom types and parameters were employed for the sugar moiety.
Along the MD simulations the covalent bonds containing hydrogen were constrained by means of the SHAKE algorithm 45 , and the long-range electrostatic interactions were treated using the particle-mesh Ewald method. 46A 1fs time step was used. in all the MD trajectories.All simulations were carried out employing the AMBER16 software (GPU (CUDA) version of the PMEMD 47,48 package.
As will be explained the section 3.2, the final snapshot of the MD simulation at the MM level was chosen to initiate a QM(SCC-DFTB)/AMBER MD simulation, from which several snapshots were chosen to generate a lot of potential energy profiles corresponding to the hydrolysis reaction (see below).The potential energy profiles were calculated at the QM(SCC-DFTB)/AMBER level with the modular program package ChemShell 49,50 employing TURBOMOLE 51 to obtain the QM energies and gradients at the SCC-DFTB 52 (Self-Consistent Charge Density Functional Tight-Binding) level.MM energies and gradients were evaluated using DL_POLY 53 , implemented in the ChemShell package, using the AMBER force field.The electrostatic embedding scheme 54 was employed to let the MM point charges to polarize the electronic density of the QM region.No cutoffs were introduced for the nonbonding MM and QM/MM interactions.The cubic box of water molecules was reduced to a 30 Å sphere surrounding the full protein for all the QM/MM calculations.All residues and water molecules within 15 Å from the anomeric carbon C1 were allowed to move during the optimization process (active region, around 2100 atoms), while the remaining atoms were kept fixed.The QM region includes the fucose ring, eight nearby waters, and the sidechains of Glu164, Glu338, and Tyr284 (67 atoms in all).Three link atoms were used to define the QM/MM boundary with the charge-shift approach. 36The total charge of the QM regions is -1 a.u.
The QM/MM optimizations were performed using the limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) 55 algorithm combined with the Hybrid Delocalized Internal Coordinate Scheme. 56The reaction paths were scanned by performing harmonically restrained optimizations along a suitable reaction coordinate (see below) in steps of 0.1 Å.For each potential energy profile, the highest potential energy structure was taken as an approximation to the corresponding transition state structure.

Results for wild-type Ttb-gly
As explained in section 3.1.,the final snapshot of the MD simulation at the MM level was chosen as starting point for a QM(SCC-DFTB)/AMBER DM simulation without any restraints.Firstly, the system was heated in the NVT canonical ensemble increasing the temperature along 60 steps, 100 fs each, up to 300 K.Then, an equilibration period of 20 ps was performed.A final production run of 100 ps was done.
In order to apply equation 5 we selected N = 2000 snapshots uniformly distributed at equal intervals along that MD simulation.It is not possible in practice to calculate the hydrolysis potential energy paths corresponding to these N snapshots.For this reason, we looked for the precatalytic structures.It is evident that only those snapshots that involve a water molecule placed in the right place for the hydrolysis reaction to be possible are precatalytic, in the sense that only those will imply a reasonable potential energy barrier for the hydrolysis.Thus, we searched precatalytic snapshots where (see Figure 1) there is a water molecule with one of its hydrogen atoms closer than 2 Å to the proton acceptor oxygen atom (OE2GLU164) of Glu164, and with its oxygen atom closer to the anomeric carbon atom C1 of the fucose ring than any of its hydrogen atoms.According to this filter, only n = 146 precatalytic snapshots were identified.The energy of each one was minimized at the QM(SCC-DFTB)/AMBER level to obtain the associated reactant minimum.From these structures the corresponding QM(SCC-DFTB)/AMBER hydrolysis potential energy paths were calculated using the difference between the distance HWAT-OWAT and the distance C1-OWAT (OWAT is the oxygen atom of the water molecule that is in the right place to produce the hydrolysis reaction, and HWAT is its hydrogen atom that is going to protonate Glu164) to define the reaction coordinate R2.
As expected, a wide dispersion of potential energy barriers was obtained (they range from 20.9 kcal/mol up to 31.8 kcal/mol), as seen in Figure 2. The arithmetic mean is 26.2 kcal/mol with a standard deviation of 2.4 kcal/mol.Although these values can be indicative of the size of the barriers and their dispersion, they cannot be used to estimate the free energy.Although it is not truly relevant four our purpose in this paper, we can mention that two kinds of potential energy paths are obtained.Some of them reach the minima corresponding to the products at a value of R2 about -0.5 Å, and the others already at the positive region of R2.This delay is caused by the fact that the proton transfer to OE2GLU164 is not included in R2.Consequently, some hysteresis appears in the region of products of some energy paths, delaying the hydrogen transfer and reaching the product minima at more advanced values of R2 (that is, allowing that Glu338 gets further away from the anomeric carbon atom C1 of the fucose ring).
It is interesting to observe that the filter we have used leads to 146 precatalytic snapshots where the distribution of distances C1-OWAT (that is not included as a criterium for the filter) is relatively concentrated near 3.2 Å (see Figure 2a), with some structures with distances C1-OWAT slightly shorter (by 0.2 Å -0.3 Å).In addition, we can see in Figure 3b that there is no correlation at all between these distances C1-OWAT and the potential energy barriers.Likewise, there is no correlation either between the distances corresponding to the proton to be transferred -oxygen atom (OE2GLU164) of Glu164 (also very grouped) and the potential energy barriers, although in this case a very slight increasing trend of the barriers as the distance augments can be glimpsed (see Figure 3c).kcal/mol.In all, the Gibbs free energy barrier amounts to 24.4 kcal/mol.It can be clearly seen that the contribution of the non-precatalytic structures is not negligible at all, and that it appears even in the case of short molecular dynamics simulations (100 ps).

Results
for the mutant Y284P Ttb-gly.
Then, we studied the same reaction enzyme but with quite smaller energy barriers.To this aim we choose the mutant Y284P Ttb-gly.We proceeded with this mutant in the same way as for wild-type Ttb-gly.Again, we selected N = 2000 snapshots uniformly distributed at equal intervals along the QM(SCC-DFTB)/AMBER 100 ps molecular dynamics simulation without any restraints.Using the same filter, we found this time n = 202 precatalytic snapshots.The energy of each one was minimized at the QM(SCC-DFTB)/AMBER level to obtain the associated reactant minimum.From these structures the corresponding QM(SCC-DFTB)/AMBER hydrolysis potential energy paths were calculated using the difference between the distance HWAT-OWAT and the distance C1-OWAT to define the reaction coordinate R2.Now, even a wider dispersion of potential energy barriers was obtained (they range from 12.4 kcal/mol up to 33.6 kcal/mol), as seen in Figure 4.The arithmetic mean is 23.1 kcal/mol with a standard deviation of 4.7 kcal/mol.Some delay of the hydrogen transfer also appears in some cases, so causing again the existence of two kinds of potential energy paths differing in the product's region.
According  Finally, we studied two enzyme reactions where the molecular dynamics simulations of the enzyme-substrate Michaelis complex were run at MM level and the potential energy profiles corresponding to the selected snapshots were calculated at QM(DFT)/MM level.
The MM level for the molecular dynamics simulations makes it possible to run trajectories a thousand times longer than in the case of Ttb-gly.
Lipoxygenases (LOXs) are non-heme iron dioxygenases that catalyze the oxidation of polyunsaturated fatty acids, producing conjugated hydroperoxide fatty acids.These hydroperoxidations turns out to be very highly regio-and stereospecific.The rate determining step of the overall hydroperoxidation process is the initial hydrogen atom ΔV (kcal/mol)

R2 (Å)
8][59][60][61] In mammals, the main substrate of LOXs is arachidonic acid, whose hydroperoxidation initiates the biosynthesis of signaling lipid mediators that are crucial for human health and are implicated in inflammatory processes. 62e crystallographic structure for rabbit reticulocyte-type 15-lipoxygenase-1 (15-rLOX-1) dimer was the first one resolved for a mammalian LOX (PDB code 2P0M). 6315-rLOX-1 converts AA into hydroperoxyeicosatetraenoic acids (HPETEs) with the following distribution of products: 97 % of 15S-HpETE (after abstracting one hydrogen atom at C13 of AA), 3 % of 12S-HpETE (after abstracting one hydrogen atom at C10 of AA).No signal of 9S-HpETE (after abstracting one hydrogen atom at C7 of AA) was detected. 64 the present paper we will study the hydrogen transfer reactions from C13 and C7 of AA catalyzed by the enzyme rabbit 15-lipoxygenase-1.The case of the hydrogen abstractions from C7 was chosen because their energy barriers are very high.To our purpose, no matter that hydrogen abstraction from C7 is actually too slow to be competitive.

Methodological details for the enzyme rabbit 15-lipoxygenase-1.
The monomer A and the ligand bound at the active site of monomer B in the x-ray structure of rabbit 15-LOX-1 dimer 63 were removed.The monomer B was protonated with the H++ web-server. 65,66A pH = 6.5 for the titratable residues was employed.The protonation state for the iron coordination sphere was corrected by hand to ensure a correct description.
The program GOLD5.8.0 67 was employed to carry out docking calculations for AA within the pocket of the monomer B. The conformational space was explored using the genetic algorithm.The binding site was defined as a 20 Å radius sphere centered on the iron atom.
Binding free energies were estimated by the ChemScore fitness function.
The best pose was chosen to run an MD simulation.The ff14SB 43 Amber force field was used for the protein atoms, whereas the specific parameters for AA were taken from a previous work. 68The GAFF2 69,70 library was employed as the source for these parameters.Employing the MCPB.pyprocedure 71 within the bonded model and the Seminario method for the force constants calculations, 72 specific MM parameters were developed for the iron atom and its first coordination sphere (His361, His366, His541, His545, Ile663 and the OH -group). 73This first coordination sphere was optimized employing the B3LYP/6-31G(d) level of theory and their atomic charges were assigned using the Merz-Kollman RESP procedure. 74e tLeap program was used to combine enzyme and substrate files into the proteinsubstrate complex, and to solvate it with an orthorhombic box of pre-equilibrated TIP3P 41 waters as well as to neutralize its total charge by addition of sodium cations.The final system contains nearly 92150 atoms of which about 10600 belong to the protein.The rest of atoms correspond to water molecules and salt ions.All molecular dynamics simulations were run using the AMBER 14 GPU (CUDA) version of the PMEMD package 47,48 as mentioned above.To begin with, the Michaelis complex was submitted to 22000 energy minimization steps combining the steepest descent and conjugate gradient methods to remove bad contacts.In the first 6000 steps, harmonic restraints were applied to the protein and substrate atoms with a force constant of 5.0 kcal mol -1 Å -2 , so that only the solvent and ions were relaxed.In the following 6000 steps, harmonic restraints were applied to the protein backbone and the substrate heavy atoms with the same force constant as before.Nonetheless, the whole complex was kept free during the last 10000 steps.After that, MD simulations using periodic boundary conditions were carried out.
The system was gradually heated from 0 K to 300 K for a period of 200 ps.The following step consisted of running a MD simulation of 1 ns within the NTP ensemble (300 K, 1 bar) to adjust the volume of the orthorhombic box so that a density of around 1 g cm -3 was achieved.During the heating and the compressing, harmonic restraints were applied to the protein backbone and the substrate heavy atoms with a force constant of 5.0 kcal mol -1 Å -2 , whereas the rest of the system was kept without restraints.The temperature was controlled by Langevin dynamics, 42 while the pressure was adjusted by the Berendsen barostat. 75Next, an equilibration stage of 10 ns at constant temperature (300 K) and volume was performed.Finally, a production period with a length of 100 ns was run within the NVT ensemble.A time step of 2 fs was used along the whole MD trajectory.
All bonds and bends containing hydrogen atoms were also constrained by the SHAKE algorithm. 45The non-bonding interactions have been calculated with a cutoff of 9 Å.
The QMM/MM potential energy profiles were calculated at the QM(DFT)/AMBER level following the procedure described in section 3.1.To carry out the initial QM/MM optimizations for each selected snapshot all water molecules outside a 17 Å radius volume centered on AA were removed.The active region includes all residues and water molecules inside a 15 Å radius sphere centered on either C13 of AA when the H13-proS hydrogen abstraction is considered, or C7 when the H7-proS hydrogen abstraction is studied.energy barrier.Furthermore, the distance from the corresponding carbon atom, C13 or C7, to that oxygen atom had to be greater than the distance from H13-proS or H7-proS, respectively, so ensuring that the corresponding C−H bond is properly oriented for the hydrogen abstraction (see Figure 5).Applying this filter, only n = 3383 or n = 33 precatalytic snapshots were identified for the H13-proS or H7-proS abstractions, respectively.
25 structures were randomly selected as representative of each set of precatalytic snapshots.The energy of each one was minimized at the QM(B3LYP)/AMBER level to obtain the associated reactant minimum.From these structures the corresponding QM(B3LYP)/AMBER potential energy paths were calculated using as reaction coordinate the difference between the distance of the breaking bond (C13−H13-proS or C7− H7-proS, for the H13-proS or H7-proS abstractions, respectively) and the forming bond (H13-proS −O or H7-proS −O, respectively).The corresponding potential energy profiles are displayed in Figures 6 and 7. Again, a wide dispersion of potential energy barriers was obtained.
They range from 12.8 kcal/mol up to 26.5 kcal/mol in Figure 6, and from 39.4 kcal/mol up to 66 kcal/mol in Figure 7.The arithmetic mean is 18.9 kcal/mol with a standard deviation of 4.2 kcal/mol for the H13-proS hydrogen abstraction, whereas the arithmetic mean is 49.6 kcal/mol with a standard deviation of 6.3 kcal/mol for the H7-proS hydrogen abstraction.As expected, the potential energy barriers corresponding to the H13-proS abstraction are quite smaller than the ones corresponding to the H7-proS abstraction.There is a quite considerable dispersion of values of relevant geometrical parameters along the 100 ns molecular dynamics simulation.This can be seen both following the evolution on time of the distances C13−OH -and C7−OH -(see Figure 8) and watching the histograms of those distances (see Figure 9).Indeed, the filtering leading to the precatalytic snapshots largely reduce this geometrical dispersion, although a wide dispersion of potential energy barriers holds and no correlation with the distances C13− OH -or C7−OH -can be found (Figure 10).

Conclusions
Instantaneous disorder consists of instantaneous dynamic fluctuations in the enzymesubstrate Michaelis complex with timescales much smaller than kcat.This leads to different energy barriers that cannot be experimentally detected.They are the consequence of a huge number of nuclear configurations in local equilibrium with each other within a reaction valley.Quantum mechanics/molecular mechanics methods can build multiple minimum-energy paths connecting a potential energy minimum inside the reactant valley with a potential energy minimum in the associated product valley, so providing a wide dispersion of potential energy barriers.An exponential average has been the most used method to estimate the Gibbs free energy barrier of the enzyme reaction.
However, the way how this exponential average has been justified and applied in practice is questionable.In this paper we propose a protocol based on the free energy perturbation theory that more properly justify the suitable use of the exponential average and that provides a practical way to determine the Gibbs free energy barrier of the reaction.This value includes the contribution of both the non-precatalytic and the precatalytic structures as a result of a long sampling over the reaction valley.This procedure does not require the arbitrary choice of a particular structure in whose quite narrow neighborhood a potential of mean force is calculated.
We have used the hydrolysis of a glycosidic bond catalyzed by the enzyme Thermus thermophilus β-glycosidase, and the mutant Y284P Ttb-gly, and the hydrogen abstraction reactions from C13 and C7 of arachidonic acid catalyzed by the enzyme rabbit 15lipoxygenase-1 to exemplify our procedure.

Electronic Supplementary Information
Test of convergence of the Gibbs free energy barrier and its contributions appearing in equation 5; Results using a slightly more restrictive filter to identify the precatalytic snapshots.

Table of Contents Entry
A protocol based on the free energy perturbation theory justifies the suitable use of the exponential average and provides a practical way to determine the Gibbs free energy barrier of an enzyme reaction.
equation 1 is just a discretization of the integral in the last member of equation 4, provided that a big enough number of selected snapshots are included in equation 1.On the other hand, the minimization in the orthogonal directions to the MEP permits to assume that the low-energy configurations in the transition state ensemble are sampled enough by means of the simulation in the reactant ensemble, in such a way that the FEP calculation converges reasonably.Thus, the exponential average in the second member of equation 1 directly corresponds to the Gibbs free energy barrier ΔGº ‡ of the catalytic reaction when only one reactant valley exists.At this point it is worth mentioning that the free energy barrier depends (see equation 4) on the quotient between the partition functions of the transition state and the reactants.The greater the accessible fluctuations in a given system, the bigger the value of the corresponding partition functions.The adequate generation of those fluctuations in reactants is warranted by a long enough sampling in the reactant ensemble.On the other hand, the fluctuations in the transition state ensemble (usually less pronounced than in the reactants) are a consequence of the dispersion of the reactants that translates to the dispersion of the corresponding MEPs and the multiplicity of maxima of these MEPs.
-gly catalyzes the hydrolysis of terminal, non-reducing β-D-glucosyl residues, for example from 4-nitrophenyl β-D-fucopyranoside (pNP-Fuc).The first step of the reaction (glycosylation) consists of the nucleophilic attack of the residue Glu338 to the anomeric carbon C1 of the fucose (Fuc) ring and the cleavage of the glycosidic bond of the pNP-Fuc substrate.A covalent glycosyl-enzyme intermediate with Glu338 is formed.A protonated residue Glu164 makes this step easier by protonation of the glycosidic oxygen of the leaving group pNP.In the second step (deglycosylation) Glu164 now deprotonates the incoming water molecule, that attacks as a nucleophile on the carbon C1 of the fucose ring in a hydrolysis reaction that breaks the glycosidic bond between fucose and Glu338 and produces fucose (see Figure1).In the presence of an appropriate acceptor substrate, like N-methyl-O-benzyl-N-(β-D-glucopyranosyl)-hydroxylamine (BnON(Me)-Glc), transglycosylation can compete with hydrolysis in the deglycosylation step and lead to the formation of the N-methyl-O-benzyl-N-(β-D-fucopyranosyl(1-4)β-Dglucopyranosyl)-hydroxylamine product.

Figure 1 .
Figure 1.Catalytic mechanism of the glycosylation and deglycosylation (hydrolysis) of 4-nitrophenyl β-D-fucopyranoside catalyzed by the enzyme Ttb-gly.The fucose ring and the pNP leaving group are pictured in green and blue, respectively.A water molecule needs to be placed in the right place for the hydrolysis reaction to be possible.

Figure 2 .
Figure 2. Potential energy paths calculated starting from each one of the 146 precatalytic snapshots identified for the hydrolysis reaction catalyzed by wild-type Ttb-gly.The energies are given in kcal/mol and the reaction coordinate R2 in Å.

Figure 3 .
Figure 3. Histogram of the distances C1-OWAT for the 146 precatalytic snapshots identified for the hydrolysis reaction catalyzed by wild-type Ttb-gly (a); potential energy barriers as a function of the distances C1-OWAT (b) or the distances proton to be transferred (HWAT) -oxygen atom (OE2GLU164) of Glu164 (c) in these precatalytic snapshots.Energies are given in kcal/mol and distances in Å.
to equation 5, with n = 202 and N = 2000 the contribution of the nonprecatalytic structures (first term of the second member of the equation) to the Gibbs free energy barrier is 1.4 kcal/mol.The contribution of the 202 precatalytic snapshots is the exponential average (second term of the second member of the equation) leading to 15 kcal/mol.In all, the Gibbs free energy barrier amounts to 16.4 kcal/mol.

Figure 4 . 4 .
Figure 4. Potential energy paths calculated starting from each one of the 202 precatalytic snapshots identified for the hydrolysis reaction catalyzed by mutant Y284P Ttb-gly.The energies are given in kcal/mol and the reaction coordinate R2 in Å.

Figure 6 .
Figure 6.Potential energy paths calculated starting from each one of the 25 randomly selected precatalytic snapshots for the H13-proS abstraction reaction catalyzed by the enzyme rabbit 15-lipoxygenase-1.

Figure 7 .
Figure 7. Potential energy paths calculated starting from each one of the 25 randomly selected precatalytic snapshots for the H7-proS abstraction reaction catalyzed by the enzyme rabbit 15-lipoxygenase-1.

Figure 11 .
Figure 11.Distances C13−OH (in Å, green line) and C7−OH (in Å, blue line) along the 3 ps MM(AMBER) molecular dynamics simulations for the 15-LOX-1:AA Michaelis complex starting from the selected precatalytic structure that originates the smallest potential energy barrier for each hydrogen atom abstraction.OH stands for the OH -group of the Fe(III)−OH− cofactor of 15-LOX-1.

Figure 12 .
Figure 12.Histogram of the distance C13−OH (in Å, green lines) and C7−OH (in Å, blue lines) corresponding to the 3 ps MM(AMBER) molecular dynamics simulations for the 15-LOX-1:AA Michaelis complex starting from the selected precatalytic structure that originates the smallest potential energy barrier for each hydrogen atom abstraction.OH stands for the OH -group of the Fe(III)−OH− cofactor of 15-LOX-1.
energy barrier ∆  ‡ is obtained.e) All the snapshots belonging to the same reactant subvalley α have roughly the same value of ∆  ‡ , that can be taken as a good estimate of Δ  ° ‡ .f) The population   is given by the number of times a snapshot i with ∆  ‡