Simulation and prediction of protein production in fed‐batch E. coli cultures: An engineering approach

An overall model describing the dynamic behavior of fed‐batch E. coli processes for protein production has been built, calibrated and validated. Using a macroscopic approach, the model consists of three interconnected blocks allowing simulation of biomass, inducer and protein concentration profiles with time. The model incorporates calculation of the extra and intracellular inducer concentration, as well as repressor–inducer dynamics leading to a successful prediction of the product concentration. The parameters of the model were estimated using experimental data of a rhamnulose‐1‐phosphate aldolase‐producer strain, grown under a wide range of experimental conditions. After validation, the model has successfully predicted the behavior of different strains producing two different proteins: fructose‐6‐phosphate aldolase and ω‐transaminase. In summary, the presented approach represents a powerful tool for E. coli production process simulation and control. Biotechnol. Bioeng. 2016;113: 772–782. © 2015 Wiley Periodicals, Inc.


Introduction
Escherichia coli still remains as one of the preferred expression systems in industrial biotechnology for relatively simple proteins not requiring post-translational modifications. Efficient production processes have been developed using fed-batch operation to reach high cell concentration cultures and, for lac-operon derived expression systems, pulse induction of an inducer (typically isopropyl-b-D-thiolgalactopyranoside, IPTG) is a well established procedure (Huang et al., 2012;Ruiz et al., 2011;Tripathi et al., 2009).
From an engineering point of view, modeling of such processes is an essential tool to simulate, predict, optimize and, eventually, design model-based control systems. Several modeling approaches have been published at different levels: i) fed-batch operation models, describing biomass and substrate concentration evolution and using different optimization algorithms (Chen et al., 2004;Lim, 1987, 1992;Sarkar and Modak, 2003); ii) lac operon dynamics models, including IPTG transport (Noel et al., 2009;Santill an and Mackey, 2004;Vilar et al., 2003); iii) nonsegregated (Kavanagh and Barton, 2008;Lee and Ram ırez, 1992;Ruiz et al., 2011) or segregated models (Fan et al., 2007;Zheng et al., 2005) describing the effect of protein production on microbial growth.
Nevertheless, until now, there is not a model of the whole process able to predict protein productivity as well as simulate the time behavior of all macroscopic process variables. The main limitations of the previously mentioned approaches come from the difficulty in quantifying two main aspects. First, the determination of intracellular IPTG concentrations has been mostly modeled without direct inducer measurements, and the transport mechanisms have been derived employing indirect measurements. Second, although the bistability behavior of the lac operon has been realized (Laurent et al., 2005), the relationship between inducer concentration and protein expression rate is still contentious affecting proper protein expression prediction as well as quantification of the deleterious effect of induction and protein production on growth rates.
Most of the published works on lac operon are focused on the different elements (operator, repressor, lacZ, lacY, lacA and inducers) separately or proposing theoretical models with little direct experimental data. In addition, few works are concerned with lac operon derived systems with plasmids potentially carrying additional copies of its constituents-i.e. lacI (repressor) gene or operator site sequences. On the other hand, the relationship between IPTG concentration (extracellular or intracellular) and protein production is quite complex (Nadri et al., 2006) and IPTG dosage is a determinant factor in the overexpression of recombinant proteins (Donovan et al., 1996;Durany et al., 2004), as intracellular IPTG concentration contributes to the probability of IPTG to bind the repressor molecules (Vilar et al., 2003).
With regards to inducer transport, our research group has recently developed a model to describe and simulate intracellular IPTG concentrations with time (Calleja et al., 2014), this model was calibrated with previously published experimental IPTG measurements (Fern andez et al., 2010).
On the other hand, it is worth to mention a model in literature (Ceroni et al., 2010) describing the transcription of the lacI gene, producing repressor molecules. It is an example of unsegregated and unstructured model: cellular population differences are not considered, and it is supposed that all the reactions, production and degradation of the different components take place in the same block. Ceroni's approach includes equations for the gene transcription, as well as expressions for the production of free repressor (not bound to IPTG) and repressor-IPTG molecules, as a function of the initial IPTG concentration added to the culture. This model allows the calculation of a time evolution of the repressor-IPTG molecules that are going to be used in the protein production model. Moreover, the production of a green fluorescence protein is described, making a relationship between the IPTGrepressor binding and the production of recombinant proteins. However, Ceroni's model employs the initial IPTG concentration as variable, which will likely limit the utility of the model, instead of using the intracellular IPTG concentration evolution along time.
The aim of this paper is to report an overall model able to describe the whole behavior of a fed-batch IPTG-induced protein production process by simulating the variation of the major macroscopic properties: biomass and substrate concentration, volume, medium and intracellular inducer concentration, and recombinant protein concentration (and activity in case of enzymes), with time. Such a model was built by coupling IPTG uptake model of Calleja to a biomass growth model (Ruiz et al., 2011) and to a newly proposed protein production model. The protein production model was built by modifying Ceroni et al's approach to include the calculated intracellular IPTG concentration.
The experimental system for calibration and validation of the overall model is the exponential fed-batch production of rhamnulose-1-phosphate aldolase (RhuA) by a K-12 derived E. coli strain, under different induction and growth conditions. The validity of the proposed protein production model will be extended to simulate the behavior of E. coli BL21 strains (one of the most employed hosts for protein expression) producing v-transaminases and fructose-6-phosphate aldolase (FSA).

Culture Media and Fermentation Conditions
Lysogeny Broth (LB), containing 10 g Á L À1 of peptone, 5 g Á L À1 of yeast extract and 10 g Á L À1 of NaCl, was used for all pre-inoculum growth. It was sterilized by autoclaving (120 C, 30 min). When necessary, it was supplemented with antibiotic: 0.1 g Á L À1 ampicillin for M15 and BL21 (DE3) FSA and 0.1 and 0.05 g Á L À1 kanamycin for M15 and BL21 (DE3) ATA, respectively. Stock solution of kanamycin was prepared with a concentration of 100 g Á L À1 and stored at À20 C. Ampicillin at 100 g Á L À1 concentration in 50% (v/v) ethanol-water stock was prepared and stored at À20 C. Kanamycin and ampicillin stocks were sterilized by filtration (0.20 mm, Minisart 1 NY25. Sartorius Stedim).
Feeding medium composition for fed-batch growth is also shown in Table I. In order to avoid co-precipitation with magnesium salts, 5 mL of phosphates solution (500 gL À1 K 2 HPO 4 and 100 gL À1 KH 2 PO 4 ) were added manually to the culture every 30 OD 600 increment. The solution was autoclaved (120 C, 30 min) for its sterilization. IPTG stock was prepared at 100 mM, sterilized by filtration (0.20 mm, Minisart 1 NY25. Sartorius Stedim) and stored at À20 C.
From cryostock (À80 C) in commercial Cryobilles (AES Chemunex), strains were grown in Falcon tubes containing 10 mL of LB medium and the supplementary antibiotic. Growths were performed at 37 C and 150 rpm, reaching 1.8 to 2.1 units of optical density (OD 600 ) after around 16 hours of incubation. Then, 5 mL of pre-inoculum were transferred to shake flasks, containing 100 mL of DM and with same cultivation conditions as preinoculum cultures, until reaching 1.2 OD 600 , that ensures the culture to be in the exponential growth phase.
For the batch phase, 80 mL of inoculum were added to 720 mL of DM. Bacterial growth was carried out in a Biostat 1 B bioreactor (at UAB) with 2 L jar for E. coli M15 strain and in a New Brunswick BioFlo 1 fermenter (at the University of Sydney) with 3 L vessel for E. coli DE3 strains. The bioreactors were equipped with stirring, temperature, dissolved oxygen and pH controllers. Temperature was maintained at 37 C and pH was kept at 7.00 AE 0.05 by adding 15% (v/v) NH 4 OH solution. Dissolved oxygen value was maintained at 60% of saturation by adapting the stirring speed between 350 and 1120 rpm and supplying 1.5 L Á h À1 of air (enriched with pure oxygen when necessary, only at UAB). A reduction in oxygen consumption and an increase in pH can be used as identification for the batch phase ending. For the glucose limited feeding, a microburette at UAB, and a peristaltic pump at the University of Sydney, was used. Feeding rate was performed according to a predefined exponential feeding profile based on mass balances and substrate uptake (Pinsach et al., 2006), keeping the specific growth rate at a fixed value, m fix . When culture reached the desired biomass concentration, a pulse of IPTG was added, in order to have the desired inducer concentration into the reactor.

Biomass
Cellular concentration was determined by optical density OD 600 , using a spectrophotometer (Uvicon 941 Plus, Kontrol at UAB and a Cary 50 at the University of Sydney). OD 600 values were converted to biomass concentration expressed as Dry Cell Weigh (DCW), with 1 OD 600 equivalent to 0.3 gDCW L À1 (Pinsach et al., 2008a).

Glucose
Biomass was removed from a 1.5 mL sample of broth culture by centrifugation (13400 rpm, 5 min) and filtration (0.45 mm). Glucose concentration in culture supernatant was determined enzymatically using a YSI 2700 (Yellow Spring Instruments).

Protein
Samples from bioreactor were diluted with milliQ water to an OD 600 of 4. One milliliter of diluted sample was centrifugated (13400 rpm, 5 min) and the pellet resuspended with 1 mL of lysis buffer (100 mM Tris-HCl; pH ¼ 7.5; OD 600 ¼ 4). Cell suspension was kept in ice, and sonicated in order to disrupt the cells. Sonication consisted in 4 cycles of 15 s pulses at 50 W (with 2 min between pulses), using a VC50 (Vibracell 1 , Sonics & Material) with microtip probe. Cellular debris was separated from supernatant by centrifugation (14000 rpm, 10 min, 4 C).
Total protein content in cells was determined by Bradford method, using Coomasie 1 Protein Assay Reagent Kit (Pierce, Thermo Fisher Scientific, Rockford, IL), in a microplate system (Microtiter Plate Flat, SUDELAB 900011). The percentage of RhuA amongst the rest of intracellular soluble proteins was determined by using NuPAGE 1 12% Bis-Tris gels, following the manufacturer's manual (Invitrogen) and quantified by using Image Lab© (Bio-Rad Laboratories). RhuA, FSA and ATA activity assays were carried out as previously described (Casablancas et al., 2013;S anchez-Moreno et al., 2012;Vidal et al., 2003). UV/Vis detector operating at 210 nm wavelength, coupled to a mass Spectrometer, equipped with ESI (Electro Spray Ionization) (Shimadzu) interface and a single quadrupole. Sample preparation, chromatographic and Mass Spectrometer conditions are described elsewhere (Fern andez et al., 2010).

Modeling and Parameter Estimation
Model parameters' estimation and simulation were carried out using PSE gPROMS 1 ModelBuilder. MathWorks TM Matlab 1 2012b was used to adjust experimental data by splines when needed.

Results and Discussion
Model Development The proposed model is an unsteady, unsegregated, unstructured and based on first principles model composed by three blocks as depicted in the scheme shown in Figure 1. In the last block, specific protein production profile-expressed either in mass units (P) or activity units (U) per gram of cells-is calculated as a function of the intracellular inducer concentration ([IPTG] i ) and the repressorinducer dynamics. The inducer uptake model block calculates these dynamics from biomass concentration and total volume that are outputs of the growth model block. Differential equations of the model are presented in Table II and the values of the parameters in  Table III. These parameter values have been sequentially determined in each block and kept constant in the next blocks.

Biomass Growth Model
For non-induced growth phase, the equations in this block need, as input, the initial batch biomass concentration (X 0 ), the initial volume (V 0 ), the initial substrate concentration (S 0 ), the fixed specific growth rate for the fed-batch (m fix ) and the glucose concentration in the feeding solution (S f ) (Ruiz et al., 2011).

Biomass growth model
ks2þ ½IPTG e;0 X ind À0:4 Á g s ; g S ð0Þ ¼ 1 Protein production model Equations (1), (4-a) and (5-a) correspond to volume (V), biomass (X) and substrate (S) balances and equation (6) to substrateinhibited growth kinetics for specific growth rate (m). The model includes equation (2-a) to determine the exponential fed-batch addition flow rate of feeding solution (F S ) and equation (3) to calculate the alkali addition flow rate (F B ) for pH control. Using these equations, a prediction of biomass, volume, and substrate dynamics is obtained for batch and fed-batch phases of the noninduced culture. Non-induced fed-batch phase starts when the initial glucose of the batch phase has been completely consumed. Induced phase starts when the culture reaches the desired biomass concentration (X ind ) and IPTG pulse addition is done. For induced growth, equations (2-a) and (5-a) are substituted with (2-b) and (5-b) and equation (4-a) is replaced by (4-b) that needs (4-b.1) as complementary equation to take into account the decrease of specific growth rate due to the metabolic burden caused by overexpression of recombinant protein using a shock function, g s (Lee and Ramirez, 1992). Initial conditions for induced growth are the values obtained in the non-induced growth phase when biomass reaches the desired concentration (X ind ).

IPTG Uptake Model
The IPTG uptake model (Calleja et al., 2014) is employed here with some improvements, which slightly modify the values of the parameters. It requires as inputs the biomass and IPTG L Á (gDCW Á L À1 ) À1 Fed-batch: 1.8 Á 10 À3 Induced phase: 3.9 Á 10 À6 k c a b 0.213 AE 2 Á 10 À3 h À1    (7) and (8) predict the time evolution of total volume of cells, V cel (L cell ) and medium volume, V m (L m ), using a specific cell volume of 0.0023 L cell Á g À1 DCW (Bennett et al., 2008). Equations (9) and (10) are the material balances for IPTG (e:extracellular; i:intracellular, mM) as a function of IPTG transport rate, r (mM Á h À1 ) and a dilution term. Finally, equation (11) represents the IPTG transport rate described by a diffusion term plus a non-specific active transport and a specific active transport.
Using equations (7) to (11) it is possible to calculate the intracellular IPTG concentration evolution along time, which is a key variable in protein production. Intracellular IPTG molecules can bind the repressor, making RNA polymerase able to transcribe the target gene and producing the target protein, RhuA in this case.

Protein Production Model
Using a modification of Ceroni's model (Ceroni et al., 2010) it is possible to estimate the concentration of free and IPTG-bound repressor molecules with time and, from this information, the specific protein amount, either in mass (P) or activity (U) units, can be predicted.
Equation (12) calculates the number of LacI mRNA molecules (M, molecules Á cell À1 ) into the cell along time. For its calculation, the transcription rate (a mRNA , h À1 ), number of lacI gene copies-pREP4 plasmid copy number (N, plasmid Á cell À1 ) in this case-and the degradation rate of LacI mRNA molecules (l mRNA , h À1 ) are needed. N has been assumed almost constant along growth according to experimental determinations (Pinsach et al., 2008b).
The amount of free repressor (R free , molecule Á cell À1 ) and repressor bound to IPTG (R IPTG , molecule Á cell À1 ) are calculated using equations (13) and (14), respectively. The original equations used by Ceroni et al. have been modified to introduce modelcalculated dynamics of intracellular concentration of IPTG from equation (10). In these equations, the values found in literature (Ceroni et al., 2010) for the IPTG-LacI equilibrium binding constant (K R-IPTG , mM), the protein degradation rate (l R , h À1 ), the protein transcription rate (a R , h À1 ) and the cooperativity of the binding LacI-IPTG (n) have been used (Table III), because they depend on the interaction between IPTG and repressor. Otherwise, the value for the time constant of LacI binding to the operator (t R-O , h À1 ) will be a model-fitting parameter because it depends on the operator used in the particular expression system and has to be fitted to experimental data.
Equation (15) is a mass balance for the specific protein amount in mass units (P, mgRhuA Á g À1 DCW), which is a function of the specific protein production rate (q p , mgRhuA Á g À1 DCW Á h À1 ) and a dilution term. The observed exponential decay for the specific production rate is described by equation (16) taking as initial condition q p0 that is calculated by equation (17) that takes into account the experimentally observed dependencies on m fix and [IPTG] e,0 /X ind (Ruiz et al., 2011).
The production of protein in activity units-equation (18)-is related to equation (15) through the protein quality (K q , U Á mg À1 RhuA). Experimental values of K q ranged from 4.5 to 7.5 U Á mg À1 RhuA and did not show a clear dependency on any other variable, instead they show a Gauss-like distribution around a value of 6. Because of that K q was considered as a constant parameter with a value to be found in the model fitting step.

Model Fitting
A set of 15 E. coli fed-batch experiments has been employed to get experimental data for the recombinant rhamnulose-1-phosphate aldolase production process, monitoring their performance by determining specific protein in mass and in activity along time. The target aldolase was intracellularly overexpressed, representing up to 30% of the total protein content. To give the model the capability to predict the dynamic behavior of a specific protein, a wide range of experimental conditions must be used in the fitting process, expecting any experiment located into the defined space to follow the prediction of the model. The selected experimental variables were inducer and biomass concentration at induction time and the specific growth rate in the fed-batch phase of the culture because they are key in protein production (Ruiz et al., 2011). The inducer range was from 8 mM to 1 mM, 4 mM being the minimum concentration for RhuA overexpression (Pinsach et al., 2008a). Biomass concentration at induction time ranged from 8 to 47 gDCW Á L À1 , being the later the maximum allowing a final biomass concentration compatible with the aeration capacity of the fermentor. Finally, specific growth rate ranged from 0.1 to 0.22 h À1 , avoiding too slow processes and excessive acetic acid production (Ruiz et al., 2009).
The numerical values for the parameters corresponding to the two first blocks of the model have been determined previously and are listed in Table III. The protein production block-equations (12) to (18)-has only four unknown parameters to be fitted (N is known). K P1 and K P2 are related to specific protein production rate in mass units, K q is the protein quality, and t R-O is the time constant for the linkage between the repressor and the operator, which depends directly on the expression system, or, more specifically the operator-repressor bond and so is operator specific. The values for all other parameters in the protein block are also listed in Table III. The mathematical model was fitted to the experimental data of aldolase concentration and activity evolution with time for the mentioned 15 fed-batch experiments. The values found for the four   Table III. The estimated value for specific activity per unit mass of aldolase, K q , was 6.10 U Á mg À1 in agreement with the experimental values.
The predictions of the model reproduced properly the experimental results. Figure 2 shows the parity plots for specific protein in mass and in activity units in order to give an overall idea of the goodness of the fitting.

Model Validation
Once the model was calibrated, a validation of the whole model was conducted from the beginning of the batch to the end of induction phase. For that purpose, a new experiment-using different conditions at the beginning of induction phase than the ones of the fitting experiments-was performed. The experimental conditions were selected to be within the experimental space used and are depicted in Table IV, which also shows the initial values for the different input variables. In the induction phase, the initial values of M and R free were calculated from equations (12) and (13) at equilibrium, when degradation rate equals formation rate and time derivative is zero. On the other hand, the initial values for P ind and U ind are the mean experimental values of the basal specific protein in mass or activity units, respectively. Figure 3A-E show the model prediction and experimental data of the different process variables. The model is able to predict the experimental data well, confirming the usefulness of the developed simulation model. The prediction capability of the built model has been tested by extending the protein production simulation to two different E. coli BL21 strains expressing v-transaminases (ATA) and fructose-6phosphate aldolase (FSA). In these cases, experimental biomass concentration and total volume data have been used as inputs of the IPTG uptake and protein production models.
E. coli BL21 (DE3) strain carries LacI repression system in its genome (Jeong et al., 2009;Lebedeva et al., 1994;Xu et al., 2012). Moreover, in the target strains, the inserted plasmids also contain additional copies of the lacI gene. As both BL21 based expression systems use the same repressor as the M15 strain producing RhuA, the parameters describing the interaction between IPTG and repressor molecules (K R-IPTG , n) and the ones describing LacI production and degradation (a mRNA , l mRNA , a R , l R ) were kept at their former values.
For each case, five parameters of the model had to be estimated because they are depending on the specific expression system or protein: i) N, the number of copies of lacI gene for both BL21 (DE3) strains; ii) t R-O , the time constant for the linkage between the repressor and the operator; iii) K P1 and K P2 , the parameters related to specific protein production, and iv) K q , the specific activity of the protein produced.
For both strains, three fed-batch production experiments at different operational conditions were conducted, detailed in Table V. It should be noted that all values of variables ([IPTG] e,0 , X ind and m fix ) are inside the experimental space employed for model calibration.
The model estimation of the different parameters is also shown in Table V. As can be seen, the obtained N value for both strains is much higher than was the case for the M15, meaning that a large number of copies of lacI gene is present into cells (it can be related to a large number of copies of the corresponding plasmid). This fact, taking into account that the number of copies of the gene present in the genome is the same for both BL21 (DE3) strains, means that the number of copies of plasmid in both strains is similar. The values of the remaining parameters are harder to explain, because they are depending on the particular expression system. Figures 4 and 5 present the results of the fitting using the value of the parameters presented in Table V. It can be seen that the model is again able to properly predict specific protein production both in mass and activity for the two E. coli BL21 (DE3) strains.

Conclusions
A macroscopic model for E. coli production of protein in fed-batch cultures has been developed. The model is built with three blocks of equations describing growth, IPTG and repressor behavior with time. The parameters of the model have been fitted and validated for RhuA production in E. coli M15 DglyA.
The model has subsequently been tested for the production of FSA and ATA in E. coli BL21(DE3). In these cases only the five parameters that are strain dependent have been tuned.
In all cases the agreement between experimental and predicted values was good and the mathematical model can be considered a powerful tool for simulation of such processes with potential exploitation in process control. plasmid copy number (plasmid Á cell À1 ) n cooperativity of the binding lacI-IPTG P specific protein in mass units (mgRhuA Á g À1 DCW) P ind basal specific protein in mass units (mgRhuA Á g À1 DCW) q p specific protein production rate (mgRhuA Á g À1 DCW Á h À1 ) q p0 initial specific protein production rate (mgRhuA Á g À1 DCW Á h À1 ) r net transport rate (mmol IPTG Á h À1 Á L À1 medium ) R free free repressor (molecule Á cell À1 ) R IPTG repressor bond to IPTG (molecule Á cell À1 ) S glucose concentration (g Á L À1 ) S 0 initial batch glucose concentration (g Á L À1 ) S f glucose concentration in the feeding solution (g Á L À1 ) t time (h) t ind time since induction (h) U specific protein in activity units (U Á g À1 DCW) U ind basal specific protein in activity units (U Á mg À1 RhuA) V total volume (L) V 0 initial batch total volume (L) V cel volume of cells (L cell ) V m culture medium volume (L medium ) X biomass concentration (g DCW Á L À1 ) X 0 initial batch biomass concentration (gDCW Á L À1 ) X ind biomass concentration at induction (gDCW Á L À1 ) Y XS glucose biomass-substrate yield (gDCW Á g À1 ) (Y XS ) ap glucose apparent biomass-substrate yield (gDCW Á g À1 ) a mRNA rate of formation of M (h À1 ) a R repressor protein synthesis rate (h À1 ) l mRNA mRNA degradation rate (h À1 ) l R repressor protein degradation rate (h À1 ) m specific growth rate (h À1 ) m fix fixed specific growth rate during the fed-batch (h À1 ) m max maximum specific growth rate for M15 strain (h À1 ) t R-O time constant for the repressor-IPTG bond (h À1 ) This work was supported by the Spanish MICINN, project number CTQ2011-28398-CO2-01. The UAB authors are members of the Research group 2014SGR452 and Biochemical Engineering Unit of Reference Network in Biotechnology (XRB), Generalitat de Catalunya.