You are on page 1of 8

WCCI 2010 IEEE World Congress on Computational Intelligence

July, 18-23, 2010 - CCIB, Barcelona, Spain CEC IEEE

Parameter Estimation with Term-wise Decomposition in


Biochemical Network GMA Models by
Hybrid Regularized Least Squares-Particle Swarm Optimization
Prospero C. Naval, Jr., Luis G. Sison, and Eduardo R. Mendoza

Abstract— High-throughput analytical techniques such as (GMA) and S-System formulations. GMA systems, which
nuclear magnetic resonance, protein kinase phosphorylation, include S-systems as special cases, have parameters that map
and mass spectroscopic methods generate time dense profiles one-to-one onto the network’s topological and regulatory
of metabolites or proteins that are replete with structural and
kinetic information about the underlying system that produced features [34].
them. Experimentalists are in urgent need of computational A GMA system is described by the following set of
tools that will allow efficient extraction of this information from coupled differential equations:
these time series data.  
A new parameter estimation method for biochemical systems Xk Y
formulated as Generalized Mass Action (GMA) models known Ẋi = ±γim f Xj (t)fijm  t ∈ [t0 , tN ] i = 1, ..., n
to capture the nonlinear dynamics of complex biological systems m=1 j∈rim
such as gene regulatory, signal transduction and metabolic
networks, is described. For such models, it is known that param- where the positive rate constants γi1 , ..., γim , ..., γik quantify
eter estimation algorithm performance deteriorates rapidly with the magnitudes of fluxes of the k production/consumption
increasing network size. We propose a decomposition strategy reactions, and fij1 , ..., fijm , ..., fijk are the kinetic orders
that breaks up the system equations into terms whose rate
constants and kinetic order parameters are estimated one term
describing the inhibitory/activating influence of species j on
at a time resulting in dramatic parameter space dimension- species i in reaction m. Sets ri1 , ..., rim , ..., rik represent the
ality reductions. This approach is demonstrated in a hybrid indices of the reacting species involved in reaction k.
algorithm based on Regularized Least Squares Regression and With appropriate algorithms (e.g. [12], [31] , [32]), the
Multi-objective Particle Swarm Optimization. We validate our parameter values of a biochemical network may be extracted
proposed strategy through the efficient and accurate extraction
of GMA model parameter values from noise-free and noisy
from time course measurements which can be considered
simulated data for Saccharomyces cerevisiae and actual Nuclear as perturbations from some mean state of the network .
Magnetic Resonance (NMR) data for Lactoccocus lactis. For certain problems such as metabolic network modeling,
the parameter estimation task is simplified since the com-
I. I NTRODUCTION plete interaction structure is known apriori. On the other
Biochemical systems parameter estimation has risen to hand, reconstruction of gene regulatory networks and signal
prominence in recent years due to its fundamental role transduction networks are often network inference problems
in biological network reconstruction and modeling. The where determination of the interaction structure has to be
challenge of extracting the parameter values of a nonlinear done in concert with parameter estimation.
biochemical model from data becomes even more pressing A variety of BST parameter estimation methods have
as high-throughput methods begin to deliver their promise as been proposed in recent years with the majority of authors
high resolution tools for biological experimentation. preferring stochastic search over deterministic techniques.
Biochemical Systems Theory (BST) has been advanced Stochastic search approaches are highly robust while sac-
as a convenient mathematical framework for modeling, anal- rificing execution speed, in contrast with faster determin-
ysis, optimization and manipulation of complex biological istic methods which however, frequently encounter great
systems. BST views a biochemical system as a set of pro- difficulty in arriving at suitable solutions for systems with
cesses representable as products of power-laws in their inputs large number of variables [21]. Stochastic search methods
whose dynamics can account for all observed biochemical re- include approaches originally devised for discrete-valued
sponses. The differential equations describing these processes problems such as Genetic Algorithms [33], [15], [14], [31],
have two formats in BST: the Generalized Mass Action Genetic Programming [5], [16], Memetic Algorithms [30, ],
Co-evolutionary Algorithms [17] and for continuous-valued
Prospero C. Naval, Jr. is with the Department of Computer Science, parameter spaces such as Particle Swarm Optimization [31]
University of the Philippines, Diliman, Quezon City, Philippines (email: and Simulated Annealing [12].
pcnaval@dcs.upd.edu.ph).
Luis G. Sison is with the Electrical and Electronics Engineering Institute, Parameter estimation has been achieved with varying
University of the Philippines, Diliman, Quezon City, Philippines (email: levels of success for the following deterministic methods:
sison@eee.upd.edu.ph). Nelder-Mead [29], Jacobian Linearization [18], Regression
Eduardo R. Mendoza is with the Physics Department and Center for
NanoScience, Ludwig Maximillians University, Munich, Germany (email: and extensions [20], [6], [36], Branch and Bound [25],
mendoza@lmu.de). Newton-Flow [19], and Constraint Propagation [32].

978-1-4244-8126-2/10/$26.00 2010
c IEEE 3696
In this paper, we describe a term-wise decomposition constitute the Pareto front. A solution is said to be non-
strategy that reduces the number of variables to be estimated dominated when any improvement on one of its objectives
by the estimator at any one time. We illustrate this strategy will only worsen at least one other objective.
with a parameter estimation method based on a swarm intel- Particle swarm optimization (PSO) is a high performance
ligence algorithm hybridized with least squares regression. population-based optimizer with the following desirable
Although developed for GMA models, this algorithm is properties: algorithmic simplicity, computational efficiency,
equally applicable to S-systems. We successfully validate our and small memory footprint. It has its origins in studies of
method on datasets from two metabolic systems: a 5th-order synchronous bird flocking and fish schooling when inves-
GMA model for the Glycolysis Pathway in Saccharomyces tigators realized that their simulation algorithms exhibited
cerevisiae [34] and the very challenging 7th-order GMA optimization properties [10]. In PSO, a potential solution
Model for the Glycolysis Pathway in Lactococcus lactis [9] is represented as a particle. A population consists of a
[39]. swarm of particles that fly through search space probing
it through the objective function. Positional changes of
II. PARAMETER E STIMATION FOR BST M ODELS the individual particles are controlled by three factors: the
BST parameter estimation is often cast as an optimization particle’s current motion, its memory influence, and swarm
problem over a continuous variable space where the solution influence as determined by its topological neighborhood.
is obtained through minimization of the difference between Particles collaborate with their neighbors through communi-
computed model outputs and experimentally derived data. cation of good positions and adjust their positions based on
Even for noise-free data, it is an inherently challenging task these desirable states. When a particle discovers a promising
saddled with the following inter-related difficulties: local new solution, the surrounding region around that potential
minima trapping, objective function multi-modality, large solution is explored further by the swarm. PSO has been
number of variables to estimate and excessive computa- extended to accommodate several objectives thus enriching
tional effort. The difficulty of the BST parameter estimation our repertoire of multi-objective optimization algorithms.
task increases exponentially with network size owing to A multi-objective algorithm has to achieve three impor-
the geometric property of space that hypervolume grows tant goals [40]: approximate the Pareto front as closely as
exponentially with increasing dimensionality [2] and further possible, maximize the number of elements in the Pareto
aggravated by the kn(n + 1) unknown parameters of an n- set found, and maximize the spread of the solutions found.
differential equation system that the function optimizer has Although a variety of multi-objective particle swarm opti-
to consider. We propose a term-wise decomposition strategy mization algorithms exist [28], we chose the Multi-Objective
that circumvents this problem by breaking up the system Particle Swarm Optimization with Crowding Distance Algo-
equations into terms and estimating the unknown variables rithm (MOPSO-CD) [27] for it specifically aims to improve
one term at a time. solution spread through minimization of crowding density
We provide in this section a general description of Multi- among the non-dominated solutions.
objective Particle Swarm Optimization, which will be com-
bined with Regularized Least Squares Regression to produce B. Hybrid Regularized Least Squares-Particle Swarm Opti-
our proposed hybrid method. Multi-objective Particle Swarm mization
Optimization is used to minimize two objective functions: the For parameter estimation, the GMA equations are decou-
sum of GMA term error residual and the sum of slope error pled through the replacement of the derivatives on the left
residual. hand side of a differential equation model with slopes derived
For GMA models, nonlinear regression can be achieved by from the time-course data profiles [38]. This transforms the
fitting a linear regressor in a transformed input space. In our differential equation model into a set of algebraic equations
method, data points are first mapped from the original input of sums of products of power-law functions on which non-
space into a logarithmic space where the input-output de- linear regression may be performed [37]. Slope estima-
pendence of data becomes linear. Regularized Least Squares tion procedures for time series data include the three-point
Regression is then used to obtain robust parameter estimates method, splines [4], and artificial neural network [1]. A GMA
from noisy slopes and time-course data. For a description of system model with n differential equations decouples into
Regularized Least Squares Regression, the reader is referred n smaller optimization problems which may still encounter
to excellent reviews such as [3]. difficulties for equations with several terms.
A description of the Hybrid Regularized Least Squares- We propose a term-wise decomposition approach where
Particle Swarm Optimization (HRLS-PSO) Algorithm is also the equations are broken down into individual terms whose
presented. rate constant and kinetic order parameters are estimated one
term at a time. Thus, the unknown parameters to be deter-
A. Multi-Objective Particle Swarm Optimization mined at one time are much reduced in number. The term
Multi-objective optimization is the systematic procedure parameters are computed using Regularized Least Squares
of simultaneously optimizing a collection of objective func- Regression after logarithmic linearization. Multi-Objective
tions. It deals with interacting and even conflicting objective Particle Swarm Optimization computes the unknown parame-
functions to generate a set of non-dominated solutions which ters in the other terms. Regularization improves performance

3697
of parameter estimation when there is noise in the slope 4) Multi-Objective Logarithmic Space Particle Swarm
values. Optimization
The Hybrid Regularized Least Squares-Particle Swarm Op-
The regression equation above assumes that the values
timization (HRLS-PSO) Algorithm
of the rate constants and kinetic orders of the other
For the Generalized Mass Action Model described as GMA terms are known. Unless they are available
 
Xk Y through a previous calculation, the values of these
Ẋi (t) = ±γim Xj (t)fijm  t ∈ [t0 , tN ] i = 1, . . . , n terms can be solved for using Multi-Objective Particle
m=1 j∈rim Swarm Optimization.
and given data (profiles) [X1 (t), . . . Xj (t) . . . , Xn (t)], and
slopes Si (t) ≈ Ẋi (t) for t ∈ [t0 , tN ], i = We thus have the following multi-objective
1, . . . , n, the HRLS-PSO Algorithm estimates the pa- optimization problem:
rameters [γim , fij1 m , · · · , fijp m ] of the m-th GMA term
f ij m Find the values for the rate parameters
γim Xjf1ij1 m · · · Xjp p in the i-th equation as follows:
γi1 , · · · , γi,m−1 , γi,m+1 , · · · , γik , the kinetic
1) Form the logarithm of the matrix: orders fij1 , · · · , fij,m−1 , fij,m+1 , · · · , fijk and
 
1 log(Xj1 (t0 )) · · · log(Xjp (t0 )) the regularization parameter λim that simultaneously
 1 log(Xj1 (t1 )) · · · log(Xjp (t1 ))  optimize the following objective functions:
Gim =  ···


1 log(Xj1 (tN )) · · · log(Xjp (tN )) Objective Function 1 (Minimize GMA Term Error
where rim = {j1 , · · · , jp }. Here, the columns are the Residual):
logarithms of the time courses of the inputs involved k
!
X
in the m-th GMA term. The first column of this matrix fobj1 (λi ) = log eip (λip ) t ∈ [t0 , tN ]
is set to unity. p=1
2) Form the regularized matrix, Aim (λim )
The regularization parameters λip for the GMA terms
Aim (λim ) = (GTim Gim + λim I)−1 GTim are computed with the aim of minimizing the sum of
term errors within the i-th equation.
where λim is a term-specific regularization parameter
whose value will be computed by the MOPSO algo-
Objective Function 2 (Minimize Slope Error Residual):
rithm and I is the identity matrix.  
3) Perform Least Squares Regression in Logarithmic  
k
X Y
Space  
fobj2 (γi , fi ) = log 
Si (t) −
±γiw Xj (t)fijw 

Define
w=1, j∈riw
w6=m
  
k t ∈ [t0 , tN ]
 X Y 
yi(m) (t) = log Si (t) − ±γiw Xj (t)fijw  subject to the following constraints:
w=1 j∈riw
w6=m
λL U
im ≤ λim ≤ λim
t ∈ [t0 , tN ]
L U
γiw ≤ γiw ≤ γiw w = 1, . . . m − 1, m + 1 . . . , k
Note that the m-th term is not included in equation
L U
above. For noise-free data and exact slopes (Si (t) = fijw ≤ fijw ≤ fijw w = 1, . . . m−1, m+1 . . . , k j ∈ riw
Ẋi (t)), we can write
X 5) Particle Swarm Vector Modification
yi(m) (t) = log(γ im ) + fijm log(Xj (t)) For the GMA term being computed, the
j∈rim estimated parameter vector gim = [γ̂, fˆ]i =
= Gim gim [log(γ̂im ), · · · fˆijm · · · ]T (j ∈ rim ) modifies the
corresponding variables in the particle vector.
where gim = [log(γim ), · · · fijm · · · ]T (j ∈ rim ) is
the vector we wish to estimate. This quantity can be
obtained by regression over N time points using the Steps 1 to 5 will produce the parameter estimates for the
regularized matrix Am : i-th equation. For the GMA system to be solved completely,
multiple swarms corresponding to the different equations are
gim = (GTim Gim +λim I)−1 GTim yi(m) = Aim (λim )yi(m) run independently and once all the swarms have converged
The Least-Squares Error for the GMA term is the GMA model parameters are reported.

eim (λim ) = ||Gim gim − yi(m) ||2 The HRLS-PSO Algorithm is easily implemented with
= ||Gim Aim (λim )yi(m) − yi(m) ||2 MOPSO-CD for its function optimizer.

3698
C. Algorithmic Properties The GMA system was decoupled and the derivatives were
Regression analysis seeks to find a functional relationship replaced with the values of slopes for each time point.
for measurement data that will make minimal prediction The HRLS-PSO Algorithm was subsequently applied to the
errors for the function at any given arbitrary point. In our decoupled system of equations. The algorithm settings used
method, non-linear regression is achieved by first mapping were the following: mutation probability = 0.5, population
the data points from the original non-linear input space into size = 1000, number of generations = 100, archive size =
a logarithmic space where the data input-output dependence 500. For this system, the first and fourth decoupled equations
becomes linear and fitting a linear regressor in the trans- were processed independently since they do not have any
formed input space. common parameters. The second and fifth equations depend
The parameter vector q = [γim , fij1 m , · · · fijm · · · , fijp m ] on the computational results of the first and fourth equations
of the m-th GMA term in the i-th equation is computed respectively to satisfy precursor-product constraints. The
following a regularized least squares approach. In Tikhonov third equation was processed after the results of the second
Regularization, the computation of this vector involves the and fourth equations have been obtained.
minimization of of the linearized GMA term with noisy The algorithm was provided the following apriori con-
data. The Tikhonov Regularized solution is (GTim Gim + straint information which defined the feasibility region of
λim I)−1 GTim yi(m) which has a regularization parameter λ. the parameters:
Several methods are available for the selection of this regu- • Rate Parameters: (γ11 , γ12 , γ22 , γ23 , γ33 , γ32 , γ42 , γ51 ) ∈
larization parameter. Among them are the L-curve technique [0.0, 1000.0]
[13], Morosov Discrepancy Principle [22], and Generalized • Kinetic Parameters: (f112 , f152 , f222 , f332 , f352 , f432 , f442 ) ∈
Cross Validation. In our parameter estimation method, the [0.0, 1.0];
Particle Swarm Optimizer automatically determines the reg- (f121 , f252 , f452 ) ∈ [−1.0, 0.0]; f551 ∈ [0.0, 2.0];
ularization parameter λ value. f223 ∈ [0.0, 50.0]
III. N UMERICAL E XPERIMENTS Most kinetic rate parameters were constrained to as-
We evaluate the performance of HRLS-PSO on noise- sume values within the recommended range [0.0, 1.0] for
free and noisy simulated data from a well-studied metabolic Michaelis-Menten processes following the suggestions of
network and on in-vivo metabolic data taken from Nuclear [34].
Magnetic Resonance experiments on Lactococcus lactis.
IV. D ISCUSSION
The Glycolysis Pathway of Saccharomyces cerevisiae
Term-wise decomposition reduces the number of variables
We test our proposed algorithm on data generated by to be simultaneously estimated from the original 20 param-
the GMA Model of the Yeast Glycolysis Pathway found eters to 2 and 3 parameters for the first and second terms in
in [34] and based on work by several groups [11], [7]. the first equation, and 3 and 4 parameters for the first and
The differential equation model has the following dependent second terms in the fourth equation. Due to precursor-product
variables: X1 (Internal Glucose), X2 (Glucose-6-Phosphate), constraints, estimates from the first and fourth equations are
X3 (Fructose-1,6-diphosphate), X4 (Phosphoenolpyruvate), propagated to subsequent equations thus simplifying the later.
X5 (ATP). The GMA Model equations for this pathway are Thus, for the second equation, the first term parameters are
as follows: already known from a previous application of the algorithm
Ẋ1 = γ11 X2f121 X6 − γ12 X1f112 X5f152 on the first equation. Consequently, only 3 and 2 parameters
Ẋ2 = γ12 X1f112 X5f152 − γ22 X2f222 X5f252 − γ23 X2f223 for the second and third terms in the second equation are
to be estimated. Similarly, only 1 parameter for the third
Ẋ3 = γ22 X2f222 X5f252 − γ32 X3f332 X5f352 term in the third equation need to be computed. For the fifth
−γ33 X3f333 X4f343 X5f353 equation, the first five terms have previously been computed
Ẋ4 = 2 γ32 X3f332 X5f352 − γ42 X3f432 X4f442 X5f452 and only the remaining 2 parameters in the sixth term need
to be estimated.
Ẋ5 = 2 γ32 X3f332 X5f352 + γ42 X3f432 X4f442 X5f452
In solving for the parameters of the first decoupled equa-
−γ12 X1f112 X5f152 − γ23 X2f223 − γ22 X2f222 X5f252 tion, the algorithm iterates on the equation
−γ51 X5f551
S1 (t) = γ11 X2 (t)f121 X6 −γ12 X1 (t)f112 X5 (t)f152 t ∈ [t0 , tN ]
Simulated biochemical profiles consisting of 50 points per
profile were generated using same parameter values as in The parameters for the first term, namely γ11 and f121 ,
the model of [34] (see col. 3 of Table I for these values). We are estimated first using regularized least squares regression
chose the following initial condition and ten different inputs while parameters of the second term are guessed using par-
(X6 ) to generate ten sets of interesting profiles: ticle swarm optimization. The regression estimates are saved
while the PSO values are eventually discarded. The second
X(t0 ) = [0.022, 1.3, 9.4, 0.0086, 0.80];
(and last) term parameters are computed next using least
X6 ∈ {15.0, 12.0, 10.0, 8.0, 6.0, 4.0, 16.0, 17.0, 18.0, 19.0} squares regression without any need for PSO computation

3699
for the parameters. Thus, all final HRLS-PSO values are least to cut further the computation time in half.
squares regression estimates.
TABLE II
After convergence of the swarms, the parameter values
C OMPUTATION T IME
obtained were very close to the original values (see Col.
3 of Table I). Recovered parameters fit the simulated time Equation min:secs
1 4:20
course data as shown in Fig. IV. To check for the con- 2 7:18
sistency of results, we performed 50 trials on the same 3 6:47
data, differing only in the particle swarm optimizer random 4 7:43
5 9:58
number generator initialization values. For input X6 = 15.0, Total 36:06
the errors for the estimates were negligible except for four
values (γ23 , γ33 , γ51 and f551 ) which nevertheless yielded
small errors (3.85%, -3.89% ,-2.6% and 2.81% respectively).
The largest parameter percentage errors were observed for
input X6 = 6.0 which produced parameter domain errors
of 167.39% -40.44% -14.77% -2.62% and 2.32% for γ23 ,
f233 , γ33 , γ51 and f551 respectively. Despite these large
percentage values, the corresponding time domain errors
for these parameters were negligible: 0.0086%, 0.00298%,
0.26%, 1.07%, and 0.5% respectively. The same trend was
also observed for other input values. Thus, the profiles are
insensitive to the values of these parameters. Convergence
was always achieved and the standard deviations of the
parameters from their mean values were very low.

TABLE I
AVERAGE PARAMETER E STIMATE P ERCENTAGE E RRORS FOR 50 RUNS
OF P ROPOSED HRLS-PSO A LGORITHM ON S IMULATED Y EAST
G LYCOLYSIS PATHWAY DATA .

Param True Noise-free Noise-free Noisy Data


Value X6 = 15.0 X6 = 4.0 σ = 0.10
γ11 0.8122 0.00 -0.01 -16.46 Fig. 1. Yeast Glycolysis Pathway Model time course profiles with
γ12 196.129 0.01 -0.01 38.57 parameters obtained from our proposed Hybrid Regularized Least Squares-
f121 -0.2344 0.00 0.00 17.43 Particle Swarm Optimization Algorithm exhibit close fitting with the data
f112 0.7464 0.00 -0.01 18.21 points used to train the algorithm.
f152 0.0243 0.00 0.00 17.16
γ22 16.5854 0.00 -0.05 -12.49 One possible disadvantage of the use of the decoupling
f222 0.7318 -0.01 -0.05 17.08
f252 -0.3941 0.00 -0.06 13.60 strategy is that it may be overly sensitive to noise in the
γ23 0.012879 3.85 167.39 336.26 derivatives. For this reason, the HRLS-PSO Algorithm was
f223 8.6107 -0.43 -40.44 -67.62 tested on noisy slopes. In this new set of experiments, the
γ33 9.59175 -3.89 -14.77 -97.95
γ32 3.78146 0.02 -0.01 -65.63 algorithmic settings were the same as for the noise-free case
f332 0.6159 0.00 0.00 45.12 except for the population size which was doubled to 2000
f352 0.1308 0.00 0.00 21.79 particles.
γ42 325.08 -0.04 0.03 201.43
f432 0.05 0.17 -0.12 -98.34 Gaussian noise was added to the concentrations and slopes
f442 0.533 -0.02 0.01 57.68 at each time point:
f452 -0.0822 -0.04 0.03 114.92
γ51 25.1 -2.60 -2.62 -70.22
f551 1.0 2.81 2.32 162.50
Xi (t)noisy = Xi (t)(1 + N (µ, σ 2 )) t ∈ [t0 , tN ]

Si (t)noisy = Si (t)(1 + N (µ, (2σ)2 )) t ∈ [t0 , tN ]


The computation times for the five equations are listed
in Table II. These values were obtained for a 3.0 GHz 64- where N (µ, b2 ) denotes the normal distribution with mean µ
bit Intel Xeon Mac XServe. Overall computation time can and standard deviation b. Noise for slopes were four higher
still be reduced by exploiting the natural parallelism in the than those for concentrations since slope estimates could
processing. Computations for the following pairs of equations deviate as much as 2σ away from their true values [19].
can proceed independently of each other: equations 1 and 4; Noisy datasets were generated with µ = 0.0 and noise
equations 3 and 5. With the availability of multicore proces- levels σ = 0.02, 0.04, 0.06, 0.08, 0.10. HRLS-PSO results
sors, these independent computations can run in parallel as show that terms with one and two kinetic rate parameters
separate execution threads. With this scheme, it is possible tend to manifest quadratic dependence of parameter error

3700
with noise level.) We observe that terms with three kinetic The swarms produced the parameter estimates listed in
rate parameters are much more sensitive to noise than those Table III. These values were close to those of [39] which
with fewer than three. were obtained manually and with considerable effort using
a software tool called WebMetabol together with the user’s
The Glycolysis Pathway in Lactococcus lactis
extensive knowledge of the domain.
We now test the usefulness of the proposed algorithm on The logarithm of errors for each of the second to the sixth
the extraction of rate constant and kinetic order parameters equations are (-9.914045, -5.915990, +2.41424, +1.531676,
from actual experimental data. Neves et. al. [24] used nu- +3.025575) indicating good fits for the profiles of G6P and
clear magnetic resonance spectroscopy to study the sugar FBP and poor fits for the 3PGA, PEP and Pyruvate profiles.
metabolism in Lactococcus lactis and their data which we Time course data fit with GMA model for the six metabolites
use in our numerical experiments here was made available are shown in Fig. 2. Previous results obtained by Voit et. al.
through [39]. The GMA Model equations for the simplified [39] are shown for comparison purposes.
Glycolysis Pathway in L. lactis are as follows [39]: Substitution of the parameter values into the Glycolysis
Ẋ1 = −β1 X1h11 X2h12 X5h25 GMA Model with either kinetic orders h513 , h51,Pi or both
Ẋ2 = α2 X1h11 X2h12 X5h25 − β2 X2h22 AT P h2,AT P equated to zero with the corresponding rescaling of the rate
constant β51 yielded very similar predictions as the values
Ẋ3 = β2 X2h22 AT P h2,AT P − β3 X3h33 Pih3,Pi N ADh3,N AD of [39] thus validating our results.
h3,Pi
Ẋ4 = 2β3 X3h33 Pi N ADh3 ,N AD + α4 X5g45 − β4 X4h44
TABLE III
Ẋ5 = β4 X4h44 − α2 X1h11 X2h12 X5h25 − α4 X5g45
PARAMETER E STIMATES FOR THE L ACTOCOCCUS LACTIS GMA M ODEL
hh51,P
−β51 X3h513 X5h515 Pi i
− β52 X5h525 (PARAMETERS IN BOLDFACE WERE OBTAINED USING STANDARD L EAST
h51,Pi S QUARES R EGRESSION )
Ẋ6 = α2 X1h11 X2h12 X5h25 + β51 X3h513 X5h515 Pi
−β61 X6h616 X3h613 N ADh61,N AD − β62 X6h626
Param Voit et al HRL-PSO Param Voit et al HRL-PSO
Ẋ7 = β61 X6h616 X3h613 N ADh61,N AD α2 0.3592 0.287057 β51 0.9375 0.94035
h11 1.1287 1.1287 h513 0.8744 0.868567
In this model, the key metabolites and enzymes are: Glucose h12 -1.2906 -1.2906 h515 0.0991 0.093747
(X1 ), Glucose-6-Phosphate (X2 ), Fructose Bi-Phosphate h25 0.2168 0.2168 h51,Pi -0.0005 -0.000487
(X3 ), 3-Phosphoglycerate (X4 ), Phosphoenolpyruvate (X5 ), β2 0.3115 0.43412 β52 0.2087 0.204364
h22 2.1700 1.973495 h525 0.0002 0.000201
Pyruvate (X6 ), and Lactate (X7 ), ATP, NADH, inorganic h2,AT P 0.8152 0.814113 β61 0.0417 0.0417
Phosphate (Pi ). Time series data from in vivo NMR exper- β3 0.4698 0.475351 h616 0.6202 0.6202
iments of 13 C-labeled glucose in L. lactis were previously h33 1.0297 0.9918895 h613 0.9263 0.9263
h3,Pi 0.2377 0.338727 β62 1.3258 1.0792
filtered using an artificial neural network and cubic splines. α4 1.1452 1.015756 h626 1.5255 1.51905
The slopes were subsequently computed using cubic splines g45 3.5453 3.513723 β4 2.1670 2.547663
using Matlab. The GMA differential equations above were h44 2.1649 2.087733
decoupled, their left-hand side derivatives replaced with
computed slopes and processed using the proposed algorithm
with the following constraints:
• Rate Parameters: (α2 , β2 , β3 , β52 ) ∈ [0.0, 100.0]
V. C ONCLUSION
(β51 , β52 , β62 ) ∈ [0.0, 10.0]; (α4 , β4 ) ∈ [0.0, 5.0] In this paper, we have described a new hybrid pa-
• Kinetic Parameters: (h33 , h3,Pi ) ∈ [−1.0, 0.0];
rameter estimation algorithm for Generalized Mass Action
(h22 , h2,AT P , h44 , hh525 , h626 , h513 , h515 , g45 ) ∈ (GMA) systems based on regularized least squares regression
[0.0, 5.0]; h51,Pi ∈ [−5.0, 0.0] and multi-objective particle swarm optimization methods.
The HRLS-PSO settings used were the following: muta- Through a term-wise decomposition strategy in which the
tion probability = 0.5, population size = 1000, number term parameters are estimated one term at a time, the
of generations = 200, archive size = 500. Since the right algorithm circumvents the curse of dimensionality problem
hand side of the first and seventh differential equations frequently faced by parameter estimation algorithms. Numer-
are monomials with constant coefficients, they only require ical experiments on simulated and actual experimental data
standard least squares computation and will therefore always show the effectiveness and accuracy of the algorithm.
converge to the same unique solution (see Table III values
for β1 , h11 , h12 , h25 and β61 , h616 , h613 ). These parameter VI. ACKNOWLEDGEMENTS
values were then used in the equations 2 and 6 which were
computed next. The third and fifth equations both depend on The authors would like to thank Prof. Eberhard O. Voit
the computational results of the second and sixth equations. (Georgia Tech) for insightful comments and providing us the
Calculations for the fourth equation will wait for the results Lactococcus lactis dataset and Dr. Ricardo del Rosario (Max
of the third and fifth equation before they could commence. Planck Biochem and UP Diliman) for helpful discussions and
Total computation time for this system was 41 min 8 secs. suggestions.

3701
Fig. 2. Time-course plots for Lactococcus Lactis GMA model with Fig. 3. Particle Swarm Optimization Algorithm. Previous estimates
parameters obtained from our proposed Hybrid Regularized Least Squares- obtained by Voit et.al. [39] are plotted as dashed lines. Data are shown
as dots.

3702
R EFERENCES [25] Polisetty, P.K., Voit, E.O., and Gatzke, E.P. (2006). Identification of
metabolic system parameters using global optimization methods. Theor.
Biol. Med. Model., 3(4).
[1] Almeida, J., and Voit, E.O. (2003). Neural-network based parameter
[26] Ramos, A., Neves, A.R., Santos, H. (2002). Metabolish of lactic
estimation in complex biomedical systems. Genome Inform., 14,114–
acid bacteria studied by nuclear magnetic resonance. Antonie Van
23.
Leeuwenhoek 82 (1-4): 249-261.
[2] Bellman R.E. (1961). Adaptive Control Processes. Princeton University
[27] Raquel, C.R. and Naval, P.C. (2005). An effective use of crowding
Press, Princeton, NJ.
distance in multiobjective particle swarm optimization. In Proc. of
[3] Björkström, A. (2001). Ridge regression and inverse problems. Research Genetic and and Evolutionary Conference (GECCO 2005),257–64,
Report in Mathematical Statistics, Stockholm University, 2000:5. ACM Press.
[4] Chen, L., Bernard, O., Bastin, G., and Angelov, P. (2000). Hybrid [28] Reyes-Sierra, M., and Coello Coello, C.A. (2006). Multi-objective
modeling of biotechnological processes using neural networks. Control particle swarm optimizers: A survey of the state-of-the-art. Int. J. Comp.
Eng. Pract., 8, 821–27. Intel. Res.. 2(3), 287–308.
[5] Cho, DY., Kwang-Hyun, C., and Byoung-Tak Z. (2006). Identification [29] Seatzu, C. (2000). A fitting based method for parameter estimation in
of biochemical networks by s-tree based genetic programming. Bioin- s-systems. Dynamic Systems and Applications, vol. 9, no. 1, 77-98.
formatics, 22(13), 16 31–1640. [30] Spieth, C., Streichert, F., Speer, N., and Zell, A.(2006). A memetic
[6] Chou, IC., Martens, H. and Voit, E.O. (2006). Parameter estimation inference method for gene regulatory networks based on s-systems.
in biochemical system models with alternating regression. Theor. Biol. In Proc. of IEEE Congress on Evolutionary Computation (CEC
Med. Model., 3(25). 2004),(1)152–157.
[7] Curto, R., Sorribas, A., and Cascante, M. (1995). Comparative char- [31] Spieth, C., Worzischek, R., and Streichert, F. (2006). Comparing
acterization of the fermentation pathway of saccharomyces cerevisiae evolutionary algorithms on the problem of network inference. In Proc.
using biochemical systems theory and metabolic control analysis. model of Genetic and and Evolutionary Conference (GECCO 2006),279–286.
definition and nomenclature. Math. Biosci. 130,25–50. [32] Tucker, W., Kutalik, Z., and Moulton, V. (2007). Estimating parameters
[8] Deb, K., Pratap, A., Agarwal, S., Meyarivan, T. (2002). A fast and for generalized mass action models using constraint propagation.Math.
elitest multiobjective genetic algorithm: NGSA-II. IEEE Trans. Evol. Biosciences,208(2): 607–620.
Comp. 6:2,182–197. [33] Ueda, T., Koga, N., and Okamoto, M. (2001). Efficient numerical
[9] R. C.H. del Rosario, E. R. Mendoza, E. O. Voit. (2008). Challenges optimization technique based on real-coded genetic algorithm Genome
in Lin-log modeling of Glycolysis in Lactococcus lactis, IET Systems Inform., 12:451–453.
Biology 2:3, 136–149. [34] Voit E.O. (2000). Computational analysis of biochemical systems.
[10] Eberhart, R and Kennedy, J. (1995). A new optimizer using particle Cambridge University Press, Cambridge, UK.
swarm theory. Proc. 6th Int. Symp. Micro Machine and Human Science [35] Vilela, M., Borges, C., Vasconcelos, A.T., Santos, H., Voit, E.O. and
(MHS ’95), 39-43. Almeida, J. (2007). Automated smoother for numeric decoupling of
[11] Gallazzo, J.L., and Bailey, J.E. (1990). Fermentation pathway kinetics dynamic models. BMC Bioinformatics, 8:305.
and metabolic flux control in suspended and immobilized Saccha- [36] Vilela, M., Chou I-C., Vinga S., Vasconcelos, A.T., Voit, E.O. and
romyces cerevisiae. Enzyme Microb. Technol., 12, 162–72. Almeida, J. (2008). Parameter Optimization in S-system models. BMC
[12] Gonzalez, O.R., Küper, C., Jung, K., Naval, P.C., Mendoza, E. (2007). Syst. Biol., 2:35.
Parameter estimation using simulated annealing for s-system models of [37] Voit, E.O. and Almeida, J. (2004). Decoupling dynamical systems
biochemical networks. Bioinformatics, 23(4), 480–486. for pathway identification from metabolic profiles. Bioinformatics, 20,
[13] Hansen, P.C. (1992). Analysis of ill-posed problems by means of the 1670–81.
L-curve, SIAM Review, 34,561–80. [38] Voit, E.O. and Savageau, M.A. (1982). Power-law approach to model-
ing biological systems: III. Methods of analysis, J. Ferment. Technol.,
[14] Ho, SY., Hsieh, CH., and Yu, FC. (2005). Inference of s-system models
60(3), 233–241.
for large-scale genetic networks. InProc. 21st Int. Conference Data
[39] Voit, E.O., Almeida, J., Marino, S., Lall, R., Goel, G., Neves, A.R. and
Engineering Workshops 2005, 1155.
Santos, H. (2006). Regulation of glycolysis in Lactococcus lactis: an
[15] Kikuchi, S., Tominaga, D., Arita, M., Takahashi, K., Tomita, M.
unfinished system biological case study. IEE Proc. Syst. Biol., 153(4),
(2003). Dynamic modeling of genetic networks using genetic algorithm
286–98.
and S-system. Bioinformatics, 19, 643–50.
[40] Zitzler, E., Deb, K., and Thiele, L. (2000). Comparison of multiobjec-
[16] Kim, KY., Cho, DY., and Byoung-Tak, Z. (2006). Multi-stage evo- tive evolutionary algorithms: empirical results. Evolutionary Computa-
lutionary algorithms for efficient identification of gene regulatory net- tion, 8(2), 173–95.
works. LNCS 3907, Springer Verlag, 45–56.
[17] Kimura, S., Ide, K., Kashihara, A., Kano, M., Hatakeyama, M., Masui,
R., and Nakagawa, N. (2005). Inference of s-system models of genetic
networks using a cooperative coevolutionary algorithm. Bioinformatics,
21, 1154–1163.
[18] Kitayama, (2006). A simplified method for power-law modelling
of metabolic pathways from time-course data and steady-state flux
profiles.Theor. Biol. Med. Model., 3(24).
[19] Kutalik, Z., Tucker, W., and Moulton, V. (2007). S-system parameter
estimation for noisy metabolic profiles using newton-flow analysis IET
Syst. Biol., 1:(3),174–180.
[20] Lall, T., and Voit, E.O. (2005). Parameter estimation in modulated,
unbranched reaction chains within biochemical systems. Comput. Biol.
Chem., 29:,309–318.
[21] Moles, C.G., Mendes, P., Banga, J.R. (2003). Parameter estimation in
biochemical pathways: a comparison of global optimization methods.
Genome Inform., 13:2467–2474.
[22] Morosov, V.A. (1966). On the solution of functional equations by the
method of regularization, Soviet. Math. Dokl., 7,414–17.
[23] Noman, N. and Iba, H. (2005). Inference of gene regulatory networks
using s-systems and differential evolution. In Proc. of Genetic and and
Evolutionary Conference (GECCO 2005),439–46, ACM Press.
[24] Neves, A.R., Ramos, A., Nunes, M.C., Kleerebezem, M., Hugenholtz,
J., de Vos, W.M., Almeida, J.S., and Santos, H. (1999). In vivo nuclear
magnetic resonance studies of glycolytic kinetics in Lactococcus lactis.
Biotechnol. Bioeng., 64, 200–12.

3703

You might also like