Machine learning approach to reconstructing signalling pathways and interaction networks in biology
View/Open
.svn.zip (38.16Mb)
Figures.zip (15.63Mb)
associated files.zip (2.085Mb)
Date
02/07/2013Author
Dondelinger, Frank
Metadata
Abstract
In this doctoral thesis, I present my research into applying machine learning techniques
for reconstructing species interaction networks in ecology, reconstructing molecular
signalling pathways and gene regulatory networks in systems biology, and inferring
parameters in ordinary differential equation (ODE) models of signalling pathways.
Together, the methods I have developed for these applications demonstrate the usefulness
of machine learning for reconstructing networks and inferring network parameters
from data.
The thesis consists of three parts. The first part is a detailed comparison of applying
static Bayesian networks, relevance vector machines, and linear regression with L1
regularisation (LASSO) to the problem of reconstructing species interaction networks
from species absence/presence data in ecology (Faisal et al., 2010). I describe how I
generated data from a stochastic population model to test the different methods and
how the simulation study led us to introduce spatial autocorrelation as an important
covariate. I also show how we used the results of the simulation study to apply the
methods to presence/absence data of bird species from the European Bird Atlas.
The second part of the thesis describes a time-varying, non-homogeneous dynamic
Bayesian network model for reconstructing signalling pathways and gene regulatory
networks, based on L`ebre et al. (2010). I show how my work has extended this model
to incorporate different types of hierarchical Bayesian information sharing priors and
different coupling strategies among nodes in the network. The introduction of these
priors reduces the inference uncertainty by putting a penalty on the number of structure
changes among network segments separated by inferred changepoints (Dondelinger
et al., 2010; Husmeier et al., 2010; Dondelinger et al., 2012b). Using both synthetic
and real data, I demonstrate that using information sharing priors leads to a better reconstruction
accuracy of the underlying gene regulatory networks, and I compare the
different priors and coupling strategies. I show the results of applying the model to
gene expression datasets from Drosophila melanogaster and Arabidopsis thaliana, as
well as to a synthetic biology gene expression dataset from Saccharomyces cerevisiae.
In each case, the underlying network is time-varying; for Drosophila melanogaster, as
a consequence of measuring gene expression during different developmental stages;
for Arabidopsis thaliana, as a consequence of measuring gene expression for circadian
clock genes under different conditions; and for the synthetic biology dataset, as
a consequence of changing the growth environment. I show that in addition to inferring
sensible network structures, the model also successfully predicts the locations of changepoints.
The third and final part of this thesis is concerned with parameter inference in
ODE models of biological systems. This problem is of interest to systems biology
researchers, as kinetic reaction parameters can often not be measured, or can only be
estimated imprecisely from experimental data. Due to the cost of numerically solving
the ODE system after each parameter adaptation, this is a computationally challenging
problem. Gradient matching techniques circumvent this problem by directly fitting the
derivatives of the ODE to the slope of an interpolant. I present an inference procedure
for a model using nonparametric Bayesian statistics with Gaussian processes, based
on Calderhead et al. (2008). I show that the new inference procedure improves on
the original formulation in Calderhead et al. (2008) and I present the result of applying
it to ODE models of predator-prey interactions, a circadian clock gene, a signal
transduction pathway, and the JAK/STAT pathway.
Collections
The following license files are associated with this item: