Extended stochastic dynamics: theory, algorithms, and applications in multiscale modelling and data science
View/ Open
Date
29/06/2016Author
Shang, Xiaocheng
Metadata
Abstract
This thesis addresses the sampling problem in a high-dimensional space, i.e., the
computation of averages with respect to a defined probability density that is a
function of many variables. Such sampling problems arise in many application
areas, including molecular dynamics, multiscale models, and Bayesian sampling
techniques used in emerging machine learning applications. Of particular interest
are thermostat techniques, in the setting of a stochastic-dynamical system,
that preserve the canonical Gibbs ensemble defined by an exponentiated energy
function. In this thesis we explore theory, algorithms, and numerous applications
in this setting.
We begin by comparing numerical methods for particle-based models. The
class of methods considered includes dissipative particle dynamics (DPD) as well
as a newly proposed stochastic pairwise Nosé-Hoover-Langevin (PNHL) method.
Splitting methods are developed and studied in terms of their thermodynamic
accuracy, two-point correlation functions, and convergence. When computational
efficiency is measured by the ratio of thermodynamic accuracy to CPU time, we
report significant advantages in simulation for the PNHL method compared to
popular alternative schemes in the low-friction regime, without degradation of
convergence rate.
We propose a pairwise adaptive Langevin (PAdL) thermostat that fully captures
the dynamics of DPD and thus can be directly applied in the setting of
momentum-conserving simulation. These methods are potentially valuable for
nonequilibrium simulation of physical systems. We again report substantial improvements
in both equilibrium and nonequilibrium simulations compared to popular
schemes in the literature. We also discuss the proper treatment of the Lees-Edwards boundary conditions, an essential part of modelling shear flow.
We also study numerical methods for sampling probability measures in high
dimension where the underlying model is only approximately identified with a
gradient system. These methods are important in multiscale modelling and in
the design of new machine learning algorithms for inference and parameterization
for large datasets, challenges which are increasingly important in "big data"
applications. In addition to providing a more comprehensive discussion of
the foundations of these methods, we propose a new numerical method for the
adaptive Langevin/stochastic gradient Nosé-Hoover thermostat that achieves a
dramatic improvement in numerical efficiency over the most popular stochastic
gradient methods reported in the literature. We demonstrate that the newly established
method inherits a superconvergence property (fourth order convergence
to the invariant measure for configurational quantities) recently demonstrated in
the setting of Langevin dynamics.
Furthermore, we propose a covariance-controlled adaptive Langevin (CCAdL)
thermostat that can effectively dissipate parameter-dependent noise while maintaining
a desired target distribution. The proposed method achieves a substantial
speedup over popular alternative schemes for large-scale machine learning applications.