Sparse Identification of Epidemiological Models from Empirical Data
Loading...
Date
2018-09-20
Authors
Horrocks, Jonathan H.
Advisor
Bauch, Chris
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Current modelling practices in mathematical epidemiology are predicated on mechanisms
stemming from theoretical assumptions, such as mass action incidence. Deterministic disease
models can describe many patterns observed in empirical incidence data but challenges remain
in creating accurate, parsimonious models that other predictive value. Recent advances
in data-driven techniques give rise to new model discovery methods that forego theoretical
assumptions and attempt to create sparse, dynamic models directly from real-world data.
Our goal is to apply these techniques to empirical case notification data of epidemiological
systems, to either con rm current practices or give new insight not accessible by human
intuition.
We adapt a recently developed technique called Sparse Identification of Nonlinear Dynamics
(SINDy), which has demonstrated ability to recover governing equations of complex
dynamical systems. To lend insight into this process, the SINDy algorithm was first applied
to simulated data from various forms of the SIR model, a standard compartmental model
of epidemics. Several conversion processes were then utilized to recover both the susceptible
and infectious classes from raw incidence data. Finally, the SINDy algorithm was applied
to empirical data from measles, varicella, and rubella datasets, three diseases that other
contrasting dynamic behaviour, and the resulting time-series and model coefficients were
analysed.
The resulting models closely mimic the dynamics of the empirical data, most notably
the frequency of epidemics, for all three diseases considered. The coefficients discovered exhibit
sparsity, though not to the extent that current compartmental models do. Similarities
between the discovered model equations and fitted SIR models can be noted, including a
strong dependence on the cross-term corresponding with the mass action incidence mechanism.
These encouraging results indicate this data-driven technique may be of use in verifying
and improving current theoretical models in mathematical epidemiology.
Description
Keywords
Epidemiology, Dynamical Systems, Statistical Modelling