Research And Application Of Parallel Computing Algorithms For Statistical Phylogenetic Inference

Loading...
Thumbnail Image

Files

Publication or External Link

Date

2017

Citation

Abstract

Estimating the evolutionary history of organisms, phylogenetic inference, is a

critical step in many analyses involving biological sequence data such as DNA.

The likelihood calculations at the heart of the most effective methods for

statistical phylogenetic analyses are extremely computationally intensive, and

hence these analyses become a bottleneck in many studies. Recent progress in

computer hardware, specifically the increase in pervasiveness of highly

parallel, many-core processors has created opportunities for new approaches to

computationally intensive methods, such as those in phylogenetic inference.

We have developed an open source library, BEAGLE, which uses parallel

computing methods to greatly accelerate statistical phylogenetic inference,

for both maximum likelihood and Bayesian approaches. BEAGLE defines a uniform

application programming interface and includes a collection of efficient

implementations that use NVIDIA CUDA, OpenCL, and C++ threading frameworks

for evaluating likelihoods under a wide variety of evolutionary models, on

GPUs as well as on multi-core CPUs. BEAGLE employs a number of different

parallelization techniques for phylogenetic inference, at different

granularity levels and for distinct processor architectures. On CUDA and

OpenCL devices, the library enables concurrent computation of site likelihoods,

data subsets, and independent subtrees. The general design features of the

library also provide a model for software development using parallel computing

frameworks that is applicable to other domains.

BEAGLE has been integrated with some of the leading programs in the field,

such as MrBayes and BEAST, and is used in a diverse range of evolutionary

studies, including those of disease causing viruses. The library can provide

significant performance gains, with the exact increase in performance

depending on the specific properties of the data set, evolutionary model, and

hardware. In general, nucleotide analyses are accelerated on the order of

10-fold and codon analyses on the order of 100-fold.

Notes

Rights