A CUDA fast multipole method with highly efficient M2L farfield evaluationfield 
evaluation

Kohnke, B.; Kutzner, C.; Beckmann, A.; Lube, G.; Kabadshow, I.; Dachsel, H.; Grubmüller, H.

doi:10.1177/1094342020964857

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Zeitschriftenartikel

A CUDA fast multipole method with highly efficient M2L farfield evaluationfield evaluation

MPG-Autoren

/persons/resource/persons206214

Kohnke, B.
Department of Theoretical and Computational Biophysics, MPI for Biophysical Chemistry, Max Planck Society;

/persons/resource/persons15407

Kutzner, C.
Department of Theoretical and Computational Biophysics, MPI for biophysical chemistry, Max Planck Society;

/persons/resource/persons15155

Grubmüller, H.
Department of Theoretical and Computational Biophysics, MPI for biophysical chemistry, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

3260951.pdf
(Verlagsversion), 3MB

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Kohnke, B., Kutzner, C., Beckmann, A., Lube, G., Kabadshow, I., Dachsel, H., et al. (2021). A CUDA fast multipole method with highly efficient M2L farfield evaluationfield evaluation. The International Journal of High Performance Computing Applications, 35(1), 97-117. doi:10.1177/1094342020964857.

Zitierlink: https://hdl.handle.net/21.11116/0000-0007-4C69-F

Zusammenfassung

Solving an N-body problem, electrostatic or gravitational, is a crucial task and the main computational bottleneck in manyscientific applications. Its direct solution is an ubiquitous showcase example for the compute power of graphics processingunits (GPUs). However, the naive pairwise summation hasOðN2Þcomputational complexity. The fast multipole method(FMM) can reduce runtime and complexity toOðNÞfor any specified precision. Here, we present a CUDA-accelerated,CþþFMM implementation for multi particle systems withr1potential that are found, e.g. in biomolecular simulations.The algorithm involves several operators to exchange information in an octree data structure. We focus on the Multipole-to-Local (M2L) operator, as its runtime is limiting for the overall performance. We propose, implement and benchmarkthree different M2L parallelization approaches. Approach (1) utilizes Unified Memory to minimize programming andporting efforts. It achieves decent speedups for only little implementation work. Approach (2) employs CUDA DynamicParallelism to significantly improve performance for high approximation accuracies. The presorted list-based approach(3) fits periodic boundary conditions particularly well. It exploits FMM operator symmetries to minimize both memoryaccess and the number of complex multiplications. The result is a compute-bound implementation, i.e. performance islimited by arithmetic operations rather than by memory accesses. The complete CUDA parallelized FMM is incorporatedwithin the GROMACS molecular dynamics package as an alternative Coulomb solver.