United tasks and conduits for programming on heterogeneous computing platforms
Permanent URL:
http://hdl.handle.net/2047/D20263339
Mi, Ningfang (Committee member)
Basagni, Stefano (Committee member)
To reduce development effort and improve portability of applications we need a programming method that can hide low-level platform hardware features to ease the programming of parallel applications as well as maintain good performance. In this research, we propose a lightweight and flexible parallel programming framework, Unified Tasks and Conduits (UTC), for heterogeneous computing platforms. In this framework, we provide high level program components, tasks and conduits, for a user to easily construct parallel applications. In a program, computational workloads are abstracted as task objects and different tasks make use of conduit objects for communication. Multiple tasks can run in parallel on different devices and each task can launch a group of threads for execution. In this way, we can separate an applications' high-level structure from low-level task implementations. When porting such a parallel application to utilize different computing resources on different platforms, the applications' main structure can remain unchanged and only adopt appropriate task implementations, easing the development effort. Also, the explicit task components can easily implement task and pipeline parallelism. In addition, the multiple threads of each task can efficiently implement data parallelism as well as overlapping computation and communication.
We have implemented a runtime system prototype of the Tasks and Conduits framework on a cluster platform, supporting the use of multicore CPUs and GPUs for task execution. To facilitate muti-threaded tasks, we implement a task based global shared data object to allow a task to create threads across multiple nodes and share data sets through one-sided remote memory access mechanism. For GPU tasks, we provide concise interfaces for users to choose proper types of memory for host/device data transfer. To demonstrate and analyze our framework, we have adapted a set of benchmark applications to our framework. The experiments on real clusters show that applications with our framework have similar or better performance than traditional parallel implementations such as OpenMP or MPI. Also we are able to make use of GPUs on the platform for acceleration through GPU tasks. Base on our high level tasks and conduits design, we can maintain a well organized program structure for improved potability and maintainability.
GPU
heterogeneous
multi-threads
parallel computing
task
Copyright restrictions may apply.