Scalar-vector GPU architectures

Title:
Scalar-vector GPU architectures
Creator:
Chen, Zhongliang (Author)
Contributor:
Kaeli, David (Advisor)
Rubin, Norman (Committee member)
Schirner, Gunar (Committee member)
Language:
English
Publisher:
Boston, Massachusetts : Northeastern University, 2016
Date Accepted:
December 2016
Date Awarded:
December 2016
Type of resource:
Text
Genre:
Dissertations
Format:
electronic
Digital origin:
born digital
Abstract/Description:
Graphics Processing Units (GPUs) have evolved to become high throughput processors for general purpose data-parallel applications. Most GPU execution exploits a Single Instruction Multiple Data (SIMD) model, where a single operation is performed on multiple data at a time. However, neither runtime or hardware pays attention to whether the data components on SIMD lanes are the same or different. When a SIMD unit operates on multiple copies of the same data, redundant computations are generated. The inefficient execution can degrade performance and deteriorate power efficiency.

A significant number of SIMD instructions in GPU compute programs demonstrate scalar characteristics, i.e., they operate on the same data across their active lanes. Treating them as normal SIMD instructions results in inefficient GPU execution. To better serve both scalar and vector operations, we propose a heterogeneous scalar-vector GPU architecture. In this thesis we propose the design of a specialized scalar pipeline to handle scalar instructions efficiently with only a single copy of the data, freeing the SIMD pipeline for normal vector execution. The proposed architecture provides an opportunity to save power by just broadcasting the results of a single computation to multiple outputs. In order to balance scalar and vector units, we propose novel schemes to efficiently resolve scalar-vector data dependencies, schedule warps, and dispatch instructions. Also, we consider the impact of varying warp sizes on our scalar-vector architecture and explore subwarp execution for power efficiency. Finally, we demonstrate that the interconnect and memory subsystem can be the new limiting factor on scalar-vector execution.
Subjects and keywords:
graphics processing units
scalar
subwarp execution
DOI:
https://doi.org/10.17760/D20251481
Permanent Link:
http://hdl.handle.net/2047/D20251481
Use and reproduction:
In Copyright: This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the right-holder(s). (http://rightsstatements.org/vocab/InC/1.0/)
Copyright restrictions may apply.

Downloads