Scalar-vector GPU architectures
Permanent URL:
http://hdl.handle.net/2047/D20251481
Rubin, Norman (Committee member)
Schirner, Gunar (Committee member)
A significant number of SIMD instructions in GPU compute programs demonstrate scalar characteristics, i.e., they operate on the same data across their active lanes. Treating them as normal SIMD instructions results in inefficient GPU execution. To better serve both scalar and vector operations, we propose a heterogeneous scalar-vector GPU architecture. In this thesis we propose the design of a specialized scalar pipeline to handle scalar instructions efficiently with only a single copy of the data, freeing the SIMD pipeline for normal vector execution. The proposed architecture provides an opportunity to save power by just broadcasting the results of a single computation to multiple outputs. In order to balance scalar and vector units, we propose novel schemes to efficiently resolve scalar-vector data dependencies, schedule warps, and dispatch instructions. Also, we consider the impact of varying warp sizes on our scalar-vector architecture and explore subwarp execution for power efficiency. Finally, we demonstrate that the interconnect and memory subsystem can be the new limiting factor on scalar-vector execution.
scalar
subwarp execution
Copyright restrictions may apply.