Improving GPU SIMD Control Flow Efficiency via Hybrid Warp Size Mechanism

Jin, Xingxing

Improving GPU SIMD Control Flow Efficiency via Hybrid Warp Size Mechanism

Files

JIN-THESIS.pdf (1.38 MB)

Date

2012-08-17

Authors

Jin, Xingxing

Degree Level

Masters

Abstract

High single instruction multiple data (SIMD) efficiency and low power consumption have made graphic processing units (GPUs) an ideal platform for many complex computational applications. Thousands of threads can be created by programmers and grouped into fixed-size SIMD batches, known as warps. High throughput is then achieved by concurrently executing such warps with minimal control overhead. However, if a branch instruction occurs, which assigns different paths to different threads, this warp will be broken into multiple warps that have to be executed serially, consequently reducing the efficiency advantage of SIMD. In this thesis, the contemporary fixed-size warp design is abandoned and a hybrid warp size (HWS) mechanism is proposed. Mixed-size warps are generated according to HWS and are scheduled and issued flexibly. Once a branch divergence occurs, split warps are squeezed according to the proposed algorithm, and warp sizes are downscaled wherever applicable. Based on updated warp sizes, warp schedulers calculate the number of cycles the current warp needs and issue the next warp accordingly. As a result, hybrid warps are pushed into pipelines as soon as possible and more pipeline stages are overlapped. The simulation results show that this mechanism yields an average speedup of 1.20 over the baseline architecture for a wide variety of general purpose GPU applications. This work also integrates HWS with dynamic warp formation (DWF), which is a well-known branch handling mechanism aimed at improving SIMD utilization by forming new warps out of split warps in real time. The warp forming policy is modified to better tolerate warp conflicts. Also, squeeze operations are added before a warp merges with other warps. The simulation shows that the combination of DWF and HWS generates an average speedup of 1.27 over the DWF-only platform for the same set of GPU benchmarks.

Keywords

SIMD, GPU, Warp, Branch Divergence

Degree

Master of Science (M.Sc.)

Department

Electrical and Computer Engineering

Program

Electrical Engineering

Advisor

Ko, Seok-Bum
Daku, Brian

Committee

Wahid, Khan ; Karki, Rajesh ; Ikechukwuka, Oguocha

URI

http://hdl.handle.net/10388/ETD-2012-06-527

Collections

Graduate Theses and Dissertations

Full item page

Improving GPU SIMD Control Flow Efficiency via Hybrid Warp Size Mechanism

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

ORCID

Type

Degree Level

Abstract

Description

Keywords

Citation

Degree

Department

Program

Advisor

Committee

Citation

Part Of

item.page.relation.ispartofseries

URI

DOI

item.page.identifier.pmid

item.page.identifier.pmcid

Collections