Efficient Deep Neural Network Computation on Processors
Yu, Jiecao
2019
Abstract
Deep neural networks (DNNs) have become a fundamental component of various applications. They are trained with a large amount of data to make accurate predictions. However, conventional DNN models have high computation and storage cost. These costs make it ...challenging to deploy DNN-based algorithms on existing processors. There are various algorithms proposed to reduce DNN computation. These algorithms can be divided into two main categories: network compression and domain transformation. Network compression removes the redundancy in DNN models by either pruning unimportant parameters or lowering the parameter precisions. It helps reduce both the required computation and storage space. For domain transformation, convolution operations are converted into different domains and replaced with less computation. Nevertheless, these algorithms are designed without considering the characteristics of underlying processors, which may lead to a degradation in computation performance and an increase in model size. This thesis solves this challenge by customizing the computation reduction algorithms for the processor architecture, and augmenting the hardware to provide better support for DNN computation. The first part of this thesis proposes to customize DNN pruning techniques for the underlying processors by matching the pruned network structure to the parallel hardware organization. Two techniques are introduced: SIMD-aware weight pruning and node pruning. SIMD-aware weight pruning maintains weights in aligned fixed-size groups to fully utilize the SIMD support. Node pruning removes redundant nodes instead of individual weights to reduce computation without sacrificing the dense matrix format. These two techniques are combined based on the hardware parallelism and layer types. Besides pruning, I investigate deploying sub-byte DNN models on microcontrollers. Due to the incompatibility between the sub-byte data formats and the byte-addressable memory hierarchy, using sub-byte weights and inputs will cause significant performance degradation. To address this issue, a new convolution algorithm is proposed to perform multiply-accumulate computations with bitwise logic operations. Extra instruction set architecture (ISA) extensions are introduced to accelerate the computation further. The last part focuses on accelerating DNN computation by combining pruning techniques and Winograd convolution. These two techniques cannot be directly combined because Winograd transformation fills in the sparsity resulting from spatial-domain pruning. To achieve a higher Winograd-domain sparsity, I propose a new pruning method, spatial-Winograd pruning. As the first step, spatial-domain weights are pruned in a structured way, which efficiently transfers the spatial-domain sparsity into the Winograd domain. For the next step, pruning and retraining are performed directly in the Winograd domain to increase the sparsity further. With these proposed techniques, this thesis solves the conflicts between the existing processor architectures and the computation reduction algorithms, enabling efficient DNN computation. [more]Subjects
Deep neural networks Processors
Types
Thesis
Metadata
Show full item recordCollections
Remediation of Harmful Language
The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.
Accessibility
If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.