Algorithms and low-power hardware for image processing applications

Ji, Alex.

Author(s)

Ji, Alex.

Download1108621278-MIT.pdf (6.610Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

Anantha P. Chandrakasan.

Terms of use

MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Image processing has become more important with the ever increasing amount of available image data. This has been accompanied by the development of new algorithms and hardware. However, dedicated hardware is often required to run these algorithms efficiently and conversely, algorithms need to be developed to exploit the benefits of the new hardware. For example, depth cameras have been created to add a new dimension to human-computer interaction. They can benefit applications that can operate on the raw depth data directly, such as breath monitoring. As for new algorithms, convolutional neural networks (CNNs) have become the standard for difficult image processing tasks due to their high accuracy. But to execute them efficiently, we need new hardware to fully exploit the parallelism inherent in these computations. The first part of the thesis presents an algorithm for breath monitoring using a low-resolution time-of-flight camera. It consists of automatic region-of-interest detection, followed by frequency estimation. It can be accurate to within 1 breath per minute, comparing with a respiratory belt as reference. The second part presents a processing element (PE) for a neural network accelerator supporting compressed weights and using a new technique called factored computation. The PE consists of an accumulator array, row decoder, and output combination block. Modifications to the row decoder can allow for reconfigurability of the compressed weight bit-widths. Several common layer operations in CNNs are described and mapped onto the proposed hardware. An energy model of the design is formulated and verified by synthesizing and simulating a basic processing element containing an 8 x 20 accumulator array. Simulations show the proposed design achieves up to 4.5x reduction in the energy per MAC compared to a baseline 16-bit fixed-point MAC unit.

Description

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018

Cataloged from PDF version of thesis.

Includes bibliographical references (pages 69-71).

Date issued

2018

URI

https://hdl.handle.net/1721.1/121835

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Graduate Theses