Algorithms and low-power hardware for image processing applications
Author(s)
Ji, Alex.
Download1108621278-MIT.pdf (6.610Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Anantha P. Chandrakasan.
Terms of use
Metadata
Show full item recordAbstract
Image processing has become more important with the ever increasing amount of available image data. This has been accompanied by the development of new algorithms and hardware. However, dedicated hardware is often required to run these algorithms efficiently and conversely, algorithms need to be developed to exploit the benefits of the new hardware. For example, depth cameras have been created to add a new dimension to human-computer interaction. They can benefit applications that can operate on the raw depth data directly, such as breath monitoring. As for new algorithms, convolutional neural networks (CNNs) have become the standard for difficult image processing tasks due to their high accuracy. But to execute them efficiently, we need new hardware to fully exploit the parallelism inherent in these computations. The first part of the thesis presents an algorithm for breath monitoring using a low-resolution time-of-flight camera. It consists of automatic region-of-interest detection, followed by frequency estimation. It can be accurate to within 1 breath per minute, comparing with a respiratory belt as reference. The second part presents a processing element (PE) for a neural network accelerator supporting compressed weights and using a new technique called factored computation. The PE consists of an accumulator array, row decoder, and output combination block. Modifications to the row decoder can allow for reconfigurability of the compressed weight bit-widths. Several common layer operations in CNNs are described and mapped onto the proposed hardware. An energy model of the design is formulated and verified by synthesizing and simulating a basic processing element containing an 8 x 20 accumulator array. Simulations show the proposed design achieves up to 4.5x reduction in the energy per MAC compared to a baseline 16-bit fixed-point MAC unit.
Description
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018 Cataloged from PDF version of thesis. Includes bibliographical references (pages 69-71).
Date issued
2018Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.