Compression of deep neural network to run on resource limited devices

Mardavkumar Gandhi

Graduate Project

Compression of deep neural network to run on resource limited devices

Neural Network comprises of fully connected layers and they have 1 or 2 hidden layers. Deep Neural Network has huge number of hidden layers and each layer is made up of number of neurons. Neural Networks and/or Deep Neural Networks (DNN) is getting trained via Feed Forwarding and Backpropagation. While training the Deep Neural Network, bases on how many iterations we want to perform we set the parameter called number of epochs and that many times Feed Forwarding and Backpropagation algorithms will run over and over on the entire dataset. Depending on the number of hidden layers in Deep Neural Network and number of neurons count in each hidden layers , DNN will have millions of parameters and this process becomes really resource consuming. If we want to run DNN application on resource limited devices such as mobile phones then due to large number of parameters of DNN, we may face the issue of battery , memory and computation. In this project we are aiming to perform the compression of Deep Neural Network which will have multiple number of times fewer parameters then Uncompressed Deep Neural Network with the similar accuracy. If we can reduce the number of parameters multiple number of times then we can run the applications of Deep Neural Network on mobile devices because compressed DNN application will be requiring less memory , energy and computation compared to original Uncompressed Deep Neural Network application. Pruning and Quantization can compress the Deep Neural Network significantly. The project is mostly focus on pruning , which is the very effective and popular technique to compress the Deep Neural Network with similar accuracy of original Uncompressed Deep Neural Network.

Date