No document available.
Abstract :
[en] Deep Learning DL is becoming more and more interesting in many areas, such as genomics, security, data analysis, image, and video processing. However, DL requires more and more powerful and parallel computing. The type of calculation performed by super-machines, equipped with powerful processors, such as the latest Intel i9. We also use GPUs that are very powerful in terms of parallelism. Despite their power, these computing units consume a lot of energy, which makes their use very difficult in small embedded systems and edge computing. To overcome the problem for which we must keep the maximum performance and satisfy the power constraint, it is necessary a heterogeneous strategy. Some solutions are promising when using less energy-consuming electronic circuits, such as FPGAs associated with less expensive topologies such as Stacked Sparse Autoencoders. Our target architecture is the Xilinx ZYNQ 7020 SoC, which combines a dual-core ARM processor and an FPGA in the same chip. In the interest of flexibility, we decided to leverage the performance of Xilinx's high-level synthesis tools, evaluate and choose the best solution in terms of size and performance of the data exchange, synchronization and pipeline processing. The results show that our implementation gives high performance at very low energy consumption. Indeed, the evaluation of our accelerator shows that it can classify 1160 MNIST images per second, consuming only 0.443 W; 2.4 W for the entire system. More than the lowest energy consumption and high performance, the platform used cost just $ 125.