利用參數(shù)稀疏性的卷積神經(jīng)網(wǎng)絡(luò)計(jì)算優(yōu)化及其FPGA加速器設(shè)計(jì)
doi: 10.11999/JEIT170819
基金項(xiàng)目:
國家科技重大專項(xiàng)(2016ZX01012101),國家自然科學(xué)基金(61572520, 61521003)
Calculation Optimization for Convolutional Neural Networks and FPGA-based Accelerator Design Using the Parameters Sparsity
Funds:
The National Science and Technology Major Project of the Ministry of Science and Technology of China (2016ZX01012101), The National Natural Science Foundation of China (61572520, 61521003)
-
摘要: 針對(duì)卷積神經(jīng)網(wǎng)絡(luò)(CNN)在嵌入式端的應(yīng)用受實(shí)時(shí)性限制的問題,以及CNN卷積計(jì)算中存在較大程度的稀疏性的特性,該文提出一種基于FPGA的CNN加速器實(shí)現(xiàn)方法來提高計(jì)算速度。首先,挖掘出CNN卷積計(jì)算的稀疏性特點(diǎn);其次,為了用好參數(shù)稀疏性,把CNN卷積計(jì)算轉(zhuǎn)換為矩陣相乘;最后,提出基于FPGA的并行矩陣乘法器的實(shí)現(xiàn)方案。在Virtex-7 VC707 FPGA上的仿真結(jié)果表明,相比于傳統(tǒng)的CNN加速器,該設(shè)計(jì)縮短了19%的計(jì)算時(shí)間。通過稀疏性來簡(jiǎn)化CNN計(jì)算過程的方式,不僅能在FPGA實(shí)現(xiàn),也能遷移到其他嵌入式端。
-
關(guān)鍵詞:
- 卷積神經(jīng)網(wǎng)絡(luò) /
- 稀疏性 /
- 計(jì)算優(yōu)化 /
- 矩陣乘法器 /
- FPGA
Abstract: Concerning the problem of real-time restriction on the application of Convolution Neural Network (CNN) in embedded field, and the large degree of sparsity in CNN convolution calculations, this paper proposes an implement method of CNN accelerator based on FPGA to improve computation speed. Firstly, the sparseness characteristics of CNN convolution calculation are seeked out. Secondly, in order to use the parameters sparseness, CNN convolution calculations are converted to matrix multiplication. Finally, the implementation method of parallel matrix multiplier based on FPGA is proposed. Simulation results on the Virtex-7 VC707 FPGA show that the design shortens the calculation time by 19% compared to the traditional CNN accelerator. The method of simplifying the CNN calculation process by sparseness not only can be implemented on FPGA, but also can migrate to other embedded ends.-
Key words:
- Convolution Neural Network (CNN) /
- Sparseness /
- Computational optimization /
- Matrix multiplier /
- FPGA
-
曾毅, 劉成林, 譚鐵牛. 類腦智能研究的回顧與展望[J]. 計(jì)算機(jī)學(xué)報(bào), 2016, 39(1): 212-222. doi: 10.11897/SP.J.1016.2016. 00212. ZENG Yi, LIU Chenglin, and TAN Tieniu. Retrospect and outlook of brain-inspired intelligence research[J]. Chinese Journal of Computers, 2016, 39(1): 212-222. doi: 10.11897/ SP.J.1016.2016.00212. 常亮, 鄧小明, 周明全, 等. 圖像理解中的卷積神經(jīng)網(wǎng)絡(luò)[J]. 自動(dòng)化學(xué)報(bào), 2016, 42(9): 1300-1312. doi: 10.16383/j.aas. 2016.c150800. CHANG Liang, DENG Xiaoming, ZHOU Mingquan, et al. Convolutional neural networks in image understanding[J]. Acta Automatica Sinica, 2016, 42(9): 1300-1312. doi: 10.16383/j.aas.2016.c150800. JI S, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis Machine Intelligence, 2012, 35(1): 221-231. doi: 10.1109/TPAMI.2012.59. CHAKRADHAR S, SANKARADAS M, JAKKULA V, et al. A dynamically configurable coprocessor for convolutional neural networks[J]. ACM Sigarch Computer Architecture News, 2010, 38(3): 247-257. doi: 10.1145/1816038.1815993. KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[C]. International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, 2012: 1097-1105. doi: 10.1145/3065386. SUDA N, CHANDRA V, DASIKA G, et al. Throughput- optimized openCL-based FPGA accelerator for large-scale convolutional neural networks[C]. ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, 2016: 16-25. doi: 10.1145/2847263.2847276. QIU J, WANG J, YAO S, et al. Going deeper with embedded FPGA platform for convolutional neural network[C]. ACM/ SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, 2016: 26-35. doi: 10.1145 /2847263.2847265. ANWAR S, HWANG K, and SUNG W. Fixed point optimization of deep convolutional neural networks for object recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, QLD, Australia, 2015: 1131-1135. doi: 10.1109/ICASSP.2015.7178146. ZHANG C, LI P, SUN G, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]. ACM/SIGDA International Symposium on Field- Programmable Gate Arrays, Monterey, California, USA, 2015: 161-170. doi: 10.1145/2684746.2689060. SHEN Y, FERDMAN M, and MILDER P. Maximizing CNN accelerator effciency through resource partitioning[C]. Annual International Symposium on Computer Architecture, Toronto, ON, Canada, 2017: 535-547. doi: 10.1145/3140659. 3080221. DU Z, FASTHUBER R, CHEN T, et al. ShiDianNao: Shifting vision processing closer to the sensor[C]. Annual International Symposium on Computer Architecture, Portland, Oregon, 2015: 92-104. doi: 10.1145/2749469. 2750389. CHEN T, DU Z, SUN N, et al. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine- learning[C]. International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, 2014: 269-284. doi: 10.1145/ 2541940.2541967. HADJIS S, ABUZAID F, ZHANG C, et al. Caffe con troll: Shallow ideas to speed up deep learning[C]. Proceedings of the Fourth Workshop on Data analytics, Melbourne, VIC, Australia, 2015: 1-4. doi: 10.1145/2799562.2799641. YAVITS L, MORAD A, and GINOSAR R. Sparse matrix multiplication on an associative processor[J]. IEEE Transactions on Parallel Distributed Systems, 2015, 26(11): 3175-3183. doi: 10.1109/TPDS.2014.2370055. CHELLAPILLA K, PURI S, and SIMARD P. High performance convolutional neural networks for document processing[C]. Tenth International Workshop on Frontiers in Handwriting Recognition, La Baule, France, 2006: 1-6. CHETLUR S, WOOLLEY C, VANDERMERSCH P, et al. CuDNN: Efficient primitives for deep learning[C]. International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 1-9. 田翔, 周凡, 陳耀武, 等. 基于FPGA的實(shí)時(shí)雙精度浮點(diǎn)矩陣乘法器設(shè)計(jì)[J]. 浙江大學(xué)學(xué)報(bào)(工學(xué)版), 2008, 42(9): 1611-1615. doi: 10.3785/j.issn.1008-973X.2008.09.027. TIAN Xiang, ZHOU Fan, CHEN Yaowu, et al. Design of field programmable gate array based real-time double-precision floating-point matrix multiplier[J]. Journal of Zhejiang University (Engineering Science), 2008, 42(9): 1611-1615. doi: 10.3785/j.issn.1008-973X.2008.09.027. JANG J, CHOI S B, and PRASANNA V K. Energy- and time-efficient matrix multiplication on FPGAs[J]. IEEE Transactions on Very Large Scale Integration Systems, 2005, 13(11): 1305-1319. doi: 10.1109/TVLSI.2005.859562. KUMAR V B Y, JOSHI S, PATKAR S B, et al. FPGA based high performance double-precision matrix multiplication[J]. International Journal of Parallel Programming, 2010, 38(3/4): 322-338. doi: 10.1109/VLSI.Design.2009.13. DONAHUE J, JIA Y, VINYALS O, et al. DeCAF: A deep convolutional activation feature for generic visual recognition[C]. International Conference on Machine Learning, Beijing, China, 2014: 647-655. PETE Warden. Why GEMM is at the heart of deep learning[OL]. https://petewarden.com/2015/04/20/ why-gemm-is-at-the-heart-of-deep-learning/. -
計(jì)量
- 文章訪問數(shù): 2504
- HTML全文瀏覽量: 434
- PDF下載量: 453
- 被引次數(shù): 0