一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級(jí)搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機(jī)號(hào)碼
標(biāo)題
留言內(nèi)容
驗(yàn)證碼

利用參數(shù)稀疏性的卷積神經(jīng)網(wǎng)絡(luò)計(jì)算優(yōu)化及其FPGA加速器設(shè)計(jì)

劉勤讓 劉崇陽

劉勤讓, 劉崇陽. 利用參數(shù)稀疏性的卷積神經(jīng)網(wǎng)絡(luò)計(jì)算優(yōu)化及其FPGA加速器設(shè)計(jì)[J]. 電子與信息學(xué)報(bào), 2018, 40(6): 1368-1374. doi: 10.11999/JEIT170819
引用本文: 劉勤讓, 劉崇陽. 利用參數(shù)稀疏性的卷積神經(jīng)網(wǎng)絡(luò)計(jì)算優(yōu)化及其FPGA加速器設(shè)計(jì)[J]. 電子與信息學(xué)報(bào), 2018, 40(6): 1368-1374. doi: 10.11999/JEIT170819
LIU Qinrang, LIU Chongyang. Calculation Optimization for Convolutional Neural Networks and FPGA-based Accelerator Design Using the Parameters Sparsity[J]. Journal of Electronics & Information Technology, 2018, 40(6): 1368-1374. doi: 10.11999/JEIT170819
Citation: LIU Qinrang, LIU Chongyang. Calculation Optimization for Convolutional Neural Networks and FPGA-based Accelerator Design Using the Parameters Sparsity[J]. Journal of Electronics & Information Technology, 2018, 40(6): 1368-1374. doi: 10.11999/JEIT170819

利用參數(shù)稀疏性的卷積神經(jīng)網(wǎng)絡(luò)計(jì)算優(yōu)化及其FPGA加速器設(shè)計(jì)

doi: 10.11999/JEIT170819
基金項(xiàng)目: 

國家科技重大專項(xiàng)(2016ZX01012101),國家自然科學(xué)基金(61572520, 61521003)

Calculation Optimization for Convolutional Neural Networks and FPGA-based Accelerator Design Using the Parameters Sparsity

Funds: 

The National Science and Technology Major Project of the Ministry of Science and Technology of China (2016ZX01012101), The National Natural Science Foundation of China (61572520, 61521003)

  • 摘要: 針對(duì)卷積神經(jīng)網(wǎng)絡(luò)(CNN)在嵌入式端的應(yīng)用受實(shí)時(shí)性限制的問題,以及CNN卷積計(jì)算中存在較大程度的稀疏性的特性,該文提出一種基于FPGA的CNN加速器實(shí)現(xiàn)方法來提高計(jì)算速度。首先,挖掘出CNN卷積計(jì)算的稀疏性特點(diǎn);其次,為了用好參數(shù)稀疏性,把CNN卷積計(jì)算轉(zhuǎn)換為矩陣相乘;最后,提出基于FPGA的并行矩陣乘法器的實(shí)現(xiàn)方案。在Virtex-7 VC707 FPGA上的仿真結(jié)果表明,相比于傳統(tǒng)的CNN加速器,該設(shè)計(jì)縮短了19%的計(jì)算時(shí)間。通過稀疏性來簡(jiǎn)化CNN計(jì)算過程的方式,不僅能在FPGA實(shí)現(xiàn),也能遷移到其他嵌入式端。
  • 曾毅, 劉成林, 譚鐵牛. 類腦智能研究的回顧與展望[J]. 計(jì)算機(jī)學(xué)報(bào), 2016, 39(1): 212-222. doi: 10.11897/SP.J.1016.2016. 00212.
    ZENG Yi, LIU Chenglin, and TAN Tieniu. Retrospect and outlook of brain-inspired intelligence research[J]. Chinese Journal of Computers, 2016, 39(1): 212-222. doi: 10.11897/ SP.J.1016.2016.00212.
    常亮, 鄧小明, 周明全, 等. 圖像理解中的卷積神經(jīng)網(wǎng)絡(luò)[J]. 自動(dòng)化學(xué)報(bào), 2016, 42(9): 1300-1312. doi: 10.16383/j.aas. 2016.c150800.
    CHANG Liang, DENG Xiaoming, ZHOU Mingquan, et al. Convolutional neural networks in image understanding[J]. Acta Automatica Sinica, 2016, 42(9): 1300-1312. doi: 10.16383/j.aas.2016.c150800.
    JI S, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis Machine Intelligence, 2012, 35(1): 221-231. doi: 10.1109/TPAMI.2012.59.
    CHAKRADHAR S, SANKARADAS M, JAKKULA V, et al. A dynamically configurable coprocessor for convolutional neural networks[J]. ACM Sigarch Computer Architecture News, 2010, 38(3): 247-257. doi: 10.1145/1816038.1815993.
    KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[C]. International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, 2012: 1097-1105. doi: 10.1145/3065386.
    SUDA N, CHANDRA V, DASIKA G, et al. Throughput- optimized openCL-based FPGA accelerator for large-scale convolutional neural networks[C]. ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, 2016: 16-25. doi: 10.1145/2847263.2847276.
    QIU J, WANG J, YAO S, et al. Going deeper with embedded FPGA platform for convolutional neural network[C]. ACM/ SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, California, USA, 2016: 26-35. doi: 10.1145 /2847263.2847265.
    ANWAR S, HWANG K, and SUNG W. Fixed point optimization of deep convolutional neural networks for object recognition[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, QLD, Australia, 2015: 1131-1135. doi: 10.1109/ICASSP.2015.7178146.
    ZHANG C, LI P, SUN G, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]. ACM/SIGDA International Symposium on Field- Programmable Gate Arrays, Monterey, California, USA, 2015: 161-170. doi: 10.1145/2684746.2689060.
    SHEN Y, FERDMAN M, and MILDER P. Maximizing CNN accelerator effciency through resource partitioning[C]. Annual International Symposium on Computer Architecture, Toronto, ON, Canada, 2017: 535-547. doi: 10.1145/3140659. 3080221.
    DU Z, FASTHUBER R, CHEN T, et al. ShiDianNao: Shifting vision processing closer to the sensor[C]. Annual International Symposium on Computer Architecture, Portland, Oregon, 2015: 92-104. doi: 10.1145/2749469. 2750389.
    CHEN T, DU Z, SUN N, et al. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine- learning[C]. International Conference on Architectural Support for Programming Languages and Operating Systems, Salt Lake City, Utah, USA, 2014: 269-284. doi: 10.1145/ 2541940.2541967.
    HADJIS S, ABUZAID F, ZHANG C, et al. Caffe con troll: Shallow ideas to speed up deep learning[C]. Proceedings of the Fourth Workshop on Data analytics, Melbourne, VIC, Australia, 2015: 1-4. doi: 10.1145/2799562.2799641.
    YAVITS L, MORAD A, and GINOSAR R. Sparse matrix multiplication on an associative processor[J]. IEEE Transactions on Parallel Distributed Systems, 2015, 26(11): 3175-3183. doi: 10.1109/TPDS.2014.2370055.
    CHELLAPILLA K, PURI S, and SIMARD P. High performance convolutional neural networks for document processing[C]. Tenth International Workshop on Frontiers in Handwriting Recognition, La Baule, France, 2006: 1-6.
    CHETLUR S, WOOLLEY C, VANDERMERSCH P, et al. CuDNN: Efficient primitives for deep learning[C]. International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 1-9.
    田翔, 周凡, 陳耀武, 等. 基于FPGA的實(shí)時(shí)雙精度浮點(diǎn)矩陣乘法器設(shè)計(jì)[J]. 浙江大學(xué)學(xué)報(bào)(工學(xué)版), 2008, 42(9): 1611-1615. doi: 10.3785/j.issn.1008-973X.2008.09.027.
    TIAN Xiang, ZHOU Fan, CHEN Yaowu, et al. Design of field programmable gate array based real-time double-precision floating-point matrix multiplier[J]. Journal of Zhejiang University (Engineering Science), 2008, 42(9): 1611-1615. doi: 10.3785/j.issn.1008-973X.2008.09.027.
    JANG J, CHOI S B, and PRASANNA V K. Energy- and time-efficient matrix multiplication on FPGAs[J]. IEEE Transactions on Very Large Scale Integration Systems, 2005, 13(11): 1305-1319. doi: 10.1109/TVLSI.2005.859562.
    KUMAR V B Y, JOSHI S, PATKAR S B, et al. FPGA based high performance double-precision matrix multiplication[J]. International Journal of Parallel Programming, 2010, 38(3/4): 322-338. doi: 10.1109/VLSI.Design.2009.13.
    DONAHUE J, JIA Y, VINYALS O, et al. DeCAF: A deep convolutional activation feature for generic visual recognition[C]. International Conference on Machine Learning, Beijing, China, 2014: 647-655.
    PETE Warden. Why GEMM is at the heart of deep learning[OL]. https://petewarden.com/2015/04/20/ why-gemm-is-at-the-heart-of-deep-learning/.
  • 加載中
計(jì)量
  • 文章訪問數(shù):  2504
  • HTML全文瀏覽量:  434
  • PDF下載量:  453
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2017-08-21
  • 修回日期:  2018-01-05
  • 刊出日期:  2018-06-19

目錄

    /

    返回文章
    返回