一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機號碼
標(biāo)題
留言內(nèi)容
驗證碼

基于FPGA的卷積神經(jīng)網(wǎng)絡(luò)硬件加速器設(shè)計

秦華標(biāo) 曹欽平

秦華標(biāo), 曹欽平. 基于FPGA的卷積神經(jīng)網(wǎng)絡(luò)硬件加速器設(shè)計[J]. 電子與信息學(xué)報, 2019, 41(11): 2599-2605. doi: 10.11999/JEIT190058
引用本文: 秦華標(biāo), 曹欽平. 基于FPGA的卷積神經(jīng)網(wǎng)絡(luò)硬件加速器設(shè)計[J]. 電子與信息學(xué)報, 2019, 41(11): 2599-2605. doi: 10.11999/JEIT190058
Huabiao QIN, Qinping CAO. Design of Convolutional Neural Networks Hardware Acceleration Based on FPGA[J]. Journal of Electronics & Information Technology, 2019, 41(11): 2599-2605. doi: 10.11999/JEIT190058
Citation: Huabiao QIN, Qinping CAO. Design of Convolutional Neural Networks Hardware Acceleration Based on FPGA[J]. Journal of Electronics & Information Technology, 2019, 41(11): 2599-2605. doi: 10.11999/JEIT190058

基于FPGA的卷積神經(jīng)網(wǎng)絡(luò)硬件加速器設(shè)計

doi: 10.11999/JEIT190058
基金項目: 廣東省科技計劃項目(2014B090910002)
詳細(xì)信息
    作者簡介:

    秦華標(biāo):男,1967年生,教授,研究方向為智能信息處理、無線通信網(wǎng)絡(luò)、嵌入式系統(tǒng)、FPGA設(shè)計

    曹欽平:男,1995年生,碩士生,研究方向為集成電路設(shè)計

    通訊作者:

    秦華標(biāo) eehbqin@scut.edu.cn

  • 中圖分類號: TP331

Design of Convolutional Neural Networks Hardware Acceleration Based on FPGA

Funds: The Science and Technology Project of Guangdong Provience (2014B090910002)
  • 摘要: 針對卷積神經(jīng)網(wǎng)絡(luò)(CNN)計算量大、計算時間長的問題,該文提出一種基于現(xiàn)場可編程邏輯門陣列(FPGA)的卷積神經(jīng)網(wǎng)絡(luò)硬件加速器。首先通過深入分析卷積層的前向運算原理和探索卷積層運算的并行性,設(shè)計了一種輸入通道并行、輸出通道并行以及卷積窗口深度流水的硬件架構(gòu)。然后在上述架構(gòu)中設(shè)計了全并行乘法-加法樹模塊來加速卷積運算和高效的窗口緩存模塊來實現(xiàn)卷積窗口的流水線操作。最后實驗結(jié)果表明,該文提出的加速器能效比達(dá)到32.73 GOPS/W,比現(xiàn)有的解決方案高了34%,同時性能達(dá)到了317.86 GOPS。
  • 圖  1  卷積層運算過程

    圖  2  1個輸入通道的卷積運算過程

    圖  3  N個輸入通道的卷積窗口并行計算

    圖  4  累加器并行運算

    圖  5  經(jīng)典加法樹

    圖  6  本文設(shè)計的加法樹

    圖  7  乘法-加法樹模塊

    圖  8  卷積窗口數(shù)據(jù)重用

    圖  9  窗口緩存結(jié)構(gòu)

    圖  10  窗口緩存時序

    圖  11  輸出通道并行模塊

    圖  12  并行加速方案結(jié)構(gòu)

    圖  13  卷積窗口流水線

    圖  14  FPGA, CPU, GPU的性能對比

    表  1  卷積神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)參數(shù)

    層名稱層結(jié)構(gòu)參數(shù)量(個)
    卷積層1卷積核大小3×3,卷積核個數(shù)15,步長1150
    激活層10
    池化層1池化大小2×2,步長20
    卷積層2卷積核大小6×6,卷積核個數(shù)20,步長110820
    激活層20
    池化層2池化大小2×2,步長20
    全連接層輸出神經(jīng)元個數(shù)103210
    下載: 導(dǎo)出CSV

    表  2  FPGA資源消耗情況

    資源比例(%)
    ALMs89423/11356079
    Block Memory730151/124928006
    DSPs342/342100
    下載: 導(dǎo)出CSV

    表  3  與文獻(xiàn)FPGA硬件加速對比

    文獻(xiàn)[7]文獻(xiàn)[11]文獻(xiàn)[12]本文方法
    FPGAZynq XC7Z045ZynqXC7Z045Virtex-7 VX690TCyclone V 5CGXF
    頻率(MHz)150100150100
    DSP資源780(86.7%)824(91.6%)1376(38%)342(100%)
    量化策略16 bit fixed16 bit fixed16 bit fixed16 bit fixed
    功耗(W)9.6309.40025.0009.711
    性能(GOPS)136.97229.50570.00317.86
    能效比(GOPS/W)14.2224.4222.8032.73
    下載: 導(dǎo)出CSV
  • LIU Weibo, WANG Zidong, LIU Xiaohui, et al. A survey of deep neural network architectures and their applications[J]. Neurocomputing, 2017, 234: 11–26. doi: 10.1016/j.neucom.2016.12.038
    HAN Song, MAO Huizi, and DALLY W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding[J]. arXiv preprint arXiv: 1510.00149, 2015.
    COATES A, HUVAL B, WANG Tao, et al. Deep learning with COTS HPC systems[C]. Proceedings of the 30th International Conference on International Conference on Machine Learning, Atlanta, USA, 2013: III-1337–III-1345.
    JOUPPI N P, YOUNG C, PATIL N, et al. In-datacenter performance analysis of a tensor processing unit[C]. Proceedings of the 44th Annual International Symposium on Computer Architecture, Toronto, Canada, 2017: 1–12. doi: 10.1145/3079856.3080246.
    MOTAMEDI M, GYSEL P, AKELLA V, et al. Design space exploration of FPGA-based deep convolutional neural networks[C]. Proceedings of the 21st Asia and South Pacific Design Automation Conference, Macau, China, 2016: 575–580. doi: 10.1109/ASPDAC.2016.7428073.
    ZHANG Jialiang and LI Jing. Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network[C]. Proceedings of 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, USA, 2017: 25–34. doi: 10.1145/3020078.3021698.
    QIU Jiantao, WANG Jie, YAO Song, et al. Going deeper with embedded FPGA platform for convolutional neural network[C]. Proceedings of 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, USA, 2016: 26–35. doi: 10.1145/2847263.2847265.
    余奇. 基于FPGA的深度學(xué)習(xí)加速器設(shè)計與實現(xiàn)[D]. [碩士論文], 中國科學(xué)技術(shù)大學(xué), 2016: 30–38.

    YU Qi. Deep learning accelerator design and implementation based on FPGA[D]. [Master dissertation], University of Science and Technology of China, 2016: 30–38.
    LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278–2324. doi: 10.1109/5.726791
    ABADI M, BARHAM P, CHEN Jianmin, et al. Tensorflow: A system for large-scale machine learning[C]. Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, Savannah, USA, 2016: 265–283.
    XIAO Qingcheng, LIANG Yun, LU Liqiang, et al. Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs[C]. Proceedings of the 54th Annual Design Automation Conference, Austin, USA, 2017: 62. doi: 10.1145/3061639.3062244.
    SHEN Junzhong, HUANG You, WANG Zelong, et al. Towards a uniform template-based architecture for accelerating 2D and 3D CNNs on FPGA[C]. Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, USA, 2018: 97–106. doi: 10.1145/3174243.3174257.
  • 加載中
圖(14) / 表(3)
計量
  • 文章訪問數(shù):  4983
  • HTML全文瀏覽量:  2192
  • PDF下載量:  349
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2019-01-22
  • 修回日期:  2019-06-10
  • 網(wǎng)絡(luò)出版日期:  2019-06-20
  • 刊出日期:  2019-11-01

目錄

    /

    返回文章
    返回