一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機(jī)號(hào)碼
標(biāo)題
留言內(nèi)容
驗(yàn)證碼

面向深度神經(jīng)網(wǎng)絡(luò)加速芯片的高效硬件優(yōu)化策略

張萌 張經(jīng)緯 李國慶 吳瑞霞 曾曉洋

張萌, 張經(jīng)緯, 李國慶, 吳瑞霞, 曾曉洋. 面向深度神經(jīng)網(wǎng)絡(luò)加速芯片的高效硬件優(yōu)化策略[J]. 電子與信息學(xué)報(bào), 2021, 43(6): 1510-1517. doi: 10.11999/JEIT210002
引用本文: 張萌, 張經(jīng)緯, 李國慶, 吳瑞霞, 曾曉洋. 面向深度神經(jīng)網(wǎng)絡(luò)加速芯片的高效硬件優(yōu)化策略[J]. 電子與信息學(xué)報(bào), 2021, 43(6): 1510-1517. doi: 10.11999/JEIT210002
Meng ZHANG, Jingwei ZHANG, Guoqing LI, Ruixia WU, Xiaoyang ZENG. Efficient Hardware Optimization Strategies for Deep Neural Networks Acceleration Chip[J]. Journal of Electronics & Information Technology, 2021, 43(6): 1510-1517. doi: 10.11999/JEIT210002
Citation: Meng ZHANG, Jingwei ZHANG, Guoqing LI, Ruixia WU, Xiaoyang ZENG. Efficient Hardware Optimization Strategies for Deep Neural Networks Acceleration Chip[J]. Journal of Electronics & Information Technology, 2021, 43(6): 1510-1517. doi: 10.11999/JEIT210002

面向深度神經(jīng)網(wǎng)絡(luò)加速芯片的高效硬件優(yōu)化策略

doi: 10.11999/JEIT210002
基金項(xiàng)目: 國家重點(diǎn)研發(fā)計(jì)劃(2018YFB2202703),江蘇省自然科學(xué)基金(BK20201145)
詳細(xì)信息
    作者簡介:

    張萌:男,1964年生,研究員,研究方向?yàn)閿?shù)字信號(hào)處理、深度學(xué)習(xí)算法及硬件加速

    張經(jīng)緯:男,1997年生,碩士生,研究方向?yàn)樯疃葘W(xué)習(xí)硬件加速器設(shè)計(jì)

    李國慶:男,1991年生,博士生,研究方向?yàn)橛?jì)算機(jī)視覺和深度學(xué)習(xí)硬件加速器設(shè)計(jì)

    吳瑞霞:女,1996年生,碩士生,研究方向?yàn)樯疃葘W(xué)習(xí)算法

    曾曉洋:男,1972年生,教授,研究方向?yàn)楦吣苄到y(tǒng)芯片(SoC)

    通訊作者:

    張經(jīng)緯 zhangjingwei@seu.edu.cn

  • 中圖分類號(hào): TN79.1

Efficient Hardware Optimization Strategies for Deep Neural Networks Acceleration Chip

Funds: The National Key R&D Program of China(2018YFB2202703), Jiangsu Province of Natural Science and Technology(BK20201145)
  • 摘要: 輕量級神經(jīng)網(wǎng)絡(luò)部署在低功耗平臺(tái)上的解決方案可有效用于無人機(jī)(UAV)檢測、自動(dòng)駕駛等人工智能(AI)、物聯(lián)網(wǎng)(IOT)領(lǐng)域,但在資源有限情況下,同時(shí)兼顧高精度和低延時(shí)來構(gòu)建深度神經(jīng)網(wǎng)絡(luò)(DNN)加速器是非常有挑戰(zhàn)性的。該文針對此問題提出一系列高效的硬件優(yōu)化策略,包括構(gòu)建可堆疊共享計(jì)算引擎(PE)以平衡不同卷積中數(shù)據(jù)重用和內(nèi)存訪問模式的不一致;提出了可調(diào)的循環(huán)次數(shù)和通道增強(qiáng)方法,有效擴(kuò)展加速器與外部存儲(chǔ)器之間的訪問帶寬,提高DNN淺層網(wǎng)絡(luò)計(jì)算效率;優(yōu)化了預(yù)加載工作流,從整體上提高了異構(gòu)系統(tǒng)的并行度。經(jīng)Xilinx Ultra96 V2板卡驗(yàn)證,該文的硬件優(yōu)化策略有效地改進(jìn)了iSmart3-SkyNet和SkrSkr-SkyNet類的DNN加速芯片設(shè)計(jì)。結(jié)果顯示,優(yōu)化后的加速器每秒處理78.576幀圖像,每幅圖像的功耗為0.068 J。
  • 圖  1  iSmart3-SkyNet加速器上的SkyNet Roofline模型分析

    圖  2  系統(tǒng)-計(jì)算模塊-線性緩沖區(qū)結(jié)構(gòu)示意圖

    圖  3  通道增強(qiáng)流程說明圖

    圖  4  3種工作流比較圖

    圖  5  優(yōu)化后加速器上的SkyNet Roofline模型分析

    圖  6  iSmart3和Skrskr加速優(yōu)化前后性能對比

    表  1  SkyNet的體系結(jié)構(gòu)和每個(gè)捆綁包的推理速度表格

    捆綁包層數(shù)輸入尺寸操作類型計(jì)算量、計(jì)算量占比(%)延遲占比(%)
    #113×160×320DW-Conv3119.61M, 20.633.90
    23×160×320PW-Conv1
    348×160×320POOLING
    #2448×80×160DW-Conv386.02M, 14.4216.54
    548×80×160PW-Conv1
    696×80×160POOLING
    #3796×40×80DW-Conv361.75M, 10.366.23
    896×40×80PW-Conv1
    9192×40×80POOLING
    #410192×20×40DW-Conv360.36M, 10.134.92
    11192×20×40PW-Conv1
    #512384×20×40DW-Conv3160.05M, 26.8512.43
    13384×20×40PW-Conv1
    #6合并第9層輸出107.52M, 18.0420.08
    141280×20×40[旁路] DW-Conv3
    151280×20×40PW-Conv1
    #71696×20×40PW-Conv10.77M, 0.140.10
    1710×20×40計(jì)算回歸框0.16
    CPU5.64
    下載: 導(dǎo)出CSV

    表  2  優(yōu)化策略效果對比

    加速器iSmart3 [9]SEUer ASkrskr [10]SEUer B
    網(wǎng)絡(luò)模型SkyNetSkyNetSkyNetSkyNet
    量化精度A9/W11A9/W11A8/W6A8/W6
    硬件平臺(tái)Ultra96V2Ultra96V2Ultra96V2Ultra96V2
    準(zhǔn)確率(DJI)0.7160.7240.7310.731
    時(shí)鐘頻率(MHz)215215300300
    DSP數(shù)量329287360360
    LUT數(shù)量(k)54545646
    FF數(shù)量(k)60706851
    幀率(fps)25.0537.39352.42978.576
    GOPS/W3.215.957.2211.19
    Energy/Pic.(J)0.2890.1350.1290.068
    下載: 導(dǎo)出CSV
  • [1] 王巍, 周凱利, 王伊昌, 等. 基于快速濾波算法的卷積神經(jīng)網(wǎng)絡(luò)加速器設(shè)計(jì)[J]. 電子與信息學(xué)報(bào), 2019, 41(11): 2578–2584. doi: 10.11999/JEIT190037

    WANG Wei, ZHOU Kaili, WANG Yichang, et al. Design of convolutional neural networks accelerator based on fast filter algorithm[J]. Journal of Electronics &Information Technology, 2019, 41(11): 2578–2584. doi: 10.11999/JEIT190037
    [2] ZHANG Xiaofan, WANG Junsong, ZHU Chao, et al. DNNBuilder: An automated tool for building high-performance DNN hardware accelerators for FPGAs[C]. 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Diego, USA, 2018: 1–8.
    [3] LI Huimin, FAN Xitian, JIAO Li, et al. A high performance FPGA-based accelerator for large-scale convolutional neural networks[C]. The 26th International Conference on Field Programmable Logic and Applications (FPL), Lausanne, Switzerland, 2016: 1–9.
    [4] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 779–788.
    [5] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031
    [6] TAN Mingxing, PANG Ruoming, and LE Q V. EfficientDet: Scalable and efficient object detection[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10781–10790.
    [7] YU Yunxuan, WU Chen, ZHAO Tiandong, et al. OPU: An FPGA-based overlay processor for convolutional neural networks[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2020, 28(1): 35–47. doi: 10.1109/TVLSI.2019.2939726
    [8] YU Yunxuan, ZHAO Tiandong, WANG Kun, et al. Light-OPU: An FPGA-based overlay processor for lightweight convolutional neural networks[C]. 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, USA, 2020: 122–132.
    [9] ZHANG Xiaofan, LU Haoming, HAO Cong, et al. SkyNet: A hardware-efficient method for object detection and tracking on embedded systems[J]. arXiv: 1909.09709, 2019.
    [10] JIANG W, LIU X, SUN H, et al. Skrskr: Dacsdc. 2020 2nd place winner in fpga track[EB/OL]. https://github.com/jiangwx/SkrSkr/, 2020.
    [11] ZHANG Chen, LI Peng, SUN Guangyu, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]. 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, USA, 2015: 161–170.
    [12] HAO Cong, ZHANG Xiaofan, LI Yuhong, et al. FPGA/DNN Co-Design: An efficient design methodology for 1ot intelligence on the edge[C]. The 56th ACM/IEEE Design Automation Conference (DAC), Las Vegas, USA, 2019: 1–6.
    [13] MOTAMEDI M, GYSEL P, AKELLA V, et al. Design space exploration of FPGA-based deep convolutional neural networks[C]. The 21st Asia and South Pacific Design Automation Conference (ASP-DAC), Macao, China, 2016: 575–580.
    [14] FAN Hongxiang, LIU Shuanglong, FERIANC M, et al. A real-time object detection accelerator with compressed SSDLite on FPGA[C]. 2018 International Conference on Field-Programmable Technology (FPT), Naha, Japan, 2018: 14–21.
    [15] LI Fanrong, MO Zitao, WANG Peisong, et al. A system-level solution for low-power object detection[C]. 2019 IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Korea (South), 2019: 2461–2468.
    [16] DONG Zhen, WANG Dequan, HUANG Qijing, et al. CoDeNet: Efficient deployment of input-adaptive object detection on embedded FPGAs[J]. arXiv: 2006.08357, 2020.
    [17] WU Di, ZHANG Yu, JIA Xijie, et al. A high-performance CNN processor based on FPGA for MobileNets[C]. The 29th International Conference on Field Programmable Logic and Applications (FPL), Barcelona, Spain, 2019: 136–143.
  • 加載中
圖(6) / 表(2)
計(jì)量
  • 文章訪問數(shù):  1494
  • HTML全文瀏覽量:  544
  • PDF下載量:  198
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2021-01-04
  • 修回日期:  2021-04-21
  • 網(wǎng)絡(luò)出版日期:  2021-04-29
  • 刊出日期:  2021-06-18

目錄

    /

    返回文章
    返回