一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級(jí)搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問(wèn)題, 您可以本頁(yè)添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機(jī)號(hào)碼
標(biāo)題
留言?xún)?nèi)容
驗(yàn)證碼

基于快速濾波算法的卷積神經(jīng)網(wǎng)絡(luò)加速器設(shè)計(jì)

王巍 周凱利 王伊昌 王廣 袁軍

王巍, 周凱利, 王伊昌, 王廣, 袁軍. 基于快速濾波算法的卷積神經(jīng)網(wǎng)絡(luò)加速器設(shè)計(jì)[J]. 電子與信息學(xué)報(bào), 2019, 41(11): 2578-2584. doi: 10.11999/JEIT190037
引用本文: 王巍, 周凱利, 王伊昌, 王廣, 袁軍. 基于快速濾波算法的卷積神經(jīng)網(wǎng)絡(luò)加速器設(shè)計(jì)[J]. 電子與信息學(xué)報(bào), 2019, 41(11): 2578-2584. doi: 10.11999/JEIT190037
Wei WANG, Kaili ZHOU, Yichang WANG, Guang WANG, Jun YUAN. Design of Convolutional Neural Networks Accelerator Based on Fast Filter Algorithm[J]. Journal of Electronics & Information Technology, 2019, 41(11): 2578-2584. doi: 10.11999/JEIT190037
Citation: Wei WANG, Kaili ZHOU, Yichang WANG, Guang WANG, Jun YUAN. Design of Convolutional Neural Networks Accelerator Based on Fast Filter Algorithm[J]. Journal of Electronics & Information Technology, 2019, 41(11): 2578-2584. doi: 10.11999/JEIT190037

基于快速濾波算法的卷積神經(jīng)網(wǎng)絡(luò)加速器設(shè)計(jì)

doi: 10.11999/JEIT190037
基金項(xiàng)目: 國(guó)家自然科學(xué)基金(61404019),重慶市集成電路產(chǎn)業(yè)重大主題專(zhuān)項(xiàng)(cstc2018jszx-cyztzx0211, cstc2018jszx-cyztzx0217)
詳細(xì)信息
    作者簡(jiǎn)介:

    王?。耗?,1967年生,博士后,教授,研究方向?yàn)榧呻娐吩O(shè)計(jì)

    周凱利:女,1991年生,碩士生,研究方向?yàn)閿?shù)字集成電路設(shè)計(jì)

    王伊昌:男,1996年生,碩士生,研究方向?yàn)槟M集成電路設(shè)計(jì)

    王廣:男,1994年生,碩士生,研究方向?yàn)榘雽?dǎo)體光電器件設(shè)計(jì)

    袁軍:男,1984年生,博士,副教授,研究方向?yàn)閿?shù)?;旌霞呻娐吩O(shè)計(jì)

    通訊作者:

    周凱利 2508005354@qq.com

  • 中圖分類(lèi)號(hào): TN432

Design of Convolutional Neural Networks Accelerator Based on Fast Filter Algorithm

Funds: The National Natural Science Foundation of China (61404019), Major Themes of Integrated Circuit Industry in Chongqing (cstc2018jszx-cyztzx0211, cstc2018jszx-cyztzx0217)
  • 摘要: 為減少卷積神經(jīng)網(wǎng)絡(luò)(CNN)的計(jì)算量,該文將2維快速濾波算法引入到卷積神經(jīng)網(wǎng)絡(luò),并提出一種在FPGA上實(shí)現(xiàn)CNN逐層加速的硬件架構(gòu)。首先,采用循環(huán)變換方法設(shè)計(jì)行緩存循環(huán)控制單元,用于有效地管理不同卷積窗口以及不同層之間的輸入特征圖數(shù)據(jù),并通過(guò)標(biāo)志信號(hào)啟動(dòng)卷積計(jì)算加速單元來(lái)實(shí)現(xiàn)逐層加速;其次,設(shè)計(jì)了基于4并行快速濾波算法的卷積計(jì)算加速單元,該單元采用若干小濾波器組成的復(fù)雜度較低的并行濾波結(jié)構(gòu)來(lái)實(shí)現(xiàn)。利用手寫(xiě)數(shù)字集MNIST對(duì)所設(shè)計(jì)的CNN加速器電路進(jìn)行測(cè)試,結(jié)果表明:在xilinx kintex7平臺(tái)上,輸入時(shí)鐘為100 MHz時(shí),電路的計(jì)算性能達(dá)到了20.49 GOPS,識(shí)別率為98.68%??梢?jiàn)通過(guò)減少CNN的計(jì)算量,能夠提高電路的計(jì)算性能。
  • 圖  2  卷積層的卷積計(jì)算過(guò)程

    圖  1  卷積神經(jīng)網(wǎng)絡(luò)的結(jié)構(gòu)

    圖  3  卷積神經(jīng)網(wǎng)絡(luò)的逐層加速硬件架構(gòu)圖

    圖  4  行緩存循環(huán)控制單元

    圖  5  卷積計(jì)算加速單元的結(jié)構(gòu)圖

    圖  6  各部分的具體電路

    表  1  MATLAB實(shí)現(xiàn)與FPGA實(shí)現(xiàn)的比較

    類(lèi)型時(shí)間(ms/frams)精度(bad/10000 frames)數(shù)據(jù)類(lèi)型
    .m文件0.78541.19%雙精度
    .v 文件0.019861.32%16 bit定點(diǎn)數(shù)
    下載: 導(dǎo)出CSV

    表  2  卷積神經(jīng)網(wǎng)絡(luò)FPGA實(shí)現(xiàn)的性能比較

    參數(shù)文獻(xiàn)[4]文獻(xiàn)[6]文獻(xiàn)[7]本文方案
    FPGAVirttex-7xc7vx485tZynqzc702Virttex-7xc7vx485tKirtex-7xc7k325t
    頻率(MHz)100166150100
    時(shí)間(ms)2.63680.15100.02540.0199
    BRAM2796030
    DSP2095638284
    FF54075276646634636973
    LUT14832388365112551748
    識(shí)別率(%)98.6299.0196.8098.68
    GOPS1.582.7015.8720.49
    下載: 導(dǎo)出CSV
  • ZHANG Chen, LI Peng, SUN Guangyu, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]. 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, USA, 2015: 161–170.
    KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[C]. The 25th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, 2012: 1097–1105.
    DONG Han, LI Tao, LENG Jiabing, et al. GCN: GPU-based cube CNN framework for hyperspectral image classification[C]. The 201746th International Conference on Parallel Processing, Bristol, UK, 2017: 41–49.
    GHAFFARI S and SHARIFIAN S. FPGA-based convolutional neural network accelerator design using high level synthesize[C]. The 20162nd International Conference of Signal Processing and Intelligent Systems, Tehran, Iran, 2016: 1–6.
    CHEN Y H, KRISHNA T, EMER J S, et al. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks[J]. IEEE Journal of Solid-State Circuits, 2017, 52(1): 127–138. doi: 10.1109/JSSC.2016.2616357
    FENG Gan, HU Zuyi, CHEN Song, et al. Energy-efficient and high-throughput FPGA-based accelerator for Convolutional Neural Networks[C]. The 201613th IEEE International Conference on Solid-State and Integrated Circuit Technology, Hangzhou, China, 2016: 624–626.
    ZHOU Yongmei and JIANG Jingfei. An FPGA-based accelerator implementation for deep convolutional neural networks[C]. The 20154th International Conference on Computer Science and Network Technology, Harbin, China, 2015: 829–832.
    HOSEINI F, SHAHBAHRAMI A, and BAYAT P. An efficient implementation of deep convolutional neural networks for MRI segmentation[J]. Journal of Digital Imaging, 2018, 31(5): 738–747. doi: 10.1007/s10278-018-0062-2
    HUANG Jiahao, WANG Tiejun, ZHU Xuhui, et al. A parallel optimization of the fast algorithm of convolution neural network on CPU[C]. The 201810th International Conference on Measuring Technology and Mechatronics Automation, Changsha, China, 2018: 5–9.
    LAVIN A and GRAY S. Fast algorithms for convolutional neural networks[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, 4013–4021.
    VINCHURKAR P P, RATHKANTHIWAR S V, and KAKDE S M. HDL implementation of DFT architectures using winograd fast Fourier transform algorithm[C]. The 2015 5th International Conference on Communication Systems and Network Technologies, Gwalior, India, 2015: 397–401.
    WANG Xuan, WANG Chao, and ZHOU Xuehai. Work-in-progress: WinoNN: Optimising FPGA-based neural network accelerators using fast winograd algorithm[C]. 2018 International Conference on Hardware/Software Codesign and System Synthesis, Turin, Italy, 2018: 1–2.
    NAITO Y, MIYAZAKI T, and KURODA I. A fast full-search motion estimation method for programmable processors with a multiply-accumulator[C]. 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, Atlanta, USA, 1996: 3221–3224.
    JIANG Jingfei, HU Rongdong, and LUJáN M. A flexible memory controller supporting deep belief networks with fixed-point arithmetic[C]. The 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, Cambridge, USA, 2013: 144–152.
    LI Sicheng, WEN Wei, WANG Yu, et al. An FPGA design framework for CNN sparsification and acceleration[C]. The 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines, Napa, USA, 2017: 28.
  • 加載中
圖(6) / 表(2)
計(jì)量
  • 文章訪(fǎng)問(wèn)數(shù):  3349
  • HTML全文瀏覽量:  1510
  • PDF下載量:  141
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2019-01-15
  • 修回日期:  2019-03-20
  • 網(wǎng)絡(luò)出版日期:  2019-05-23
  • 刊出日期:  2019-11-01

目錄

    /

    返回文章
    返回