基于快速濾波算法的卷積神經(jīng)網(wǎng)絡(luò)加速器設(shè)計(jì)
doi: 10.11999/JEIT190037
-
重慶郵電大學(xué) 光電工程學(xué)院/國(guó)際半導(dǎo)體學(xué)院 ??重慶 ??400065
Design of Convolutional Neural Networks Accelerator Based on Fast Filter Algorithm
-
College of Electronics Engineering/International Semiconductor College, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
-
摘要: 為減少卷積神經(jīng)網(wǎng)絡(luò)(CNN)的計(jì)算量,該文將2維快速濾波算法引入到卷積神經(jīng)網(wǎng)絡(luò),并提出一種在FPGA上實(shí)現(xiàn)CNN逐層加速的硬件架構(gòu)。首先,采用循環(huán)變換方法設(shè)計(jì)行緩存循環(huán)控制單元,用于有效地管理不同卷積窗口以及不同層之間的輸入特征圖數(shù)據(jù),并通過(guò)標(biāo)志信號(hào)啟動(dòng)卷積計(jì)算加速單元來(lái)實(shí)現(xiàn)逐層加速;其次,設(shè)計(jì)了基于4并行快速濾波算法的卷積計(jì)算加速單元,該單元采用若干小濾波器組成的復(fù)雜度較低的并行濾波結(jié)構(gòu)來(lái)實(shí)現(xiàn)。利用手寫(xiě)數(shù)字集MNIST對(duì)所設(shè)計(jì)的CNN加速器電路進(jìn)行測(cè)試,結(jié)果表明:在xilinx kintex7平臺(tái)上,輸入時(shí)鐘為100 MHz時(shí),電路的計(jì)算性能達(dá)到了20.49 GOPS,識(shí)別率為98.68%??梢?jiàn)通過(guò)減少CNN的計(jì)算量,能夠提高電路的計(jì)算性能。
-
關(guān)鍵詞:
- 卷積神經(jīng)網(wǎng)絡(luò) /
- 快速濾波算法 /
- FPGA /
- 并行結(jié)構(gòu)
Abstract: In order to reduce the computational complexity of Convolutional Neural Network(CNN), the two-dimensional fast filtering algorithm is introduced into the CNN, and a hardware architecture for implementing CNN layer-by-layer acceleration on FPGA is proposed. Firstly, the line buffer loop control unit is designed by using the cyclic transformation method to manage effectively different convolution windows and the input feature map data between different layers, and starts the convolution calculation acceleration unit by the flag signal to realize layer-by-layer acceleration. Secondly, a convolution calculation accelerating unit based on 4 parallel fast filtering algorithm is designed. The unit is realized by a less complex parallel filtering structure composed of several small filters. Using the handwritten digit set MNIST to test the designed CNN accelerator circuit, the results show that on the xilinx kintex7 platform, when the input clock is 100 MHz, the computational performance of the circuit reaches 20.49 GOPS, and the recognition rate is 98.68%. It can be seen that the computational performance of the circuit can be improved by reducing the amount of calculation of the CNN.-
Key words:
- Convolution Neural Network(CNN) /
- Fast filter algorithms /
- FPGA /
- Parallel structure
-
表 1 MATLAB實(shí)現(xiàn)與FPGA實(shí)現(xiàn)的比較
類(lèi)型 時(shí)間(ms/frams) 精度(bad/10000 frames) 數(shù)據(jù)類(lèi)型 .m文件 0.7854 1.19% 雙精度 .v 文件 0.01986 1.32% 16 bit定點(diǎn)數(shù) 下載: 導(dǎo)出CSV
-
ZHANG Chen, LI Peng, SUN Guangyu, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]. 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, USA, 2015: 161–170. KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[C]. The 25th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, 2012: 1097–1105. DONG Han, LI Tao, LENG Jiabing, et al. GCN: GPU-based cube CNN framework for hyperspectral image classification[C]. The 201746th International Conference on Parallel Processing, Bristol, UK, 2017: 41–49. GHAFFARI S and SHARIFIAN S. FPGA-based convolutional neural network accelerator design using high level synthesize[C]. The 20162nd International Conference of Signal Processing and Intelligent Systems, Tehran, Iran, 2016: 1–6. CHEN Y H, KRISHNA T, EMER J S, et al. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks[J]. IEEE Journal of Solid-State Circuits, 2017, 52(1): 127–138. doi: 10.1109/JSSC.2016.2616357 FENG Gan, HU Zuyi, CHEN Song, et al. Energy-efficient and high-throughput FPGA-based accelerator for Convolutional Neural Networks[C]. The 201613th IEEE International Conference on Solid-State and Integrated Circuit Technology, Hangzhou, China, 2016: 624–626. ZHOU Yongmei and JIANG Jingfei. An FPGA-based accelerator implementation for deep convolutional neural networks[C]. The 20154th International Conference on Computer Science and Network Technology, Harbin, China, 2015: 829–832. HOSEINI F, SHAHBAHRAMI A, and BAYAT P. An efficient implementation of deep convolutional neural networks for MRI segmentation[J]. Journal of Digital Imaging, 2018, 31(5): 738–747. doi: 10.1007/s10278-018-0062-2 HUANG Jiahao, WANG Tiejun, ZHU Xuhui, et al. A parallel optimization of the fast algorithm of convolution neural network on CPU[C]. The 201810th International Conference on Measuring Technology and Mechatronics Automation, Changsha, China, 2018: 5–9. LAVIN A and GRAY S. Fast algorithms for convolutional neural networks[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, 4013–4021. VINCHURKAR P P, RATHKANTHIWAR S V, and KAKDE S M. HDL implementation of DFT architectures using winograd fast Fourier transform algorithm[C]. The 2015 5th International Conference on Communication Systems and Network Technologies, Gwalior, India, 2015: 397–401. WANG Xuan, WANG Chao, and ZHOU Xuehai. Work-in-progress: WinoNN: Optimising FPGA-based neural network accelerators using fast winograd algorithm[C]. 2018 International Conference on Hardware/Software Codesign and System Synthesis, Turin, Italy, 2018: 1–2. NAITO Y, MIYAZAKI T, and KURODA I. A fast full-search motion estimation method for programmable processors with a multiply-accumulator[C]. 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, Atlanta, USA, 1996: 3221–3224. JIANG Jingfei, HU Rongdong, and LUJáN M. A flexible memory controller supporting deep belief networks with fixed-point arithmetic[C]. The 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, Cambridge, USA, 2013: 144–152. LI Sicheng, WEN Wei, WANG Yu, et al. An FPGA design framework for CNN sparsification and acceleration[C]. The 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines, Napa, USA, 2017: 28. -