顯著性背景感知的多尺度紅外行人檢測(cè)方法
doi: 10.11999/JEIT190761
-
陸軍工程大學(xué)石家莊校區(qū)電子與光學(xué)工程系 石家莊 050003
Multi-scale Pedestrian Detection in Infrared Images with Salient Background-awareness
-
Department of Electronic and Optical Engineering, Shijiazhuang Campus of Army Engineering University, Shijiazhuang 050003, China
-
摘要: 超大視場(chǎng)(U-FOV)紅外成像系統(tǒng)探測(cè)范圍大、不受光照限制,但存在尺度多樣、小目標(biāo)豐富的特點(diǎn)。為此該文提出一種具備背景感知能力的多尺度紅外行人檢測(cè)方法,在提高小目標(biāo)檢測(cè)性能的同時(shí),減少冗余計(jì)算。首先,構(gòu)建了4尺度的特征金字塔網(wǎng)絡(luò)分別獨(dú)立預(yù)測(cè)目標(biāo),補(bǔ)充高分辨率細(xì)節(jié)特征。其次,在特征金字塔結(jié)構(gòu)的橫向連接中融入注意力模塊,產(chǎn)生顯著性特征,抑制不相關(guān)區(qū)域的特征響應(yīng)、突出圖像局部目標(biāo)特征。最后,在顯著性系數(shù)的基礎(chǔ)上構(gòu)建了錨框掩膜生成子網(wǎng)絡(luò),約束錨框位置,排除平坦背景,提高處理效率。實(shí)驗(yàn)結(jié)果表明,顯著性生成子網(wǎng)絡(luò)僅增加5.94%的處理時(shí)間,具備輕量特性;超大視場(chǎng)(U-FOV)紅外行人數(shù)據(jù)集上的識(shí)別準(zhǔn)確率達(dá)到了93.20%,比YOLOv3高了26.49%;錨框約束策略能節(jié)約處理時(shí)間18.05%。重構(gòu)模型具有輕量性和高準(zhǔn)確性,適合于檢測(cè)超大視場(chǎng)中的多尺度紅外目標(biāo)。
-
關(guān)鍵詞:
- 紅外行人檢測(cè) /
- 超大視場(chǎng) /
- 卷積神經(jīng)網(wǎng)絡(luò) /
- 背景感知 /
- 多尺度
Abstract: The infrared imaging system of Ultrawide Field Of View (U-FOV) has large monitoring range and is not limited by illumination, but there are diverse scales and abundant small objects. For accurately detecting them, a multi-scale infrared pedestrian detection method is proposed with the ability of background-awareness, which can improve the detection performance of small objects and reduce the redundant computation. Firstly, a four scales feature pyramid network is constructed to predict object independently and supplement detail features with higher resolution. Secondly, attention module is integrated into the horizontal connection of feature pyramid structure to generate salient features, suppress feature response of irrelevant areas and enhance the object features. Finally, the anchor mask generation subnetwork is constructed on the basis of salient coefficient to the location of the anchors, to eliminate the flat background, and to improve the processing efficiency. The experimental results show that the salient generation subnetwork only increases the processing time by 5.94%, and has the lightweight characteristic. The Average-Precision is 93.20% on the U-FOV infrared pedestrian dataset, 26.49% higher than that of YOLOv3. Anchor box constraint strategy can save 18.05% of processing time. The proposed method is lightweight and accurate, which is suitable for detecting multi-scale infrared objects in the U-FOV camera. -
表 1 不同IoU閾值下的行人檢測(cè)平均準(zhǔn)確率
方法 主干網(wǎng)絡(luò) 訓(xùn)練集 平均準(zhǔn)確率(AP) IoU=0.3 IoU=0.45 IoU=0.5 IoU=0.7 Faster R-CNN ResNet101 U-FOV – – 0.5932 – SSD Mobilenet_v1 U-FOV – – 0.5584 – R-FCN ResNet101 U-FOV – – 0.6312 – CSP Resnet50 U-FOV – – 0.8414 – YOLOv3 Darknet53 U-FOV 0.6595 0.6671 0.6628 0.6461 YOLOv3+FS Darknet53 U-FOV 0.8880 0.8870 0.8828 0.8511 YOLOv3+FS Darknet53 Caltech+U-FOV 0.9057 0.9078 0.9084 0.8961 本文方法 Darknet53 Caltech+U-FOV 0.9201 0.9320 0.9315 0.9107 下載: 導(dǎo)出CSV
表 2 參數(shù)量對(duì)比
方法 總參數(shù)量 可訓(xùn)練參數(shù)量 不可訓(xùn)練參數(shù)量 YOLOv3 61576342 61523734 52608 本文方法 64861976 64806296 55680 下載: 導(dǎo)出CSV
表 3 U-FOV測(cè)試集圖像總處理時(shí)間及處理幀速
方法 YOLOv3 YOLOv3+Attention FS+Attention 本文方法 總時(shí)間(s) 90.35 95.72 125.39 107.25 處理幀率 7.32 6.91 5.27 6.16 下載: 導(dǎo)出CSV
-
BLOISI D D, PREVITALI F, PENNISI A, et al. Enhancing automatic maritime surveillance systems with visual information[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(4): 824–833. doi: 10.1109/TITS.2016.2591321 KANG J K, HONG H G, and PARK K R. Pedestrian detection based on adaptive selection of visible light or far-infrared light camera image by fuzzy inference system and convolutional neural network-based verification[J]. Sensors, 2017, 17(7): 1598. doi: 10.3390/s17071598 KIM S, SONG W J, and KIM S H. Infrared variation optimized deep convolutional neural network for robust automatic ground target recognition[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, USA, 2017: 195–202. doi: 10.1109/CVPRW.2017.30. 王晨, 湯心溢, 高思莉. 基于人眼視覺(jué)的紅外圖像增強(qiáng)算法研究[J]. 激光與紅外, 2017, 47(1): 114–118. doi: 10.3969/j.issn.1001-5078.2017.01.022WANG Chen, TANG Xinyi, and GAO Sili. Infrared image enhancement algorithm based on human vision[J]. Laser &Infrared, 2017, 47(1): 114–118. doi: 10.3969/j.issn.1001-5078.2017.01.022 MUNDER S, SCHNORR C, and GAVRILA D M. Pedestrian detection and tracking using a mixture of view-based shape-texture models[J]. IEEE Transactions on Intelligent Transportation Systems, 2008, 9(2): 333–343. doi: 10.1109/TITS.2008.922943 DALAL N and TRIGGS B. Histograms of oriented gradients for human detection[C]. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005: 886–893. doi: 10.1109/CVPR.2005.177. ZHANG Shanshan, BAUCKHAGE C, and CREMERS A B. Informed haar-like features improve pedestrian detection[C]. 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 947–954. doi: 10.1109/CVPR.2014.126. WATANABE T and ITO S. Two co-occurrence histogram features using gradient orientations and local binary patterns for pedestrian detection[C]. The 2nd IAPR Asian Conference on Pattern Recognition, Naha, Japan, 2013: 415–419. doi: 10.1109/ACPR.2013.117. 余春艷, 徐小丹, 鐘詩(shī)俊. 面向顯著性目標(biāo)檢測(cè)的SSD改進(jìn)模型[J]. 電子與信息學(xué)報(bào), 2018, 40(11): 2554–2561. doi: 10.11999/JEIT180118YU Chunyan, XU Xiaodan, and ZHONG Shijun. An improved SSD model for saliency object detection[J]. Journal of Electronics &Information Technology, 2018, 40(11): 2554–2561. doi: 10.11999/JEIT180118 LIU Songtao, HUANG Di, and WANG Yunhong. Adaptive NMS: Refining pedestrian detection in a crowd[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 6452–6461. doi: 10.1109/CVPR.2019.00662. LIU Wei, LIAO Shengcai, REN Weiqiang, et al. Center and scale prediction: A box-free approach for pedestrian and face detection[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Los Angeles, USA, 2019: 5187–5196. 車(chē)凱, 向鄭濤, 陳宇峰, 等. 基于改進(jìn)Fast R-CNN的紅外圖像行人檢測(cè)研究[J]. 紅外技術(shù), 2018, 40(6): 578–584. doi: 10.11846/j.issn.1001_8891.201806010CHE Kai, XIANG Zhengtao, CHEN Yufeng, et al. Research on infrared image pedestrian detection based on improved fast R-CNN[J]. Infrared Technology, 2018, 40(6): 578–584. doi: 10.11846/j.issn.1001_8891.201806010 王殿偉, 何衍輝, 李大湘, 等. 改進(jìn)的YOLOv3紅外視頻圖像行人檢測(cè)算法[J]. 西安郵電大學(xué)學(xué)報(bào), 2018, 23(4): 48–52. doi: 10.13682/j.issn.2095-6533.2018.04.008WANG Dianwei, HE Yanhui, LI Daxiang, et al. An improved infrared video image pedestrian detection algorithm[J]. Journal of Xi'an University of Posts and Telecommunications, 2018, 23(4): 48–52. doi: 10.13682/j.issn.2095-6533.2018.04.008 GIRSHICK R. Fast R-CNN[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1440–1448. doi: 10.1109/ICCV.2015.169. REDMON J and FARHADI A. YOLOv3: An incremental improvement[EB/OL]. http://arxiv.org/abs/1804.02767, 2018. 郭智, 宋萍, 張義, 等. 基于深度卷積神經(jīng)網(wǎng)絡(luò)的遙感圖像飛機(jī)目標(biāo)檢測(cè)方法[J]. 電子與信息學(xué)報(bào), 2018, 40(11): 2684–2690. doi: 10.11999/JEIT180117GUO Zhi, SONG Ping, ZHANG Yi, et al. Aircraft detection method based on deep convolutional neural network for remote sensing images[J]. Journal of Electronics &Information Technology, 2018, 40(11): 2684–2690. doi: 10.11999/JEIT180117 CHEN Long, ZHANG Hanwang, XIAO Jun, et al. SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6298–6306. doi: 10.1109/CVPR.2017.667. WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]. Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 2018: 3–19. doi: 10.1007/978-3-030-01234-2_1. DOLLáR P, WOJEK C, SCHIELE B, et al. Pedestrian detection: An evaluation of the state of the art[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(4): 743–761. doi: 10.1109/TPAMI.2011.155 FU Chengyang, LIU Wei, RANGA A, et al. DSSD: Deconvolutional single shot detector[J]. arXiv, 2017, 1701.06659. HE Kaiming, GKIOXARI G, DOLLáR P, et al. Mask R-CNN[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2980–2988. doi: 10.1109/ICCV.2017.322. BERG A, AHLBERG J, and FELSBERG M. A thermal object tracking benchmark[C]. The 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance, Karlsruhe, Germany, 2015: 1–6. doi: 10.1109/AVSS.2015.7301772. LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]. Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Netherlands, 2016: 21–37. doi: 10.1007/978-3-319-46448-0_2. REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031 DAI Jifeng, LI Yi, HE Kaiming, et al. R-FCN: Object detection via region-based fully convolutional networks[C]. Advances in Neural Information Processing Systems, Barcelona, Spain, 2016: 379–387. -