基于多尺度和注意力融合學(xué)習(xí)的行人重識別
doi: 10.11999/JEIT190998
-
1.
北京科技大學(xué)自動化學(xué)院 北京 100083
-
2.
北京科技大學(xué)人工智能研究院 北京 100083
-
3.
北京市工業(yè)波譜成像工程中心 北京 100083
Person Re-identification Based on Multi-scale Network Attention Fusion
-
1.
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
-
2.
Institute of Artificial Intelligence, University of Science and Technology Beijing, Beijing 100083, China
-
3.
Beijing Engineering Research Center of Industrial Spectrum Imaginghe, Beijing 100083, China
-
摘要: 行人重識別的關(guān)鍵依賴于行人特征的提取,卷積神經(jīng)網(wǎng)絡(luò)具有強大的特征提取以及表達(dá)能力。針對不同尺度下可以觀察到不同的特征,該文提出一種基于多尺度和注意力網(wǎng)絡(luò)融合的行人重識別方法(MSAN)。該方法通過對網(wǎng)絡(luò)不同深度的特征進(jìn)行采樣,將采樣的特征融合后對行人進(jìn)行預(yù)測。不同深度的特征圖具有不同的表達(dá)能力,使網(wǎng)絡(luò)可以學(xué)習(xí)到行人身上更加細(xì)粒度的特征。同時將注意力模塊嵌入到殘差網(wǎng)絡(luò)中,使得網(wǎng)絡(luò)能更加關(guān)注于一些關(guān)鍵信息,增強網(wǎng)絡(luò)特征學(xué)習(xí)能力。所提方法在Market1501, DukeMTMC-reID和MSMT17_V1數(shù)據(jù)集上首位準(zhǔn)確率分別到了95.3%, 89.8%和82.2%。實驗表明,該方法充分利用了網(wǎng)絡(luò)不同深度的信息和關(guān)注的關(guān)鍵信息,使模型具有很強的判別能力,而且所提模型的平均準(zhǔn)確率優(yōu)于大多數(shù)先進(jìn)算法。
-
關(guān)鍵詞:
- 行人重識別 /
- 多尺度 /
- 注意力 /
- 殘差網(wǎng)絡(luò) /
- 度量學(xué)習(xí)
Abstract: The key to person re-identification depends on the extraction of pedestrian characteristics. Convolutional neural networks have powerful feature extraction and expression capabilities. In view of the fact that different features can be observed at different scales, a pedestrian re-identification method based on Multi-Scale Attention Network(MSAN) fusion is proposed. This method samples the features at different depths of the network and fuses the sampled features to predict pedestrians. Feature maps of different depths have different expressive powers, enabling the network to learn more fine-grained features of pedestrians. At the same time, the attention module is embedded in the residual network, so that the network can pay more attention to some key information and enhance the network feature learning ability. The accuracy of the proposed method on the datasets such as Market1501, DukeMTMC-reID and MSMT17_V1 reaches 95.3%, 89.8% and 82.2%, respectively. Experiments show that the method makes full use of the information of different depths of the network and the key information of interest, so that the model has strong discriminating ability, and the average accuracy of the proposed model is better than most state-of-the-art algorithms.-
Key words:
- Person re-identification /
- Multiple scale /
- Attention /
- Residual network /
- Metric learning
-
表 1 多尺度融合模型準(zhǔn)確率驗證實驗結(jié)果(%)
方法 Market1501 DukeMTMC-reID MSMT17_V1 Rank-1 mAP Rank-1 mAP Rank-1 mAP SSAN 94.9 87.9 86.1 67.7 81.4 66.3 SSAN(+RK) 95.3 93.7 86.0 75.6 84.6 73.8 MSAN 95.3 87.9 89.8 78.8 82.2 60.6 MSAN (+RK) 95.9 93.9 92.3 89.7 85.0 74.6 下載: 導(dǎo)出CSV
表 2 CBAM模塊準(zhǔn)確率驗證實驗結(jié)果(%)
方法 Market1501 DukeMTMC-reID MSMT17_V1 Rank-1 mAP Rank-1 mAP Rank-1 mAP MSN 94.4 86.2 87.5 77.2 79.6 56.0 MSN (+CBAM) 95.3 87.9 89.8 78.8 82.2 60.6 MSN(+RK) 95.3 93.1 90.9 89.2 83.2 72.0 MSN(+CBAM+RK) 95.9 93.9 92.3 89.7 85.0 74.6 下載: 導(dǎo)出CSV
表 3 所提MSAN算法與其他先進(jìn)算法的準(zhǔn)確率對比(%)
方法 Market1501 DukeMTMC-reID MSMT17_V1 Rank-1 mAP Rank-1 mAP Rank-1 mAP SVDNet[21] 82.3 62.1 76.7 56.8 – – DPFL[22] 88.6 72.6 79.2 60.0 – – SVDNet+Era[23] 87.1 71.3 79.3 62.4 – – TriNET+Era[23] 83.9 68.7 73.0 56.6 – – DaRe[24] 89.0 76.0 80.2 64.5 – – GP-reid[25] 92.2 81.2 85.2 72.8 – – PCB[4] 92.3 77.4 81.9 65.3 68.2 40.4 Aligned-ReID[5] 92.6 82.3 – – – – PCB+RPP[4] 93.8 81.6 83.3 69.2 – – MGN[6] 95.7 86.9 88.7 78.4 – – BFENET[8] 94.2 84.3 86.8 72.1 – – IANet[18] 94.4 83.1 87.1 73.4 75.5 46.8 DGNet[19] 94.8 86.0 86.6 74.8 77.2 52.3 OSNet[20] 94.8 84.9 88.6 73.5 78.7 52.9 MSAN 95.3 87.9 89.8 78.8 82.2 60.6 MSAN(+RK) 95.9 93.9 92.3 89.7 85.0 74.6 下載: 導(dǎo)出CSV
-
FARENZENA M, BAZZANI L, PERINA A, et al. Person re-identification by symmetry-driven accumulation of local features[C]. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, USA, 2010: 2360–2367. 周智恒, 劉楷怡, 黃俊楚, 等. 一種基于等距度量學(xué)習(xí)策略的行人重識別改進(jìn)算法[J]. 電子與信息學(xué)報, 2019, 41(2): 477–483. doi: 10.11999/JEIT180336ZHOU Zhiheng, LIU Kaiyi, HUANG Junchu, et al. Improved metric learning algorithm for person re-identification based on equidistance[J]. Journal of Electronics &Information Technology, 2019, 41(2): 477–483. doi: 10.11999/JEIT180336 HIRZER M, ROTH P M, K?STINGER M, et al. Relaxed pairwise learned metric for person re-identification[C]. The 12th European Conference on Computer Vision, Florence, Italy, 2012: 780–793. SUN Yifan, ZHENG Liang, YANG Yi, et al. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline)[C]. The 15th European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 480–496. LUO Hao, JIANG Wei, ZHANG Xuan, et al. AlignedReID++: Dynamically matching local information for person re-identification[J]. Pattern Recognition, 2019, 94: 53–61. doi: 10.1016/j.patcog.2019.05.028 WANG Guanshuo, YUAN Yufeng, CHEN Xiong, et al. Learning discriminative features with multiple granularities for person re-identification[C]. 2018 ACM Multimedia Conference on Multimedia Conference, Seoul, Korea, 2018: 274–282. 陳鴻昶, 吳彥丞, 李邵梅, 等. 基于行人屬性分級識別的行人再識別[J]. 電子與信息學(xué)報, 2019, 41(9): 2239–2246. doi: 10.11999/JEIT180740CHEN Hongchang, WU Yancheng, LI Shaomei, et al. Person re-identification based on attribute hierarchy recognition[J]. Journal of Electronics &Information Technology, 2019, 41(9): 2239–2246. doi: 10.11999/JEIT180740 DAI Zuozhuo, CHEN Mingqiang, GU Xiaodong, et al. Batch DropBlock network for person re-identification and beyond[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 3691–3701. WOO S, PARK J, LEE J Y, et al. Cbam: Convolutional block attention module[C]. The 15th European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 3–19. LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. 2017 IEEE conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 2117–2125. HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. HERMANS A, BEYER L, and LEIBE B. In defense of the triplet loss for person re-identification[EB/OL]. https://arxiv.org/abs/1703.07737, 2017. ZHENG Liang, SHEN Liyue, TIAN Lu, et al. Scalable person re-identification: A benchmark[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1116–1124. RISTANI E, SOLERA F, ZOU R, et al. Performance measures and a data set for multi-target, multi-camera tracking[C]. 2016 European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 17–35. WEI Longhui, ZHANG Shiliang, GAO Wen, et al. Person transfer GAN to bridge domain gap for person re-identification[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 79–88. ZHONG Zhun, ZHENG Liang, CAO Donglin, et al. Re-ranking person re-identification with k-reciprocal encoding[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 1318–1327. SALLEH S S, AZIZ N A A, MOHAMAD D, et al. Combining mahalanobis and jaccard distance to overcome similarity measurement constriction on geometrical shapes[J]. International Journal of Computer Science Issues, 2012, 9(4): 124–132. ZHENG Zhedong, YANG Xiaodong, YU Zhiding, et al. Joint discriminative and generative learning for person re-identification[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 2138–2147. HOU Ruibing, MA Bingpeng, CHANG Hong, et al. Interaction-and-aggregation network for person re-identification[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 9317–9326. ZHOU Kaiyang, YANG Yongxin, CAVALLARO A, et al. Omni-Scale feature learning for person re-identification[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 3702–3712. SUN Yifan, ZHENG Liang, DENG Weijian, et al. SVDNet for pedestrian retrieval[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 3800–3808. CHEN Yanbei, ZHU Xiatian, and GONG Shaogang. Person re-identification by deep learning multi-scale representations[C]. 2017 IEEE International Conference on Computer Vision Workshops, Venice, Italy, 2017: 2590–2600. ZHONG Zhun, ZHENG Liang, KANG Guoliang, et al. Random erasing data augmentation[EB/OL]. https://arxiv.org/abs/1708.04896, 2017. WANG Yan, WANG Lequn, YOU Yurong, et al. Resource aware person re-identification across multiple resolutions[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8042–8051. ALMAZAN J, GAJIC B, MURRAY N, et al. Re-ID done right: towards good practices for person re-identification[EB/OL]. https://arxiv.org/abs/1801.05339, 2018. -