上下文感知多感受野融合網(wǎng)絡(luò)的定向遙感目標(biāo)檢測(cè)
doi: 10.11999/JEIT240560
-
大連海事大學(xué)信息科學(xué)技術(shù)學(xué)院 大連 116026
A Context-Aware Multiple Receptive Field Fusion Network for Oriented Object Detection in Remote Sensing Images
-
Information Science and Technology College, Dalian Maritime University, Dalian 116026, China
-
摘要: 以廣距鳥瞰視角拍攝獲取的遙感圖像通常具有目標(biāo)種類多、尺度變化大以及背景信息豐富等特點(diǎn),為目標(biāo)檢測(cè)任務(wù)帶來巨大挑戰(zhàn)。針對(duì)遙感圖像成像特點(diǎn),該文設(shè)計(jì)一種上下文感知多感受野融合網(wǎng)絡(luò),通過充分挖掘深度網(wǎng)絡(luò)中遙感圖像在不同尺寸特征描述下所包含的上下文關(guān)聯(lián)信息,提高圖像特征描述力,進(jìn)而提升遙感目標(biāo)檢測(cè)精度。首先,在特征金字塔前4層網(wǎng)絡(luò)中構(gòu)建了感受野擴(kuò)張模塊,通過擴(kuò)大網(wǎng)絡(luò)在不同尺度特征圖上的感受野范圍,增強(qiáng)網(wǎng)絡(luò)對(duì)不同尺度遙感目標(biāo)的感知能力;進(jìn)一步,構(gòu)建了高層特征聚合模塊,通過將特征金字塔網(wǎng)絡(luò)中高層語義信息聚合到低層特征中,從而將特征圖中所包含的多尺度上下文信息進(jìn)行有效融合;最后,在雙階段定向目標(biāo)檢測(cè)框架下設(shè)計(jì)了特征細(xì)化區(qū)域建議網(wǎng)絡(luò)。通過對(duì)一階段提案進(jìn)行精細(xì)化處理,提升提案準(zhǔn)確性,進(jìn)而提高二階段興趣區(qū)域?qū)R網(wǎng)絡(luò)得到的不同成像方向下的遙感目標(biāo)檢測(cè)性能。在公測(cè)數(shù)據(jù)集DIOR-R和HRSC2016上的定性和定量的對(duì)比實(shí)驗(yàn)結(jié)果證明,所提方法對(duì)不同種類和尺度大小的遙感目標(biāo)均能實(shí)現(xiàn)更加準(zhǔn)確的檢測(cè)。
-
關(guān)鍵詞:
- 遙感圖像 /
- 深度學(xué)習(xí) /
- 目標(biāo)檢測(cè) /
- 多感受野融合
Abstract:Objective Recent advances in remote sensing imaging technology have made oriented object detection in remote sensing images a prominent research area in computer vision. Unlike traditional object detection tasks, remote sensing images, captured from a wide-range bird’s-eye view, often contain a variety of objects with diverse scales and complex backgrounds, posing significant challenges for oriented object detection. Although current approaches have made substantial progress, existing networks do not fully exploit the contextual information across multi-scale features, resulting in classification and localization errors during detection. To address this, a context-aware multiple receptive field fusion network is proposed, which leverages the contextual correlation in multi-scale features. By enhancing the feature representation capabilities of deep networks, the accuracy of oriented object detection in remote sensing images can be improved. Methods For input remote sensing images, ResNet-50 and a feature pyramid network are first employed to extract features at different scales. The features from the first four layers are then enhanced using a receptive field expansion module. The resulting features are processed through a high-level feature aggregation module to effectively fuse multi-scale contextual information. After obtaining enhanced features at different scales, a feature refinement region proposal network is designed to revise object detection proposals using refined feature representations, resulting in more accurate candidate proposals. These multi-scale features and candidate proposals are then input into the Oriented R-CNN detection head to obtain the final object detection results. The receptive field expansion module consists of two submodules: a large selective kernel convolution attention submodule and a shift window self-attention enhancement submodule, which operate in parallel. The large selective kernel convolution submodule introduces multiple convolution operations with different kernel sizes to capture contextual information under various receptive fields, thereby improving the network’s ability to perceive multi-scale objects. The shift window self-attention enhancement submodule divides the feature map into patches according to predefined window and step sizes and calculates the self-attention-enhanced feature representation of each patch, extracting more global information from the image. The high-level feature aggregation module integrates rich semantic information from the feature pyramid network with low-level features, improving detection accuracy for multi-scale objects. Finally, a feature refinement region proposal network is designed to reduce location deviation between generated region proposals and actual rotating objects in remote sensing images. The deformable convolution is employed to capture geometric and contextual information, refining the initial proposals and producing the final oriented object detection results through a two-stage region-of-interest alignment network. Results and Discussions The effectiveness and robustness of the proposed network are demonstrated on two public datasets: DIOR-R and HRSC2016. For DIOR-R dataset, the AP50, AP75 and AP50:95 metrics are used for evaluation. Quantitative and qualitative comparisons ( Fig. 7 ,Table 1 ) demonstrate that the proposed network significantly enhances feature representation for different remote sensing objects, distinguishing objects with similar appearances and localizing objects at various scales more accurately. For the HRSC2016 dataset, the mean Average Precision (mAP) is used, and both mAP(07) and mAP(12) are computed for quantitative comparison. The results (Fig. 7 ,Table 2 ) further highlight the network’s effectiveness in improving ship detection accuracy in remote sensing images. Additionally, ablation studies (Table 3 ) demonstrate that each module in the proposed network contributes to improved detection performance for oriented objects in remote sensing images.Conclusions This paper proposes a context-aware multi-receptive field fusion network for oriented object detection in remote sensing images. The network includes a receptive field expansion module that enhances the perception ability for remote sensing objects of different sizes. The high-level feature aggregation module fully utilizes high-level semantic information, further improving localization and classification accuracy. The feature refinement region proposal network refines the first-stage proposals, resulting in more accurate detection. The qualitative and quantitative results on the DIOR-R and HRSC2016 datasets demonstrate that the proposed network outperforms existing approaches, providing superior detection results for remote sensing objects of varying scales. -
Key words:
- Remote sensing image /
- Deep learning /
- Object detection /
- Multiple receptive field fusion
-
表 1 不同算法在DOIR-R數(shù)據(jù)集上的定量對(duì)比(%)
方法 Gliding Vertex[18] Rotated Faster RCNN[3] S2ANet[19] R3Det[20] EDA[21] QPDet[22] ABFL[23] 本文方法 APL 62.67 62.92 62.32 62.60 63.01 71.52 62.04 72.00 APO 38.56 39.94 43.38 42.98 36.87 42.01 42.54 49.49 BF 71.94 71.95 71.90 71.42 72.05 77.99 76.40 72.11 BC 81.20 81.48 81.32 81.42 81.42 81.47 85.33 81.60 BR 37.73 36.71 40.24 38.45 40.22 40.80 37.75 45.81 CH 72.48 72.54 75.37 72.63 72.26 72.64 74.34 80.51 ESA 78.62 77.35 78.17 78.81 78.04 77.36 77.97 80.67 ETS 69.04 68.75 69.63 67.60 69.98 66.69 69.29 70.14 DAM 22.81 25.31 26.47 27.51 28.63 31.84 26.78 29.94 GF 77.89 76.36 73.75 70.91 65.38 69.16 73.88 78.16 GTF 82.13 76.57 78.41 77.11 82.35 82.24 77.78 83.10 HA 46.22 45.39 41.82 39.69 44.86 42.78 43.15 46.61 OP 54.76 50.10 56.34 54.94 55.58 54.67 54.13 58.66 SH 81.03 80.93 80.99 80.26 81.03 80.90 84.97 81.19 STA 74.88 75.27 63.25 72.88 73.99 77.15 67.88 74.59 STO 62.54 62.12 69.72 61.30 62.57 62.73 70.04 62.46 TC 81.41 81.46 81.47 81.51 81.49 81.56 81.39 81.54 TS 54.25 50.25 52.40 55.72 59.83 47.77 54.63 55.88 VE 43.22 42.81 47.64 44.81 43.29 47.39 45.35 43.55 WM 65.13 63.02 64.42 64.15 64.79 64.12 65.01 66.11 $ {\text{A}}{{\text{P}}_{50}} $ 62.91 62.06 62.95 62.34 62.88 63.64 63.53 65.71 $ {\text{A}}{{\text{P}}_{75}} $ 40.00 39.55 35.85 38.82 40.02 36.79 42.68 46.72 $ {\text{A}}{{\text{P}}_{50:95}} $ 38.34 38.22 36.25 37.84 38.36 37.51 40.94 43.17 下載: 導(dǎo)出CSV
表 3 不同模塊消融實(shí)驗(yàn)(%)
感受野擴(kuò)
張模塊高層特征
聚合模塊特征細(xì)化區(qū)
域建議網(wǎng)絡(luò)AP50 AP75 AP50:95 64.06 43.96 41.10 √ 64.86 46.05 42.93 √ 64.61 44.80 41.87 √ 64.17 44.68 41.69 √ √ √ 65.71 46.72 43.17 下載: 導(dǎo)出CSV
-
[1] RAO Chaofan, WANG Jiabao, CHENG Gong, et al. Learning orientation-aware distances for oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5610911. doi: 10.1109/TGRS.2023.3278933. [2] XIE Xingxing, CHENG Gong, WANG Jiabao, et al. Oriented R-CNN for object detection[C]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 3520–3529. doi: 10.1109/ICCV48922.2021.00350. [3] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031. [4] YANG Xue, YANG Jirui, YAN Junchi, et al. SCRDet: Towards more robust detection for small, cluttered and rotated objects[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019: 8232–8241. doi: 10.1109/ICCV.2019.00832. [5] LI Zhonghua, HOU Biao, WU Zitong, et al. FCOSR: A simple anchor-free rotated detector for aerial object detection[J]. Remote Sensing, 2023, 15(23): 5499. doi: 10.3390/rs15235499. [6] TIAN Yang, ZHANG Mengmeng, LI Jinyu, et al. FPNFormer: Rethink the method of processing the rotation-invariance and rotation-equivariance on arbitrary-oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5605610. doi: 10.1109/TGRS.2024.3351156. [7] MING Qi, MIAO Lingjuan, ZHOU Zhiqiang, et al. CFC-Net: A critical feature capturing network for arbitrary-oriented object detection in remote-sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5605814. doi: 10.1109/TGRS.2021.3095186. [8] REN Zhida, TANG Yongqiang, HE Zewen, et al. Ship detection in high-resolution optical remote sensing images aided by saliency information[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5623616. doi: 10.1109/TGRS.2022.3173610. [9] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90. [10] LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 936–944. doi: 10.1109/CVPR.2017.106. [11] LUO Wenjie, LI Yujia, URTASUN R, et al. Understanding the effective receptive field in deep convolutional neural networks[C]. The 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 4905–4913. [12] LI Yuxuan, HOU Qibin, ZHENG Zhaohui, et al. Large selective kernel network for remote sensing object detection[C]. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023: 16748–16759. doi: 10.1109/ICCV51070.2023.01540. [13] LIU Ze, LIN Yutong, CAO Yue, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 9992–10002. doi: 10.1109/ICCV48922.2021.00986. [14] CHENG Gong, WANG Jiabao, LI Ke, et al. Anchor-free oriented proposal generator for object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5625411. doi: 10.1109/TGRS.2022.3183022. [15] LIU Zikun, WANG Hongzhen, WENG Lubin, et al. Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds[J]. IEEE Geoscience and Remote Sensing Letters, 2016, 13(8): 1074–1078. doi: 10.1109/LGRS.2016.2565705. [16] ZENG Ying, CHEN Yushi, YANG Xue, et al. ARS-DETR: Aspect ratio-sensitive detection transformer for aerial oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5610315. doi: 10.1109/TGRS.2024.3364713. [17] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The pascal Visual Object Classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338. doi: 10.1007/s11263-009-0275-4. [18] XU Yongchao, FU Mingtao, WANG Qimeng, et al. Gliding vertex on the horizontal bounding box for multi-oriented object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(4): 1452–1459. doi: 10.1109/TPAMI.2020.2974745. [19] HAN Jiaming, DING Jian, LI Jie, et al. Align deep features for oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5602511. doi: 10.1109/TGRS.2021.3062048. [20] YANG Xue, YAN Junchi, FENG Ziming, et al. R3Det: Refined single-stage detector with feature refinement for rotating object[C]. Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021: 3163–3171. doi: 10.1609/aaai.v35i4.16426. [21] CHEN Weining, MIAO Shencheng, WANG Guangxing, et al. Recalibrating features and regression for oriented object detection[J]. Remote Sensing, 2023, 15(8): 2134. doi: 10.3390/rs15082134. [22] YAO Yanqing, CHENG Gong, WANG Guangxing, et al. On improving bounding box representations for oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5600111. doi: 10.1109/TGRS.2022.3231340. [23] ZHAO Zifei and LI Shengyang. ABFL: Angular boundary discontinuity free loss for arbitrary oriented object detection in aerial images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5611411. doi: 10.1109/TGRS.2024.3368630. [24] XIE Xingxing, CHENG Gong, RAO Chaofan, et al. Oriented object detection via contextual dependence mining and penalty-incentive allocation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5618010. doi: 10.1109/TGRS.2024.3385985. -