一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級搜索

留言板

尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

姓名
郵箱
手機號碼
標題
留言內(nèi)容
驗證碼

自適應卷積注意力與掩碼結構協(xié)同的顯著目標檢測

朱磊 袁金垚 王文武 蔡小嫚

朱磊, 袁金垚, 王文武, 蔡小嫚. 自適應卷積注意力與掩碼結構協(xié)同的顯著目標檢測[J]. 電子與信息學報, 2025, 47(1): 260-270. doi: 10.11999/JEIT240431
引用本文: 朱磊, 袁金垚, 王文武, 蔡小嫚. 自適應卷積注意力與掩碼結構協(xié)同的顯著目標檢測[J]. 電子與信息學報, 2025, 47(1): 260-270. doi: 10.11999/JEIT240431
ZHU Lei, YUAN Jinyao, WANG Wenwu, CAI Xiaoman. Saliency Object Detection Utilizing Adaptive Convolutional Attention and Mask Structure[J]. Journal of Electronics & Information Technology, 2025, 47(1): 260-270. doi: 10.11999/JEIT240431
Citation: ZHU Lei, YUAN Jinyao, WANG Wenwu, CAI Xiaoman. Saliency Object Detection Utilizing Adaptive Convolutional Attention and Mask Structure[J]. Journal of Electronics & Information Technology, 2025, 47(1): 260-270. doi: 10.11999/JEIT240431

自適應卷積注意力與掩碼結構協(xié)同的顯著目標檢測

doi: 10.11999/JEIT240431
詳細信息
    作者簡介:

    朱磊:男,副教授,研究方向為目標檢測與識別、語義分割、場景解析等

    袁金垚:女,碩士生,研究方向為深度學習、語義分割

    王文武:男,副教授,研究方向為目標檢測與識別、語義分割、場景解析等

    蔡小嫚:女,碩士生,研究方向為深度學習、語義分割

    通訊作者:

    袁金垚 jyyuan202209@163.com

  • 中圖分類號: TN911.7; TP391

Saliency Object Detection Utilizing Adaptive Convolutional Attention and Mask Structure

  • 摘要: 顯著目標檢測(SOD)旨在模仿人類視覺系統(tǒng)注意力機制和認知機制來自動提取場景中的顯著物體。雖然現(xiàn)有基于卷積神經(jīng)網(wǎng)絡 (CNN)或Transformer的模型不斷刷新該領域方法的性能,但較少研究關注以下兩個問題:(1)此領域多數(shù)方法常采用逐像素點的密集預測方式以獲取像素顯著值,然而該方式不符合基于人類視覺系統(tǒng)的場景解析機制,即人眼通常對語義區(qū)域進行整體分析而非關注像素級信息;(2)增強上下文信息關聯(lián)在SOD任務中受到廣泛關注,但通過Transformer主干結構獲取長程關聯(lián)特征不一定具有優(yōu)勢。SOD應更關注目標在適當區(qū)域內(nèi)其中心-鄰域差異性而非全局長程依賴。針對上述問題,該文提出一種新的顯著目標檢測模型,將CNN形式的自適應注意力和掩碼注意力集成到網(wǎng)絡中,以提高顯著目標檢測的性能。該算法設計了基于掩碼感知的解碼模塊,通過將交叉注意力限制在預測的掩碼區(qū)域來感知圖像特征,有助于網(wǎng)絡更好地聚焦于顯著目標的整體區(qū)域。同時,該文設計了基于卷積注意力的上下文特征增強模塊,與Transformer逐層建立長程關系不同,該模塊僅捕獲最高層特征中的適當上下文關聯(lián),避免引入無關的全局信息。該文在4個廣泛使用的數(shù)據(jù)集上進行了實驗評估,結果表明,該文提出的方法在不同場景下均取得了顯著的性能提升,具有良好的泛化能力和穩(wěn)定性。
  • 圖  1  本文方法總體網(wǎng)絡結構框圖

    圖  2  基于卷積視覺轉(zhuǎn)換器的特征增強模塊(CTFE)

    圖  3  兩種特征融合方法對比

    圖  4  掩碼感知視覺轉(zhuǎn)換器模塊

    圖  5  本文方法與其他幾種方法的定性評價結果

    圖  6  特征可視化結果圖

    表  1  所有參與評價方法在4個數(shù)據(jù)集上的Max F-measure, MAE測度的定量評價結果

    方法(年份) 速度 (fps) SOD ECSSD DUTS-TE DUT-OMRON
    MAE↓ $ {F}_{\beta }^{\mathrm{m}\mathrm{a}\mathrm{x}} $↑ MAE↓ $ {F}_{\beta }^{\mathrm{m}\mathrm{a}\mathrm{x}} $↑ MAE↓ $ {F}_{\beta }^{\mathrm{m}\mathrm{a}\mathrm{x}} $↑ MAE↓ $ {F}_{\beta }^{\mathrm{m}\mathrm{a}\mathrm{x}} $↑
    EGNet(2019) 30.5 0.0969 0.8778 0.0374 0.9474 0.0386 0.8880 0.0528 0.8155
    PoolNet(2019) 32.0 0.1000 0.8690 0.0390 0.9440 0.0400 0.8860 0.0560 0.8300
    MINet(2020) 86.1 0.0920 0.8680 0.0342 0.9475 0.0373 0.8833 0.0559 0.8098
    AADFNet(2020) 15.0 0.0903 0.8677 0.0280 0.9543 0.0314 0.8993 0.0488 0.8143
    SACNet(2021) 11.2 0.0934 0.8804 0.0309 0.9512 0.0339 0.8944 0.0523 0.8287
    ICON(2022) 58.5 0.0841 0.8790 0.0318 0.9503 0.0370 0.8917 0.0569 0.8254
    MENet(2023) 45.0 0.0874 0.8780 0.0307 0.9549 0.0281 0.9123 0.0380 0.8337
    VSCode(2024) 39.8 0.0602 0.8817 0.0245 0.9560 0.0262 0.9150 0.0473 0.8315
    本文 46.0 0.0567 0.8872 0.0230 0.9508 0.0243 0.8966 0.0352 0.8290
    下載: 導出CSV

    表  2  不同模塊的定量消融實驗結果

    實驗 方法 SOD
    MAE↓ $ {F}_{\beta }^{\mathrm{m}\mathrm{a}\mathrm{x}} $↑
    a Baseline 0.109 1 0.869 6
    b Baseline+CTFE 0.102 0 0.875 5
    c Baseline+CTFE+MAT 0.056 7 0.887 2
    d Baseline+CTFE+MAT+
    Canny Loss
    0.058 0 0.885 3
    e Baseline+CTFE+MAT+
    IOU_BCE Loss
    0.056 7 0.887 2
    f Attention-Fusion 0.064 7 0.876 1
    g Simple-Fusion 0.056 7 0.887 2
    下載: 導出CSV

    表  3  不同損失比重的實驗結果

    損失比重 SOD
    ${L_{{\text{mask}}}}$ ${L_{{\text{rank}}}}$ ${L_{{\text{edge}}}}$ MAE↓ $ {F}_{\beta }^{\mathrm{m}\mathrm{a}\mathrm{x}} $↑
    1 0.5 0.5 0.060 0 0.883 3
    0.5 1 0.5 0.058 9 0.873 5
    0.5 0.5 1 0.073 5 0.871 4
    1 1 1 0.056 7 0.887 2
    下載: 導出CSV
  • [1] ZHOU Huajun, XIE Xiaohua, LAI Jianhuang, et al. Interactive two-stream decoder for accurate and fast saliency detection[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 9138–9147. doi: 10.1109/CVPR42600.2020.00916.
    [2] LIANG Pengpeng, PANG Yu, LIAO Chunyuan, et al. Adaptive objectness for object tracking[J]. IEEE Signal Processing Letters, 2016, 23(7): 949–953. doi: 10.1109/LSP.2016.2556706.
    [3] RUTISHAUSER U, WALTHER D, KOCH C, et al. Is bottom-up attention useful for object recognition?[C]. 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, USA, 2004: II-II. doi: 10.1109/CVPR.2004.1315142.
    [4] ZHANG Jing, FAN Dengping, DAI Yuchao, et al. RGB-D saliency detection via cascaded mutual information minimization[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 4318–4327. doi: 10.1109/ICCV48922.2021.00430.
    [5] LI Aixuan, MAO Yuxin, ZHANG Jing, et al. Mutual information regularization for weakly-supervised RGB-D salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(1): 397–410. doi: 10.1109/TCSVT.2023.3285249.
    [6] LIAO Guibiao, GAO Wei, LI Ge, et al. Cross-collaborative fusion-encoder network for robust RGB-thermal salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(11): 7646–7661. doi: 10.1109/TCSVT.2022.3184840.
    [7] CHEN Yilei, Li Gongyang, AN Ping, et al. Light field salient object detection with sparse views via complementary and discriminative interaction network[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(2): 1070–1085. doi: 10.1109/TCSVT.2023.3290600.
    [8] ITTI L, KOCH C, and NIEBUR E. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(11): 1254–1259. doi: 10.1109/34.730558.
    [9] JIANG Huaizu, WANG Jingdong, YUAN Zejian, et al. Salient object detection: A discriminative regional feature integration approach[C]. 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, USA, 2013: 2083–2090. doi: 10.1109/CVPR.2013.271.
    [10] LI Guanbin and YU Yizhou. Visual saliency based on multiscale deep features[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 5455–5463. doi: 10.1109/CVPR.2015.7299184.
    [11] LEE G, TAI Y W, and KIM J. Deep saliency with encoded low level distance map and high level features[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 660–668. doi: 10.1109/CVPR.2016.78.
    [12] WANG Linzhao, WANG Lijun, LU Huchuan, et al. Salient object detection with recurrent fully convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(7): 1734–1746. doi: 10.1109/TPAMI.2018.2846598.
    [13] LIU Nian, ZHANG Ni, WAN Kaiyuan, et al. Visual saliency transformer[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 4702–4712. doi: 10.1109/ICCV48922.2021.00468.
    [14] YUN Yike and LIN Weisi. SelfReformer: Self-refined network with transformer for salient object detection[J]. arXiv: 2205.11283, 2022.
    [15] ZHU Lei, CHEN Jiaxing, HU Xiaowei, et al. Aggregating attentional dilated features for salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(10): 3358–3371. doi: 10.1109/TCSVT.2019.2941017.
    [16] XIE Enze, WANG Wenhai, YU Zhiding, et al. SegFormer: Simple and efficient design for semantic segmentation with transformers[C]. The 35th International Conference on Neural Information Processing Systems, 2021: 924.
    [17] WANG Libo, LI Rui, ZHANG Ce, et al. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 190: 196–214. doi: 10.1016/j.isprsjprs.2022.06.008.
    [18] ZHOU Daquan, KANG Bingyi, JIN Xiaojie, et al. DeepViT: Towards deeper vision transformer[J]. arXiv: 2103.11886, 2021.
    [19] GAO Shanghua, CHENG Mingming, ZHAO Kai, et al. Res2Net: A new multi-scale backbone architecture[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(2): 652–662. doi: 10.1109/TPAMI.2019.2938758.
    [20] LIN Xian, YAN Zengqiang, DENG Xianbo, et al. ConvFormer: Plug-and-play CNN-style transformers for improving medical image segmentation[C]. The 26th International Conference on Medical Image Computing and Computer-Assisted Intervention, Vancouver, Canada, 2023: 642–651. doi: 10.1007/978-3-031-43901-8_61.
    [21] CHENG Bowen, MISRA I, SCHWING A G, et al. Masked-attention mask transformer for universal image segmentation[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 1280–1289. doi: 10.1109/CVPR52688.2022.00135.
    [22] ZHAO Jiaxing, LIU Jiangjiang, FAN Dengping, et al. EGNet: Edge guidance network for salient object detection[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 8778–8787. doi: 10.1109/ICCV.2019.00887.
    [23] LIU Jiangjiang, HOU Qibin, CHENG Mingming, et al. A simple pooling-based design for real-time salient object detection[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 3912–3921. doi: 10.1109/CVPR.2019.00404.
    [24] PANG Youwei, ZHAO Xiaoqi, ZHANG Lihe, et al. Multi-scale interactive network for salient object detection[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 9410–9419. doi: 10.1109/CVPR42600.2020.00943.
    [25] HU Xiaowei, FU C, ZHU Lei, et al. SAC-Net: Spatial attenuation context for salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(3): 1079–1090. doi: 10.1109/TCSVT.2020.2995220.
    [26] ZHUGE Mingchen, FAN Dengping, LIU Nian, et al. Salient object detection via integrity learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3738–3752. doi: 10.1109/TPAMI.2022.3179526.
    [27] WANG Yi, WANG Ruili, FAN Xin, et al. Pixels, regions, and objects: Multiple enhancement for salient object detection[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 10031–10040. doi: 10.1109/CVPR527292023.00967.
    [28] LUO Ziyang, LIU Nian, ZHAO Wangbo, et al. VSCode: General visual salient and camouflaged object detection with 2D prompt learning[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 17169–17180. doi: 10.1109/CVPR52733.2024.01625.
  • 加載中
圖(6) / 表(3)
計量
  • 文章訪問數(shù):  291
  • HTML全文瀏覽量:  93
  • PDF下載量:  65
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2024-05-13
  • 修回日期:  2024-09-18
  • 網(wǎng)絡出版日期:  2024-09-24
  • 刊出日期:  2025-01-31

目錄

    /

    返回文章
    返回