基于多模態(tài)特征融合監(jiān)督的RGB-D圖像顯著性檢測
doi: 10.11999/JEIT190297
-
安徽大學(xué)計算機科學(xué)與技術(shù)學(xué)院 合肥 230601
RGB-D Image Saliency Detection Based on Multi-modal Feature-fused Supervision
-
School of Computer Science and Technology, Anhui University, Hefei 230601, China
-
摘要:
RGB-D圖像顯著性檢測是在一組成對的RGB和Depth圖中識別出視覺上最顯著突出的目標(biāo)區(qū)域。已有的雙流網(wǎng)絡(luò),同等對待多模態(tài)的RGB和Depth圖像數(shù)據(jù),在提取特征方面幾乎一致。然而,低層的Depth特征存在較大噪聲,不能很好地表征圖像特征。因此,該文提出一種多模態(tài)特征融合監(jiān)督的RGB-D圖像顯著性檢測網(wǎng)絡(luò),通過兩個獨立流分別學(xué)習(xí)RGB和Depth數(shù)據(jù),使用雙流側(cè)邊監(jiān)督模塊分別獲取網(wǎng)絡(luò)各層基于RGB和Depth特征的顯著圖,然后采用多模態(tài)特征融合模塊來融合后3層RGB和Depth高維信息生成高層顯著預(yù)測結(jié)果。網(wǎng)絡(luò)從第1層至第5層逐步生成RGB和Depth各模態(tài)特征,然后從第5層到第3層,利用高層指導(dǎo)低層的方式產(chǎn)生多模態(tài)融合特征,接著從第2層到第1層,利用第3層產(chǎn)生的融合特征去逐步地優(yōu)化前兩層的RGB特征,最終輸出既包含RGB低層信息又融合RGB-D高層多模態(tài)信息的顯著圖。在3個公開數(shù)據(jù)集上的實驗表明,該文所提網(wǎng)絡(luò)因為使用了雙流側(cè)邊監(jiān)督模塊和多模態(tài)特征融合模塊,其性能優(yōu)于目前主流的RGB-D顯著性檢測模型,具有較強的魯棒性。
-
關(guān)鍵詞:
- RGB-D顯著性檢測 /
- 卷積神經(jīng)網(wǎng)絡(luò) /
- 多模態(tài) /
- 監(jiān)督
Abstract:RGB-D saliency detection identifies the most visually attentive target areas in a pair of RGB and Depth images. Existing two-stream networks, which treat RGB and Depth data equally, are almost identical in feature extraction. As the lower layers Depth features with a lot of noise, it causes image features not be well characterized. Therefore, a multi-modal feature-fused supervision of RGB-D saliency detection network is proposed, RGB and Depth data are studied independently through two-stream , double-side supervision module is used respectively to obtain saliency maps of each layer, and then the multi-modal feature-fused module is used to later three layers of the fused RGB and Depth of higher dimensional information to generate saliency predicted results. Finally, the information of lower layers is fused to generate the ultimate saliency maps. Experiments on three open data sets show that the proposed network has better performance and stronger robustness than the current RGB-D saliency detection models.
-
Key words:
- RGB-D saliency detection /
- Convolutional Neural Network(CNN) /
- Multi-modal /
- Supervision
-
表 1 在F-measure, MAE, S-measure, E-measure上與其他模型的對比
算法 NLPR1000 NJU2000 STEREO F MAE S E F MAE S E F MAE S E TAN 0.7956 0.0410 0.8861 0.9161 0.8442 0.0605 0.8785 0.8932 0.8489 0.0591 0.8775 0.9108 PCFN 0.7948 0.0437 0.8736 0.9163 0.8440 0.0591 0.8770 0.8966 0.8450 0.0606 0.8800 0.9054 MMCI 0.7299 0.0591 0.8557 0.8717 0.8122 0.0790 0.8581 0.8775 0.8120 0.0796 0.8599 0.8896 DF 0.7348 0.0891 0.7909 0.8600 0.7703 0.1406 0.7596 0.8383 0.7650 0.1395 0.7664 0.8438 本文模型 0.8629 0.0318 0.9117 0.9464 0.8578 0.0541 0.8852 0.8956 0.8622 0.0519 0.8894 0.9130 下載: 導(dǎo)出CSV
表 2 雙流側(cè)邊監(jiān)督模塊有效性實驗對比結(jié)果
算法 NLPR1000 NJU2000 STEREO F MAE S E F MAE S E F MAE S E NDS 0.8358 0.0340 0.9085 0.9336 0.8502 0.0568 0.8848 0.8902 0.8524 0.0552 0.8879 0.9066 本文模型(DS) 0.8629 0.0318 0.9117 0.9464 0.8578 0.0541 0.8852 0.8956 0.8622 0.0519 0.8894 0.9130 下載: 導(dǎo)出CSV
表 3 多尺度模塊有效性實驗對比結(jié)果
算法 NLPR1000 NJU2000 STEREO F MAE S E F MAE S E F MAE S E BN 0.8488 0.0340 0.9059 0.9398 0.8504 0.0566 0.8814 0.8928 0.8573 0.0547 0.8848 0.9093 本文模型 0.8629 0.0318 0.9117 0.9464 0.8578 0.0541 0.8852 0.8956 0.8622 0.0519 0.8894 0.9130 下載: 導(dǎo)出CSV
表 4 低維Depth特征實驗對比結(jié)果
算法 NLPR1000 NJU2000 STEREO F MAE S E F MAE S E F MAE S E DY 0.8715 0.1087 0.8187 0.9479 0.8250 0.1310 0.8414 0.8785 0.8355 0.1277 0.8541 0.8984 本文模型 0.8629 0.0318 0.9117 0.9464 0.8578 0.0541 0.8852 0.8956 0.8622 0.0519 0.8894 0.9130 下載: 導(dǎo)出CSV
-
SHAO Ling and BRADY M. Specific object retrieval based on salient regions[J]. Pattern Recognition, 2006, 39(10): 1932–1948. doi: 10.1016/j.patcog.2006.04.010 GUO Chenlei and ZHANG Liming. A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression[J]. IEEE Transactions on Image Processing, 2010, 19(1): 185–198. doi: 10.1109/TIP.2009.2030969 MAHADEVAN V and VASCONCELOS N. Biologically inspired object tracking using center-surround saliency mechanisms[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(3): 541–554. doi: 10.1109/TPAMI.2012.98 QU Liangqiong, HE Shengfeng, ZHANG Jiawei, et al. RGBD salient object detection via deep fusion[J]. IEEE Transactions on Image Processing, 2017, 26(5): 2274–2285. doi: 10.1109/TIP.2017.2682981 CHEN Hao, LI Youfu, and SU Dan. Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection[J]. Pattern Recognition, 2019, 86: 376–385. doi: 10.1016/j.patcog.2018.08.007 HAN Junwei, CHEN Hao, LIU Nian, et al. CNNs-Based RGB-D saliency detection via cross-view transfer and multiview fusion[J]. IEEE Transactions on Cybernetics, 2018, 48(11): 3171–3183. doi: 10.1109/TCYB.2017.2761775 CHEN Hao, LI Youfu, and SU Dan. RGB-D saliency detection by multi-stream late fusion network[C]. The 11th International Conference on Computer Vision Systems, Shenzhen, China, 2017: 459-468. doi: 10.1007/978-3-319-68345-4_41. CHEN Hao and LI Youfu. Progressively complementarity-aware fusion network for RGB-D salient object detection[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 3051–3060. SIMONYAN K and ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]. 2015 International Conference on Learning Representations, San Diego, USA, 2015: 1150–1210. LEE C Y, XIE Saining, GALLAGHER P, et al. Deeply-supervised nets[C]. The 18th International Conference on Artificial Intelligence and Statistics, San Diego, USA, 2015: 562–570. XIE Saining and TU Zhuowen. Holistically-nested edge detection[J]. International Journal of Computer Vision, 2017, 125(1/3): 3–18. doi: 10.1007/s11263-017-1004-z HOU Qibin, CHENG Mingming, HU Xiaowei, et al. Deeply supervised salient object detection with short connections[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(4): 815–828. doi: 10.1109/TPAMI.2018.2815688 DU Dapeng, XU Xiangyang, REN Tongwei, et al. Depth images could tell us more: Enhancing depth discriminability for RGB-D scene recognition[C]. 2018 IEEE International Conference on Multimedia and Expo, San Diego, USA, 2018: 1–6. doi: 10.1109/ICME.2018.8486573. SONG Xinhang, HERRANZ L, and JIANG Shuqiang. Depth CNNs for RGB-D scene recognition: Learning from scratch better than transferring from RGB-CNNs[C]. The 31st AAAI Conference on Artificial Intelligence, San Francisco, USA, 2017: 4271–4277. LIU Nian and HAN Junwei. DHSnet: Deep hierarchical saliency network for salient object detection[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 678–686. doi: 10.1109/CVPR.2016.80. KIM H J, DUNN E, and FRAHM J M. Learned contextual feature reweighting for image geo-localization[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 3251–3260. doi: 10.1109/CVPR.2017.346. PENG Houwen, LI Bing, XIONG Weihua, et al. RGBD salient object detection: A benchmark and algorithms[C]. The 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 92–109. doi: 10.1007/978-3-319-10578-9_7. JU Ran, GE Ling, GENG Wenjing, et al. Depth saliency based on anisotropic center-surround difference[C]. 2014 IEEE International Conference on Image Processing, Paris, France, 2014: 1115–1119. doi: 10.1109/ICIP.2014.7025222. NIU Yuzhen, GENG Yujie, LI Xueqing, et al. Leveraging stereopsis for saliency analysis[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 454–461. doi: 10.1109/CVPR.2012.6247708. MARTIN D R, FOWLKES C C, and MALIK J. Learning to detect natural image boundaries using local brightness, color, and texture cues[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(5): 530–549. doi: 10.1109/TPAMI.2004.1273918 FAN Dengping, CHENG Mingming, LIU Yun, et al. Structure-measure: A new way to evaluate foreground maps[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 4558–4567. FAN Dengping, GONG Cheng, CAO Yang, et al. Enhanced-alignment measure for binary foreground map evaluation[C]. The 27th International Joint Conference on Artificial Intelligence, Stockholm, 2018: 698–704. FAN Dengping, CHENG Mingming, LIU Jiangjiang, et al. Salient objects in clutter: Bringing salient object detection to the foreground[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 186–202. JIA Yangqing, SHELHAMER E, DONAHUE J, et al. Caffe: Convolutional architecture for fast feature embedding[C]. The 22nd ACM International Conference on Multimedia, Orlando, USA, 2014: 675–678. doi: 10.1145/2647868.2654889. CHEN Hao and LI Youfu. Three-stream attention-aware network for RGB-D salient object detection[J]. IEEE Transactions on Image Processing, 2019, 28(6): 2825–2835. doi: 10.1109/TIP.2019.2891104 -