基于多模態(tài)特征融合監(jiān)督的RGB-D圖像顯著性檢測

劉政怡; 段群濤; 石松; 趙鵬

doi:10.11999/JEIT190297

基于多模態(tài)特征融合監(jiān)督的RGB-D圖像顯著性檢測

doi: 10.11999/JEIT190297 cstr: 32379.14.JEIT190297

安徽大學(xué)計算機科學(xué)與技術(shù)學(xué)院合肥 230601

基金項目: 安徽省自然科學(xué)基金(1908085MF182)，國家自然科學(xué)基金(61602004)，安徽高校自然科學(xué)研究項目(KJ2019A0034)

詳細信息

作者簡介:
劉政怡：女，1978年生，副教授，研究方向為計算機視覺

段群濤：女，1993年生，碩士生，研究方向為圖像顯著性檢測

石松：男，1993年生，碩士生，研究方向為圖像顯著性檢測

趙鵬：女，1976年生，副教授，研究方向為智能信息處理、機器學(xué)習(xí)

通訊作者:
劉政怡　liuzywen@ahu.edu.cn

中圖分類號: TP391.41
計量
- 文章訪問數(shù): 5676
- HTML全文瀏覽量: 2528
- PDF下載量: 223
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2019-04-29
- 修回日期: 2019-08-31
- 網(wǎng)絡(luò)出版日期: 2019-09-05
- 刊出日期: 2020-06-04

RGB-D Image Saliency Detection Based on Multi-modal Feature-fused Supervision

School of Computer Science and Technology, Anhui University, Hefei 230601, China

Funds: The Provincial Natural Science Foundation of Anhui (1908085MF182), The National Natural Science Foundation of China (61602004), The Anhui University Natural Science Research Project (KJ2019A0034)

摘要

摘要:
RGB-D圖像顯著性檢測是在一組成對的RGB和Depth圖中識別出視覺上最顯著突出的目標區(qū)域。已有的雙流網(wǎng)絡(luò)，同等對待多模態(tài)的RGB和Depth圖像數(shù)據(jù)，在提取特征方面幾乎一致。然而，低層的Depth特征存在較大噪聲，不能很好地表征圖像特征。因此，該文提出一種多模態(tài)特征融合監(jiān)督的RGB-D圖像顯著性檢測網(wǎng)絡(luò)，通過兩個獨立流分別學(xué)習(xí)RGB和Depth數(shù)據(jù)，使用雙流側(cè)邊監(jiān)督模塊分別獲取網(wǎng)絡(luò)各層基于RGB和Depth特征的顯著圖，然后采用多模態(tài)特征融合模塊來融合后3層RGB和Depth高維信息生成高層顯著預(yù)測結(jié)果。網(wǎng)絡(luò)從第1層至第5層逐步生成RGB和Depth各模態(tài)特征，然后從第5層到第3層，利用高層指導(dǎo)低層的方式產(chǎn)生多模態(tài)融合特征，接著從第2層到第1層，利用第3層產(chǎn)生的融合特征去逐步地優(yōu)化前兩層的RGB特征，最終輸出既包含RGB低層信息又融合RGB-D高層多模態(tài)信息的顯著圖。在3個公開數(shù)據(jù)集上的實驗表明，該文所提網(wǎng)絡(luò)因為使用了雙流側(cè)邊監(jiān)督模塊和多模態(tài)特征融合模塊，其性能優(yōu)于目前主流的RGB-D顯著性檢測模型，具有較強的魯棒性。
- RGB-D顯著性檢測 /
- 卷積神經(jīng)網(wǎng)絡(luò) /
- 多模態(tài) /
- 監(jiān)督
Abstract:
RGB-D saliency detection identifies the most visually attentive target areas in a pair of RGB and Depth images. Existing two-stream networks, which treat RGB and Depth data equally, are almost identical in feature extraction. As the lower layers Depth features with a lot of noise, it causes image features not be well characterized. Therefore, a multi-modal feature-fused supervision of RGB-D saliency detection network is proposed, RGB and Depth data are studied independently through two-stream , double-side supervision module is used respectively to obtain saliency maps of each layer, and then the multi-modal feature-fused module is used to later three layers of the fused RGB and Depth of higher dimensional information to generate saliency predicted results. Finally, the information of lower layers is fused to generate the ultimate saliency maps. Experiments on three open data sets show that the proposed network has better performance and stronger robustness than the current RGB-D saliency detection models.
- RGB-D saliency detection /
- Convolutional Neural Network(CNN) /
- Multi-modal /
- Supervision

HTML全文

圖 1 本文方法模型

下載: 全尺寸圖片幻燈片

圖 2 雙流側(cè)邊監(jiān)督模塊

下載: 全尺寸圖片幻燈片

圖 3 多模態(tài)特征融合方法

下載: 全尺寸圖片幻燈片

圖 4 與4種模型的PR曲線對比

下載: 全尺寸圖片幻燈片

圖 5 與4種模型的可視化對比

下載: 全尺寸圖片幻燈片

圖 6 DY可視化

下載: 全尺寸圖片幻燈片

圖 7 本文模型可視化

下載: 全尺寸圖片幻燈片

表 1 在F-measure, MAE, S-measure, E-measure上與其他模型的對比

算法	NLPR1000				NJU2000				STEREO
算法	F	MAE	S	E	F	MAE	S	E	F	MAE	S	E
TAN	0.7956	0.0410	0.8861	0.9161	0.8442	0.0605	0.8785	0.8932	0.8489	0.0591	0.8775	0.9108
PCFN	0.7948	0.0437	0.8736	0.9163	0.8440	0.0591	0.8770	0.8966	0.8450	0.0606	0.8800	0.9054
MMCI	0.7299	0.0591	0.8557	0.8717	0.8122	0.0790	0.8581	0.8775	0.8120	0.0796	0.8599	0.8896
DF	0.7348	0.0891	0.7909	0.8600	0.7703	0.1406	0.7596	0.8383	0.7650	0.1395	0.7664	0.8438
本文模型	0.8629	0.0318	0.9117	0.9464	0.8578	0.0541	0.8852	0.8956	0.8622	0.0519	0.8894	0.9130

下載: 導(dǎo)出CSV

表 2 雙流側(cè)邊監(jiān)督模塊有效性實驗對比結(jié)果

算法	NLPR1000				NJU2000				STEREO
算法	F	MAE	S	E	F	MAE	S	E	F	MAE	S	E
NDS	0.8358	0.0340	0.9085	0.9336	0.8502	0.0568	0.8848	0.8902	0.8524	0.0552	0.8879	0.9066
本文模型(DS)	0.8629	0.0318	0.9117	0.9464	0.8578	0.0541	0.8852	0.8956	0.8622	0.0519	0.8894	0.9130

下載: 導(dǎo)出CSV

表 3 多尺度模塊有效性實驗對比結(jié)果

算法	NLPR1000				NJU2000				STEREO
算法	F	MAE	S	E	F	MAE	S	E	F	MAE	S	E
BN	0.8488	0.0340	0.9059	0.9398	0.8504	0.0566	0.8814	0.8928	0.8573	0.0547	0.8848	0.9093
本文模型	0.8629	0.0318	0.9117	0.9464	0.8578	0.0541	0.8852	0.8956	0.8622	0.0519	0.8894	0.9130

下載: 導(dǎo)出CSV

表 4 低維Depth特征實驗對比結(jié)果

算法	NLPR1000				NJU2000				STEREO
算法	F	MAE	S	E	F	MAE	S	E	F	MAE	S	E
DY	0.8715	0.1087	0.8187	0.9479	0.8250	0.1310	0.8414	0.8785	0.8355	0.1277	0.8541	0.8984
本文模型	0.8629	0.0318	0.9117	0.9464	0.8578	0.0541	0.8852	0.8956	0.8622	0.0519	0.8894	0.9130

下載: 導(dǎo)出CSV

參考文獻(25)

SHAO Ling and BRADY M. Specific object retrieval based on salient regions[J]. Pattern Recognition, 2006, 39(10): 1932–1948. doi: 10.1016/j.patcog.2006.04.010

GUO Chenlei and ZHANG Liming. A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression[J]. IEEE Transactions on Image Processing, 2010, 19(1): 185–198. doi: 10.1109/TIP.2009.2030969

MAHADEVAN V and VASCONCELOS N. Biologically inspired object tracking using center-surround saliency mechanisms[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(3): 541–554. doi: 10.1109/TPAMI.2012.98

QU Liangqiong, HE Shengfeng, ZHANG Jiawei, et al. RGBD salient object detection via deep fusion[J]. IEEE Transactions on Image Processing, 2017, 26(5): 2274–2285. doi: 10.1109/TIP.2017.2682981

CHEN Hao, LI Youfu, and SU Dan. Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection[J]. Pattern Recognition, 2019, 86: 376–385. doi: 10.1016/j.patcog.2018.08.007

HAN Junwei, CHEN Hao, LIU Nian, et al. CNNs-Based RGB-D saliency detection via cross-view transfer and multiview fusion[J]. IEEE Transactions on Cybernetics, 2018, 48(11): 3171–3183. doi: 10.1109/TCYB.2017.2761775

CHEN Hao, LI Youfu, and SU Dan. RGB-D saliency detection by multi-stream late fusion network[C]. The 11th International Conference on Computer Vision Systems, Shenzhen, China, 2017: 459-468. doi: 10.1007/978-3-319-68345-4_41.

CHEN Hao and LI Youfu. Progressively complementarity-aware fusion network for RGB-D salient object detection[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 3051–3060.

SIMONYAN K and ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]. 2015 International Conference on Learning Representations, San Diego, USA, 2015: 1150–1210.

LEE C Y, XIE Saining, GALLAGHER P, et al. Deeply-supervised nets[C]. The 18th International Conference on Artificial Intelligence and Statistics, San Diego, USA, 2015: 562–570.

XIE Saining and TU Zhuowen. Holistically-nested edge detection[J]. International Journal of Computer Vision, 2017, 125(1/3): 3–18. doi: 10.1007/s11263-017-1004-z

HOU Qibin, CHENG Mingming, HU Xiaowei, et al. Deeply supervised salient object detection with short connections[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(4): 815–828. doi: 10.1109/TPAMI.2018.2815688

DU Dapeng, XU Xiangyang, REN Tongwei, et al. Depth images could tell us more: Enhancing depth discriminability for RGB-D scene recognition[C]. 2018 IEEE International Conference on Multimedia and Expo, San Diego, USA, 2018: 1–6. doi: 10.1109/ICME.2018.8486573.

SONG Xinhang, HERRANZ L, and JIANG Shuqiang. Depth CNNs for RGB-D scene recognition: Learning from scratch better than transferring from RGB-CNNs[C]. The 31st AAAI Conference on Artificial Intelligence, San Francisco, USA, 2017: 4271–4277.

LIU Nian and HAN Junwei. DHSnet: Deep hierarchical saliency network for salient object detection[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 678–686. doi: 10.1109/CVPR.2016.80.

KIM H J, DUNN E, and FRAHM J M. Learned contextual feature reweighting for image geo-localization[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 3251–3260. doi: 10.1109/CVPR.2017.346.

PENG Houwen, LI Bing, XIONG Weihua, et al. RGBD salient object detection: A benchmark and algorithms[C]. The 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 92–109. doi: 10.1007/978-3-319-10578-9_7.

JU Ran, GE Ling, GENG Wenjing, et al. Depth saliency based on anisotropic center-surround difference[C]. 2014 IEEE International Conference on Image Processing, Paris, France, 2014: 1115–1119. doi: 10.1109/ICIP.2014.7025222.

NIU Yuzhen, GENG Yujie, LI Xueqing, et al. Leveraging stereopsis for saliency analysis[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 454–461. doi: 10.1109/CVPR.2012.6247708.

MARTIN D R, FOWLKES C C, and MALIK J. Learning to detect natural image boundaries using local brightness, color, and texture cues[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(5): 530–549. doi: 10.1109/TPAMI.2004.1273918

FAN Dengping, CHENG Mingming, LIU Yun, et al. Structure-measure: A new way to evaluate foreground maps[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 4558–4567.

FAN Dengping, GONG Cheng, CAO Yang, et al. Enhanced-alignment measure for binary foreground map evaluation[C]. The 27th International Joint Conference on Artificial Intelligence, Stockholm, 2018: 698–704.

FAN Dengping, CHENG Mingming, LIU Jiangjiang, et al. Salient objects in clutter: Bringing salient object detection to the foreground[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 186–202.

JIA Yangqing, SHELHAMER E, DONAHUE J, et al. Caffe: Convolutional architecture for fast feature embedding[C]. The 22nd ACM International Conference on Multimedia, Orlando, USA, 2014: 675–678. doi: 10.1145/2647868.2654889.

CHEN Hao and LI Youfu. Three-stream attention-aware network for RGB-D salient object detection[J]. IEEE Transactions on Image Processing, 2019, 28(6): 2825–2835. doi: 10.1109/TIP.2019.2891104

相關(guān)文章

施引文獻

資源附件(0)

訪問統(tǒng)計