基于視圖感知的單視圖三維重建算法

王年; 胡旭陽; 朱凡; 唐俊

doi:10.11999/JEIT190986

基于視圖感知的單視圖三維重建算法

doi: 10.11999/JEIT190986

王年¹,
胡旭陽¹,
朱凡²,
唐俊^1, ,

1.
安徽大學(xué)電子信息工程學(xué)院合肥 230031
2.
起源人工智能研究院阿布扎比 51133

基金項目: 國家自然科學(xué)基金(61772032)

詳細(xì)信息

作者簡介:
王年：男，1966年生，教授，博士，主要從事模式識別與圖像處理等方面的研究

胡旭陽：男，1995年生，碩士生，研究方向為圖像生成和3維重建

朱凡：男，1987年生，博士，主要從事計算機(jī)視覺方面的研究

唐俊：男，1977年生，教授，博士，主要從事模式識別與計算機(jī)視覺等方面的研究

通訊作者:
唐俊　tangjunahu@163.com

中圖分類號: TN911.73; TP301.6
計量
- 文章訪問數(shù): 2627
- HTML全文瀏覽量: 1423
- PDF下載量: 137
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2019-12-09
- 修回日期: 2020-05-26
- 網(wǎng)絡(luò)出版日期: 2020-06-22
- 刊出日期: 2020-12-08

Single-view 3D Reconstruction Algorithm Based on View-aware

Nian WANG¹,
Xuyang HU¹,
Fan ZHU²,
Jun TANG^{1
, ,}

1.
School of Electronic Information Engineering, Anhui University, Hefei 230031, China
2.
Inception Institute of Artificial Intelligence, Abu Dhabi 51133, United Arab Emirates

Funds: The National Nature Science Foundation of China (61772032)

摘要

摘要: 盡管由于丟棄維度將3維(3D)形狀投影到2維(2D)視圖看似是不可逆的，但是從可視化到計算機(jī)輔助幾何設(shè)計，各個垂直行業(yè)對3維重建技術(shù)的興趣正迅速增長。傳統(tǒng)基于物體深度圖或者RGB圖的3維重建算法雖然可以在一些方面達(dá)到令人滿意的效果，但是它們?nèi)匀幻媾R若干問題：(1)粗魯?shù)膶W(xué)習(xí)2D視圖與3D形狀之間的映射；(2)無法解決物體不同視角下外觀差異所帶來的的影響；(3)要求物體多個觀察視角下的圖像。該文提出一個端到端的視圖感知3維(VA3D)重建網(wǎng)絡(luò)解決了上述問題。具體而言，VA3D包含多鄰近視圖合成子網(wǎng)絡(luò)和3D重建子網(wǎng)絡(luò)。多鄰近視圖合成子網(wǎng)絡(luò)基于物體源視圖生成多個鄰近視角圖像，且引入自適應(yīng)融合模塊解決了視角轉(zhuǎn)換過程中出現(xiàn)的模糊或扭曲等問題。3D重建子網(wǎng)絡(luò)使用循環(huán)神經(jīng)網(wǎng)絡(luò)從合成的多視圖序列中恢復(fù)物體3D形狀。通過在ShapeNet數(shù)據(jù)集上大量定性和定量的實(shí)驗表明，VA3D有效提升了基于單視圖的3維重建結(jié)果。
- 視圖感知 /
- 3維重建 /
- 視角轉(zhuǎn)換 /
- 端到端神經(jīng)網(wǎng)絡(luò) /
- 自適應(yīng)融合
Abstract: While projecting 3D shapes to 2D images is irreversible due to the abandoned dimension amid the projection process, there are rapidly growing interests across various vertical industries for 3D reconstruction techniques, from visualization purposes to computer aided geometric design. The traditional 3D reconstruction approaches based on depth map or RGB image can synthesize visually satisfactory 3D objects, while they generally suffer from several problems: (1)The 2D to 3D learning strategy is brutal-force; (2)Unable to solve the effects of differences in appearance from different viewpoints of objects; (3)Multiple images from distinctly different viewpoints are required. In this paper, an end-to-end View-Aware 3D (VA3D) reconstruction network is proposed to address the above problems. In particular, the VA3D includes a multi-neighbor-view synthesis sub-network and a 3D reconstruction sub-network. The multi-neighbor-view synthesis sub-network generates multiple neighboring viewpoint images based on the object source view, while the adaptive fusional module is added to resolve the blurry and distortion issues in viewpoint translation. The 3D reconstruction sub-network introduces a recurrent neural network to recover the object 3D shape from multi-view sequence. Extensive qualitative and quantitative experiments on the ShapeNet dataset show that the VA3D effectively improves the 3D reconstruction results based on single-view.
- View-aware /
- 3D reconstruction /
- Viewpoint translation /
- End-to-end neural network /
- Adaptive fusional

HTML全文

圖 1 視圖感知3維重建

下載: 全尺寸圖片幻燈片

圖 2 MSN生成器結(jié)構(gòu)

下載: 全尺寸圖片幻燈片

圖 3 自適應(yīng)融合

下載: 全尺寸圖片幻燈片

圖 4 定性比較示例樣本

下載: 全尺寸圖片幻燈片

圖 5 SA3D與VA3D生成的多視圖對比

下載: 全尺寸圖片幻燈片

圖 6 不同合成視圖數(shù)量的IoU和F-score

下載: 全尺寸圖片幻燈片

表 1 定量比較結(jié)果

類別	IoU			F-score
類別	3D-R2N2_1	3D-R2N2_5	VA3D	3D-R2N2_1	3D-R2N2_5	VA3D
柜子	0.7299	0.7839	0.7915	0.8267	0.8651	0.8694
汽車	0.8123	0.8551	0.8530	0.8923	0.9190	0.9178
椅子	0.4958	0.5802	0.5643	0.6404	0.7155	0.6995
飛機(jī)	0.5560	0.6228	0.6385	0.7006	0.7561	0.7641
桌子	0.5297	0.6061	0.6128	0.6717	0.7362	0.7386
長凳	0.4621	0.5566	0.5533	0.6115	0.6991	0.6936
平均	0.5976	0.6674	0.6689	0.7238	0.7818	0.7805

下載: 導(dǎo)出CSV

表 2 對比SA3D算法結(jié)果

算法	平均IoU
SA3D	0.6162
VA3D	0.6741

下載: 導(dǎo)出CSV

表 3 MSN中不同輸出策略的影響

模型	SSIM	PSNR	IoU	F-score
僅使用${\left\{ {{{{\tilde{ I}}}_r}} \right\}^{\rm{C}}}$	0.8035	19.8042	0.6525	0.7649
僅使用${\left\{ {{{{\tilde{ I}}}_f}} \right\}^{\rm{C}}}$	0.8435	20.5273	0.6530	0.7646
自適應(yīng)融合	0.8488	20.6203	0.6554	0.7672

下載: 導(dǎo)出CSV

表 4 重建結(jié)果的方差

模型	$\sigma _{{\rm{IoU}}}^2$	$\sigma _{{F} {\rm{ - score}}}^{\rm{2}}$
合成視圖數(shù)量=0	0.0057	0.0061
合成視圖數(shù)量=4	0.0051	0.0054

下載: 導(dǎo)出CSV

表 5 不同損失函數(shù)的組合

模型	SSIM	PSNR	IoU	F-score
無重建損失${{\cal{L}}_{{\rm{rec}}}}$	0.8462	20.2693	0.6540	0.7658
無對抗損失${{\cal{L}}_{{\rm{adv}}}}$	0.8516	21.4385	0.6539	0.7651
無感知損失${{\cal{L}}_{{\rm{per}}}}$	0.8416	20.3141	0.6525	0.7645
全部損失	0.8488	20.6203	0.6554	0.7672

下載: 導(dǎo)出CSV

參考文獻(xiàn)(21)

EIGEN D, PUHRSCH C, and FERGUS R. Depth map prediction from a single image using a multi-scale deep network[C]. The 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 2366–2374.

WU Jiajun, WANG Yifan, XUE Tianfan, et al. Marrnet: 3D shape reconstruction via 2.5D sketches[C]. The 31st Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 540–550.

WANG Nanyang, ZHANG Yinda, LI Zhuwen, et al. Pixel2mesh: Generating 3D mesh models from single RGB images[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 55–71. doi: 10.1007/978-3-030-01252-6_4.

TANG Jiapeng, HAN Xiaoguang, PAN Junyi, et al. A skeleton-bridged deep learning approach for generating meshes of complex topologies from single RGB images[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 4536–4545. doi: 10.1109/cvpr.2019.00467.

CHOY C B, XU Danfei, GWAK J Y, et al. 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction[C]. The 14th European Conference on Computer Vision, Amsterdam, the Netherlands, 2016: 628–644. doi: 10.1007/978-3-319-46484-8_38.

HU Xuyang, ZHU Fan, LIU Li, et al. Structure-aware 3D shape synthesis from single-view images[C]. 2018 British Machine Vision Conference, Newcastle, UK, 2018.

GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]. The 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 2672–2680.

張驚雷, 厚雅偉. 基于改進(jìn)循環(huán)生成式對抗網(wǎng)絡(luò)的圖像風(fēng)格遷移[J]. 電子與信息學(xué)報, 2020, 42(5): 1216–1222. doi: 10.11999/JEIT190407

ZHANG Jinglei and HOU Yawei. Image-to-image translation based on improved cycle-consistent generative adversarial network[J]. Journal of Electronics &Information Technology, 2020, 42(5): 1216–1222. doi: 10.11999/JEIT190407

陳瑩, 陳湟康. 基于多模態(tài)生成對抗網(wǎng)絡(luò)和三元組損失的說話人識別[J]. 電子與信息學(xué)報, 2020, 42(2): 379–385. doi: 10.11999/JEIT190154

CHEN Ying and CHEN Huangkang. Speaker recognition based on multimodal generative adversarial nets with triplet-loss[J]. Journal of Electronics &Information Technology, 2020, 42(2): 379–385. doi: 10.11999/JEIT190154

WANG Tingchun, LIU Mingyu, ZHU Junyan, et al. High-resolution image synthesis and semantic manipulation with conditional gans[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8798–8807. doi: 10.1109/cvpr.2018.00917.

ULYANOV D, VEDALDI A, and LEMPITSKY V. Instance normalization: The missing ingredient for fast stylization[EB/OL]. https://arxiv.org/abs/1607.08022, 2016.

XU Bing, WANG Naiyan, CHEN Tianqi, et al. Empirical evaluation of rectified activations in convolutional network[EB/OL]. https://arxiv.org/abs/1505.00853, 2015.

GOKASLAN A, RAMANUJAN V, RITCHIE D, et al. Improving shape deformation in unsupervised image-to-image translation[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 662–678. doi: 10.1007/978-3-030-01258-8_40.

MAO Xudong, LI Qing, XIE Haoran, et al. Least squares generative adversarial networks[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2813–2821. doi: 10.1109/iccv.2017.304.

GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of wasserstein GANs[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 5767–5777.

LEDIG C, THEIS L, HUSZáR F, et al. Photo-realistic single image super-resolution using a generative adversarial network[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 105–114. doi: 10.1109/CVPR.2017.19.

SIMONYAN K and ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. https://arxiv.org/abs/1409.1556, 2014.

KINGMA D P and BA J. Adam: A method for stochastic optimization[EB/OL]. https://arxiv.org/abs/1412.6980, 2014.

CHANG A X, FUNKHOUSER T, GUIBAS L, et al. Shapenet: An information-rich 3D model repository[EB/OL]. https://arxiv.org/abs/1512.03012, 2015.

GRABNER A, ROTH P M, and LEPETIT V. 3D pose estimation and 3D model retrieval for objects in the wild[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 3022–3031. doi: 10.1109/cvpr.2018.00319.

HE Xinwei, ZHOU Yang, ZHOU Zhichao, et al. Triplet-center loss for multi-view 3D object retrieval[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1945–1954. doi: 10.1109/cvpr.2018.00208.

相關(guān)文章

施引文獻(xiàn)

資源附件(0)

訪問統(tǒng)計