融合神經(jīng)輻射場和視覺同時(shí)定位與地圖構(gòu)建的混合場景表示方法

周非; 周志遠(yuǎn); 張宇曈; 謝源遠(yuǎn)

doi:10.11999/JEIT240316

融合神經(jīng)輻射場和視覺同時(shí)定位與地圖構(gòu)建的混合場景表示方法

doi: 10.11999/JEIT240316

重慶郵電大學(xué)通信與信息工程學(xué)院重慶 400065

基金項(xiàng)目: 國家自然科學(xué)基金(62271096)

詳細(xì)信息

作者簡介:
周非：男，教授，研究方向?yàn)閳D像處理、機(jī)器視覺、信息安全等

周志遠(yuǎn)：男，碩士生，研究方向?yàn)閳D像處理、視覺SLAM等

張宇曈：男，碩士生，研究方向?yàn)閳D像處理等

謝源遠(yuǎn)：男，碩士生，研究方向?yàn)閳D像處理等

通訊作者:
周志遠(yuǎn)　s220101223@stu.cqupt.edu.cn

中圖分類號(hào): TN911.73; TP391.41
計(jì)量
- 文章訪問數(shù): 243
- HTML全文瀏覽量: 55
- PDF下載量: 31
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2024-04-22
- 修回日期: 2024-08-26
- 網(wǎng)絡(luò)出版日期: 2024-08-30
- 刊出日期: 2024-11-10

Hybrid Scene Representation Method Integrating Neural Radiation Fields and Visual Simultaneous Localization and Mapping

School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Funds: The National Natural Science Foundation of China (62271096)

摘要

摘要: 目前，傳統(tǒng)顯式場景表示的同時(shí)定位與地圖構(gòu)建(SLAM)系統(tǒng)對(duì)場景進(jìn)行離散化，不適用于連續(xù)性場景重建。該文提出一種基于神經(jīng)輻射場(NeRF)的混合場景表示的深度相機(jī)(RGB-D)SLAM系統(tǒng)，利用擴(kuò)展顯式八叉樹符號(hào)距離函數(shù)(SDF)先驗(yàn)粗略表示場景，并通過多分辨率哈希編碼以不同細(xì)節(jié)級(jí)別表示場景，實(shí)現(xiàn)場景幾何的快速初始化，并使場景幾何更易于學(xué)習(xí)。此外，運(yùn)用外觀顏色分解法，結(jié)合視圖方向?qū)㈩伾纸鉃槁瓷漕伾顽R面反射顏色，實(shí)現(xiàn)光照一致性的重建，使得重建結(jié)果更加真實(shí)。通過在Replica和TUM RGB-D數(shù)據(jù)集上進(jìn)行實(shí)驗(yàn)，Replica數(shù)據(jù)集場景重建完成率達(dá)到93.65%，相較于Vox-Fusion定位精度，在Replica數(shù)據(jù)集上平均領(lǐng)先87.50%，在TUM RGB-D數(shù)據(jù)集上平均領(lǐng)先81.99%。
- 同時(shí)定位與地圖構(gòu)建系統(tǒng) /
- 神經(jīng)輻射場 /
- 混合場景表示 /
- 鏡面反射
Abstract: Currently, traditional explicit scene representation Simultaneous Localization And Mapping (SLAM) systems discretize the scene and are not suitable for continuous scene reconstruction. A RGB-D SLAM system based on hybrid scene representation of Neural Radiation Fields (NeRF) is proposed in this paper. The extended explicit octree Signed Distance Functions (SDF) prior is used to roughly represent the scene, and multi-resolution hash coding is used to represent the scene with different details levels, enabling fast initialization of scene geometry and making scene geometry easier to learn. In addition, the appearance color decomposition method is used to decompose the color into diffuse reflection color and specular reflection color based on the view direction to achieve reconstruction of lighting consistency, making the reconstruction result more realistic. Through experiments on the Replica and TUM RGB-D dataset, the scene reconstruction completion rate of the Replica dataset reaches 93.65%. Compared with the Vox-Fusion positioning accuracy, it leads on average by 87.50% on the Replica dataset and by 81.99% on the TUM RGB-D dataset.
- Simultaneous Localization And Mapping (SLAM) system /
- Neural Radiation Fields (NeRF) /
- Hybrid scene representation /
- Specular reflection

HTML全文

圖 1 系統(tǒng)框架

下載: 全尺寸圖片幻燈片

圖 2 八叉樹SDF先驗(yàn)

下載: 全尺寸圖片幻燈片

圖 3 擴(kuò)展體素分配

下載: 全尺寸圖片幻燈片

圖 4 渲染圖像

下載: 全尺寸圖片幻燈片

圖 5 Replica數(shù)據(jù)集重建結(jié)果

下載: 全尺寸圖片幻燈片

圖 6 Replica數(shù)據(jù)集重建物體網(wǎng)格

下載: 全尺寸圖片幻燈片

圖 7 Apartment數(shù)據(jù)集重建結(jié)果

下載: 全尺寸圖片幻燈片

圖 8 八叉樹SDF先驗(yàn)消融實(shí)驗(yàn)

下載: 全尺寸圖片幻燈片

圖 9 擴(kuò)展體素分配消融實(shí)驗(yàn)

下載: 全尺寸圖片幻燈片

圖 10 外觀顏色分解消融實(shí)驗(yàn)

下載: 全尺寸圖片幻燈片

表 1 超參設(shè)定值

超參	設(shè)定值	超參	設(shè)定值	超參	設(shè)定值	超參	設(shè)定值
L	16	F2	2	M_f	11	${\alpha _3}$	0.000 01
T	2¹⁶	${N_t}$	1024	${\alpha _1}$	5.0	${\alpha _4}$	1000
F1	1	$ M $	32	${\alpha _2}$	0.1	${\alpha _5}$	10

下載: 導(dǎo)出CSV

表 2 Replica數(shù)據(jù)集重建質(zhì)量對(duì)比

方法	重建質(zhì)量指標(biāo)
方法	Depth L1(cm)↓	Acc. (cm)↓	Comp. (cm)↓	Comp. Ratio(%)↑
iMAP	4.64	3.62	4.93	80.50
NICE-SLAM	3.53	2.85	3.00	89.33
Vox-Fusion	2.91	2.37	2.28	92.86
vMAP	3.33	3.20	2.39	92.99
DNS SLAM	3.16	2.76	2.74	91.73
本文	1.76	2.29	2.11	93.65

下載: 導(dǎo)出CSV

表 3 Replica數(shù)據(jù)集軌跡誤差

方法	room0	room1	office0	office1	office3	office4	平均值
iMAP	70.00	4.53	2.32	1.74	58.40	2.62	23.27
NICE-SLAM	1.69	2.04	0.99	0.90	3.97	3.08	2.11
Vox-Fusion	1.37	4.70	8.48	2.04	1.11	2.94	3.44
vMAP	/	/	/	/	/	/	/
DNS SLAM	0.49	0.46	0.34	0.35	0.62	0.60	0.48
本文	0.41	0.52	0.31	0.37	0.46	0.53	0.43

下載: 導(dǎo)出CSV

表 4 TUM-RGBD數(shù)據(jù)集軌跡誤差

方法	fr1/desk	fr2/xyz	fr3/office	平均值
iMAP	4.9	2.0	5.8	4.23
NICE-SLAM	2.7	1.8	3.0	2.50
Vox-Fusion	3.5	1.5	26.0	10.33
vMAP	2.6	1.6	3.0	2.40
DNS SLAM	/	/	/	/
本文	2.0	1.5	2.1	1.86

下載: 導(dǎo)出CSV

表 5 Replica數(shù)據(jù)集消融實(shí)驗(yàn)的定量分析

	Acc.(cm)↓	Comp.(cm)↓	Comp. Ratio(%)↑
w/o 八叉樹SDF先驗(yàn)	2.99	2.20	93.88
w/o 擴(kuò)展體素分配	2.88	2.10	95.05
w/o 外觀顏色分解	2.36	1.95	95.36
本文	2.27	1.92	95.75

下載: 導(dǎo)出CSV

表 6 添加體素的點(diǎn)的閾值分析

點(diǎn)數(shù)量閾值	Acc.(cm)↓	Comp.(cm)↓	Comp. Ratio(%)↑
5	5.00	2.08	93.11
10/本文	2.27	1.92	95.75
15	2.37	1.94	95.67
20	2.29	1.93	95.63

下載: 導(dǎo)出CSV

表 7 Replica數(shù)據(jù)集損失函數(shù)消融實(shí)驗(yàn)

	Acc.(cm)↓	Comp.(cm)↓	Comp. Ratio(%)↑
w/o $ {{L}_{{\mathrm{rgb}}}} $	2.47	1.94	95.68
w/o $ {{L}_{\textq7j3ldu95}} $	2.48	1.93	95.54
w/o $ {{L}_{{\text{specular}}}} $	2.45	1.95	95.64
w/o $ {{L}_{{\text{sdf}}}} $	2.68	2.04	94.68
w/o $ {{L}_{{f_{\text{s}}}}} $	2.30	1.95	95.46
本文	2.27	1.92	95.75

下載: 導(dǎo)出CSV

表 8 Replica數(shù)據(jù)集性能對(duì)比

方法	Avg. fps↑	GPU Mem. (G)↓	param. (M)↓
iMAP	0.13	6.44	0.32
NICE-SLAM	0.61	4.70	17.4
Vox-Fusion	0.74	21.22	0.87
vMAP	4.03	\	0.66
DNS SLAM	0.13	\	\
本文	4.93	2.93	0.34

下載: 導(dǎo)出CSV

參考文獻(xiàn)(27)

[1]	HORNUNG A, WURM K M, BENNEWITZ M, et al. OctoMap: An efficient probabilistic 3D mapping framework based on octrees[J]. Autonomous Robots, 2013, 34(3): 189–206. doi: 10.1007/s10514-012-9321-0.
[2]	OLEYNIKOVA H, TAYLOR Z, FEHR M, et al. Voxblox: Incremental 3D euclidean signed distance fields for on-board MAV planning[C]. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, Canada, 2017: 1366–1373. doi: 10.1109/IROS.2017.8202315.
[3]	NEWCOMBE R A, IZADI S, HILLIGES O, et al. KinectFusion: Real-time dense surface mapping and tracking[C]. 2011 10th IEEE International Symposium on Mixed and Augmented Reality, Basel, Switzerland, 2011: 127–136. doi: 10.1109/ISMAR.2011.6092378.
[4]	FEHR M, FURRER F, DRYANOVSKI I, et al. TSDF-based change detection for consistent long-term dense reconstruction and dynamic object discovery[C]. 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, Singapore, 2017: 5237–5244. doi: 10.1109/ICRA.2017.7989614.
[5]	DAI A, NIE?NER M, ZOLLH?FER M, et al. BundleFusion: Real-time globally consistent 3D reconstruction using on-the-fly surface reintegration[J]. ACM Transactions on Graphics (ToG), 2017, 36(4): 76a. doi: 10.1145/3072959.3054739.
[6]	NIE?NER M, ZOLLH?FER M, IZADI S, et al. Real-time 3D reconstruction at scale using voxel hashing[J]. ACM Transactions on Graphics (ToG), 2013, 32(6): 169. doi: 10.1145/2508363.2508374.
[7]	K?HLER O, PRISACARIU V A, REN C Y, et al. Very high frame rate volumetric integration of depth images on mobile devices[J]. IEEE Transactions on Visualization and Computer Graphics, 2015, 21(11): 1241–1250. doi: 10.1109/TVCG.2015.2459891.
[8]	WANG Kaixuan, GAO Fei, and SHEN Shaojie. Real-time scalable dense surfel mapping[C]. 2019 International Conference on Robotics and Automation (ICRA), Montreal, Canada, 2019: 6919–6925. doi: 10.1109/ICRA.2019.8794101.
[9]	WHELAN T, SALAS-MORENO R F, GLOCKER B, et al. ElasticFusion: Real-time dense SLAM and light source estimation[J]. The International Journal of Robotics Research, 2016, 35(14): 1697–1716. doi: 10.1177/0278364916669237.
[10]	RUETZ F, HERNáNDEZ E, PFEIFFER M, et al. OVPC mesh: 3D free-space representation for local ground vehicle navigation[C]. 2019 International Conference on Robotics and Automation (ICRA), Montreal, Canada, 2019: 8648–8654. doi: 10.1109/ICRA.2019.8793503.
[11]	ZHONG Xingguang, PAN Yue, BEHLEY J, et al. SHINE-mapping: Large-scale 3D mapping using sparse hierarchical implicit neural representations[C]. 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 2023: 8371–8377. doi: 10.1109/ICRA48891.2023.10160907.
[12]	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: Representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2021, 65(1): 99–106. doi: 10.1145/3503250.
[13]	SUCAR E, LIU Shikun, ORTIZ J, et al. iMAP: Implicit mapping and positioning in real-time[C]. The 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 6209–6218. doi: 10.1109/ICCV48922.2021.00617.
[14]	ZHU Zihan, PENG Songyou, LARSSON V, et al. NICE-SLAM: Neural implicit scalable encoding for slam[C]. The 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 12776–12786. doi: 10.1109/CVPR52688.2022.01245.
[15]	YANG Xingrui, LI Hai, ZHAI Hongjia, et al. Vox-Fusion: Dense tracking and mapping with voxel-based neural implicit representation[C]. 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Singapore, 2022: 499–507. doi: 10.1109/ISMAR55827.2022.00066.
[16]	KONG Xin, LIU Shikun, TAHER M, et al. vMAP: Vectorised object mapping for neural field SLAM[C]. The 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 952–961. doi: 10.1109/CVPR52729.2023.00098.
[17]	LI Kunyi, NIEMEYER M, NAVAB N, et al. DNS SLAM: Dense neural semantic-informed SLAM[J]. arXiv preprint arXiv: 2312.00204, 2023. doi: 10.48550/arXiv.2312.00204.
[18]	WU Xingming, LIU Zimeng, TIAN Yuxin, et al. KN-SLAM: Keypoints and neural implicit encoding SLAM[J]. IEEE Transactions on Instrumentation and Measurement, 2024, 73: 2512712. doi: 10.1109/TIM.2024.3378264.
[19]	WANG Haocheng, CAO Yanlong, WEI Xiaoyao, et al. Structerf-SLAM: Neural implicit representation SLAM for structural environments[J]. Computers & Graphics, 2024, 119: 103893. doi: 10.1016/j.cag.2024.103893.
[20]	MüLLER T, EVANS A, SCHIED C, et al. Instant neural graphics primitives with a multiresolution hash encoding[J]. ACM Transactions on Graphics (ToG), 2022, 41(4): 102. doi: 10.1145/3528223.3530127.
[21]	TANG Jiaxiang, ZHOU Hang, CHEN Xiaokang, et al. Delicate textured mesh recovery from NeRF via adaptive surface refinement[C]. The 2023 IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 17693–17703. doi: 10.1109/ICCV51070.2023.01626.
[22]	ZHANG Xiuming, SRINIVASAN P P, DENG Boyang, et al. NeRFactor: Neural factorization of shape and reflectance under an unknown illumination[J]. ACM Transactions on Graphics (ToG), 2021, 40(6): 237. doi: 10.1145/3478513.3480496.
[23]	WANG Peng, LIU Lingjie, LIU Yuan, et al. NeuS: Learning neural implicit surfaces by volume rendering for multi-view reconstruction[C]. The 35th International Conference on Neural Information Processing Systems, 2021: 2081.
[24]	YARIV L, GU Jiatao, KASTEN Y, et al. Volume rendering of neural implicit surfaces[C]. Proceedings of the 35th International Conference on Neural Information Processing Systems, 2021: 367.
[25]	AZINOVI? D, MARTIN-BRUALLA R, GOLDMAN D B, et al. Neural RGB-D surface reconstruction[C]. The 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 6280–6291. doi: 10.1109/CVPR52688.2022.00619.
[26]	STRAUB J, WHELAN T, MA Lingni, et al. The replica dataset: A digital replica of indoor spaces[J]. arXiv preprint arXiv: 1906.05797, 2019. doi: 10.48550/arXiv.1906.05797.
[27]	STURM J, ENGELHARD N, ENDRES F, et al. A benchmark for the evaluation of RGB-D SLAM systems[C]. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 2012: 573–580. doi: 10.1109/IROS.2012.6385773.