6G無線多模態(tài)通信技術(shù)
doi: 10.11999/JEIT231201
-
北京科技大學(xué) 北京 10083
基金項(xiàng)目: 國家自然科學(xué)基金(62201034, U22B2003, 62341103),北京市自然科學(xué)基金(L212004, L212004-03)
Wireless Multimodal Communications for 6G
-
Beijing University of Science and Technology, Beijing 100083, China
Funds: The National Natural Science Foundation of China (62201034, U22B2003, 62341103), Beijing Municipal Natural Science Foundation (L212004-03, L212004)
-
摘要: 該文綜述了多模態(tài)通信作為一種能夠同時(shí)交互多種模態(tài)形式的信息轉(zhuǎn)移方式在不同應(yīng)用場景下的重要性及其未來在6G無線通信技術(shù)中的發(fā)展前景。首先,將多模態(tài)通信分為3類,并探討了其在這些領(lǐng)域中的關(guān)鍵作用。隨后,針對6G無線通信系統(tǒng)可能面臨的通信、感知、計(jì)算和存儲資源限制以及跨域資源管理問題進(jìn)行了深入剖析,指出未來的6G無線多模態(tài)通信將實(shí)現(xiàn)通感算存的深度融合和通信能力的提升。在多模態(tài)通信實(shí)現(xiàn)過程中,必須考慮多個(gè)環(huán)節(jié),包括多發(fā)送端處理、傳輸技術(shù)和接收端處理等,以解決多模態(tài)語料庫構(gòu)建、多模態(tài)信息壓縮、傳輸、干擾處理、降噪、對齊、融合和擴(kuò)充等方面的挑戰(zhàn),以及資源管理問題。最后,強(qiáng)調(diào)了6G網(wǎng)絡(luò)的跨域多模態(tài)信息轉(zhuǎn)移、互補(bǔ)和協(xié)同的重要性,這將更好地整合和應(yīng)用海量異構(gòu)信息,以滿足未來高速、低延遲、智能互聯(lián)的通信需求。
-
關(guān)鍵詞:
- 多模態(tài)通信 /
- 多媒體信息 /
- 無線通信技術(shù)
Abstract: An overview of multimodal communication as an important information transfer mode that can simultaneously interact with multiple modal forms in different application scenarios is proposed in this paper. The future development prospects of multimodal communication in 6G wireless communication technology is also discussed. Firstly, multimodal communication is classified into three categories, and its key roles in these fields are explored. Furthermore, a deep analysis is conducted on the communication, sensation, computation, and storage resource limitations, as well as cross-domain resource management issues that 6G wireless communication systems may face. It points out that future 6G wireless multimodal communication will achieve deep integration of communication perception, computation, and storage, as well as enhance communication capabilities. In the process of implementing multimodal communication, various aspects must be considered, including multi-transmitter processing, transmission technology, and receiver processing, in order to address challenges in multimodal corpus construction, multimodal information compression, transmission, interference handling, noise reduction, alignment, fusion, and expansion, as well as resource management issues. Finally, the importance of cross-domain multimodal information transfer, complementarity, and collaboration in the 6G network is emphasized. This will better integrate and apply a massive amount of heterogeneous information to meet the future communication demands of high-speed, low-latency, and intelligent interconnection. -
[1] CHEN Guanghui and ZENG Xiaoping. Multi-modal emotion recognition by fusing correlation features of speech-visual[J]. IEEE Signal Processing Letters, 2021, 28: 533–537. doi: 10.1109/LSP.2021.3055755. [2] LIU Shuang, DUAN Linlin, ZHANG Zhong, et al. Multimodal ground-based remote sensing cloud classification via learning heterogeneous deep features[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(11): 7790–7800. doi: 10.1109/TGRS.2020.2984265. [3] CHEN Haifeng, JIANG Dongmei, and SAHLI H. Transformer encoder with multi-modal multi-head attention for continuous affect recognition[J]. IEEE Transactions on Multimedia, 2021, 23: 4171–4183. doi: 10.1109/TMM.2020.3037496. [4] DENG Li, DU Xi and SHEN Ji zhong. Web page classification based on heterogeneous features and a combination of multiple classifiers[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(217): 995–1004. doi: 10.1631/FITEE.1900240. [5] FINOMORE V JR, POPIK D JR, CASTLE C JR, et al. Effects of a network-centric multi-modal communication tool on a communication monitoring task[J]. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 2010, 54(25): 2125–2129. doi: 10.1177/154193121005402501. [6] PORWOL L and OJO A. VR-participation: The feasibility of the virtual reality-driven multi-modal communication technology facilitating e-Participation[C]. The 11th International Conference on Theory and Practice of Electronic Governance, Galway, Ireland, 2018: 269–278. doi: 10.1145/3209415.3209515. [7] 徐建博, 魏昕, 周亮. 面向跨模態(tài)通信的信息恢復(fù)技術(shù)[J]. 電子學(xué)報(bào), 2022, 50(7): 1631–1642. doi: 10.12263/DZXB.20210945.XU Jianbo, WEI Xin, and ZHOU Liang. Information recovery technology for cross-modal communications[J]. Acta Electronica Sinica, 2022, 50(7): 1631–1642. doi: 10.12263/DZXB.20210945. [8] XIE Jiayu, JIN Xin, and CAO Hongkun. SMRD: A local feature descriptor for multi-modal image registration[C]. 2021 International Conference on Visual Communications and Image Processing (VCIP), Munich, Germany, 2021: 1–5. doi: 10.1109/VCIP53242.2021.9675401. [9] GERMINIAN J F and TRICYA ESTERINA WIDAGDO ST. Utilizing postGIS extension to process spatial data stored in Neo4j database[C]. IEEE International Conference on Data and Software Engineering (ICoDSE), Toba, Indonesia, 2023: 250–255. doi: 10.1109/ICoDSE59534.2023.10291400. [10] 王佩瑾, 閆志遠(yuǎn), 容雪娥, 等. 數(shù)據(jù)受限條件下的多模態(tài)處理技術(shù)綜述[J]. 中國圖象圖形學(xué)報(bào), 2022, 27(10): 2803–2834.WANG Peijin, YAN Zhiyuan, RONG Xuee, et al. Review of multimodal data processing techniques with limited data[J]. Journal of Image and Graphics, 2022, 27(10): 2803–2834. [11] BHANDARI D and PAUL S. Perception net: A multimodal deep neural network for machine perception[C]. 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India, 2018: 1419–1426. doi: 10.1109/SSCI.2018.8628711. [12] LACKEY S, BARBER D, REINERMAN L, et al. Defining next-generation multi-modal communication in human robot interaction[J]. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 2011, 55(1): 461–464. doi: 10.1177/1071181311551095. [13] AI Tianfang. Application of human-computer interaction technology in mobile interface design for digital media[C]. 2022 International Conference on Electronics and Devices, Computational Science (ICEDCS), Marseille, France, 2022: 152–155. doi: 10.1109/ICEDCS57360.2022.00040. [14] SHU Beibei, SZIEBIG G, and PIETERS R. Architecture for safe human-robot collaboration: Multi-modal communication in virtual reality for efficient task execution[C]. 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), Vancouver, Canada, 2019: 2297–2302. doi: 10.1109/ISIE.2019.8781372. [15] TODA Y and KUBOTA N. Computational intelligence for human-friendly robot partners based on multi-modal communication[C]. The 1st IEEE Global Conference on Consumer Electronics 2012, Tokyo, Japan, 2012: 309–313. doi: 10.1109/GCCE.2012.6379611. [16] CHU Ziyan, WANG Jiacheng, JIANG Xinyue, et al. mind-vr: a utility approach of human-computer interaction in virtual space based on autonomous consciousness[C]. 2022 International Conference on Virtual Reality, Human-Computer Interaction and Artificial Intelligence (VRHCIAI), Changsha, China, 2022: 134–138. doi: 10.1109/VRHCIAI57205.2022.00030. [17] WANG Dayan, QU Jue, WANG Wei, et al. The verification system for interface intelligent perception of human-computer interaction[C]. 2020 International Conference on Intelligent Computing and Human-Computer Interaction (ICHCI), Sanya, China, 2020: 43–48. doi: 10.1109/ICHCI51889.2020.00017. [18] LIU Weiwei, PANG Yalong and LUAN Shenshen, et al. Building a Spaceborne integrated high-performance processing and computing platform based on SpaceVPX[C]. 2022 International Conference on Computing, Communication, Perception and Quantum Technology (CCPQT), Xiamen, China, 2022: 287-293. doi: 10.1109/CCPQT56151.2022.00056. [19] REN Chao and LIU Luchuan. Toward full passive internet of things: Symbiotic localization and ambient backscatter communication[J]. IEEE Internet of Things Journal, 2023, 10(22): 19495–19506. doi: 10.1109/JIOT.2023.3262779. [20] REN Chao, HE Zongrui, LI Yingqi, et al. Priority aggregation network with integrated computation and sensation for ultra dense artificial intelligence of things[J]. IEEE Wireless Communications Letters, 2024, 13(2): 270–273. doi: 10.1109/LWC.2023.3323421. [21] REN Chao, LIU Luchuan, and ZHANG Haijun. Multimodal interference compatible passive UAV network based on location-aware flexibility[J]. IEEE Wireless Communications Letters, 2023, 12(4): 640–643. doi: 10.1109/LWC.2023.3237637. [22] REN Chao, GONG Chao, and LIU Luchuan. Task-oriented multimodal communication based on cloud-edge-UAV collaboration[J]. IEEE Internet of Things Journal, 2024, 11(1): 125–136. doi: 10.1109/JIOT.2023.3295650. [23] REN Chao, GONG Chao, CAO Difei, et al. Enhancing reliability in multimodal UAV communication based on opportunistic task space[J]. IEEE Wireless Communications Letters, 2024, 13(2): 284–287. doi: 10.1109/LWC.2023.3326130. [24] KASHEVNIK A, LASHKOV I, AXYONOV A, et al. Multimodal corpus design for audio-visual speech recognition in vehicle cabin[J]. IEEE Access, 2021, 9: 34986–35003. doi: 10.1109/ACCESS.2021.3062752. [25] KARPOV A, RONZHIN A, and KIPYATKOVA I. Designing a multimodal corpus of audio-visual speech using a high-speed camera[C]. 2012 IEEE 11th International Conference on Signal Processing, Beijing, China, 2012: 519–522. doi: 10.1109/ICoSP.2012.6491539. [26] MONTORSI G and BENEDETTO S. Design of spatially coupled turbo product codes for optical communications[C]. 2021 11th International Symposium on Topics in Coding (ISTC), Montreal, Canada, 2021: 1–5. doi: 10.1109/ISTC49272.2021.9594120. [27] ?ALI?KAN E K, GüLBAZ A, TENGIZLER B, et al. NDC-O-OFDM with channel coding for IM/DD communication systems[C]. 2022 30th Signal Processing and Communications Applications Conference (SIU), Safranbolu, Turkey, 2022: 1–4. doi: 10.1109/SIU55565.2022.9864710. [28] KHAN M H and ZHANG Gongxuan. Evaluation of channel coding techniques for massive machine-type communication in 5G cellular network[C]. 2020 IEEE 3rd International Conference on Information Communication and Signal Processing (ICICSP), Shanghai, China, 2020: 375–379. doi: 10.1109/ICICSP50920.2020.9232037. [29] WANG Yue, SHI Qingwen, XU Li, et al. A study on the construction of traditional Chinese medicine multimodal corpus under the view of self-media[C]. 2022 International Conference on Computers, Information Processing and Advanced Education (CIPAE), Ottawa, Canada, 2022: 373–375. doi: 10.1109/CIPAE55637.2022.00084. [30] LI Guangyan, FANG Tian, LIANG Li, et al. Current situation of multimodal corpora and the method of Tibetan corpus construction[C]. 2018 IEEE International Conference of Safety Produce Informatization (IICSPI), Chongqing, China, 2018: 449–453. doi: 10.1109/IICSPI.2018.8690378. [31] YEH C H, LIN Y L, LI C C, et al. Compressed-and-forward: Compressive sensing for cooperative communication[C]. 2012 International Symposium on Intelligent Signal Processing and Communications Systems, Tamsui, China, 2012: 319–322. doi: 10.1109/ISPACS.2012.6473503. [32] HANECHE H, BOUDRAA B, and OUAHABI A. Compressed sensing investigation in an end-to-end rayleigh communication system: Speech compression[C]. 2018 International Conference on Smart Communications in Network Technologies (SaCoNeT), El Oued, Algeria, 2018: 73–77. doi: 10.1109/SaCoNeT.2018.8585702. [33] CHEN Mingkai, LIU Minghao, WANG Wenjun, et al. Cross-modal semantic communications in 6G[C]. 2023 IEEE/CIC International Conference on Communications in China (ICCC), Dalian, China, 2023: 1–6. doi: 10.1109/ICCC57788.2023.10233481. [34] YANG Lianxin, WU Dan, and ZHOU Liang. Heterogeneous stream scheduling for cross-modal transmission[J]. IEEE Transactions on Communications, 2021, 69(9): 6037–6049. doi: 10.1109/TCOMM.2021.3086522. [35] LU Kangkang, LIANG Meiyu, XUE Zhe, et al. Adversarial guided gradient estimation hashing for cross-modal retrieval[C]. 2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS), Chengdu, China, 2022: 109–113. doi: 10.1109/CCIS57298.2022.10016424. [36] ZHAO Wenying, XU Qian, WANG Haoyu, et al. Cross-modal hashing for material surface properties fusion[C]. 2023 International Wireless Communications and Mobile Computing (IWCMC), Marrakesh, Morocco, 2023: 194–198. doi: 10.1109/IWCMC58020.2023.10183090. [37] HUANG Yingying, WANG Quan, ZHANG Yipeng, et al. A unified perspective of multi-level cross-modal similarity for cross-modal retrieval[C]. 2022 5th International Conference on Information Communication and Signal Processing (ICICSP), Shenzhen, China, 2022: 466–471. doi: 10.1109/ICICSP55539.2022.10050678. [38] WEI Xin, SHI Yingying, and ZHOU Liang. Haptic signal reconstruction for cross-modal communications[J]. IEEE Transactions on Multimedia, 2022, 24: 4514–4525. doi: 10.1109/TMM.2021.3119860. [39] 王惠琴, 高大慶, 何永強(qiáng), 等. GPR圖像的數(shù)據(jù)集構(gòu)建及其DRDU-Net去噪算法[J/OL]. 湖南大學(xué)學(xué)報(bào): 自然科學(xué)版,http://kns.cnki.net/kcms/detail/43.1061.N.20231017.1626.004.html, 2023.WANG Huiqin, GAO Daqing, HE Yongqiang, et al. DRDU-net based denoising algorithm for GPR image dataset[J/OL]. Journal of Hunan University: Natural Sciences,http://kns.cnki.net/kcms/detail/43.1061.N.20231017.1626.004.html, 2023. [40] AKIYAMA H, TANAKA M, and OKUTOMI M. Pseudo four-channel image denoising for noisy CFA raw data[C]. 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, Canada, 2015: 4778–4782. doi: 10.1109/ICIP.2015.7351714. [41] LI Jun, YU Menghong, and YUAN Wei. Application of wavelet new threshold denoising in ship data preprocessing and modeling[C]. 2020 Chinese Automation Congress (CAC), Shanghai, China, 2020: 1263–1267. doi: 10.1109/CAC51589.2020.9326954. [42] WANG Feng, YANG Bo, WANG Yuqing, et al. Learning from noisy data: An unsupervised random denoising method for seismic data using model-based deep learning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5913314. doi: 10.1109/TGRS.2022.3165037. [43] TIAN Chongpeng, HONG Mei, LI Dongying, et al. Deep recurrent neural network for ground-penetrating radar signal denoising[C]. 2022 4th International Conference on Intelligent Information Processing (IIP), Guangzhou, China, 2022: 85?88. doi: 10.1109/IIP57348.2022.00024. [44] GONG Jianhu. Denoising control method of abnormal signals in communication networks based on big data analysis[C]. 2021 13th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), Beihai, China, 2021: 329–334. doi: 10.1109/ICMTMA52658.2021.00077. [45] LIU Yuchi, YAO Yue, WANG Zhengjie, et al. Generalized alignment for multimodal physiological signal learning[C]. 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 2019: 1–10. doi: 10.1109/IJCNN.2019.8852216. [46] CHEN Liyi, LI Zhi and WANG Yijun, et al. MMEA: Entity alignment for multi-modal knowledge graph[C]. In Knowledge Science, Engineering and Management: 13th International Conference, KSEM 2020, Hangzhou, China, 2020: 134–147. doi: https://doi.org/10.1007/978-3-030-55130-8_12. [47] 王歡, 宋麗娟, 杜方. 基于多模態(tài)知識圖譜的中文跨模態(tài)實(shí)體對齊方法[J]. 計(jì)算機(jī)工程, 2023, 49(12): 88–95. doi: 10.19678/j.issn.1000-3428.0066938.WANG Huan, SONG Lijuan, and DU Fang. Chinese cross-modal entity alignment method based on multi-modal knowledge graph[J]. Computer Engineering, 2023, 49(12): 88–95. doi: 10.19678/j.issn.1000-3428.0066938. [48] 羅俊豪, 朱焱. 用于未對齊多模態(tài)語言序列情感分析的多交互感知網(wǎng)絡(luò)[J]. 計(jì)算機(jī)應(yīng)用, 2024, 44(1): 79–85. doi: 10.11772/j.issn.1001-9081.2023060815.LUO Junhao and ZHU Yan. Multi-dynamic aware network for unaligned multimodal language sequence sentiment analysis[J]. Journal of Computer Applications, 2024, 44(1): 79–85. doi: 10.11772/j.issn.1001-9081.2023060815. [49] LIU Ye, QIAO Lingfeng, LU Changchong, et al. OSAN: A one-stage alignment network to unify multimodal alignment and unsupervised domain adaptation[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023: 3551–3560. doi: 10.1109/CVPR52729.2023.00346. [50] GAO Lei and GUAN Ling. A discriminative vectorial framework for multi-modal feature representation[J]. IEEE Transactions on Multimedia, 2022, 24: 1503–1514. doi: 10.1109/TMM.2021.3066118. [51] XUE Zhixiang, YU Xuchu, ZHANG Pengqiang, et al. Self-supervised feature learning and few-shot land cover classification for cross-modal remote sensing images[C]. IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022: 3600–3603. doi: 10.1109/IGARSS46834.2022.9884265. [52] AL-HMOUZ R, DAQROUQ K, MORFEQ A, et al. Multimodal biometrics using multiple feature representations to speaker identification system[C]. 2015 International Conference on Information and Communication Technology Research (ICTRC), Abu Dhabi, United Arab Emirates, 2015: 314–317. doi: 10.1109/ICTRC.2015.7156485. [53] GAO Lei and GUAN Ling. Interpretable learning-based multi-modal hashing analysis for multi-view feature representation learning[C]. 2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR), CA, USA, 2022: 47–52. doi: 10.1109/MIPR54900.2022.00016. [54] WANG Shiping and GUO Wenzhong. Sparse multigraph embedding for multimodal feature representation[J]. IEEE Transactions on Multimedia, 2017, 19(7): 1454–1466. doi: 10.1109/TMM.2017.2663324. [55] SHEKHAR S, PATEL V M, NASRABADI N M, et al. Joint sparse representation for robust multimodal biometrics recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(1): 113–126. doi: 10.1109/TPAMI.2013.109. [56] HOU Xiaofeng, XU Cheng, LIU Jiacheng, et al. Characterizing and understanding end-to-end multi-modal neural networks on GPUs[J]. IEEE Computer Architecture Letters, 2022, 21(2): 125–128. doi: 10.1109/LCA.2022.3215718. [57] LI Shuzhen, ZHANG Tong, CHEN Bianna, et al. MIA-Net: Multi-modal interactive attention network for multi-modal affective analysis[J]. IEEE Transactions on Affective Computing, 2023, 14(4): 2796–2809. doi: 10.1109/TAFFC.2023.3259010. [58] LIU Bin. Robust dynamic multi-modal data fusion: A model uncertainty perspective[J]. IEEE Signal Processing Letters, 2021, 28: 2107–2111. doi: 10.1109/LSP.2021.3117731. [59] TIAN Pinzhuo, LI Wenbin, and GAO Yang. Consistent meta-regularization for better meta-knowledge in few-shot learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(12): 7277–7288. doi: 10.1109/TNNLS.2021.3084733. [60] WANG Ruiqi, ZHANG Xuyao, and LIU Chenglin. Meta-prototypical learning for domain-agnostic few-shot recognition[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022, 33(11): 6990–6996. doi: 10.1109/TNNLS.2021.3083650. [61] YAN Minghao. Adaptive learning knowledge networks for few-shot learning[J]. IEEE Access, 2019, 7: 119041–119051. doi: 10.1109/ACCESS.2019.2934694. [62] WANG Kai. An overview of deep learning based small sample medical imaging classification[C]. 2021 International Conference on Signal Processing and Machine Learning (CONF-SPML), Stanford, USA, 2021: 278–281. doi: 10.1109/CONF-SPML54095.2021.00060. [63] LIANG Yong, CHEN Zetao, LIN Daoqian, et al. Three-dimension attention mechanism and self-supervised pretext task for augmenting few-shot learning[J]. IEEE Access, 2023, 11: 59428–59437. doi: 10.1109/ACCESS.2023.3285721. -