多無(wú)人機(jī)分布式感知任務(wù)分配-通信基站關(guān)聯(lián)與飛行策略聯(lián)合優(yōu)化設(shè)計(jì)
doi: 10.11999/JEIT240738
-
1.
西南電子技術(shù)研究所 成都 610036
-
2.
重慶大學(xué)微電子與通信工程學(xué)學(xué)院 重慶 400044
Joint Task Allocation, Communication Base Station Association and Flight Strategy Optimization Design for Distributed Sensing Unmanned Aerial Vehicles
-
1.
Southwest China Institute of Electronic Technology, Chengdu 610036, China
-
2.
School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China
-
摘要: 針對(duì)多無(wú)人機(jī)(UAV)分布式感知開(kāi)展研究,為協(xié)調(diào)各UAV行為,該文設(shè)計(jì)了任務(wù)感知-數(shù)據(jù)回傳協(xié)議,并建立了UAV任務(wù)分配、數(shù)據(jù)回傳基站關(guān)聯(lián)與飛行策略聯(lián)合優(yōu)化混合整數(shù)非線性規(guī)劃問(wèn)題模型。鑒于該問(wèn)題數(shù)學(xué)結(jié)構(gòu)的復(fù)雜性,以及集中式優(yōu)化算法設(shè)計(jì)面臨計(jì)算復(fù)雜度高且信息交互開(kāi)銷大等不足,提出將該問(wèn)題轉(zhuǎn)化為協(xié)作式馬爾可夫博弈(MG),定義了基于成本-效用復(fù)合的收益函數(shù)。考慮到MG問(wèn)題連續(xù)-離散動(dòng)作空間復(fù)雜耦合特點(diǎn),設(shè)計(jì)了基于獨(dú)立學(xué)習(xí)者(IL)的復(fù)合動(dòng)作表演評(píng)論家(MA-IL-CA2C)的MG問(wèn)題求解算法。仿真分析結(jié)果表明,相對(duì)于基線算法,所提算法能顯著提高系統(tǒng)收益并降低網(wǎng)絡(luò)能耗。
-
關(guān)鍵詞:
- 無(wú)人機(jī) /
- 分布式感知 /
- 聯(lián)合優(yōu)化 /
- 強(qiáng)化學(xué)習(xí) /
- 馬爾可夫博弈
Abstract:Objective The demand for Unmanned Aerial Vehicles (UAVs) in distributed sensing applications has increased significantly due to their low cost, flexibility, mobility, and ease of deployment. In these applications, the coordination of multi-UAV sensing tasks, communication strategies, and flight trajectory optimization presents a significant challenge. Although there have been preliminary studies on the joint optimization of UAV communication strategies and flight trajectories, most existing work overlooks the impact of the randomly distributed and dynamically updated task airspace model on the optimal design of UAV communication and flight strategies. Furthermore, accurate UAV energy consumption modeling is often lacking when establishing system design goals. Energy consumption during flight, sensing, and data transmission is a critical issue, especially given the UAV’s limited payload capacity and energy supply. Achieving an accurate energy consumption model is essential for extending UAV operational time. To address the requirements of multiple UAVs performing distributed sensing, particularly when tasks are dynamically updated and data must be transmitted to ground base stations, this paper explores the optimal design of joint UAV sensing task allocation, base station association for data backhaul, flight strategy planning, and transmit power control. Methods To coordinate the relationships among UAVs, base stations, and sensing tasks, a protocol framework for multi-UAV distributed task sensing applications is first proposed. This framework divides the UAVs’ behavior during distributed sensing into four stages: cooperation, movement, sensing, and transmission. The framework ensures coordination in the UAVs’ movement to the task area, task sensing, and the backhaul transmission of sensed data. A sensing task model based on dynamic updates, a UAV movement model, a UAV sensing behavior model, and a data backhaul transmission model are then established. A revenue function, combining task sensing utility and task execution costs, is designed, leading to a joint optimization problem of UAV task allocation, communication base station association, and flight strategy. The objective is to maximize the long-term weighted utility-cost. Given that the optimization problem involves high-dimensional decision variables in both discrete and continuous forms, and the objective function is non-convex with respect to these variables, the problem is a typical non-convex Mixed-Integer Non-Linear Programming (MINLP) problem. It falls within the NP-Hard complexity class. Centralized optimization algorithms for this formulation require a central node with high computational capacity and the collection of substantial additional information, such as channel state and UAV location data. This results in high information-interaction overhead and poor scalability. To overcome these challenges, the problem is reformulated as a Markov Game (MG). An effective algorithm is designed by leveraging the distributed coordination concept of Multi-Agent (MA) systems and the exploration capability of deep Reinforcement Learning (RL) within the optimization solution space. Specifically, due to the complex coupling between the continuous and discrete action spaces in the MG problem, a novel solution algorithm called Multi-Agent Independent-Learning Compound-Action Actor-Critic (MA-IL-CA2C) based on Independent Learning (IL) is proposed. The core idea is as follows: first, the independent-learning algorithm is applied to extend single-agent RL to a MA environment. Then, deep learning is used to represent the high-dimensional action and state spaces. To handle the combined discrete and continuous action spaces, the UAV action space is decomposed into discrete and continuous components, with the DQN algorithm applied to the discrete space and the DDPG algorithm to the continuous space. Results and Discussions The computational complexity of action selection and training for the proposed MA-IL-CA2C algorithm is theoretically analyzed. The results show that its complexity is almost equivalent to that of the two benchmark algorithms, DQN and DDPG. Additionally, the performance of the proposed algorithm is simulated and analyzed. When compared with the DQN, DDPG, and Greedy algorithms, the MA-IL-CA2C algorithm demonstrates lower network energy consumption throughout the network operation ( Fig. 6 ), improved system revenue (Fig. 5 ,Fig. 8 , andFig. 9 ), and optimized UAV flight strategies (Fig. 7 ).Conclusions This paper addresses and solves the optimal design problems of joint UAV sensing task allocation, data backhaul base station association, flight strategy planning, and transmit power control for multi-UAV distributed task sensing. A new MA-IL-CA2C algorithm based on IL is proposed. The simulation results show that the proposed algorithm achieves better system revenue while minimizing UAV energy consumption. -
圖 3 ${\text{UA}}{{\text{V}}_n}$飛行方向角${\boldsymbol{\delta}} _n^t = \left( {\alpha _n^t,\beta _n^t} \right)$
1 MA-IL-CA2C算法
(1)初始化:設(shè)置$t = 0$,最大決策周期數(shù)$T$,選擇經(jīng)驗(yàn)回放模塊
容量$ {N_{\mathrm{c}}} $,批量大小${N_{\mathrm}}$,網(wǎng)絡(luò)學(xué)習(xí)率${\alpha _{{\boldsymbol{\theta}} _n^t}}$和$ {\alpha _{{\boldsymbol{\omega}} _n^t}} $,軟更新參數(shù)
$ \rho $;(2)對(duì)于每個(gè)智能體$n \in \mathcal{N}$: 隨機(jī)初始化網(wǎng)絡(luò)參數(shù)$ {{\boldsymbol{\theta}} }_n^t $, $ {\hat {\boldsymbol{\theta}} }_n^t $, $ {{\boldsymbol{\omega}} }_n^t $, $ {\hat {\boldsymbol{\omega}} }_n^t $,并設(shè)置初始狀態(tài)${{\boldsymbol s}^0}$; #主循環(huán) (3)如果$t \le T$: (a)對(duì)于每個(gè)智能體$n \in \mathcal{N}$: 根據(jù)式(28),在${\boldsymbol{s}}_n^t$處選擇離散動(dòng)作$ {\boldsymbol a}_n^{{\text{dis}},t} $,即選擇感知任務(wù)$m$和$ {\text{B}}{{\text{S}}_k} $; #協(xié)作階段 在控制信道上反饋決策$D_n^{\mathrm{c}} = \left\{ {n,{\boldsymbol a}_n^{{\mathrm{dis}},t}} \right\}$,并接收其余
UAV的決策信息;根據(jù)離散動(dòng)作$ {\boldsymbol a}_n^{{\mathrm{dis}},t} $決定連續(xù)動(dòng)作${\boldsymbol a}_n^{{\text{con}},t}{ = v}_n^t\left( {{{\boldsymbol s}^t},{\boldsymbol a}_n^{{\mathrm{dis}},t}} \right)$,
即決定飛行方向角$ \delta _n^t $、移動(dòng)速度$ v_n^t $和發(fā)射功率$ P_n^t $;#移動(dòng)階段 基于飛行方向角$ {\boldsymbol{\delta}} _n^t $和移動(dòng)速度$ v_n^t $,飛行至感知位置$ {\boldsymbol{x}}_n^{{\mathrm{s}},t} $; #感知階段 執(zhí)行感知任務(wù)并收集任務(wù)數(shù)據(jù)$D_n^{s,t}$; #傳輸階段 以發(fā)射功率$ P_n^t $將任務(wù)數(shù)據(jù)回傳給$ {\text{B}}{{\text{S}}_k} $; 根據(jù)式(23)獲得收益$ r_n^{t + 1} $,觀察得到${{\boldsymbol s}^{t + 1}}$; 將經(jīng)驗(yàn)元組$ \left( {{{\boldsymbol s}^t},{\boldsymbol a}_n^t,r_n^{t + 1},{{\boldsymbol s}^{t + 1}}} \right) $存入經(jīng)驗(yàn)回放模塊${\mathcal{D}_n}$中; 如果$ t \gt {N_c} $: 從經(jīng)驗(yàn)回放模塊${\mathcal{D}_n}$中移除舊的經(jīng)驗(yàn)元組; #訓(xùn)練網(wǎng)絡(luò) 在經(jīng)驗(yàn)回放模塊${\mathcal{D}_n}$中隨機(jī)抽取一個(gè)批量${N_{\mathrm}}$的經(jīng)驗(yàn)元組
$ \left( {{{\boldsymbol s}^t},{\boldsymbol a}_n^t,r_n^{t + 1},{{\boldsymbol s}^{t + 1}}} \right) $;根據(jù)式(29)–式(34),更新當(dāng)前網(wǎng)絡(luò)參數(shù)$ {{\boldsymbol{\theta}} }_n^t $與$ {{\boldsymbol{\omega}} }_n^t $; 根據(jù)式(36)和式(37),更新目標(biāo)網(wǎng)絡(luò)參數(shù)$ {\hat {\boldsymbol{\theta}} }_n^t $與$ {\hat {\boldsymbol{\omega}} }_n^t $; (b)令$t = t + 1$, ${{\boldsymbol s}^t} \leftarrow {{\boldsymbol s}^{t + 1}}$; (4)重復(fù)步驟(3),直至算法結(jié)束。 下載: 導(dǎo)出CSV
表 1 仿真參數(shù)
參數(shù) 數(shù)值 UAV數(shù)目$N$,感知任務(wù)數(shù)目$M$,BS數(shù)目$K$ 3, 10, 2 網(wǎng)絡(luò)范圍半徑${r_{\text{c}}}$ 500 m 信道帶寬$ W $ 1 MHz BS高度$ {H_0} $ 25 m UAV最大與最低高度${h_{\min }},{h_{\max }}$ 50 m, 100 m UAV最大飛行速度$ {v_{\max }} $ 15 m/s UAV最大發(fā)射功率$ {P_{\max }} $ 30 dBm 感知參數(shù)$\lambda $ 0.01 環(huán)境參數(shù)$a,b$ 9.61, 0.16 LoS和NLoS額外路徑損耗${\eta ^{{\text{LoS}}}},{\eta ^{{\text{NLoS}}}}$ 1dB, 20 dB 載波頻率${f_{\text{c}}}$ 2 GHz 噪聲功率${N_0}$ –96 dBm 下載: 導(dǎo)出CSV
表 2 模型超參數(shù)
超參數(shù) 數(shù)值 Actor網(wǎng)絡(luò)與Critic網(wǎng)絡(luò)初始學(xué)習(xí)率$ {\alpha _{{\boldsymbol{\theta}} _n^t}} $,$ {\alpha _{{\boldsymbol{\omega}} _n^t}} $ 0.001, 0.002 軟更新權(quán)重$\rho $ 0.01 貪婪率$\varepsilon $ 0.1 激活函數(shù) ReLu 批量大小${N_{\text}}$ 64 經(jīng)驗(yàn)回放模塊大小${N_{\text{c}}}$ 20 000 DQN網(wǎng)絡(luò)初始學(xué)習(xí)率 0.01 DQN目標(biāo)網(wǎng)絡(luò)更新周期 100 Actor網(wǎng)絡(luò)和Critic網(wǎng)絡(luò)層數(shù) 4,4 隱層神經(jīng)元數(shù) 128 下載: 導(dǎo)出CSV
-
[1] SHRESTHA R, ROMERO D, and CHEPURI S P. Spectrum surveying: Active radio map estimation with autonomous UAVs[J]. IEEE Transactions on Wireless Communications, 2023, 22(1): 627–641. doi: 10.1109/TWC.2022.3197087. [2] NOMIKOS N, GKONIS P K, BITHAS P S, et al. A survey on UAV-aided maritime communications: Deployment considerations, applications, and future challenges[J]. IEEE Open Journal of the Communications Society, 2023, 4: 56–78. doi: 10.1109/OJCOMS.2022.3225590. [3] HARIKUMAR K, SENTHILNATH J, and SUNDARAM S. Multi-UAV oxyrrhis marina-inspired search and dynamic formation control for forest firefighting[J]. IEEE Transactions on Automation Science and Engineering, 2019, 16(2): 863–873. doi: 10.1109/TASE.2018.2867614. [4] QU Yuben, SUN Hao, DONG Chao, et al. Elastic collaborative edge intelligence for UAV Swarm: Architecture, challenges, and opportunities[J]. IEEE Communications Magazine, 2024, 62(1): 62–68. doi: 10.1109/MCOM.002.2300129. [5] ZHANG Tao, ZHU Kun, ZHENG Shaoqiu, et al. Trajectory design and power control for joint radar and communication enabled multi-UAV cooperative detection systems[J]. IEEE Transactions on Communications, 2023, 71(1): 158–172. doi: 10.1109/TCOMM.2022.3224751. [6] PAN Hongyang, LIU Yanheng, SUN Geng, et al. Joint power and 3D trajectory optimization for UAV-Enabled wireless powered communication networks with obstacles[J]. IEEE Transactions on Communications, 2023, 71(4): 2364–2380. doi: 10.1109/TCOMM.2023.3240697. [7] NGUYEN P X, NGUYEN V D, NGUYEN H V, et al. UAV-assisted secure communications in terrestrial cognitive radio networks: Joint power control and 3D trajectory optimization[J]. IEEE Transactions on Vehicular Technology, 2021, 70(4): 3298–3313. doi: 10.1109/TVT.2021.3062283. [8] ZENG Shuhao, ZHANG Hongliang, DI Boya, et al. Trajectory optimization and resource allocation for OFDMA UAV relay networks[J]. IEEE Transactions on Wireless Communications, 2021, 20(10): 6634–6647. doi: 10.1109/TWC.2021.3075594. [9] LI Peiming and XU Jie. Fundamental rate limits of UAV-enabled multiple access channel with trajectory optimization[J]. IEEE Transactions on Wireless Communications, 2020, 19(1): 458–474. doi: 10.1109/TWC.2019.2946153. [10] GUAN Yue, ZOU Sai, PENG Haixia, et al. Cooperative UAV trajectory design for disaster area emergency communications: A multiagent PPO method[J]. IEEE Internet of Things Journal, 2024, 11(5): 8848–8859. doi: 10.1109/JIOT.2023.3320796. [11] SILVIRIANTI, NAROTTAMA B, and SHIN S Y. Layerwise quantum deep reinforcement learning for joint optimization of UAV trajectory and resource allocation[J]. IEEE Internet of Things Journal, 2024, 11(1): 430–443. doi: 10.1109/JIOT.2023.3285968. [12] HU Jingzhi, ZHANG Hongliang, SONG Lingyang, et al. Cooperative internet of UAVs: Distributed trajectory design by multi-agent deep reinforcement learning[J]. IEEE Transactions on Communications, 2020, 68(11): 6807–6821. doi: 10.1109/TCOMM.2020.3013599. [13] WU Fanyi, ZHANG Hongliang, WU Jianjun, et al. Cellular UAV-to-device communications: Trajectory design and mode selection by Multi-Agent deep reinforcement learning[J]. IEEE Transactions on Communications, 2020, 68(7): 4175–4189. doi: 10.1109/TCOMM.2020.2986289. [14] DAI Xunhua, LU Zhiyu, CHEN Xuehan, et al. Multiagent RL-based joint trajectory scheduling and resource allocation in NOMA-assisted UAV swarm network[J]. IEEE Internet of Things Journal, 2024, 11(8): 14153–14167. doi: 10.1109/JIOT.2023.3340669. [15] ZHANG Zhongyu, LIU Yunpeng, LIU Tianci, et al. DAGN: A real-time UAV remote sensing image vehicle detection framework[J]. IEEE Geoscience and Remote Sensing Letters, 2020, 17(11): 1884–1888. doi: 10.1109/LGRS.2019.2956513. [16] YANG Jun, YOU Xinghui, WU Gaoxiang, et al. Application of reinforcement learning in UAV cluster task scheduling[J]. Future Generation Computer Systems, 2019, 95: 140–148. doi: 10.1016/j.future.2018.11.014. [17] NOBAR S K, AHMED M H, MORGAN Y, et al. Resource allocation in cognitive radio-enabled UAV communication[J]. IEEE Transactions on Cognitive Communications and Networking, 2022, 8(1): 296–310. doi: 10.1109/TCCN.2021.3103531. [18] CHEN Jiming, LI Junkun, and LAI T H. Energy-efficient intrusion detection with a barrier of probabilistic sensors: Global and local[J]. IEEE Transactions on Wireless Communications, 2013, 12(9): 4742–4755. doi: 10.1109/TW.2013.072313.122083. [19] SHAKHOV V V and KOO I. Experiment design for parameter estimation in probabilistic sensing models[J]. IEEE Sensors Journal, 2017, 17(24): 8431–8437. doi: 10.1109/JSEN.2017.2766089. [20] YANG Qianqian, HE Shibo, LI Junkun, et al. Energy-efficient probabilistic area coverage in wireless sensor networks[J]. IEEE Transactions on Vehicular Technology, 2015, 64(1): 367–377. doi: 10.1109/TVT.2014.2300181. [21] AL-HOURANI A, KANDEEPAN S, and LARDNER S. Optimal LAP altitude for maximum coverage[J]. IEEE Wireless Communications Letters, 2014, 3(6): 569–572. doi: 10.1109/LWC.2014.2342736. [22] ZHANG Xinyu and SHIN K G. E-MiLi: Energy-minimizing idle listening in wireless networks[J]. IEEE Transactions on Mobile Computing, 2012, 11(9): 1441–1454. doi: 10.1109/TMC.2012.112. [23] ZHU Changxi, DASTANI M, and WANG Shihan. A survey of multi-agent deep reinforcement learning with communication[J]. Autonomous Agents and Multi-Agent Systems, 2024, 38(1): 4. doi: 10.1007/s10458-023-09633-6. [24] 喻莞芯. 基于多智能體強(qiáng)化學(xué)習(xí)的無(wú)人機(jī)集群網(wǎng)絡(luò)優(yōu)化設(shè)計(jì)[D]. [碩士論文], 重慶大學(xué), 2022. doi: 10.27670/d.cnki.gcqdu.2022.001082.YU Wanxin. Optimization design of UAV cluster network based on multi-agent reinforcement learning[D]. [Master dissertation], Chongqing University, 2022. doi: 10.27670/d.cnki.gcqdu.2022.001082. [25] SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. Cambridge, USA: MIT Press, 1998. [26] WOOD L F. Training neural networks[P]. US, 4914603A, 1990. [27] SIPPER M. A serial complexity measure of neural networks[C]. IEEE International Conference on Neural Networks, San Francisco, USA, 1993: 962–966. doi: 10.1109/ICNN.1993.298687. [28] GUO Shaoai and ZHAO Xiaohui. Multi-agent deep reinforcement learning based transmission latency minimization for delay-sensitive cognitive satellite-UAV networks[J]. IEEE Transactions on Communications, 2023, 71(1): 131–144. doi: 10.1109/TCOMM.2022.3222460. -