信息年齡約束下的無(wú)人機(jī)數(shù)據(jù)采集能耗優(yōu)化路徑規(guī)劃算法
doi: 10.11999/JEIT240075
-
1.
中國(guó)民航大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院 天津 300300
-
2.
華北電力大學(xué)(保定)計(jì)算機(jī)系 保定 071066
-
3.
河北省能源電力知識(shí)計(jì)算重點(diǎn)實(shí)驗(yàn)室 保定 071066
Energy-Efficient UAV Trajectory Planning Algorithm for AoI-Constrained Data Collection
-
1.
Institute of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
-
2.
Department of Computer, North China Electric Power University (Baoding), Baoding 071066, China
-
3.
Hebei Key Laboratory of Knowledge Computing for Energy & Power, Baoding 071066, China
-
摘要: 信息年齡(AoI)是評(píng)價(jià)無(wú)線傳感器網(wǎng)絡(luò)(WSN)數(shù)據(jù)時(shí)效性的重要指標(biāo),無(wú)人機(jī)輔助WSN數(shù)據(jù)采集過(guò)程中采用優(yōu)化飛行軌跡、提升速度等運(yùn)動(dòng)策略保障卸載至基站的數(shù)據(jù)滿足各節(jié)點(diǎn)AoI限制。然而,不合理的運(yùn)動(dòng)策略易導(dǎo)致無(wú)人機(jī)因飛行距離過(guò)長(zhǎng)、速度過(guò)快產(chǎn)生非必要能耗,造成數(shù)據(jù)采集任務(wù)失敗。針對(duì)該問(wèn)題,該文首先提出信息年齡約束的無(wú)人機(jī)數(shù)據(jù)采集能耗優(yōu)化路徑規(guī)劃問(wèn)題并進(jìn)行數(shù)學(xué)建模;其次,設(shè)計(jì)一種協(xié)同混合近端策略優(yōu)化(CH-PPO)強(qiáng)化學(xué)習(xí)算法,同時(shí)規(guī)劃無(wú)人機(jī)對(duì)傳感器節(jié)點(diǎn)或基站的訪問(wèn)次序、懸停位置和飛行速度,在滿足各傳感器節(jié)點(diǎn)信息年齡約束的同時(shí),最大限度地減少無(wú)人機(jī)能量消耗。再次,設(shè)計(jì)一種融合離散和連續(xù)策略的損失函數(shù),增強(qiáng)CH-PPO算法動(dòng)作的合理性,提升其訓(xùn)練效果。仿真實(shí)驗(yàn)結(jié)果顯示,CH-PPO算法在無(wú)人機(jī)能量消耗以及影響該指標(biāo)因素的比較中均優(yōu)于對(duì)比的3種強(qiáng)化學(xué)習(xí)算法,并具有良好的收斂性、穩(wěn)定性和魯棒性。
-
關(guān)鍵詞:
- 無(wú)線傳感器網(wǎng)絡(luò) /
- 信息年齡約束 /
- 協(xié)同混合近端策略優(yōu)化算法 /
- 無(wú)人機(jī)路徑規(guī)劃 /
- 深度強(qiáng)化學(xué)習(xí)
Abstract: The information freshness is measured by Age of Information (AoI) of each sensor in Wireless Sensor Networks (WSN). The UAV optimizes flight trajectories and accelerates speed to assist WSN data collection, which guarantees that the data offloaded to the base station meets the AoI limitation of each sensor. However, the UAV’s inappropriate flight strategies cause non-essential energy consumption due to excessive flight distance and speed, which may result in the failure of data collection mission. In this paper, firstly a mathematical model is investigated and developed for the UAV energy consumption optimization trajectory planning problem on the basis of AoI-constrained data collection. Then, a novel deep reinforcement learning algorithm, named Cooperation Hybrid Proximal Policy Optimization (CH-PPO) algorithm, is proposed to simultaneously schedule the UAV’s access sequence, hovering position, the flight speed to the sensor nodes or the base station, to minimize the UAV's energy consumption under the constraint of data timeliness for each sensor node. Meanwhile, a loss function that integrates the discrete policy and continuous policy is designed to increase the rationality of hybrid actions and improve the training effectiveness of the proposed algorithm. Numerical results demonstrate that the CH-PPO algorithm outperforms the other three reinforcement learning algorithms in the comparison group in energy consumption of UAV and its influencing factors. Furthermore, the convergence, stability, and robustness of the proposed algorithm is well verified. -
1 CH-PPO算法
(1)輸入:訓(xùn)練輪數(shù)${\text{EP}}$,參數(shù)更新次數(shù)$M$,學(xué)習(xí)率$\eta $,裁剪系數(shù)$ \varepsilon $; (2)初始化網(wǎng)絡(luò)參數(shù):$\theta $,$ {\theta _{{\text{old}}}} $和$\omega $; (3)循環(huán)訓(xùn)練:$i = 1,2, \cdots ,{\text{EP}}$: (4) ${\text{done}} \ne 0$時(shí): (5) 計(jì)算離散動(dòng)作${\chi _{\mathrmq7j3ldu95}}\left( {{{\boldsymbol{s}}_k};{\theta _{{\mathrmq7j3ldu95},{\mathrm{old}}}}} \right)$; (6) 計(jì)算連續(xù)動(dòng)作${\chi _{\mathrm{c}}}\left( {{{\boldsymbol{s}}_k};{\theta _{{\mathrm{c}},{\mathrm{old}}}}} \right)$; (7) 得到混合動(dòng)作${{\boldsymbol{a}}_k} = \left\{ {i,\left( {l\left( k \right),\theta \left( k \right),{\boldsymbol{v}}\left( k \right)} \right)} \right\}$; (8) 智能體在狀態(tài)${{\boldsymbol{s}}_k}$下執(zhí)行動(dòng)作${{\boldsymbol{a}}_k}$,獲得獎(jiǎng)勵(lì)${r_k}$,并進(jìn)入下
一狀態(tài)${{\boldsymbol{s}}_{k + 1}}$;(9) 將$\left( {{{\boldsymbol{s}}_k},{{\boldsymbol{a}}_k},{r_k},{{\boldsymbol{s}}_{k + 1}}} \right)$存儲(chǔ)在經(jīng)驗(yàn)池; (10) 直到${\text{done}} = 0$; (11) 循環(huán)參數(shù)更新:$j = 1,2, \cdots ,M$: (12) 從經(jīng)驗(yàn)池中獲得所有經(jīng)驗(yàn)$ {\left( {{{\boldsymbol{s}}_k},{{\boldsymbol{a}}_k},{r_k},{{\boldsymbol{s}}_{k + 1}}} \right)_{k \in \{ 1,2, \cdots ,K\} }} $; (13) 計(jì)算經(jīng)驗(yàn)池中所有狀態(tài)的狀態(tài)價(jià)值${\text{va}}{{\text{l}}_1}, \cdots ,{\text{va}}{{\text{l}}_K}$; (14) 計(jì)算優(yōu)勢(shì)函數(shù)的估計(jì)值$ {\hat A_1}, \cdots ,{\hat A_K} $; (15) 分別計(jì)算離散策略和連續(xù)策略的新舊策略比值:
$ r_k^{\mathrmq7j3ldu95}\left( {{\theta _{\mathrmq7j3ldu95}}} \right) = \dfrac{{{\pi _{{\theta _{\mathrmq7j3ldu95}}}}({\boldsymbol{a}}_k^d|{{\boldsymbol{s}}_k})}}{{{\pi _{{\theta _{{\mathrmq7j3ldu95},{\text{old}}}}}}({\boldsymbol{a}}_k^{\mathrmq7j3ldu95}|{{\boldsymbol{s}}_k})}} $,$r_k^c\left( {{\theta _{\mathrm{c}}}} \right) = \dfrac{{{\pi _{{\theta _{\mathrm{c}}}}}({\boldsymbol{a}}_k^{\mathrm{c}}|{{\boldsymbol{s}}_k})}}{{{\pi _{{\theta _{{\mathrm{c}},{\text{old}}}}}}({\boldsymbol{a}}_k^{\mathrm{c}}|{{\boldsymbol{s}}_k})}}$;(16) 分別計(jì)算Actor網(wǎng)絡(luò)和Critic網(wǎng)絡(luò)的損失:$ {\text{L}}{{\text{A}}_k}\left( \theta \right) $,
${\text{L}}{{\text{C}}_k}{{(\omega )}}$;(17) 分別計(jì)算Actor網(wǎng)絡(luò)和Critic網(wǎng)絡(luò)的梯度:${\nabla _\theta }l_k^\chi \left( \theta \right)$,
${\nabla _\omega }l_k^V\left( \omega \right)$;(18) 更新參數(shù)$ \theta =\theta -\eta \text{ }{\nabla }_{\theta }{l}_{k}^{\chi }\left(\theta \right) $,$\omega = \omega - \eta {\nabla _\omega }l_k^V\left( \omega \right)$; (19) 直到$j = M$; (20) 更新舊Actor網(wǎng)絡(luò)的參數(shù):$ {\theta }_{\text{old}}=\theta $; (21) 清空經(jīng)驗(yàn)池; (22)直到$i = {\text{EP}}$,訓(xùn)練結(jié)束。 下載: 導(dǎo)出CSV
表 1 仿真參數(shù)
參數(shù) 取值 參數(shù) 取值 無(wú)人機(jī)初始能量$ {E_{{\text{init}}}} $ $1 \times {10^5}{\text{ J}}$ 無(wú)人機(jī)最大飛行速度${v_{\max}}$ $30{\text{ m/s}}$ 無(wú)人機(jī)飛行高度$H$ $10{\text{ m}}$ 傳感器節(jié)點(diǎn)的數(shù)據(jù)量$D$ $5 \times {10^4}{\text{ Byte}}$ 無(wú)人機(jī)最大連通半徑$R$ $30{\text{ m}}$ LoS和NLoS依賴常數(shù)$a,b$ $ 10,0.6 $ 帶寬$W$ $ 1{\text{ MHz}} $ 信道功率${P_{\mathrmq7j3ldu95}}$ $ - 20{\text{ dBm}} $ 非視距信道額外衰減系數(shù)$\mu $ $ 0.2 $ 單位信道功率增益$\zeta $ $ - 30{\text{ dB}} $ 噪聲功率${\sigma ^2}$ $ - 90{\text{ dBm}} $ 通徑損失指數(shù)$\alpha $ $ 2.3 $ 下載: 導(dǎo)出CSV
表 2 網(wǎng)絡(luò)參數(shù)
參數(shù) 取值 訓(xùn)練輪數(shù)${\text{EP}}$ 20000 學(xué)習(xí)率$\eta $ $1 \times {10^{ - 4}}$ 獎(jiǎng)勵(lì)折扣率$\gamma $ 0.99 裁剪系數(shù)$ \varepsilon $ 0.2 下載: 導(dǎo)出CSV
表 3 不同任務(wù)規(guī)模下的無(wú)人機(jī)能量消耗($1 \times {10^4}\;{\text{J}}$)
區(qū)域邊長(zhǎng) 網(wǎng)絡(luò)規(guī)模 CH-PPO H-PPO DQN PPO VLC-GA 200 20 1.71 1.75 2.32 2.82 2.18 40 4.01 4.55 4.91 4.73 4.95 60 6.10 6.72 6.82 7.08 6.93 300 20 1.96 2.07 2.61 4.02 2.54 40 4.89 5.21 6.84 9.28 6.91 60 9.18 9.45 11.43 12.59 11.51 400 20 2.02 2.11 2.92 4.46 2.84 40 6.37 7.60 7.62 12.76 7.64 60 10.03 10.70 12.04 14.78 12.13 下載: 導(dǎo)出CSV
表 4 不同任務(wù)規(guī)模下的無(wú)人機(jī)飛行距離(m)
區(qū)域邊長(zhǎng) 網(wǎng)絡(luò)規(guī)模 CH-PPO H-PPO DQN PPO VLC-GA 200 20 1488 1560 2398 2805 2249 40 3600 4308 5102 4596 5155 60 5922 6566 7025 6818 7175 300 20 1785 1837 2734 4115 2654 40 4471 4802 7270 9641 7356 60 8483 8785 12173 12526 12322 400 20 1898 1921 3066 4623 2991 40 6001 7242 8121 10767 8177 60 9367 9964 12880 13135 13019 下載: 導(dǎo)出CSV
表 5 不同任務(wù)規(guī)模下的任務(wù)時(shí)間(s)
區(qū)域邊長(zhǎng) 網(wǎng)絡(luò)規(guī)模 CH-PPO H-PPO DQN PPO VLC-GA 200 20 100 105 138 172 130 40 241 259 291 293 293 60 323 345 406 440 412 300 20 121 125 154 241 150 40 297 290 399 550 403 60 550 489 669 684 673 400 20 126 131 171 264 167 40 355 424 443 583 445 60 584 549 699 762 704 下載: 導(dǎo)出CSV
表 6 不同任務(wù)規(guī)模和AoI閾值下的無(wú)人機(jī)能量消耗($1 \times {10^4}\;{\text{J}}$)
AoI閾值 [60,80] [90,110] [120,140] 區(qū)域邊長(zhǎng) 網(wǎng)絡(luò)規(guī)模
20020 1.90 1.71 1.45 40 5.77 4.01 3.83 60 8.24 6.10 5.89
30020 2.19 1.96 1.55 40 7.79 4.89 4.29 60 10.47 9.18 7.96
40020 2.29 2.02 1.95 40 8.36 6.37 5.80 60 11.34 10.03 9.57 下載: 導(dǎo)出CSV
-
[1] AKYILDIZ I F, SU W, SANKARASUBRAMANIAM Y, et al. Wireless sensor networks: A survey[J]. Computer Networks, 2002, 38(4): 393–422. doi: 10.1016/S1389-1286(01)00302-4. [2] HAYAT S, YANMAZ E, and MUZAFFAR R. Survey on unmanned aerial vehicle networks for civil applications: A communications viewpoint[J]. IEEE Communications Surveys & Tutorials, 2016, 18(4): 2624–2661. doi: 10.1109/COMST.2016.2560343. [3] MOTLAGH N H, BAGAA M, and TALEB T. UAV-based IoT platform: A crowd surveillance use case[J]. IEEE Communications Magazine, 2017, 55(2): 128–134. doi: 10.1109/MCOM.2017.1600587CM. [4] HU Jie, WANG Tuan, YANG Jiacheng, et al. WSN-assisted UAV trajectory adjustment for pesticide drift control[J]. Sensors, 2020, 20(19): 5473. doi: 10.3390/s20195473. [5] 周彬, 郭艷, 李寧, 等. 基于導(dǎo)向強(qiáng)化Q學(xué)習(xí)的無(wú)人機(jī)路徑規(guī)劃[J]. 航空學(xué)報(bào), 2021, 42(9): 325109. doi: 10.7527/S1000-6893.2021.25109.ZHOU Bin, GUO Yan, LI Ning, et al. Path planning of UAV using guided enhancement Q-learning algorithm[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(9): 325109. doi: 10.7527/S1000-6893.2021.25109. [6] ZHOU Conghao, WU Wen, HE Hongli, et al. Delay-aware IoT task scheduling in space-air-ground integrated network[C]. 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, USA, 2019: 1–6. doi: 10.1109/GLOBECOM38437.2019.9013393. [7] LIU Dianxiong, XU Yuhua, WANG Jinlong, et al. Opportunistic utilization of dynamic multi-UAV in device-to-device communication networks[J]. IEEE Transactions on Cognitive Communications and Networking, 2020, 6(3): 1069–1083. doi: 10.1109/TCCN.2020.2991436. [8] 張廣馳, 何梓楠, 崔苗. 基于深度強(qiáng)化學(xué)習(xí)的無(wú)人機(jī)輔助移動(dòng)邊緣計(jì)算系統(tǒng)能耗優(yōu)化[J]. 電子與信息學(xué)報(bào), 2023, 45(5): 1635–1643. doi: 10.11999/JEIT220352.ZHANG Guangchi, HE Zinan, and CUI Miao. Energy consumption optimization of unmanned aerial vehicle assisted mobile edge computing systems based on deep reinforcement learning[J]. Journal of Electronics & Information Technology, 2023, 45(5): 1635–1643. doi: 10.11999/JEIT220352. [9] LUO Chuanwen, CHEN Wenping, LI Deying, et al. Optimizing flight trajectory of UAV for efficient data collection in wireless sensor networks[J]. Theoretical Computer Science, 2021, 853: 25–42. doi: 10.1016/j.tcs.2020.05.019. [10] ZHU Yuchao and WANG Shaowei. Efficient aerial data collection with cooperative trajectory planning for large-scale wireless sensor networks[J]. IEEE Transactions on Communications, 2022, 70(1): 433–444. doi: 10.1109/TCOMM.2021.3124950. [11] ZHAN Cheng and ZENG Yong. Completion time minimization for multi-UAV-enabled data collection[J]. IEEE Transactions on Wireless Communications, 2019, 18(10): 4859–4872. doi: 10.1109/TWC.2019.2930190. [12] KAUL S, YATES R, and GRUTESER M. Real-time status: How often should one update?[C]. 2012 Proceedings IEEE INFOCOM, Orlando, USA, 2012: 2731–2735. doi: 10.1109/INFCOM.2012.6195689. [13] 張建行, 康凱, 錢(qián)驊, 等. 面向物聯(lián)網(wǎng)的深度Q網(wǎng)絡(luò)無(wú)人機(jī)路徑規(guī)劃[J]. 電子與信息學(xué)報(bào), 2022, 44(11): 3850–3857. doi: 10.11999/JEIT210962.ZHANG Jianhang, KANG Kai, QIAN Hua, et al. UAV trajectory planning based on deep Q-network for internet of things[J]. Journal of Electronics & Information Technology, 2022, 44(11): 3850–3857. doi: 10.11999/JEIT210962. [14] LIAO Yuan and FRIDERIKOS V. Energy and age pareto optimal trajectories in UAV-assisted wireless data collection[J]. IEEE Transactions on Vehicular Technology, 2022, 71(8): 9101–9106. doi: 10.1109/TVT.2022.3175318. [15] SHERMAN M, SHAO Sihua, SUN Xiang, et al. Optimizing AoI in UAV-RIS-assisted IoT networks: Off policy versus on policy[J]. IEEE Internet of Things Journal, 2023, 10(14): 12401–12415. doi: 10.1109/JIOT.2023.3246925. [16] SUN Mengying, XU Xiaodong, QIN Xiaoqi, et al. AoI-energy-aware UAV-assisted data collection for IoT networks: A deep reinforcement learning method[J]. IEEE Internet of Things Journal, 2021, 8(24): 17275–17289. doi: 10.1109/JIOT.2021.3078701. [17] LIU Juan, TONG Peng, WANG Xijun, et al. UAV-aided data collection for information freshness in wireless sensor networks[J]. IEEE Transactions on Wireless Communications, 2021, 20(4): 2368–2382. doi: 10.1109/TWC.2020.3041750. [18] DAI Zipeng, LIU C H, YE Yuxiao, et al. AoI-minimal UAV crowdsensing by model-based graph convolutional reinforcement learning[C]. IEEE INFOCOM 2022-IEEE Conference on Computer Communications, London, United Kingdom, 2022: 1029–1038. doi: 10.1109/INFOCOM48880.2022.9796732. [19] LIU Kai and ZHENG Jun. UAV trajectory optimization for time-constrained data collection in UAV-enabled environmental monitoring systems[J]. IEEE Internet of Things Journal, 2022, 9(23): 24300–24314. doi: 10.1109/JIOT.2022.3189214. [20] SUN Yin, UYSAL-BIYIKOGLU E, YATES R D, et al. Update or wait: How to keep your data fresh[J]. IEEE Transactions on Information Theory, 2017, 63(11): 7492–7508. doi: 10.1109/TIT.2017.2735804. [21] YU Yu, TANG Jie, HUANG Jiayi, et al. Multi-objective optimization for UAV-assisted wireless powered IoT networks based on extended DDPG algorithm[J]. IEEE Transactions on Communications, 2021, 69(9): 6361–6374. doi: 10.1109/TCOMM.2021.3089476. [22] ZENG Yong, XU Jie, and ZHANG Rui. Energy minimization for wireless communication with rotary-wing UAV[J]. IEEE Transactions on Wireless Communications, 2019, 18(4): 2329–2345. doi: 10.1109/TWC.2019.2902559. [23] FAN Zhou, SU Rui, ZHANG Weinan, et al. Hybrid actor-critic reinforcement learning in parameterized action space[C]. Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 2019. [24] HA V P, DAO T K, PHAM N Y, et al. A variable-length chromosome genetic algorithm for time-based sensor network schedule optimization[J]. Sensors, 2021, 21(12): 3990. doi: 10.3390/s21123990. -