信息年齡約束下的無(wú)人機(jī)數(shù)據(jù)采集能耗優(yōu)化路徑規(guī)劃算法

高思華; 劉寶煜; 惠康華; 徐偉峰; 李軍輝; 趙炳陽(yáng)

doi:10.11999/JEIT240075

信息年齡約束下的無(wú)人機(jī)數(shù)據(jù)采集能耗優(yōu)化路徑規(guī)劃算法

doi: 10.11999/JEIT240075

1.
中國(guó)民航大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院天津 300300
2.
華北電力大學(xué)(保定)計(jì)算機(jī)系保定 071066
3.
河北省能源電力知識(shí)計(jì)算重點(diǎn)實(shí)驗(yàn)室保定 071066

基金項(xiàng)目: 國(guó)家自然科學(xué)基金(62173332)，中央高校基本科研業(yè)務(wù)費(fèi)專(zhuān)項(xiàng)資金(3122019118)，河北省能源電力知識(shí)計(jì)算重點(diǎn)實(shí)驗(yàn)室開(kāi)發(fā)基金(HBKCEP202202)

詳細(xì)信息

作者簡(jiǎn)介:
高思華：男，講師，研究方向?yàn)閺?qiáng)化學(xué)習(xí)理論、最優(yōu)化理論、無(wú)線傳感器網(wǎng)絡(luò)和無(wú)人機(jī)系統(tǒng)

劉寶煜：男，碩士生，研究方向?yàn)閺?qiáng)化學(xué)習(xí)理論、無(wú)人機(jī)路徑規(guī)劃

惠康華：男，副教授，研究方向?yàn)橛?jì)算機(jī)視覺(jué)

徐偉峰：男，講師，研究方向?yàn)橛?jì)算機(jī)視覺(jué)和空管系統(tǒng)

李軍輝：男，碩士生，研究方向?yàn)閺?qiáng)化學(xué)習(xí)理論、無(wú)人機(jī)路徑規(guī)劃

趙炳陽(yáng)：男，碩士生，研究方向?yàn)閺?qiáng)化學(xué)習(xí)理論、無(wú)人機(jī)路徑規(guī)劃

通訊作者:
惠康華　khhui@cauc.edu.cn

中圖分類(lèi)號(hào): TN926.2; V279
計(jì)量
- 文章訪問(wèn)數(shù): 321
- HTML全文瀏覽量: 86
- PDF下載量: 42
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2024-01-30
- 修回日期: 2024-09-05
- 網(wǎng)絡(luò)出版日期: 2024-09-10
- 刊出日期: 2024-10-30

Energy-Efficient UAV Trajectory Planning Algorithm for AoI-Constrained Data Collection

1.
Institute of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
2.
Department of Computer, North China Electric Power University (Baoding), Baoding 071066, China
3.
Hebei Key Laboratory of Knowledge Computing for Energy & Power, Baoding 071066, China

Funds: The National Natural Science Foundation of China (62173332), The Fundamental Research Fundation for the Central Universities (3122019118), The Open Fundation of Hebei Key Laboratory of Knowledge Computing for Energy & Power (HBKCEP202202)

摘要

摘要: 信息年齡(AoI)是評(píng)價(jià)無(wú)線傳感器網(wǎng)絡(luò)(WSN)數(shù)據(jù)時(shí)效性的重要指標(biāo)，無(wú)人機(jī)輔助WSN數(shù)據(jù)采集過(guò)程中采用優(yōu)化飛行軌跡、提升速度等運(yùn)動(dòng)策略保障卸載至基站的數(shù)據(jù)滿足各節(jié)點(diǎn)AoI限制。然而，不合理的運(yùn)動(dòng)策略易導(dǎo)致無(wú)人機(jī)因飛行距離過(guò)長(zhǎng)、速度過(guò)快產(chǎn)生非必要能耗，造成數(shù)據(jù)采集任務(wù)失敗。針對(duì)該問(wèn)題，該文首先提出信息年齡約束的無(wú)人機(jī)數(shù)據(jù)采集能耗優(yōu)化路徑規(guī)劃問(wèn)題并進(jìn)行數(shù)學(xué)建模；其次，設(shè)計(jì)一種協(xié)同混合近端策略優(yōu)化(CH-PPO)強(qiáng)化學(xué)習(xí)算法，同時(shí)規(guī)劃無(wú)人機(jī)對(duì)傳感器節(jié)點(diǎn)或基站的訪問(wèn)次序、懸停位置和飛行速度，在滿足各傳感器節(jié)點(diǎn)信息年齡約束的同時(shí)，最大限度地減少無(wú)人機(jī)能量消耗。再次，設(shè)計(jì)一種融合離散和連續(xù)策略的損失函數(shù)，增強(qiáng)CH-PPO算法動(dòng)作的合理性，提升其訓(xùn)練效果。仿真實(shí)驗(yàn)結(jié)果顯示，CH-PPO算法在無(wú)人機(jī)能量消耗以及影響該指標(biāo)因素的比較中均優(yōu)于對(duì)比的3種強(qiáng)化學(xué)習(xí)算法，并具有良好的收斂性、穩(wěn)定性和魯棒性。
- 無(wú)線傳感器網(wǎng)絡(luò) /
- 信息年齡約束 /
- 協(xié)同混合近端策略優(yōu)化算法 /
- 無(wú)人機(jī)路徑規(guī)劃 /
- 深度強(qiáng)化學(xué)習(xí)
Abstract: The information freshness is measured by Age of Information (AoI) of each sensor in Wireless Sensor Networks (WSN). The UAV optimizes flight trajectories and accelerates speed to assist WSN data collection, which guarantees that the data offloaded to the base station meets the AoI limitation of each sensor. However, the UAV’s inappropriate flight strategies cause non-essential energy consumption due to excessive flight distance and speed, which may result in the failure of data collection mission. In this paper, firstly a mathematical model is investigated and developed for the UAV energy consumption optimization trajectory planning problem on the basis of AoI-constrained data collection. Then, a novel deep reinforcement learning algorithm, named Cooperation Hybrid Proximal Policy Optimization (CH-PPO) algorithm, is proposed to simultaneously schedule the UAV’s access sequence, hovering position, the flight speed to the sensor nodes or the base station, to minimize the UAV's energy consumption under the constraint of data timeliness for each sensor node. Meanwhile, a loss function that integrates the discrete policy and continuous policy is designed to increase the rationality of hybrid actions and improve the training effectiveness of the proposed algorithm. Numerical results demonstrate that the CH-PPO algorithm outperforms the other three reinforcement learning algorithms in the comparison group in energy consumption of UAV and its influencing factors. Furthermore, the convergence, stability, and robustness of the proposed algorithm is well verified.
- Wireless Sensor Networks (WSN) /
- AoI limitation /
- Cooperation hybrid proximal policy optimization algorithm /
- Unmanned aerial vehicle trajectory planning /
- Deep reinforcement learning

HTML全文

圖 1 任務(wù)示意圖

下載: 全尺寸圖片幻燈片

圖 2 AoI示意圖

下載: 全尺寸圖片幻燈片

圖 3 網(wǎng)絡(luò)結(jié)構(gòu)圖

下載: 全尺寸圖片幻燈片

圖 4 獎(jiǎng)勵(lì)收斂效果圖

下載: 全尺寸圖片幻燈片

圖 5 不同學(xué)習(xí)率下的獎(jiǎng)勵(lì)收斂圖

下載: 全尺寸圖片幻燈片

圖 6 不同裁剪系數(shù)下的獎(jiǎng)勵(lì)收斂圖

下載: 全尺寸圖片幻燈片

1 CH-PPO算法

(1)輸入：訓(xùn)練輪數(shù)${\text{EP}}$，參數(shù)更新次數(shù)$M$，學(xué)習(xí)率$\eta $，裁剪系數(shù)$ \varepsilon $；
(2)初始化網(wǎng)絡(luò)參數(shù)：$\theta $，$ {\theta _{{\text{old}}}} $和$\omega $；
(3)循環(huán)訓(xùn)練：$i = 1,2, \cdots ,{\text{EP}}$：
(4) ${\text{done}} \ne 0$時(shí)：
(5) 　計(jì)算離散動(dòng)作${\chi _{\mathrmq7j3ldu95}}\left( {{{\boldsymbol{s}}_k};{\theta _{{\mathrmq7j3ldu95},{\mathrm{old}}}}} \right)$；
(6) 　計(jì)算連續(xù)動(dòng)作${\chi _{\mathrm{c}}}\left( {{{\boldsymbol{s}}_k};{\theta _{{\mathrm{c}},{\mathrm{old}}}}} \right)$；
(7) 　得到混合動(dòng)作${{\boldsymbol{a}}_k} = \left\{ {i,\left( {l\left( k \right),\theta \left( k \right),{\boldsymbol{v}}\left( k \right)} \right)} \right\}$；
(8) 　智能體在狀態(tài)${{\boldsymbol{s}}_k}$下執(zhí)行動(dòng)作${{\boldsymbol{a}}_k}$，獲得獎(jiǎng)勵(lì)${r_k}$，并進(jìn)入下　　　一狀態(tài)${{\boldsymbol{s}}_{k + 1}}$；
(9) 　將$\left( {{{\boldsymbol{s}}_k},{{\boldsymbol{a}}_k},{r_k},{{\boldsymbol{s}}_{k + 1}}} \right)$存儲(chǔ)在經(jīng)驗(yàn)池；
(10) 直到${\text{done}} = 0$；
(11) 循環(huán)參數(shù)更新：$j = 1,2, \cdots ,M$：
(12) 從經(jīng)驗(yàn)池中獲得所有經(jīng)驗(yàn)$ {\left( {{{\boldsymbol{s}}_k},{{\boldsymbol{a}}_k},{r_k},{{\boldsymbol{s}}_{k + 1}}} \right)_{k \in \{ 1,2, \cdots ,K\} }} $；
(13) 計(jì)算經(jīng)驗(yàn)池中所有狀態(tài)的狀態(tài)價(jià)值${\text{va}}{{\text{l}}_1}, \cdots ,{\text{va}}{{\text{l}}_K}$；
(14) 計(jì)算優(yōu)勢(shì)函數(shù)的估計(jì)值$ {\hat A_1}, \cdots ,{\hat A_K} $；
(15) 分別計(jì)算離散策略和連續(xù)策略的新舊策略比值：　　　 $ r_k^{\mathrmq7j3ldu95}\left( {{\theta _{\mathrmq7j3ldu95}}} \right) = \dfrac{{{\pi _{{\theta _{\mathrmq7j3ldu95}}}}({\boldsymbol{a}}_k^d\|{{\boldsymbol{s}}_k})}}{{{\pi _{{\theta _{{\mathrmq7j3ldu95},{\text{old}}}}}}({\boldsymbol{a}}_k^{\mathrmq7j3ldu95}\|{{\boldsymbol{s}}_k})}} $，$r_k^c\left( {{\theta _{\mathrm{c}}}} \right) = \dfrac{{{\pi _{{\theta _{\mathrm{c}}}}}({\boldsymbol{a}}_k^{\mathrm{c}}\|{{\boldsymbol{s}}_k})}}{{{\pi _{{\theta _{{\mathrm{c}},{\text{old}}}}}}({\boldsymbol{a}}_k^{\mathrm{c}}\|{{\boldsymbol{s}}_k})}}$；
(16) 分別計(jì)算Actor網(wǎng)絡(luò)和Critic網(wǎng)絡(luò)的損失：$ {\text{L}}{{\text{A}}_k}\left( \theta \right) $，　　　 ${\text{L}}{{\text{C}}_k}{{(\omega )}}$；
(17) 分別計(jì)算Actor網(wǎng)絡(luò)和Critic網(wǎng)絡(luò)的梯度：${\nabla _\theta }l_k^\chi \left( \theta \right)$，　　　 ${\nabla _\omega }l_k^V\left( \omega \right)$；
(18) 更新參數(shù)$ \theta =\theta -\eta \text{ }{\nabla }_{\theta }{l}_{k}^{\chi }\left(\theta \right) $，$\omega = \omega - \eta {\nabla _\omega }l_k^V\left( \omega \right)$；
(19) 直到$j = M$；
(20) 更新舊Actor網(wǎng)絡(luò)的參數(shù)：$ {\theta }_{\text{old}}=\theta $；
(21) 清空經(jīng)驗(yàn)池；
(22)直到$i = {\text{EP}}$，訓(xùn)練結(jié)束。

下載: 導(dǎo)出CSV

表 1 仿真參數(shù)

參數(shù)	取值	參數(shù)	取值
無(wú)人機(jī)初始能量$ {E_{{\text{init}}}} $	$1 \times {10^5}{\text{ J}}$	無(wú)人機(jī)最大飛行速度${v_{\max}}$	$30{\text{ m/s}}$
無(wú)人機(jī)飛行高度$H$	$10{\text{ m}}$	傳感器節(jié)點(diǎn)的數(shù)據(jù)量$D$	$5 \times {10^4}{\text{ Byte}}$
無(wú)人機(jī)最大連通半徑$R$	$30{\text{ m}}$	LoS和NLoS依賴常數(shù)$a,b$	$ 10,0.6 $
帶寬$W$	$ 1{\text{ MHz}} $	信道功率${P_{\mathrmq7j3ldu95}}$	$ - 20{\text{ dBm}} $
非視距信道額外衰減系數(shù)$\mu $	$ 0.2 $	單位信道功率增益$\zeta $	$ - 30{\text{ dB}} $
噪聲功率${\sigma ^2}$	$ - 90{\text{ dBm}} $	通徑損失指數(shù)$\alpha $	$ 2.3 $

下載: 導(dǎo)出CSV

表 2 網(wǎng)絡(luò)參數(shù)

參數(shù)	取值
訓(xùn)練輪數(shù)${\text{EP}}$	20000
學(xué)習(xí)率$\eta $	$1 \times {10^{ - 4}}$
獎(jiǎng)勵(lì)折扣率$\gamma $	0.99
裁剪系數(shù)$ \varepsilon $	0.2

下載: 導(dǎo)出CSV

表 3 不同任務(wù)規(guī)模下的無(wú)人機(jī)能量消耗($1 \times {10^4}\;{\text{J}}$)

區(qū)域邊長(zhǎng)	網(wǎng)絡(luò)規(guī)模	CH-PPO	H-PPO	DQN	PPO	VLC-GA
200	20	1.71	1.75	2.32	2.82	2.18
	40	4.01	4.55	4.91	4.73	4.95
	60	6.10	6.72	6.82	7.08	6.93
300	20	1.96	2.07	2.61	4.02	2.54
	40	4.89	5.21	6.84	9.28	6.91
	60	9.18	9.45	11.43	12.59	11.51
400	20	2.02	2.11	2.92	4.46	2.84
	40	6.37	7.60	7.62	12.76	7.64
	60	10.03	10.70	12.04	14.78	12.13

下載: 導(dǎo)出CSV

表 4 不同任務(wù)規(guī)模下的無(wú)人機(jī)飛行距離(m)

區(qū)域邊長(zhǎng)	網(wǎng)絡(luò)規(guī)模	CH-PPO	H-PPO	DQN	PPO	VLC-GA
200	20	1488	1560	2398	2805	2249
	40	3600	4308	5102	4596	5155
	60	5922	6566	7025	6818	7175
300	20	1785	1837	2734	4115	2654
	40	4471	4802	7270	9641	7356
	60	8483	8785	12173	12526	12322
400	20	1898	1921	3066	4623	2991
	40	6001	7242	8121	10767	8177
	60	9367	9964	12880	13135	13019

下載: 導(dǎo)出CSV

表 5 不同任務(wù)規(guī)模下的任務(wù)時(shí)間(s)

區(qū)域邊長(zhǎng)	網(wǎng)絡(luò)規(guī)模	CH-PPO	H-PPO	DQN	PPO	VLC-GA
200	20	100	105	138	172	130
	40	241	259	291	293	293
	60	323	345	406	440	412
300	20	121	125	154	241	150
	40	297	290	399	550	403
	60	550	489	669	684	673
400	20	126	131	171	264	167
	40	355	424	443	583	445
	60	584	549	699	762	704

下載: 導(dǎo)出CSV

表 6 不同任務(wù)規(guī)模和AoI閾值下的無(wú)人機(jī)能量消耗($1 \times {10^4}\;{\text{J}}$)

AoI閾值		[60,80]	[90,110]	[120,140]
區(qū)域邊長(zhǎng)	網(wǎng)絡(luò)規(guī)模	[60,80]	[90,110]	[120,140]
200	20	1.90	1.71	1.45
	40	5.77	4.01	3.83
	60	8.24	6.10	5.89
300	20	2.19	1.96	1.55
	40	7.79	4.89	4.29
	60	10.47	9.18	7.96
400	20	2.29	2.02	1.95
	40	8.36	6.37	5.80
	60	11.34	10.03	9.57

下載: 導(dǎo)出CSV

參考文獻(xiàn)(24)

[1]	AKYILDIZ I F, SU W, SANKARASUBRAMANIAM Y, et al. Wireless sensor networks: A survey[J]. Computer Networks, 2002, 38(4): 393–422. doi: 10.1016/S1389-1286(01)00302-4.
[2]	HAYAT S, YANMAZ E, and MUZAFFAR R. Survey on unmanned aerial vehicle networks for civil applications: A communications viewpoint[J]. IEEE Communications Surveys & Tutorials, 2016, 18(4): 2624–2661. doi: 10.1109/COMST.2016.2560343.
[3]	MOTLAGH N H, BAGAA M, and TALEB T. UAV-based IoT platform: A crowd surveillance use case[J]. IEEE Communications Magazine, 2017, 55(2): 128–134. doi: 10.1109/MCOM.2017.1600587CM.
[4]	HU Jie, WANG Tuan, YANG Jiacheng, et al. WSN-assisted UAV trajectory adjustment for pesticide drift control[J]. Sensors, 2020, 20(19): 5473. doi: 10.3390/s20195473.
[5]	周彬, 郭艷, 李寧, 等. 基于導(dǎo)向強(qiáng)化Q學(xué)習(xí)的無(wú)人機(jī)路徑規(guī)劃[J]. 航空學(xué)報(bào), 2021, 42(9): 325109. doi: 10.7527/S1000-6893.2021.25109. ZHOU Bin, GUO Yan, LI Ning, et al. Path planning of UAV using guided enhancement Q-learning algorithm[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(9): 325109. doi: 10.7527/S1000-6893.2021.25109.
[6]	ZHOU Conghao, WU Wen, HE Hongli, et al. Delay-aware IoT task scheduling in space-air-ground integrated network[C]. 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, USA, 2019: 1–6. doi: 10.1109/GLOBECOM38437.2019.9013393.
[7]	LIU Dianxiong, XU Yuhua, WANG Jinlong, et al. Opportunistic utilization of dynamic multi-UAV in device-to-device communication networks[J]. IEEE Transactions on Cognitive Communications and Networking, 2020, 6(3): 1069–1083. doi: 10.1109/TCCN.2020.2991436.
[8]	張廣馳, 何梓楠, 崔苗. 基于深度強(qiáng)化學(xué)習(xí)的無(wú)人機(jī)輔助移動(dòng)邊緣計(jì)算系統(tǒng)能耗優(yōu)化[J]. 電子與信息學(xué)報(bào), 2023, 45(5): 1635–1643. doi: 10.11999/JEIT220352. ZHANG Guangchi, HE Zinan, and CUI Miao. Energy consumption optimization of unmanned aerial vehicle assisted mobile edge computing systems based on deep reinforcement learning[J]. Journal of Electronics & Information Technology, 2023, 45(5): 1635–1643. doi: 10.11999/JEIT220352.
[9]	LUO Chuanwen, CHEN Wenping, LI Deying, et al. Optimizing flight trajectory of UAV for efficient data collection in wireless sensor networks[J]. Theoretical Computer Science, 2021, 853: 25–42. doi: 10.1016/j.tcs.2020.05.019.
[10]	ZHU Yuchao and WANG Shaowei. Efficient aerial data collection with cooperative trajectory planning for large-scale wireless sensor networks[J]. IEEE Transactions on Communications, 2022, 70(1): 433–444. doi: 10.1109/TCOMM.2021.3124950.
[11]	ZHAN Cheng and ZENG Yong. Completion time minimization for multi-UAV-enabled data collection[J]. IEEE Transactions on Wireless Communications, 2019, 18(10): 4859–4872. doi: 10.1109/TWC.2019.2930190.
[12]	KAUL S, YATES R, and GRUTESER M. Real-time status: How often should one update?[C]. 2012 Proceedings IEEE INFOCOM, Orlando, USA, 2012: 2731–2735. doi: 10.1109/INFCOM.2012.6195689.
[13]	張建行, 康凱, 錢(qián)驊, 等. 面向物聯(lián)網(wǎng)的深度Q網(wǎng)絡(luò)無(wú)人機(jī)路徑規(guī)劃[J]. 電子與信息學(xué)報(bào), 2022, 44(11): 3850–3857. doi: 10.11999/JEIT210962. ZHANG Jianhang, KANG Kai, QIAN Hua, et al. UAV trajectory planning based on deep Q-network for internet of things[J]. Journal of Electronics & Information Technology, 2022, 44(11): 3850–3857. doi: 10.11999/JEIT210962.
[14]	LIAO Yuan and FRIDERIKOS V. Energy and age pareto optimal trajectories in UAV-assisted wireless data collection[J]. IEEE Transactions on Vehicular Technology, 2022, 71(8): 9101–9106. doi: 10.1109/TVT.2022.3175318.
[15]	SHERMAN M, SHAO Sihua, SUN Xiang, et al. Optimizing AoI in UAV-RIS-assisted IoT networks: Off policy versus on policy[J]. IEEE Internet of Things Journal, 2023, 10(14): 12401–12415. doi: 10.1109/JIOT.2023.3246925.
[16]	SUN Mengying, XU Xiaodong, QIN Xiaoqi, et al. AoI-energy-aware UAV-assisted data collection for IoT networks: A deep reinforcement learning method[J]. IEEE Internet of Things Journal, 2021, 8(24): 17275–17289. doi: 10.1109/JIOT.2021.3078701.
[17]	LIU Juan, TONG Peng, WANG Xijun, et al. UAV-aided data collection for information freshness in wireless sensor networks[J]. IEEE Transactions on Wireless Communications, 2021, 20(4): 2368–2382. doi: 10.1109/TWC.2020.3041750.
[18]	DAI Zipeng, LIU C H, YE Yuxiao, et al. AoI-minimal UAV crowdsensing by model-based graph convolutional reinforcement learning[C]. IEEE INFOCOM 2022-IEEE Conference on Computer Communications, London, United Kingdom, 2022: 1029–1038. doi: 10.1109/INFOCOM48880.2022.9796732.
[19]	LIU Kai and ZHENG Jun. UAV trajectory optimization for time-constrained data collection in UAV-enabled environmental monitoring systems[J]. IEEE Internet of Things Journal, 2022, 9(23): 24300–24314. doi: 10.1109/JIOT.2022.3189214.
[20]	SUN Yin, UYSAL-BIYIKOGLU E, YATES R D, et al. Update or wait: How to keep your data fresh[J]. IEEE Transactions on Information Theory, 2017, 63(11): 7492–7508. doi: 10.1109/TIT.2017.2735804.
[21]	YU Yu, TANG Jie, HUANG Jiayi, et al. Multi-objective optimization for UAV-assisted wireless powered IoT networks based on extended DDPG algorithm[J]. IEEE Transactions on Communications, 2021, 69(9): 6361–6374. doi: 10.1109/TCOMM.2021.3089476.
[22]	ZENG Yong, XU Jie, and ZHANG Rui. Energy minimization for wireless communication with rotary-wing UAV[J]. IEEE Transactions on Wireless Communications, 2019, 18(4): 2329–2345. doi: 10.1109/TWC.2019.2902559.
[23]	FAN Zhou, SU Rui, ZHANG Weinan, et al. Hybrid actor-critic reinforcement learning in parameterized action space[C]. Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 2019.
[24]	HA V P, DAO T K, PHAM N Y, et al. A variable-length chromosome genetic algorithm for time-based sensor network schedule optimization[J]. Sensors, 2021, 21(12): 3990. doi: 10.3390/s21123990.