通信干擾信道和功率智能決策算法
doi: 10.11999/JEIT240100
-
海軍工程大學(xué)電子工程學(xué)院 武漢 430033
Intelligent Decision-making for Selection of Communication Jamming Channel and Power
-
College of Electronic Engineering, Naval University of Engineering, Wuhan 430033, China
-
摘要: 智能干擾是一種利用環(huán)境反饋自主學(xué)習(xí)干擾策略,對敵方通信鏈路進行有效干擾的技術(shù)。然而,現(xiàn)有的智能干擾研究大多假設(shè)干擾機能夠直接獲取通信質(zhì)量反饋(如誤碼率或丟包率),這在實際對抗環(huán)境中難以實現(xiàn),限制了智能干擾的應(yīng)用范圍。為了解決這一問題,該文將通信干擾問題建模為馬爾科夫決策過程(MDP),綜合考慮干擾基本原則和通信目標(biāo)行為變化制定干擾效能衡量指標(biāo),提出了一種改進的策略爬山算法(IPHC)。該算法按照“觀察(Observe)-調(diào)整(Orient)-決策(Decide)-行動(Act)”的OODA閉環(huán),實時觀察通信目標(biāo)變化,靈活調(diào)整干擾策略,運用混合策略決策,實施通信干擾。仿真結(jié)果表明,在通信目標(biāo)采用確定性規(guī)避策略時,所提算法能夠較快收斂到最優(yōu)干擾策略,并且其收斂耗時較Q-learning算法至少縮短2/3;當(dāng)通信目標(biāo)變換策略時,能夠自適應(yīng)學(xué)習(xí),重新調(diào)整到最優(yōu)干擾策略。在通信目標(biāo)采用混合性規(guī)避策略時,所提算法也能夠快速收斂,取得較優(yōu)的干擾效果。Abstract: Intelligent jamming is a technique that utilizes environmental feedback information and autonomous learning of jamming strategies to effectively disrupt the communication links of the enemy. However, most existing research on intelligent jamming assumes that jammers can directly access the feedback of communication quality indicators, such as bit error rate or packet loss rate. This assumption is difficult to achieve in practical adversarial environments, thus limiting the applicability of intelligent jamming. To address this issue, the communication jamming problem is modeled as a Markov Decision Process (MDP), and by considering both the fundamental principles of jamming and the dynamic behavior of communication objectives, an Improved Policy Hill-Climbing (IPHC) algorithm is proposed. This algorithm follows an OODA loop of “Observe-Orient-Decide-Act”, continuously observes the changes of communication objectives in real time, flexibly adjusts jamming strategies, and applies a mixed strategy decision-making to execute communication jamming. Simulation results demonstrate that when the communication objectives adopt deterministic evasion strategies, the proposed algorithm can quickly converge to the optimal jamming strategy, and the convergence time is at least two-thirds shorter than that of the Q-learning algorithm. When the communication objectives switch evasion strategies, the algorithm can adaptively learn and readjust to the optimal jamming strategy. In the case of communication objectives using mixed evasion strategies, the proposed algorithm also achieves fast convergence and obtains superior jamming effects.
-
1 基于IPHC的通信干擾信道和功率智能決策算法
參數(shù)設(shè)置:$ Q\left( {{\boldsymbol{s}},{\boldsymbol{a}}} \right) = 0 $,$ {\pi} \left( {{\boldsymbol{s}},{\boldsymbol{a}}} \right) = {1 \mathord{\left/ {\vphantom {1 {\left| A \right|}}} \right. } {\left| A \right|}} $,更新步長$\alpha $和學(xué)習(xí)率$\eta $。 學(xué)習(xí)過程:令$t = 0$,在狀態(tài)${{\boldsymbol{s}}_t}$,依據(jù)$ {\pi} \left( {{{\boldsymbol{s}}_t},{\boldsymbol{a}}} \right) $得到動作${{\boldsymbol{a}}_t}$,并轉(zhuǎn)移到下一狀態(tài)${{\boldsymbol{s}}_{t + 1}}$。 while $t < T$ 由${{\boldsymbol{s}}_t}$和${{\boldsymbol{s}}_{t + 1}}$之間的關(guān)系,評估獎勵:$ {r_t} = {w_1}{\varphi _1}\left( {{\text{JNSR}} - {T_{\text{h}}}} \right) + {w_2}\mu \left( {{f_{{\text{c}},t + 1}} - {f_{{\text{c}},t}}} \right) + {w_3}{\varphi _2}\left( {{p_{{\text{c}},t + 1}} - {p_{{\text{c}},t}}} \right) - {w_4}{{{p_{{\text{j}},t + 1}}} \mathord{\left/ {\vphantom {{{p_{{\text{j}},t + 1}}} {{P_{{\text{jMax}}}}}}} \right. } {{P_{{\text{jMax}}}}}} $; 依據(jù)獎勵$ {r_t} $,調(diào)整Q值表:$ Q\left( {{{\boldsymbol{s}}_t},{{\boldsymbol{a}}_t}} \right) = Q\left( {{{\boldsymbol{s}}_t},{{\boldsymbol{a}}_t}} \right) + \alpha \left[ {{r_t} + \gamma \mathop {\max }\limits_{\boldsymbol{a}} Q\left( {{{\boldsymbol{s}}_{t + 1}},{\boldsymbol{a}}} \right) - Q\left( {{{\boldsymbol{s}}_t},{{\boldsymbol{a}}_t}} \right)} \right] $; 依據(jù)Q值表調(diào)整策略,并進行歸一化:$ {\pi} \left({\boldsymbol{s}},{\boldsymbol{a}}\right)={\pi} \left({\boldsymbol{s}},{\boldsymbol{a}}\right)+\eta ,\;\;{\boldsymbol{a}}=\mathrm{arg}\underset{{{\boldsymbol{a}}}^{\prime }}{\mathrm{max}}Q\left({\boldsymbol{s}},{\boldsymbol{{a}}}^{\prime }\right) $,$ {\pi} \left( {{\boldsymbol{s}},{{\boldsymbol{a}}_i}} \right) = {{{\pi} \left( {{\boldsymbol{s}},{{\boldsymbol{a}}_i}} \right)} \Bigr/ {\displaystyle\sum\limits_{i = 1}^{M \times K} {{\pi} \left( {{\boldsymbol{s}},{{\boldsymbol{a}}_i}} \right)} }} $; 轉(zhuǎn)入下一時刻,$t = t + 1$,在狀態(tài)${{\boldsymbol{s}}_t}$,依據(jù)$ {\pi} \left( {{{\boldsymbol{s}}_t},{\boldsymbol{a}}} \right) $得到動作${{\boldsymbol{a}}_t}$,并轉(zhuǎn)移到下一狀態(tài)${{\boldsymbol{s}}_{t + 1}}$。 下載: 導(dǎo)出CSV
表 1 仿真參數(shù)設(shè)置
參數(shù) 取值 $\gamma $ 0.5 $\alpha $ 0.1 $\eta $ 0.001 ${T_{\text{h}}}$ 0.3 ${w_1}$ 1 ${w_2}$ 0.5 ${w_3}$ 0.5 ${w_4}$ 1 下載: 導(dǎo)出CSV
表 3 前2個最大Q值對應(yīng)不同策略選擇個數(shù)情況
序號 干擾
狀態(tài)增大
功率切換
信道序號 干擾
狀態(tài)增大
功率切換
信道1 $ \left( {{f_{{\text{j}},t}} = {F_1},{p_{{\text{j}},t}} = 2{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_1},{p_{{\text{c}},t}} = 7{\text{ }}{\rm{mW}}} \right) $ 1 1 11 $ \left( {{f_{{\text{j}},t}} = {F_3},{p_{{\text{j}},t}} = 6{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_3},{p_{{\text{c}},t}} = 21{\text{ }}{\rm{mW}}} \right) $ 1 1 2 $ \left( {{f_{{\text{j}},t}} = {F_1},{p_{{\text{j}},t}} = 4{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_1},{p_{{\text{c}},t}} = 14{\text{ }}{\rm{mW}}} \right) $ 1 1 12 $ \left( {{f_{{\text{j}},t}} = {F_3},{p_{{\text{j}},t}} = 8{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_3},{p_{{\text{c}},t}} = 28{\text{ }}{\rm{mW}}} \right) $ 2 0 3 $ \left( {{f_{{\text{j}},t}} = {F_1},{p_{{\text{j}},t}} = 6{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_1},{p_{{\text{c}},t}} = 21{\text{ }}{\rm{mW}}} \right) $ 1 1 13 $ \left( {{f_{{\text{j}},t}} = {F_4},{p_{{\text{j}},t}} = 2{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_4},{p_{{\text{c}},t}} = 7{\text{ }}{\rm{mW}}} \right) $ 1 1 4 $ \left( {{f_{{\text{j}},t}} = {F_1},{p_{{\text{j}},t}} = 8{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_1},{p_{{\text{c}},t}} = 28{\text{ }}{\rm{mW}}} \right) $ 2 0 14 $ \left( {{f_{{\text{j}},t}} = {F_4},{p_{{\text{j}},t}} = 4{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_4},{p_{{\text{c}},t}} = 14{\text{ }}{\rm{mW}}} \right) $ 1 1 5 $ \left( {{f_{{\text{j}},t}} = {F_2},{p_{{\text{j}},t}} = 2{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_2},{p_{{\text{c}},t}} = 7{\text{ }}{\rm{mW}}} \right) $ 0 2 15 $ \left( {{f_{{\text{j}},t}} = {F_4},{p_{{\text{j}},t}} = 6{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_4},{p_{{\text{c}},t}} = 21{\text{ }}{\rm{mW}}} \right) $ 1 1 6 $ \left( {{f_{{\text{j}},t}} = {F_2},{p_{{\text{j}},t}} = 4{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_2},{p_{{\text{c}},t}} = 14{\text{ }}{\rm{mW}}} \right) $ 0 2 16 $ \left( {{f_{{\text{j}},t}} = {F_4},{p_{{\text{j}},t}} = 8{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_4},{p_{{\text{c}},t}} = 28{\text{ }}{\rm{mW}}} \right) $ 2 0 7 $ \left( {{f_{{\text{j}},t}} = {F_2},{p_{{\text{j}},t}} = 6{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_2},{p_{{\text{c}},t}} = 21{\text{ }}{\rm{mW}}} \right) $ 1 1 17 $ \left( {{f_{{\text{j}},t}} = {F_5},{p_{{\text{j}},t}} = 2{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_5},{p_{{\text{c}},t}} = 7{\text{ }}{\rm{mW}}} \right) $ 1 1 8 $ \left( {{f_{{\text{j}},t}} = {F_2},{p_{{\text{j}},t}} = 8{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_2},{p_{{\text{c}},t}} = 28{\text{ }}{\rm{mW}}} \right) $ 2 0 18 $ \left( {{f_{{\text{j}},t}} = {F_5},{p_{{\text{j}},t}} = 4{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_5},{p_{{\text{c}},t}} = 14{\text{ }}{\rm{mW}}} \right) $ 1 1 9 $ \left( {{f_{{\text{j}},t}} = {F_3},{p_{{\text{j}},t}} = 2{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_3},{p_{{\text{c}},t}} = 7{\text{ }}{\rm{mW}}} \right) $ 0 2 19 $ \left( {{f_{{\text{j}},t}} = {F_5},{p_{{\text{j}},t}} = 6{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_5},{p_{{\text{c}},t}} = 21{\text{ }}{\rm{mW}}} \right) $ 1 1 10 $ \left( {{f_{{\text{j}},t}} = {F_3},{p_{{\text{j}},t}} = 4{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_3},{p_{{\text{c}},t}} = 14{\text{ }}{\rm{mW}}} \right) $ 1 1 20 $ \left( {{f_{{\text{j}},t}} = {F_5},{p_{{\text{j}},t}} = 8{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_5},{p_{{\text{c}},t}} = 28{\text{ }}{\rm{mW}}} \right) $ 1 1 總次數(shù) 21 19 下載: 導(dǎo)出CSV
表 4 不同策略選擇概率情況
序號 干擾
狀態(tài)增大
功率切換
信道序號 干擾
狀態(tài)增大
功率切換
信道1 $ \left( {{f_{{\text{j}},t}} = {F_1},{p_{{\text{j}},t}} = 2{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_1},{p_{{\text{c}},t}} = 7{\text{ }}{\rm{mW}}} \right) $ 1 0 11 $ \left( {{f_{{\text{j}},t}} = {F_3},{p_{{\text{j}},t}} = 6{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_3},{p_{{\text{c}},t}} = 21{\text{ }}{\rm{mW}}} \right) $ 0.76 0.24 2 $ \left( {{f_{{\text{j}},t}} = {F_1},{p_{{\text{j}},t}} = 4{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_1},{p_{{\text{c}},t}} = 14{\text{ }}{\rm{mW}}} \right) $ 1 0 12 $ \left( {{f_{{\text{j}},t}} = {F_3},{p_{{\text{j}},t}} = 8{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_3},{p_{{\text{c}},t}} = 28{\text{ }}{\rm{mW}}} \right) $ 1 0 3 $ \left( {{f_{{\text{j}},t}} = {F_1},{p_{{\text{j}},t}} = 6{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_1},{p_{{\text{c}},t}} = 21{\text{ }}{\rm{mW}}} \right) $ 0.89 0.11 13 $ \left( {{f_{{\text{j}},t}} = {F_4},{p_{{\text{j}},t}} = 2{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_4},{p_{{\text{c}},t}} = 7{\text{ }}{\rm{mW}}} \right) $ 1 0 4 $ \left( {{f_{{\text{j}},t}} = {F_1},{p_{{\text{j}},t}} = 8{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_1},{p_{{\text{c}},t}} = 28{\text{ }}{\rm{mW}}} \right) $ 0.77 0.23 14 $ \left( {{f_{{\text{j}},t}} = {F_4},{p_{{\text{j}},t}} = 4{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_4},{p_{{\text{c}},t}} = 14{\text{ }}{\rm{mW}}} \right) $ 1 0 5 $ \left( {{f_{{\text{j}},t}} = {F_2},{p_{{\text{j}},t}} = 2{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_2},{p_{{\text{c}},t}} = 7{\text{ }}{\rm{mW}}} \right) $ 1 0 15 $ \left( {{f_{{\text{j}},t}} = {F_4},{p_{{\text{j}},t}} = 6{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_4},{p_{{\text{c}},t}} = 21{\text{ }}{\rm{mW}}} \right) $ 0.93 0.07 6 $ \left( {{f_{{\text{j}},t}} = {F_2},{p_{{\text{j}},t}} = 4{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_2},{p_{{\text{c}},t}} = 14{\text{ }}{\rm{mW}}} \right) $ 1 0 16 $ \left( {{f_{{\text{j}},t}} = {F_4},{p_{{\text{j}},t}} = 8{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_4},{p_{{\text{c}},t}} = 28{\text{ }}{\rm{mW}}} \right) $ 1 0 7 $ \left( {{f_{{\text{j}},t}} = {F_2},{p_{{\text{j}},t}} = 6{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_2},{p_{{\text{c}},t}} = 21{\text{ }}{\rm{mW}}} \right) $ 0.98 0.02 17 $ \left( {{f_{{\text{j}},t}} = {F_5},{p_{{\text{j}},t}} = 2{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_5},{p_{{\text{c}},t}} = 7{\text{ }}{\rm{mW}}} \right) $ 1 0 8 $ \left( {{f_{{\text{j}},t}} = {F_2},{p_{{\text{j}},t}} = 8{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_2},{p_{{\text{c}},t}} = 28{\text{ }}{\rm{mW}}} \right) $ 0.80 0.20 18 $ \left( {{f_{{\text{j}},t}} = {F_5},{p_{{\text{j}},t}} = 4{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_5},{p_{{\text{c}},t}} = 14{\text{ }}{\rm{mW}}} \right) $ 1 0 9 $ \left( {{f_{{\text{j}},t}} = {F_3},{p_{{\text{j}},t}} = 2{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_3},{p_{{\text{c}},t}} = 7{\text{ }}{\rm{mW}}} \right) $ 1 0 19 $ \left( {{f_{{\text{j}},t}} = {F_5},{p_{{\text{j}},t}} = 6{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_5},{p_{{\text{c}},t}} = 21{\text{ }}{\rm{mW}}} \right) $ 0.87 0.13 10 $ \left( {{f_{{\text{j}},t}} = {F_3},{p_{{\text{j}},t}} = 4{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_3},{p_{{\text{c}},t}} = 14{\text{ }}{\rm{mW}}} \right) $ 1 0 20 $ \left( {{f_{{\text{j}},t}} = {F_5},{p_{{\text{j}},t}} = 8{\text{ }}{\rm{mW}},{f_{{\text{c}},t}} = {F_5},{p_{{\text{c}},t}} = 28{\text{ }}{\rm{mW}}} \right) $ 0.76 0.24 平均概率 0.94 0.06 注:表中有部分結(jié)果為0,實際上其值為小于${10^{ - 3}}$的值,對結(jié)果的影響極小。為了表述方便,本文將其忽略。 下載: 導(dǎo)出CSV
表 5 各算法耗時(ms)
算法 仿真實驗1 仿真實驗2 仿真實驗3 IPHC算法 12.0 11.1 10.8 PHC算法 14.5 13.8 14.0 Q-learning算法 7.4 5.4 4.8 下載: 導(dǎo)出CSV
-
[1] HAN Hao, XU Yifan, JIN Zhu, et al. Primary-User-Friendly Dynamic Spectrum Anti-Jamming Access: A GAN-Enhanced Deep Reinforcement Learning Approach[J]. IEEE Wireless Communications Letters, 2022, 11(2): 258–262. doi: 10.1109/LWC.2021.3125337. [2] NI Gang, HE Chong, JIN Ronghong. Single-Channel Anti-Jamming Receiver With Harmonic-Based Space-Time Adaptive Processing[J]. IEEE Wireless Communications Letters, 2022, 11(4): 776–780. doi: 10.1109/LWC.2022.3143505. [3] ZHU Xinyu, HUANG Yang, WANG Shaoyu, et al. Dynamic Spectrum Anti-Jamming With Reinforcement Learning Based on Value Function Approximation[J]. IEEE Wireless Communications Letters, 2023, 12(2): 386–390. doi: 10.1109/LWC.2022.3228045. [4] 汪志勇, 張滬寅, 徐寧, 等. 認(rèn)知無線電網(wǎng)絡(luò)中基于隨機學(xué)習(xí)博弈的信道分配與功率控制[J]. 電子學(xué)報, 2018, 46(12): 2870–2877. doi: 10.3969/j.issn.0372-2112.2018.12.008.WANG Zhiyong, ZHANG Huyin, XU Ning, et al. Channel assignment and power control based on stochastic learning game in cognitive radio networks[J]. Acta electronica sinica, 2018, 46(12): 2870–2877. doi: 10.3969/j.issn.0372-2112.2018.12.008. [5] 饒寧, 許華, 蔣磊, 等. 基于多智能體深度強化學(xué)習(xí)的分布式協(xié)同干擾功率分配算法[J]. 電子學(xué)報, 2022, 50(6): 1319–1330. doi: 10.12263/DZXB.20210818.RAO Ning, XU Hua, JIANG Lei, et al. Allocation algorithm of distributed cooperative jamming power based on multi-agent deep reinforcement learning[J]. Acta electronica sinica, 2022, 50(6): 1319–1330. doi: 10.12263/DZXB.20210818. [6] 宋佰霖, 許華, 齊子森, 等. 一種基于深度強化學(xué)習(xí)的協(xié)同通信干擾決策算法[J]. 電子學(xué)報, 2022, 50(6): 1301–1309. doi: 10.12263/DZXB.20210814.SONG Bailin, XU Hua, QI Ziseng, et al. A collaborative communication jamming decision algorithm based on deep reinforcement learning[J]. Acta electronica sinica, 2022, 50(6): 1301–1309. doi: 10.12263/DZXB.20210814. [7] AMURU S, TEKIN C, SCHAAR M V D, et al. Jamming Bandits—A Novel Learning Method for Optimal Jamming[J]. IEEE Transactions on Wireless Communications, 2016, 15(4): 2792–2808. doi: 10.1109/TWC.2015.2510643. [8] ZHUANSUN Shaoshuai, YANG Junan, LIU Hui, et al. A novel jamming strategy-greedy bandit[C]. Proceedings of the 2017 IEEE 9th International Conference on Communication Software and Networks (ICCSN). Guangzhou, China: IEEE, 2017: 1142-1146. doi: 10.1109/ICCSN.2017.8230289. [9] 張君毅, 張冠杰, 楊鴻杰. 針對未知通信目標(biāo)的干擾策略智能生成方法研究[J]. 電子測量技術(shù), 2019, 42(16): 148–153. doi: 10.19651/j.cnki.emt.1903103.ZHANG Junyi, ZHANG Guanjie, YANG Hongjie. Research on intelligent interference strategy generation method for unknown communication target[J]. Electronic measurement technology, 2019, 42(16): 148–153. doi: 10.19651/j.cnki.emt.1903103. [10] ZHUANSUN Shaoshuai, YANG Junan, LIU Hui. An algorithm for jamming strategy using OMP and MAB[J]. EURASIP Journal on Wireless Communications and Networking, 2019(1): 85–95. doi: 10.1186/s13638-019-1414-4. [11] 顓孫少帥, 楊俊安, 劉輝, 等. 采用雙層強化學(xué)習(xí)的干擾決策算法[J]. 西安交通大學(xué)學(xué)報, 2018, 52(2): 63–69. doi: 10.7652/xjtuxb201802010.ZHUANSUN Shaoshuai, YANG Junan, LIU Hui, et al. An algorithm for jamming decision using dual reinforcement learning[J]. Journal of Xi’an jiaotong university, 2018, 52(2): 63–69. doi: 10.7652/xjtuxb201802010. [12] ZHOU Cheng, MA Congshan, LIN Qian, et al. Intelligent bandit learning for jamming strategy generation[J]. Wireless Networks, 2023, 29(5): 2391–2403. doi: 10.1007/s11276-023-03286-9. [13] 李芳, 熊俊, 趙肖迪, 等. 基于快速強化學(xué)習(xí)的無線通信干擾規(guī)避策略[J]. 電子與信息學(xué)報, 2022, 44(11): 3842–3849. doi: 10.11999/JEIT210965.LI Fang, XIONG Jun, ZHAO Xiaodi, et al. Wireless communications interference avoidance based on fast reinforcement learning[J]. Journal of electronics and information technology, 2022, 44(11): 3842–3849. doi: 10.11999/JEIT210965. [14] 潘筱茜, 張姣, 劉琰, 等. 基于深度強化學(xué)習(xí)的多域聯(lián)合干擾規(guī)避[J]. 信號處理, 2022, 38(12): 2572–2581. doi: 10.16798/j.issn.1003-0530.2022.12.012.PAN Xiaoqian, ZHANG Jiao, LIU Yan, et al. Multi-domain joint interference avoidance based on deep reinforcement learning[J]. Journal of signal processing, 2022, 38(12): 2572–2581. doi: 10.16798/j.issn.1003-0530.2022.12.012. [15] TOM V. 9 Reinforcement Learning: The Markov Decision Process Approach[M]. MIT Press. 2021: 133-152. [16] 楊鴻杰, 張君毅. 基于強化學(xué)習(xí)的智能干擾算法研究[J]. 電子測量技術(shù), 2018, 41(20): 49–54. doi: 10.19651/j.cnki/emt.1802113.YANG Hongjie, ZHANG Junyi. Research on intelligent interference algorithm based on reinforcement learning[J]. Electronic measurement technology, 2018, 41(20): 49–54. doi: 10.19651/j.cnki/emt.1802113. [17] MARTIN A, ANDERS H. Reinforcement Learning[M]. Wiley. 2023: 327-349. [18] 裴緒芳, 陳學(xué)強, 呂麗剛, 等. 基于隨機森林強化學(xué)習(xí)的干擾智能決策方法研究[J]. 通信技術(shù), 2019, 52(9): 2118–2124. doi: 10.3969/j.issn.1002-0802.2019.09.009.PEI Xufang, CHEN Xueqiang, LV Ligang, et al. Research on jamming intelligent decision-making method based on random forest reinforcement learning[J]. Communications technology, 2019, 52(9): 2118–2124. doi: 10.3969/j.issn.1002-0802.2019.09.009. [19] 張雙義, 沈箬怡, 陳學(xué)強, 等. 基于強化學(xué)習(xí)的功率與信道聯(lián)合干擾方法研究[J]. 通信技術(shù), 2020, 53(8): 1859–1868. doi: 10.3969/j.issn.1002-0802.2020.08.004.ZHANG Shuangyi, SHEN Ruoyi, CHEN Xueqiang, et al. Joint jamming method of channel and power based on reinforcement learning[J]. Communications technology, 2020, 53(8): 1859–1868. doi: 10.3969/j.issn.1002-0802.2020.08.004. [20] BOWLING M, VELOSO M M. Rational and Convergent Learning in Stochastic Games[C]. Proceedings of the International Joint Conference on Artificial Intelligence. Seattle, WA, 2001: 1021-1026. [21] XU B, ZENG W. A Combat Decision Support Method Based on OODA and Dynamic Graph Reinforcement Learning[C]. Proceedings of the 2022 34th Chinese Control and Decision Conference (CCDC). Hefei, China: IEEE , 2022: 4872-4878. doi: 10.1109/CCDC55256.2022.10033986. -