基于生成對抗網(wǎng)絡(luò)輔助多智能體強化學習的邊緣計算網(wǎng)絡(luò)聯(lián)邦切片資源管理
doi: 10.11999/JEIT240773
-
南京理工大學電子工程與光電技術(shù)學院 南京 210094
基金項目: 國家自然科學基金(62001225, 62071236)
Federated Slicing Resource Management in Edge Computing Networks based on GAN-Assisted Multi-Agent Reinforcement Learning
-
School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
Funds: The National Natural Science Foundation of China (62001225, 62071236)
-
摘要: 為滿足動態(tài)邊緣計算網(wǎng)絡(luò)場景下用戶差異化服務(wù)需求,該文提出一種基于生成對抗網(wǎng)絡(luò)(GAN)輔助多智能體強化學習(RL)的聯(lián)邦切片資源管理方案。首先,考慮未知時變信道和隨機用戶流量到達的場景,以同時優(yōu)化長期平均服務(wù)等待時延和服務(wù)滿意率為目標,構(gòu)建聯(lián)合帶寬和計算切片資源管理優(yōu)化問題,并進一步建模為分布式部分可觀測馬爾可夫決策過程 (Dec-POMDP)。其次,運用多智能體競爭雙深度Q網(wǎng)絡(luò)(D3QN)方法,結(jié)合GAN算法對狀態(tài)值分布多模態(tài)學習的優(yōu)勢,以及利用聯(lián)邦學習框架促使智能體合作學習,最終實現(xiàn)僅需共享各智能體生成網(wǎng)絡(luò)加權(quán)參數(shù)即可完成切片資源管理協(xié)同決策。仿真結(jié)果表明,所提方案相較于基準方案能夠在保護用戶隱私的前提下,降低用戶平均服務(wù)等待時延28%以上,且同時提升用戶平均服務(wù)滿意率8%以上。
-
關(guān)鍵詞:
- 邊緣計算 /
- 網(wǎng)絡(luò)切片 /
- 多智能體強化學習 /
- 聯(lián)邦學習 /
- 生成對抗網(wǎng)絡(luò)
Abstract:Objective To meet the differentiated service requirements of users in dynamic Edge Computing (EC) network scenarios, network slicing technology has become a crucial enabling approach for EC networks to offer differentiated edge services. It facilitates flexible allocation and customized management of communication and computation resources by dividing network resources into multiple independent sub-slices. However, traditional slicing resource management methods cannot handle the time-varying wireless channel conditions and the randomness of service arrivals in EC networks. Additionally, existing intelligent slicing resource management schemes based on deep reinforcement learning face challenges, including the need for extensive information sharing, privacy leakage, and unstable training convergence. To address these challenges, the integration of Multi-Agent Reinforcement Learning (MARL) and Federated Learning (FL) allows for experience sharing among agents while protecting users’ privacy. Furthermore, Generative Adversarial Network (GAN) is used to generate state-action value distributions, improving the ability of traditional MARL methods to learn state-value information. By modeling the joint bandwidth and computing slicing resource management optimization problem as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP), collaborative decision-making for slicing resource management is achieved by sharing only the generator network parameters of each agent through the combination of FL and GAN. This study provides a federated collaborative decision-making framework for addressing the slicing resource management problem in EC scenarios and offers theoretical support for enhancing the utilization efficiency of edge slicing resources while preserving users’ privacy. Methods The core concept of the proposed federated slicing resource management scheme is to first employ both GAN technology and the D3QN algorithm for local training within a multi-agent framework. The FL architecture is then used to share the generator network parameters of each agent, facilitating collaborative decision-making for joint bandwidth and computing slicing resource management. In this approach, each Access Point (AP) agent collects data on the total number of tasks to be transmitted and the number of Central Processing Unit (CPU) cycles required for computing tasks in each associated slice as local observations during each training time slot. Each agent subsequently selects the optimal local bandwidth and computing resource management action, obtaining the system reward, which consists of the average service waiting delay and service satisfaction rate, as well as the observation for the next time slot to train the local network. During the training process, each AP agent maintains its own main generator network, target generator network, and discriminator network. In each training episode, the D3QN algorithm is applied to decompose the state-action values, and GAN is used to perform multi-modal learning of the state value distribution, thus completing the local training. After each training episode, the AP agents upload their main generator network parameters for federated aggregation and receive the global main generator network parameters for the next training episode. Results and Discussions By employing the D3QN algorithm and integrating the advantages of GAN within the MARL framework, alongside leveraging FL to share learning experiences among agents while protecting users’ privacy, the proposed scheme reduces the long-term service waiting delay and improves the long-term average service satisfaction rate. Simulation results demonstrate that the proposed scheme achieves the highest average cumulative reward after approximately 500 episodes ( Fig. 3 ), with a notable improvement of at least 10% in convergence performance compared to the baselines. Furthermore, the scheme strikes a better balance between average service waiting delay and average service satisfaction rate (Fig. 4 ). Additionally, it delivers superior performance in terms of user average service satisfaction rate, with at least an 8% improvement under varying user numbers (Fig. 5 ), highlighting its effectiveness in resource management under different task loads. Moreover, the proposed scheme reduces the average service waiting delay by at least 28% (Fig. 6 ) under varying numbers of agents.Conclusions This paper investigates the joint bandwidth and computing slicing resource management problem in dynamic, unknown EC network scenarios and proposes a federated slicing resource management scheme based on GAN-assisted MARL. The proposed scheme enhances the agents’ ability to learn state-value information and promotes collaborative learning by sharing the training network parameters of agents, which ultimately reduces long-term service waiting delays and improves long-term average service satisfaction rates, while protecting users’ privacy. Simulation results show that: (1) The cumulative reward convergence performance of the proposed scheme improves by at least 10% compared to the baselines; (2) The average service satisfaction rate of the proposed scheme is more than 8% higher than that of the baselines under varying user numbers; (3) The average service waiting delay of the proposed scheme is reduced by at least 28% compared to the baselines under varying agent numbers. However, this study only considers ideal, static user scenarios and interference-free communication conditions. Future work should incorporate more real-world dynamics, such as time-varying user mobility and complex multi-user interference. -
1 基于GAN輔助多智能體強化學習的邊緣計算網(wǎng)絡(luò)聯(lián)邦切片資源管理算法
(1) 每個AP智能體初始化的生成網(wǎng)絡(luò)${G_b}$和判別網(wǎng)絡(luò)${D_b}$; (2) 每個AP智能體初始化目標生成網(wǎng)絡(luò)${\hat G_b}$,本地經(jīng)驗回放池
${\mathcal{M}_b}$,粒子個數(shù)$N$;(3) ${T^{{\text{train}}}} \leftarrow 0$; (4) for Episode $v = 1,2, \cdots ,V$ do: (5) 重置環(huán)境 (6) for TS $t = 1,2, \cdots ,T$ do: (7) for AP智能體 $b = 1,2, \cdots ,B$ do: (8) 采樣噪聲${{\boldsymbol{\tau}} _{b,t}}{\text{~}}U\left( {0,1} \right)$,獲取本地觀測${{\boldsymbol{o}}_{b,t}}$,同時輸
入生成網(wǎng)絡(luò)${G_b}$;(9) 得到狀態(tài)值粒子$\left\{ {G_{b,t}^{\text{V}}\left( {{{\boldsymbol{o}}_{b,t}},{{\boldsymbol{\tau}} _{b,t}}} \right)} \right\}$和動作優(yōu)勢值
$G_{b,t,{{\boldsymbol{a}}_{b,t}}}^{\text{A}}\left( {{{\boldsymbol{o}}_{b,t}},{{\boldsymbol{\tau}} _{b,t}}} \right)$(10) 根據(jù)式(10)計算狀態(tài)動作價值函數(shù)${Q_{b,t}}\left( {{{\boldsymbol{o}}_{b,t}},{{\boldsymbol{a}}_{b,t}}} \right)$; (11) 執(zhí)行動作${\boldsymbol{a}}_{b,t}^* \leftarrow {\text{argmax}}{Q_{b,t}}\left( {{{\boldsymbol{o}}_{b,t}},{{\boldsymbol{a}}_{b,t}}} \right)$; (12) 獲取環(huán)境獎勵${r_t}$和下一時隙觀測${{\boldsymbol{o}}_{b,t + 1}}$; (13) 儲存訓練信息$\left\{ {{{\boldsymbol{o}}_{b,t}},{{\boldsymbol{a}}_{b,t}},{{\boldsymbol{o}}_{b,t + 1}},{r_t}} \right\}$至本地經(jīng)驗回放
池${\mathcal{M}_b}$;(14) end for; (15) if ${T^{{\text{train}}}} \ge {T^{{\text{update}}}}$: (16) for AP智能體 $b = 1,2, \cdots ,B$ do: (17) 隨機抽取 $\left\{ {{{\boldsymbol o}_{b,k}},{a_{b,k}},{o_{b,k + 1}},{r_k}} \right\}_{k = 1}^K{\text{~}}{\mathcal{M}_b}$,采樣
噪聲$\left\{ {{{\boldsymbol{\tau}} _{b,k}}} \right\}_{k = 1}^K$和$ {\left\{{\varepsilon }_{b,k}\right\}}_{k=1}^{K} $;(18) 根據(jù)式(14)-式(16)計算損失函數(shù)$J_{b,k}^D$并根據(jù)式
$\begin{array}{*{20}{c}} {\theta _{b,t + 1}^D \leftarrow \theta _{b,t}^D - {\eta ^D}{\nabla _\theta }J_{b,t}^D} \end{array}$更新網(wǎng)絡(luò)${D_b}$;(19) 計算${Q_{b,k}}\left( {{{\boldsymbol o}_{b,k}},{a_{b,k}}} \right)$和${\hat Q_{b,k}}\left( {{{\boldsymbol o}_{b,k}},{a_{b,k}}} \right)$; (20) 根據(jù)式(12)-式(13)計算損失函數(shù)$J_{b,k}^G$更新網(wǎng)絡(luò)${G_b}$,
和網(wǎng)絡(luò)${\hat G_b}$;(21) 根據(jù)式$\begin{array}{*{20}{c}} {\theta _{b,t + 1}^G \leftarrow \theta _{b,t}^G - {\eta ^G}{\nabla _\theta }J_{b,t}^G} \end{array}$更新主網(wǎng)絡(luò)
${G_b}$,根據(jù)式$\hat \theta _{b,t}^G \leftarrow \theta _{b,t}^G\hat \theta _{b,t}^G \leftarrow \theta _{b,t}^G$更新目標網(wǎng)絡(luò)
${\hat G_b}$;(22) end for; (23) end if; (24) end for; (25) 根據(jù)式(17)–式(18)執(zhí)行聯(lián)邦聚合并向所有智能體廣播生成
網(wǎng)絡(luò)參數(shù)$\theta _{b,1,v + 1}^G$;(26) ${T^{{\text{train}}}} \leftarrow {T^{{\text{train}}}} + 1$; (27) end for; 下載: 導出CSV
表 1 仿真參數(shù)設(shè)置
系統(tǒng)參數(shù) 值 AP傳輸功率${P^{\text{A}}}$ 46 dBm 用戶傳輸功率${P^{\text{U}}}$ 23 dBm 時隙持續(xù)時間$\tau $ 10 ms 最大可容忍時延$l_i^{{\text{max}}}$ {5,8,9} ms 上行任務(wù)包大小${x_{{u_b},t}}$ {2.4,12,30} kbit 處理前后數(shù)據(jù)包之比$\beta $ 0.25 計算任務(wù)量${s_{{u_b},t}}$ {0.1,0.2,1} kMc 用戶數(shù)${U_b}$ 20 切片數(shù)量$I$ 3 AP覆蓋半徑 40 m AP數(shù)$B$ 4 AP總帶寬${W_b}$ 36 MHz AP總計算資源${C_b}$ 900 kMc/s 帶寬資源塊大小${\rho ^{\text{B}}}$ 2 MHz 計算資源塊大小${\rho ^{\text{C}}}$ 50 kMc 訓練參數(shù) 值 生成網(wǎng)絡(luò)學習率${\eta ^G}$ 1e–3 判別網(wǎng)絡(luò)學習率${\eta ^D}$ 1e–3 獎勵折扣系數(shù) $\gamma $ 0.8 每回合步數(shù) 100 狀態(tài)值粒子個數(shù)$N$ 30 權(quán)重系數(shù)$\alpha $ 0.5 批大小 32 經(jīng)驗回放池大小 50 000 目標網(wǎng)絡(luò)更新頻率 10$ \tau $ 輸入噪聲維度 10 下載: 導出CSV
-
[1] GHONGE M, MANGRULKAR R S, JAWANDHIYA P M, et al. Future Trends in 5G and 6G: Challenges, Architecture, and Applications[M]. Boca Raton: CRC Press, 2022. [2] DEBBABI F, JMAL R, FOURATI L C, et al. Algorithmics and modeling aspects of network slicing in 5G and Beyonds network: Survey[J]. IEEE Access, 2020, 8: 162748–162762. doi: 10.1109/ACCESS.2020.3022162. [3] MATENCIO-ESCOLAR A, WANG Qi, and CALERO J M A. SliceNetVSwitch: Definition, design and implementation of 5G multi-tenant network slicing in software data paths[J]. IEEE Transactions on Network and Service Management, 2020, 17(4): 2212–2225. doi: 10.1109/TNSM.2020.3029653. [4] 吳大鵬, 鄭豪, 崔亞平. 面向服務(wù)的車輛網(wǎng)絡(luò)切片協(xié)調(diào)智能體設(shè)計[J]. 電子與信息學報, 2020, 42(8): 1910–1917. doi: 10.11999/JEIT190635.WU Dapeng, ZHENG Hao, and CUI Yaping. Service-oriented coordination agent design for network slicing in vehicular networks[J]. Journal of Electronics & Information Technology, 2020, 42(8): 1910–1917. doi: 10.11999/JEIT190635. [5] 唐倫, 魏延南, 譚頎, 等. H-CRAN網(wǎng)絡(luò)下聯(lián)合擁塞控制和資源分配的網(wǎng)絡(luò)切片動態(tài)資源調(diào)度策略[J]. 電子與信息學報, 2020, 42(5): 1244–1252. doi: 10.11999/JEIT190439.TANG Lun, WEI Yannan, TAN Qi, et al. Joint congestion control and resource allocation dynamic scheduling strategy for network slices in heterogeneous cloud raido access network[J]. Journal of Electronics & Information Technology, 2020, 42(5): 1244–1252. doi: 10.11999/JEIT190439. [6] SHAH S D A, GREGORY M A, and LI Shuo. Cloud-native network slicing using software defined networking based multi-access edge computing: A survey[J]. IEEE Access, 2021, 9: 10903–10924. doi: 10.1109/ACCESS.2021.3050155. [7] SHAH S D A, GREGORY M A, and LI Shuo. Toward network-slicing-enabled edge computing: A cloud-native approach for slice mobility[J]. IEEE Internet of Things Journal, 2024, 11(2): 2684–2700. doi: 10.1109/JIOT.2023.3292520. [8] FAN Wenhao, LI Xuewei, TANG Bihua, et al. MEC network slicing: Stackelberg-game-based slice pricing and resource allocation with QoS guarantee[J]. IEEE Transactions on Network and Service Management, 2024, 21(4): 4494–4509. doi: 10.1109/TNSM.2024.3409277. [9] JO?ILO S and DáN G. Joint wireless and edge computing resource management with dynamic network slice selection[J]. IEEE/ACM Transactions on Networking, 2022, 30(4): 1865–1878. doi: 10.1109/TNET.2022.3156178. [10] HUSAIN S, KUNZ A, PRASAD A, et al. Mobile edge computing with network resource slicing for internet-of-things[C]. The 2018 IEEE 4th World Forum on Internet of Things, Singapore, 2018: 1–6. doi: 10.1109/WF-IoT.2018.8355232. [11] SHEN Xuemin, GAO Jie, WU Wen, et al. AI-assisted network-slicing based next-generation wireless networks[J]. IEEE Open Journal of Vehicular Technology, 2020, 1: 45–66. doi: 10.1109/OJVT.2020.2965100. [12] ELSAYED M and EROL-KANTARCI M. Reinforcement learning-based joint power and resource allocation for URLLC in 5G[C]. 2019 IEEE Global Communications Conference, Waikoloa, USA, 2019: 1–6. doi: 10.1109/GLOBECOM38437.2019.9014032. [13] AZIMI Y, YOUSEFI S, KALBKHANI H, et al. Energy-efficient deep reinforcement learning assisted resource allocation for 5G-RAN slicing[J]. IEEE Transactions on Vehicular Technology, 2022, 71(1): 856–871. doi: 10.1109/TVT.2021.3128513. [14] HUA Yuxiu, LI Rongpeng, ZHAO Zhifeng, et al. GAN-powered deep distributional reinforcement learning for resource management in network slicing[J]. IEEE Journal on Selected Areas in Communications, 2020, 38(2): 334–349. doi: 10.1109/JSAC.2019.2959185. [15] ADDAD R A, DUTRA D L C, TALEB T, et al. Toward using reinforcement learning for trigger selection in network slice mobility[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(7): 2241–2253. doi: 10.1109/JSAC.2021.3078501. [16] LI Xuanheng, JIAO Kajia, CHEN Xingyun, et al. Demand-oriented Fog-RAN slicing with self-adaptation via deep reinforcement learning[J]. IEEE Transactions on Vehicular Technology, 2023, 72(11): 14704–14716. doi: 10.1109/TVT.2023.3280242. [17] ZHOU Hao, ELSAYED M, and EROL-KANTARCI M. RAN resource slicing in 5G using multi-agent correlated Q-learning[C]. The 2021 IEEE 32nd Annual International Symposium on Personal, Indoor and Mobile Radio Communications, Helsinki, Finland, 2021: 1179–1184. doi: 10.1109/PIMRC50174.2021.9569358. [18] AKYILDIZ H A, GEMICI ? F, H?KELEK I, et al. Hierarchical reinforcement learning based resource allocation for RAN slicing[J]. IEEE Access, 2024, 12: 75818–75831. doi: 10.1109/ACCESS.2024.3406949. [19] CUI Yaping, SHI Hongji, WANG Ruyan, et al. Multi-agent reinforcement learning for slicing resource allocation in vehicular networks[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(2): 2005–2016. doi: 10.1109/TITS.2023.3314929. [20] HUANG Chen, CAO Jiannong, WANG Shihui, et al. Dynamic resource scheduling optimization with network coding for multi-user services in the internet of vehicles[J]. IEEE Access, 2020, 8: 126988–127003. doi: 10.1109/ACCESS.2020.3001140. [21] LIN Yan, BAO Jinming, ZHANG Yijin, et al. Privacy-preserving joint edge association and power optimization for the internet of vehicles via federated multi-agent reinforcement learning[J]. IEEE Transactions on Vehicular Technology, 2023, 72(6): 8256–8261. doi: 10.1109/TVT.2023.3240682. [22] GUPTA A, MAURYA M K, DHERE K, et al. Privacy-preserving hybrid federated learning framework for mental healthcare applications: Clustered and quantum approaches[J]. IEEE Access, 2024, 12: 145054–145068. doi: 10.1109/ACCESS.2024.3464240. [23] GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 5769–5779. -