邊緣輔助的自適應稀疏聯(lián)邦學習優(yōu)化算法
doi: 10.11999/JEIT240741
-
桂林電子科技大學信息與通信學院 桂林 541004
Adaptively Sparse Federated Learning Optimization Algorithm Based on Edge-assisted Server
-
School of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China
-
摘要: 聯(lián)邦學習中,高模型貢獻率的無線網(wǎng)絡設(shè)備通常由于算力不足、能量有限成為掉隊者,進而增加模型聚合時延并影響全局模型精度。針對此問題,該文設(shè)計了聯(lián)合邊緣服務器輔助訓練和模型自適應稀疏聯(lián)邦學習架構(gòu),并提出了基于邊緣輔助訓練的自適應稀疏聯(lián)邦學習優(yōu)化算法。首先,引入邊緣服務器為算力不足或能量受限的設(shè)備提供輔助訓練。構(gòu)建了輔助訓練和通信、計算資源分配的優(yōu)化模型,并采用多種深度強化學習方法求解優(yōu)化的輔助訓練決策。其次,基于輔助訓練決策,在每個通信輪次自適應地對全局模型進行非結(jié)構(gòu)化剪枝,進一步降低設(shè)備的時延和能耗開銷。實驗結(jié)果表明,所提算法極大地減少了掉隊設(shè)備,其模型測試精度優(yōu)于經(jīng)典聯(lián)邦學習的測試精度;利用深度確定性策略梯度(DDPG)優(yōu)化輔助資源分配的算法有效地減少了系統(tǒng)訓練時延,提升了模型訓練效率。
-
關(guān)鍵詞:
- 聯(lián)邦學習 /
- 邊緣服務器 /
- 自適應稀疏 /
- 深度強化學習 /
- 非結(jié)構(gòu)化剪枝
Abstract:Objective Federated Learning (FL) represents a distributed learning framework with significant potential, allowing users to collaboratively train a shared model while retaining data on their devices. However, the substantial differences in computing, storage, and communication capacities across FL devices within complex networks result in notable disparities in model training and transmission latency. As communication rounds increase, a growing number of heterogeneous devices become stragglers due to constraints such as limited energy and computing power, changes in user intentions, and dynamic channel fluctuations, adversely affecting system convergence performance. This study addresses these challenges by jointly incorporating assistance mechanisms and reducing device overhead to mitigate the impact of stragglers on model accuracy and training latency. Methods This paper designs a FL architecture integrating joint edge-assisted training and adaptive sparsity and proposes an adaptively sparse FL optimization algorithm based on edge-assisted training. First, an edge server is introduced to provide auxiliary training for devices with limited computing power or energy. This reduces the training delay of the FL system, enables stragglers to continue participating in the training process, and helps maintain model accuracy. Specifically, an optimization model for auxiliary training, communication, and computing resource allocation is constructed. Several deep reinforcement learning methods are then applied to obtain the optimized auxiliary training decision. Second, based on the auxiliary training decision, unstructured pruning is adaptively performed on the global model during each communication round to further reduce device delay and energy consumption. Results and Discussions The proposed framework and algorithm are evaluated through extensive simulations. The results demonstrate the effectiveness and efficiency of the proposed method in terms of model accuracy and training delay.The proposed algorithm achieves an accuracy rate approximately 5% higher than that of the FL algorithm on both the MNIST and CIFAR-10 datasets. This improvement results from low-computing-power and low-energy devices failing to transmit their local models to the central server during multiple communication rounds, reducing the global model’s accuracy ( Table 3 ).The proposed algorithm achieves an accuracy rate 18% higher than that of the FL algorithm on the MNIST-10 dataset when the data on each device follow a non-IID distribution. Statistical heterogeneity exacerbates model degradation caused by stragglers, whereas the proposed algorithm significantly improves model accuracy under such conditions (Table 4 ).The reward curves of different algorithms are presented (Fig. 7 ). The reward of FL remains constant, while the reward of EAFL_RANDOM fluctuates randomly. ASEAFL_DDPG shows a more stable reward curve once training episodes exceed 120 due to the strong learning and decision-making capabilities of DDPG and DQN. In contrast, EAFL_DQN converges more slowly and maintains a lower reward than the proposed algorithm, mainly due to more precise decision-making in the continuous action space and an exploration mechanism that expands action selection (Fig. 7 ).When the computing power of the edge server increases, the training delay of the FL algorithm remains constant since it does not involve auxiliary training. The training delay of EAFL_RANDOM fluctuates randomly, while the delays of ASEAFL_DDPG and EAFL_DQN decrease. However, ASEAFL_DDPG consistently achieves a lower system training delay than EAFL_DQN under the same MEC computing power conditions (Fig. 9 ).When the communication bandwidth between the edge server and devices increases, the training delay of the FL algorithm remains unchanged as it does not involve auxiliary training. The training delay of EAFL_RANDOM fluctuates randomly, while the delays of ASEAFL_DDPG and EAFL_DQN decrease. ASEAFL_DDPG consistently achieves lower system training delay than EAFL_DQN under the same bandwidth conditions (Fig. 10 ).Conclusions The proposed sparse-adaptive FL architecture based on an edge-assisted server mitigates the straggler problem caused by system heterogeneity from two perspectives. By reducing the number of stragglers, the proposed algorithm achieves higher model accuracy compared with the traditional FL algorithm, effectively decreases system training delay, and improves model training efficiency. This framework holds practical value, particularly for FL deployments where aggregation devices are selected based on statistical characteristics, such as model contribution rates. Straggler issues are common in such FL scenarios, and the proposed architecture effectively reduces their occurrence. Simultaneously, devices with high model contribution rates can continue participating in multiple rounds of federated training, lowering the central server’s frequent device selection overhead. Additionally, in resource-constrained FL environments, edge servers can perform more diverse and flexible tasks, such as partial auxiliary training and partitioned model training. -
表 1 主要符號描述表
符號 含義 符號 含義 Z, N 邊緣服務器個數(shù),每個邊緣服務器范圍內(nèi)被選中參與
聚合的設(shè)備數(shù)$ {p}_{z,n} $ 邊緣服務器z范圍下設(shè)備n的傳輸功率 M 聯(lián)邦學習的總通信輪次 $ k $ 由芯片結(jié)構(gòu)決定的有效開關(guān)電容 $ {D}_{n} $ 每個輪次的本地訓練樣本數(shù)據(jù)量 $ {t}_{m,z,n}^{\mathrm{u}\mathrm{p}\_\mathrm{z}} $ 第m個通信輪次內(nèi)設(shè)備n上傳樣本數(shù)據(jù)到邊緣服務器z的時延 $ {D}_{\omega } $ 聯(lián)邦學習模型參數(shù)數(shù)據(jù)量 $ {t}_{z,n}^{\mathrm{I}\mathrm{O}\mathrm{T}} $ 邊緣服務器z范圍下設(shè)備n的本地訓練時延 r 處理單位比特數(shù)據(jù)所需的CPU周期數(shù) $ {t}_{m,z,n}^{\mathrm{M}\mathrm{E}\mathrm{C}} $ 第m個通信輪次內(nèi)邊緣服務器z訓練設(shè)備n上傳的樣本數(shù)據(jù)的時延 $ {s}_{m} $ 第m個通信輪次的模型稀疏率 $ {t}_{m,z,n}^{\mathrm{u}\mathrm{p}\_\mathrm{c}} $ 第m個通信輪次內(nèi)邊緣服務器z范圍下設(shè)備n上傳模型到聚合服務器的時延 $ {\alpha }_{m,z,n} $ 第m個通信輪次內(nèi)邊緣服務器z范圍下設(shè)備n是否
接受輔助訓練$ {e}_{m,z,n}^{\mathrm{u}\mathrm{p}\_\mathrm{z}} $ 第m個通信輪次內(nèi)設(shè)備n上傳樣本數(shù)據(jù)到邊緣服務器z的能耗 $ {r}_{m,z,n}^{\mathrm{u}\mathrm{p}\_\mathrm{z}} $ 第m個通信輪次內(nèi)設(shè)備n到邊緣服務器z的數(shù)據(jù)傳輸速率 $ {e}_{z,n}^{\mathrm{I}\mathrm{O}\mathrm{T}} $ 邊緣服務器z范圍下設(shè)備n的本地訓練能耗 $ {r}_{m,z,n}^{\mathrm{u}\mathrm{p}\_\mathrm{c}} $ 第m個通信輪次內(nèi)邊緣服務器z范圍下設(shè)備n到聚合服務器
的數(shù)據(jù)傳輸速率$ {e}_{m,z,n}^{\mathrm{u}\mathrm{p}\_\mathrm{c}} $ 第m個通信輪次內(nèi)邊緣服務器z范圍下設(shè)備n上傳模型到聚合服務器的能耗 $ {B}_{m,z,n} $ 第m個通信輪次內(nèi)邊緣服務器z范圍下設(shè)備n分配到的
單位帶寬數(shù)目$ {E}_{z,n}^{\mathrm{s}\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{t}} $ 邊緣服務器z范圍下設(shè)備n的初始能量 $ {B}_{z} $ 輔助訓練系統(tǒng)帶寬 $ {E}_{m,z,n} $ 邊緣服務器z范圍下設(shè)備n在第m個通信輪次的總能耗 $ _{z} $ 輔助訓練系統(tǒng)的單位帶寬數(shù)目 $ {F}_{m,z,n} $ 第m個通信輪次內(nèi)邊緣服務器z用于設(shè)備n上樣本訓練的計算頻率 $ {h}_{z,n} $ 設(shè)備n與邊緣服務器z之間的傳輸信道增益 $ {f}_{z,n}^{\mathrm{I}\mathrm{O}\mathrm{T}} $ 邊緣服務器z范圍下設(shè)備n的計算頻率 $ {\delta }^{2} $ 噪聲功率 $ {F}_{z}^{\mathrm{M}\mathrm{E}\mathrm{C}} $ 邊緣服務器z的計算頻率 下載: 導出CSV
1 基于邊緣輔助訓練的自適應稀疏聯(lián)邦學習
輸入:初始稀疏比例$ {s}_{0} $,最終的稀疏比例$ {s}_{M} $,稀疏頻率$ U $,稀
疏速率控制指數(shù)c,開始稀疏的輪次$ {m}_{0} $,初始掩碼矩陣$ {\mathit{X}}^{{m}_{0}} $,
邊緣服務器z下參與聚合的設(shè)備集合$ {\varPsi }_{z} $,當前設(shè)備選擇下總的通
信輪次M,初始模型$ \omega $,各設(shè)備的初始能量$ {E}_{{z},n}^{\mathrm{s}\mathrm{t}\mathrm{a}\mathrm{r}\mathrm{t}} $、計算頻率
$ {f}_{z,n}^{\mathrm{I}\mathrm{O}\mathrm{T}} $,傳輸功率$ {p}_{{z},n} $,邊緣服務器的計算頻率$ {F}_{{z}}^{\mathrm{M}\mathrm{E}\mathrm{C}} $、邊緣服務
器和設(shè)備間的通信帶寬$ {B}_{{z}} $,每輪訓練數(shù)據(jù)量$ {D}_{n} $,由芯片結(jié)構(gòu)
決定的有效開關(guān)電容$ k $輸出:聯(lián)邦學習模型$ {\omega }^{M} $ (1) 在邊緣服務器z上部署深度強化學習智能體,智能體收集聯(lián)邦
學習在當前設(shè)備選擇下的總通信輪次M、其覆蓋范圍內(nèi)所有參與
模型聚合的設(shè)備的狀態(tài)信息并初始化策略網(wǎng)絡和價值網(wǎng)絡,智能
體的獎勵函數(shù)為式(14),通過不斷地與環(huán)境交互,智能體將學習
到最佳的輔助訓練決策(2) 智能體下發(fā)M個通信輪次的輔助訓練決策,包括輔助訓練標
記$ {\alpha }_{m,z,n} $、傳輸帶寬$ {B}_{m,{z},n} $和CPU頻率$ {F}_{m,{z},n} $(3) For m=1 to M do For n=1 to N do (并行) IF $ {\alpha }_{m,z,n}==1 $ 設(shè)備n依據(jù)傳輸帶寬$ {B}_{m,{z},n} $上傳樣本數(shù)據(jù)到邊緣服務 器,邊緣服務器依據(jù)分配到的算力$ {F}_{m,{z},n} $完成輔助訓
練,模型訓練更新依據(jù)式(21)邊緣服務器上傳輔助訓練模型$ {\omega }_{n} $到聚合服務器 ELSE 設(shè)備n完成本地訓練,模型更新依據(jù)式(21) 設(shè)備n上傳本地訓練模型$ {\omega }_{n} $到聚合服務器 End For (4) 執(zhí)行全局模型聚合$ {\omega }^{m+1}=\displaystyle\sum\nolimits _{n=1}^{N}\dfrac{{D}_{n}}{D}{\omega }_{n}^{m} $ (5) 依據(jù)式(20)計算全局模型稀疏度$ {s}_{m} $,然后根據(jù)稀疏度$ {s}_{m} $計
算全局模型的掩碼矩陣$ {\mathit{X}}^{m} $(6) 全局模型和掩碼矩陣進行哈達瑪積運算${\omega }^{m + } = {\omega }^{m + 1} $$ \odot{\mathit{X}}^{m} $,
獲得非結(jié)構(gòu)化剪枝模型$ {\omega }^{m+1} $作為新的全局模型(7) 將全局稀疏模型$ {\omega }^{m+1} $和掩碼矩陣$ {\mathit{X}}^{m} $下發(fā)給參與訓練的設(shè)
備和邊緣服務器End For 下載: 導出CSV
表 2 各參數(shù)取值表
參數(shù) 取值 邊緣服務器z下參與聚合的設(shè)備數(shù)N 5 系統(tǒng)單位帶寬數(shù)目$ _{z} $ 8 處理單位比特數(shù)據(jù)所需CPU圈數(shù)r 1 000 距離1 m時的參考信道增益h –30 dB 上行鏈路傳輸功率$ {p}_{z,n} $ 0.1 W 噪聲功率$ {\delta }^{2} $ –100 dBm 由芯片結(jié)構(gòu)決定的有效開關(guān)電容$ k $ $ {10}^{-25} $ 邊緣服務器z計算頻率$ {F}_{z}^{\mathrm{M}\mathrm{E}\mathrm{C}} $ 4 GHz 模型初始稀疏系數(shù)$ {s}_{0} $ 0 模型最終稀疏系數(shù)$ {S}_{M} $ 0.6 稀疏頻率$ U $ 1 初始稀疏輪次$ {m}_{0} $ 1 當前設(shè)備選擇下總的通信輪次M 10 稀疏速率控制指數(shù)$ c $ 2 MNIST數(shù)據(jù)集上的學習率Lr_mnist 0.001 CIFAR10數(shù)據(jù)集上的學習率Lr_cifar 0.01 MNIST數(shù)據(jù)集的批量大小Bs_mnist 32 CIFAR10數(shù)據(jù)集的批量大小Bs_cifar 64 本地更新批數(shù)Local_eps 4 下載: 導出CSV
表 3 不同算法模型測試精度對比(%)
算法 數(shù)據(jù)集 通信輪次 1 2 3 4 5 6 7 8 9 10 FL MNIST 62.65 73.41 75.24 76.18 77.00 77.45 78.52 79.18 79.78 80.38 CIFAR10 15.03 47.06 55.79 58.75 59.65 61.1 61.21 62.05 61.83 62.39 ASEAFL_DDPG MNIST 62.65 73.13 77.51 80.04 82.2 82.73 84.17 84.71 85.27 85.65 CIFAR10 15.03 41.01 55.90 60.75 63.23 64.60 65.80 66.33 67.10 67.26 下載: 導出CSV
表 4 Non-IID設(shè)置下MNIST數(shù)據(jù)集上的模型測試精度(%)
算法 通信輪次 1 2 3 4 5 6 7 8 9 10 FL 16.25 15.97 19.53 21.76 26.33 32.98 37.62 40.18 42.69 45.84 ASEAFL_DDPG 16.25 25.72 35.02 42.51 48.81 54.31 57.32 60.17 62.75 63.97 下載: 導出CSV
-
[1] CHENG Nan, WU Shen, WANG Xiucheng, et al. AI for UAV-assisted IoT applications: A comprehensive review[J]. IEEE Internet of Things Journal, 2023, 10(16): 14438–14461. doi: 10.1109/JIOT.2023.3268316. [2] ALSELEK M, ALCARAZ-CALERO J M, and WANG Qi. Dynamic AI-IoT: Enabling updatable AI models in ultralow-power 5G IoT devices[J]. IEEE Internet of Things Journal, 2024, 11(8): 14192–14205. doi: 10.1109/JIOT.2023.3340858. [3] KALAKOTI R, BAHSI H, and N?MM S. Improving IoT security with explainable AI: Quantitative evaluation of explainability for IoT botnet detection[J]. IEEE Internet of Things Journal, 2024, 11(10): 18237–18254. doi: 10.1109/JIOT.2024.3360626. [4] KUMAR R, JAVEED D, ALJUHANI A, et al. Blockchain-based authentication and explainable AI for securing consumer IoT applications[J]. IEEE Transactions on Consumer Electronics, 2024, 70(1): 1145–1154. doi: 10.1109/TCE.2023.3320157. [5] MCMAHAN B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deep networks from decentralized data[C]. The 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, USA, 2017: 1273–1282. [6] LI Xingyu, QU Zhe, TANG Bo, et al. Stragglers are not disasters: A hybrid federated learning framework with delayed gradients[C]. The 21st IEEE International Conference on Machine Learning and Applications (ICMLA), Nassau, Bahamas, 2022: 727–732. doi: 10.1109/ICMLA55696.2022.00121. [7] LIANG Kai and WU Youlong. Two-layer coded gradient aggregation with straggling communication links[C]. 2020 IEEE Information Theory Workshop (ITW), Riva del Garda, Italy, 2021: 1–5. doi: 10.1109/ITW46852.2021.9457626. [8] LANG N, COHEN A, and SHLEZINGER N. Stragglers-aware low-latency synchronous federated learning via layer-wise model updates[J]. arXiv: 2403.18375, 2024. doi: 10.48550/arXiv.2403.18375. [9] MHAISEN N, ABDELLATIF A A, MOHAMED A, et al. Optimal user-edge assignment in hierarchical federated learning based on statistical properties and network topology constraints[J]. IEEE Transactions on Network Science and Engineering, 2022, 9(1): 55–66. doi: 10.1109/TNSE.2021.3053588. [10] FENG Chenyuan, YANG H H, HU Deshun, et al. Mobility-aware cluster federated learning in hierarchical wireless networks[J]. IEEE Transactions on Wireless Communications, 2022, 21(10): 8441–8458. doi: 10.1109/TWC.2022.3166386. [11] LIM W Y B, NG J S, XIONG Zehui, et al. Decentralized edge intelligence: A dynamic resource allocation framework for hierarchical federated learning[J]. IEEE Transactions on Parallel and Distributed Systems, 2022, 33(3): 536–550. doi: 10.1109/TPDS.2021.3096076. [12] KONG J M and SOUSA E. Adaptive ratio-based-threshold gradient sparsification scheme for federated learning[C]. 2023 International Symposium on Networks, Computers and Communications (ISNCC), Doha, Qatar, 2023: 1–5. doi: 10.1109/ISNCC58260.2023.10323644. [13] SU Junshen, WANG Xijun, CHEN Xiang, et al. Joint sparsification and quantization for wireless federated learning under communication constraints[C]. 2023 IEEE 24th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Shanghai, China, 2023: 401–405. doi: 10.1109/SPAWC53906.2023.10304559. [14] PARK S and CHOI W. Regulated subspace projection based local model update compression for communication-efficient federated learning[J]. IEEE Journal on Selected Areas in Communications, 2023, 41(4): 964–976. doi: 10.1109/JSAC.2023.3242722. [15] DHAKAL S, PRAKASH S, YONA Y, et al. Coded federated learning[C]. 2019 IEEE Globecom Workshops (GC Wkshps), Waikoloa, USA, 2019: 1–6. doi: 10.1109/GCWkshps45667.2019.9024521. [16] PRAKASH S, DHAKAL S, AKDENIZ M R, et al. Coded computing for low-latency federated learning over wireless edge networks[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(1): 233–250. doi: 10.1109/JSAC.2020.3036961. [17] SUN Yuchang, SHAO Jiawei, MAO Yuyi, et al. Stochastic coded federated learning: Theoretical analysis and incentive mechanism design[J]. IEEE Transactions on Wireless Communications, 2024, 23(6): 6623–6638. doi: 10.1109/TWC.2023.3334732. [18] BANERJEE S, VU X S, and BHUYAN M. Optimized and adaptive federated learning for straggler-resilient device selection[C]. 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 2022: 1–9. doi: 10.1109/IJCNN55064.2022.9892777. [19] HUANG Peishan, LI Dong, and YAN Zhigang. Wireless federated learning with asynchronous and quantized updates[J]. IEEE Communications Letters, 2023, 27(9): 2393–2397. doi: 10.1109/LCOMM.2023.3294606. [20] YAN Xinru, MIAO Yinbin, LI Xinghua, et al. Privacy-preserving asynchronous federated learning framework in distributed IoT[J]. IEEE Internet of Things Journal, 2023, 10(15): 13281–13291. doi: 10.1109/JIOT.2023.3262546. [21] YANG Zhigang, ZHANG Xuhua, WU Dapeng, et al. Efficient asynchronous federated learning research in the internet of vehicles[J]. IEEE Internet of Things Journal, 2023, 10(9): 7737–7748. doi: 10.1109/JIOT.2022.3230412. [22] DIAO E, DING Jie, and TAROKH V. HeteroFL: Computation and communication efficient federated learning for heterogeneous clients[C]. 9th International Conference on Learning Representations, 2021. [23] AL-ABIAD M S, HASSAN M Z, and HOSSAIN M J. Energy-efficient resource allocation for federated learning in NOMA-enabled and relay-assisted internet of things networks[J]. IEEE Internet of Things Journal, 2022, 9(24): 24736–24753. doi: 10.1109/JIOT.2022.3194546. [24] TANG Jianhang, NIE Jiangtian, ZHANG Yang, et al. Multi-UAV-assisted federated learning for energy-aware distributed edge training[J]. IEEE Transactions on Network and Service Management, 2024, 21(1): 280–294. doi: 10.1109/TNSM.2023.3298220. [25] LI Yuchen, LIANG Weifa, LI Jing, et al. Energy-aware, device-to-device assisted federated learning in edge computing[J]. IEEE Transactions on Parallel and Distributed Systems, 2023, 34(7): 2138–2154. doi: 10.1109/TPDS.2023.3277423. [26] 高晗, 田育龍, 許封元, 等. 深度學習模型壓縮與加速綜述[J]. 軟件學報, 2021, 32(1): 68–92. doi: 10.13328/j.cnki.jos.006096.GAO Han, TIAN Yulong, XU Fengyuan, et al. Survey of deep learning model compression and acceleration[J]. Journal of Software, 2021, 32(1): 68–92. doi: 10.13328/j.cnki.jos.006096. [27] STRIPELIS D, GUPTA U, VER STEEG G, et al. Federated progressive sparsification (purge, merge, tune)+[J]. arXiv: 2204.12430, 2022. doi: 10.48550/arXiv.2204.12430. -