基于強化學習的5G網(wǎng)絡切片虛擬網(wǎng)絡功能遷移算法
doi: 10.11999/JEIT190290
-
1.
重慶郵電大學通信與信息工程學院 重慶 400065
-
2.
重慶郵電大學移動通信重點實驗室 重慶 400065
Virtual Network Function Migration Algorithm Based on Reinforcement Learning for 5G Network Slicing
-
1.
School of Communication and Information Engineering, Chongqing University ofPost and Telecommunications, Chongqing 400065, China
-
2.
Key Laboratory of Mobile Communication Technology, Chongqing University ofPost and Telecommunications, Chongqing 400065, China
-
摘要:
針對5G網(wǎng)絡切片架構下業(yè)務請求動態(tài)性引起的虛擬網(wǎng)絡功能(VNF)遷移優(yōu)化問題,該文首先建立基于受限馬爾可夫決策過程(CMDP)的隨機優(yōu)化模型以實現(xiàn)多類型服務功能鏈(SFC)的動態(tài)部署,該模型以最小化通用服務器平均運行能耗為目標,同時受限于各切片平均時延約束以及平均緩存、帶寬資源消耗約束。其次,為了克服優(yōu)化模型中難以準確掌握系統(tǒng)狀態(tài)轉移概率及狀態(tài)空間過大的問題,該文提出了一種基于強化學習框架的VNF智能遷移學習算法,該算法通過卷積神經(jīng)網(wǎng)絡(CNN)來近似行為值函數(shù),從而在每個離散的時隙內(nèi)根據(jù)當前系統(tǒng)狀態(tài)為每個網(wǎng)絡切片制定合適的VNF遷移策略及CPU資源分配方案。仿真結果表明,所提算法在有效地滿足各切片QoS需求的同時,降低了基礎設施的平均能耗。
-
關鍵詞:
- 5G網(wǎng)絡切片 /
- 虛擬網(wǎng)絡功能遷移 /
- 強化學習 /
- 資源分配
Abstract:In order to solve the Virtual Network Function (VNF) migration optimization problem caused by the dynamicity of service requests on the 5G network slicing architecture, firstly, a stochastic optimization model based on Constrained Markov Decision Process (CMDP) is established to realize the dynamic deployment of multi-type Service Function Chaining (SFC). This model aims to minimize the average sum operating energy consumption of general servers, and is subject to the average delay constraint for each slicing as well as the average cache, bandwidth resource consumption constraints. Secondly, in order to overcome the issue of having difficulties in acquiring the accurate transition probabilities of the system states and the excessive state space in the optimization model, a VNF intelligent migration learning algorithm based on reinforcement learning framework is proposed. The algorithm approximates the behavior value function by Convolutional Neural Network (CNN), so as to formulate a suitable VNF migration strategy and CPU resource allocation scheme for each network slicing according to the current system state in each discrete time slot. The simulation results show that the proposed algorithm can effectively meet the QoS requirements of each slice while reducing the average energy consumption of the infrastructure.
-
表 1 基于DQN的價值函數(shù)近似
(1) 初始化Q網(wǎng)絡,采用Xavier[14]初始化權重,即令權重的概率分布函數(shù)服從$W \sim U\left[ { - \dfrac{ {\sqrt 6 } }{ {\sqrt { {\upsilon _l} + {\upsilon _{l + 1} } } } },\dfrac{ {\sqrt 6 } }{ {\sqrt { {\upsilon _l} + {\upsilon _{l + 1} } } } } } \right]$的均勻分布,初始化目
標Q網(wǎng)絡,權重為${w^ - } = w$,其中$l$為網(wǎng)絡層數(shù),$\upsilon $為神經(jīng)元個數(shù)(2) 初始化拉格朗日乘子$\beta _i^d \leftarrow 0,\beta _h^q \leftarrow 0,\beta _{h,l}^x \leftarrow 0,$$\forall i \in I,\forall h,l \in H$,初始化經(jīng)驗回放池 (3) for episode $k = 1,2, ···,K$ do (4) 隨機選取一個狀態(tài)初始化${r_1}$ (5) for $t = 1,2, ···,T$ do (6) 隨機選擇一個概率$p$,if $p \ge \varepsilon $ (7) 計算VNF遷移及CPU資源分配策略$a_t^{\rm{*} } = \arg \mathop {\min }\limits_{a \in A} { Q}({r_t},a,w)$ (8) else 選擇一個隨機的行動${a_t} \ne a_t^{\rm{*}}$ (9) 執(zhí)行行動${a_t}$,獲得拉格朗日回報${g^\beta }({r_t},{a_t})$,并觀察下一時刻狀態(tài)${r_{t + 1}}$ (10) 將經(jīng)驗樣本$\left( {{r_t},{a_t},{g^\beta }({r_t},{a_t}),{r_{t + 1}}} \right)$存入經(jīng)驗回放池中 (11) 從經(jīng)驗池中隨機抽取一組Mini-batch的經(jīng)驗樣本$\left( {{r_k},{a_k},{g^\beta }({r_k},{a_k}),{r_{k + 1}}} \right)$ (12) 利用目標Q網(wǎng)絡得到$\mathop {\min }\limits_{ {a'} \in A} { Q}({r_{t + 1} },{a'},{w^ - })$,求得${y_k} = {g^\beta }({r_k},{a_k}) + \gamma \mathop {\min }\limits_{ {a'} \in A} { Q}({r_{t + 1} },{a'},{w^ - })$ (13) 對${\left( { {y_k} - { Q}({r_t},{a_k},w)} \right)^2}$使用梯度下降法對$w$進行更新 (14) 每隔時間長度${T_q}$更新目標Q網(wǎng)絡,即${w^ - } = w$ (15) 利用隨機次梯度法更新拉格朗日乘子${ \beta} :\beta \ge 0$ (16) end for (17) end for 下載: 導出CSV
表 2 基于DQN的VNF在線遷移算法
(1) for $t = 1,2,···,T$ do (2) \*網(wǎng)絡狀態(tài)的監(jiān)測*\ (3) 監(jiān)測當前時隙$t$下的全局狀態(tài)$r(t)$,包括全局隊列狀態(tài)${{Q}}({{t}})$、全局節(jié)點狀態(tài)${{\zeta}} ({{t}})$以及全局鏈路狀態(tài)${{\eta}} ({{t}})$ (4) if ${\zeta _h}(t) = 0{\text{或}}{\eta _{h,l} }(t) = 0$ (5) 在將滿足$B(h,f) = 1{\text{或}}P({f_p}|{f_j})B({f_j},h)B({f_p},l) \ne 0$的所有$\forall f \in F$遷移至其它節(jié)點的基礎上,計算最優(yōu)的VNF遷移策略及
CPU資源分配策略$a_t^{\rm{*} } = \arg \mathop {\min }\limits_{a \in A} { Q}({r_t},a,w)$(6) else (7) 直接計算最優(yōu)的VNF遷移策略及CPU資源分配策略$a_t^{\rm{*} } = \arg \mathop {\min }\limits_{a \in A} { Q}({r_t},a,w)$ (8) 基于最優(yōu)行動$a_t^{\rm{*}}$執(zhí)行VNF的遷移,并進行資源的分配 (9) $t = t + 1$ (10) end for 下載: 導出CSV
表 3 仿真參數(shù)
仿真參數(shù) 仿真值 仿真參數(shù) 仿真值 網(wǎng)絡切片業(yè)務數(shù)量$I$ 3 服務器總臺數(shù)$H$ 8 VNF種類$J$ 10 節(jié)點失效率 服從均值為[0.01,0.02]均勻分布 時隙長度${T_s} $ 10 s 鏈路失效率 服從均值為[0.02,0.04]均勻分布 數(shù)據(jù)包到達過程 獨立同分布的泊松過程 鏈路傳輸時延$\delta $ 0.5 ms 平均數(shù)據(jù)包大小$\overline P$ 500 kbit/packet 服務器最高功率$P_h$ 800 W 節(jié)點緩存空間$\chi $ 300 MB 服務器功耗百分比$u_h$ 0.3 節(jié)點CPU個數(shù)$\kappa $ 8 最大迭代輪數(shù) 2000 單個CPU最大服務速率$\xi $ 25 MB/s 總訓練步長 200000 鏈路帶寬容量Δ 640 Mbps 學習率$\alpha $ 0.0001 折扣因子$\gamma $ 0.9 Mini-batch 8 下載: 導出CSV
表 4 CNN神經(jīng)網(wǎng)絡參數(shù)
網(wǎng)絡層 卷積核大小 卷積步長 卷積核個數(shù) 激活函數(shù) 卷積層1 $7 \times 7$ 2 32 ReLU 卷積層2 $5 \times 5$ 2 64 ReLU 卷積層3 $3 \times 3$ 1 64 ReLU 全連接層1 – – 512 ReLU 全連接層2 – – 122 Linear 下載: 導出CSV
-
GE Xiaohu, TU Song, MAO Guoqiang, et al. 5G ultra-dense cellular networks[J]. IEEE Wireless Communications, 2016, 23(1): 72–79. doi: 10.1109/mwc.2016.7422408 SUGISONO K, FUKUOKA A, and YAMAZAKI H. Migration for VNF instances forming service chain[C]. The 7th IEEE International Conference on Cloud Networking, Tokyo, Japan, 2018: 1–3. doi: 10.1109/CloudNet.2018.8549194. ZHENG Qinghua, LI Rui, LI Xiuqi, et al. Virtual machine consolidated placement based on multi-objective biogeography-based optimization[J]. Future Generation Computer Systems, 2016, 54: 95–122. doi: 10.1016/j.future.2015.02.010 ZHANG Xiaoqing, YUE Qiang, and HE Zhongtang. Dynamic Energy-efficient Virtual Machine Placement Optimization for Virtualized Clouds[M]. JIA Limin, LIU Zhigang, QIN Yong, et al. Proceedings of the 2013 International Conference on Electrical and Information Technologies for Rail Transportation (EITRT2013)-Volume II. Berlin, Heidelberg: Springer, 2014, 288: 439–448. doi: 10.1007/978-3-642-53751-6_47. ERAMO V, AMMAR M, and LAVACCA F G. Migration energy aware reconfigurations of virtual network function instances in NFV architectures[J]. IEEE Access, 2017, 5: 4927–4938. doi: 10.1109/ACCESS.2017.2685437 ERAMO V, MIUCCI E, AMMAR M, et al. An approach for service function chain routing and virtual function network instance migration in network function virtualization architectures[J]. IEEE/ACM Transactions on Networking, 2017, 25(4): 2008–2025. doi: 10.1109/TNET.2017.2668470 WEN Tao, YU Hongfang, SUN Gang, et al. Network function consolidation in service function chaining orchestration[C]. 2016 IEEE International Conference on Communications, Kuala Lumpur, Malaysia, 2016: 1–6. doi: 10.1109/ICC.2016.7510679. YANG Jian, ZHANG Shuben, WU Xiaomin, et al. Online learning-based server provisioning for electricity cost reduction in data center[J]. IEEE Transactions on Control Systems Technology, 2017, 25(3): 1044–1051. doi: 10.1109/TCST.2016.2575801 CHENG Aolin, LI Jian, YU Yuling, et al. Delay-sensitive user scheduling and power control in heterogeneous networks[J]. IET Networks, 2015, 4(3): 175–184. doi: 10.1049/iet-net.2014.0026 LI Rongpeng, ZHAO Zhifeng, CHEN Xianfu, et al. TACT: A transfer actor-critic learning framework for energy saving in cellular radio access networks[J]. IEEE Transactions on Wireless Communications, 2014, 13(4): 2000–2011. doi: 10.1109/TWC.2014.022014.130840 WANG Shangxing, LIU Hanpeng, GOMES P H, et al. Deep reinforcement learning for dynamic multichannel access in wireless networks[J]. IEEE Transactions on Cognitive Communications and Networking, 2018, 4(2): 257–265. doi: 10.1109/TCCN.2018.2809722 HUANG Xiaohong, YUAN Tingting, QIAO Guanghua, et al. Deep reinforcement learning for multimedia traffic control in software defined networking[J]. IEEE Network, 2018, 32(6): 35–41. doi: 10.1109/MNET.2018.1800097 HE Ying, ZHANG Zheng, YU F R, et al. Deep-reinforcement-learning-based optimization for cache-enabled opportunistic interference alignment wireless networks[J]. IEEE Transactions on Vehicular Technology, 2017, 66(11): 10433–10445. doi: 10.1109/TVT.2017.2751641 GLOROT X and BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[C]. The International Conference on Artificial Intelligence and Statistics, Sardinia, 2010: 249–256. PERUMAL V and SUBBIAH S. Power-conservative server consolidation based resource management in cloud[J]. International Journal of Network Management, 2014, 24(6): 415–432. doi: 10.1002/nem.1873 QU Long, ASSI C, SHABAN K, et al. Delay-aware scheduling and resource optimization with network function virtualization[J]. IEEE Transactions on Communications, 2016, 64(9): 3746–3758. doi: 10.1109/TCOMM.2016.2580150 -