基于強化學習的5G網(wǎng)絡切片虛擬網(wǎng)絡功能遷移算法

唐倫; 周鈺; 譚頎; 魏延南; 陳前斌

doi:10.11999/JEIT190290

基于強化學習的5G網(wǎng)絡切片虛擬網(wǎng)絡功能遷移算法

doi: 10.11999/JEIT190290

1.
重慶郵電大學通信與信息工程學院重慶 400065
2.
重慶郵電大學移動通信重點實驗室重慶 400065

基金項目: 國家自然科學基金(61571073)，重慶市教委科學技術研究項目(KJZD-M201800601)

詳細信息

作者簡介:
唐倫：男，1973年生，教授，博士生導師，研究方向為新一代無線通信網(wǎng)絡、異構蜂窩網(wǎng)絡、軟件定義無線網(wǎng)絡等

周鈺：男，1993年生，碩士生，研究方向為5G網(wǎng)絡切片資源分配和深度學習

譚頎：女，1995年生，碩士生，研究方向為5G網(wǎng)絡切片、資源分配、隨機優(yōu)化理論

魏延南：男，1995年生，碩士生，研究方向為5G網(wǎng)絡切片、虛擬資源分配，可靠性

陳前斌：男，1967年生，教授，博士生導師，研究方向為個人通信、多媒體信息處理與傳輸、下一代移動通信網(wǎng)絡

通訊作者:
周鈺　137068966@qq.com

中圖分類號: TN929.5
計量
- 文章訪問數(shù): 4424
- HTML全文瀏覽量: 1673
- PDF下載量: 245
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2019-04-25
- 修回日期: 2019-09-11
- 網(wǎng)絡出版日期: 2019-09-19
- 刊出日期: 2020-03-19

Virtual Network Function Migration Algorithm Based on Reinforcement Learning for 5G Network Slicing

1.
School of Communication and Information Engineering, Chongqing University ofPost and Telecommunications, Chongqing 400065, China
2.
Key Laboratory of Mobile Communication Technology, Chongqing University ofPost and Telecommunications, Chongqing 400065, China

Funds: The National Natural Science Foundation of China (61571073), The Science and Technology Research Program of Chongqing Municipal Education Commission (KJZD-M201800601)

摘要

摘要:
針對5G網(wǎng)絡切片架構下業(yè)務請求動態(tài)性引起的虛擬網(wǎng)絡功能(VNF)遷移優(yōu)化問題，該文首先建立基于受限馬爾可夫決策過程(CMDP)的隨機優(yōu)化模型以實現(xiàn)多類型服務功能鏈(SFC)的動態(tài)部署，該模型以最小化通用服務器平均運行能耗為目標，同時受限于各切片平均時延約束以及平均緩存、帶寬資源消耗約束。其次，為了克服優(yōu)化模型中難以準確掌握系統(tǒng)狀態(tài)轉移概率及狀態(tài)空間過大的問題，該文提出了一種基于強化學習框架的VNF智能遷移學習算法，該算法通過卷積神經(jīng)網(wǎng)絡(CNN)來近似行為值函數(shù)，從而在每個離散的時隙內(nèi)根據(jù)當前系統(tǒng)狀態(tài)為每個網(wǎng)絡切片制定合適的VNF遷移策略及CPU資源分配方案。仿真結果表明，所提算法在有效地滿足各切片QoS需求的同時，降低了基礎設施的平均能耗。
- 5G網(wǎng)絡切片 /
- 虛擬網(wǎng)絡功能遷移 /
- 強化學習 /
- 資源分配
Abstract:
In order to solve the Virtual Network Function (VNF) migration optimization problem caused by the dynamicity of service requests on the 5G network slicing architecture, firstly, a stochastic optimization model based on Constrained Markov Decision Process (CMDP) is established to realize the dynamic deployment of multi-type Service Function Chaining (SFC). This model aims to minimize the average sum operating energy consumption of general servers, and is subject to the average delay constraint for each slicing as well as the average cache, bandwidth resource consumption constraints. Secondly, in order to overcome the issue of having difficulties in acquiring the accurate transition probabilities of the system states and the excessive state space in the optimization model, a VNF intelligent migration learning algorithm based on reinforcement learning framework is proposed. The algorithm approximates the behavior value function by Convolutional Neural Network (CNN), so as to formulate a suitable VNF migration strategy and CPU resource allocation scheme for each network slicing according to the current system state in each discrete time slot. The simulation results show that the proposed algorithm can effectively meet the QoS requirements of each slice while reducing the average energy consumption of the infrastructure.
- 5G network slicing /
- Virtual Network Function (VNF) migration /
- Reinforcement learning /
- Resource allocation

HTML全文

圖 1 5G網(wǎng)絡切片架構下的VNF遷移系統(tǒng)場景圖

下載: 全尺寸圖片幻燈片

圖 2 基于DQN的虛擬網(wǎng)絡功能智能遷移學習架構圖

下載: 全尺寸圖片幻燈片

圖 3 各切片數(shù)據(jù)包平均總時延

下載: 全尺寸圖片幻燈片

圖 4 緩存資源和鏈路帶寬資源平均利用率

下載: 全尺寸圖片幻燈片

圖 5 通用服務器平均總功耗

下載: 全尺寸圖片幻燈片

圖 6 平均切片總時延

下載: 全尺寸圖片幻燈片

表 1 基于DQN的價值函數(shù)近似

(1) 初始化Q網(wǎng)絡，采用Xavier^[14]初始化權重，即令權重的概率分布函數(shù)服從$W \sim U\left[ { - \dfrac{ {\sqrt 6 } }{ {\sqrt { {\upsilon _l} + {\upsilon _{l + 1} } } } },\dfrac{ {\sqrt 6 } }{ {\sqrt { {\upsilon _l} + {\upsilon _{l + 1} } } } } } \right]$的均勻分布，初始化目標Q網(wǎng)絡，權重為${w^ - } = w$，其中$l$為網(wǎng)絡層數(shù)，$\upsilon $為神經(jīng)元個數(shù)
(2) 初始化拉格朗日乘子$\beta _i^d \leftarrow 0,\beta _h^q \leftarrow 0,\beta _{h,l}^x \leftarrow 0,$$\forall i \in I,\forall h,l \in H$，初始化經(jīng)驗回放池
(3)　for episode $k = 1,2, ···,K$ do
(4)　　　隨機選取一個狀態(tài)初始化${r_1}$
(5)　　for $t = 1,2, ···,T$ do
(6)　　　隨機選擇一個概率$p$，if $p \ge \varepsilon $
(7)　　　　　計算VNF遷移及CPU資源分配策略$a_t^{\rm{*} } = \arg \mathop {\min }\limits_{a \in A} { Q}({r_t},a,w)$
(8)　　　　 else 選擇一個隨機的行動${a_t} \ne a_t^{\rm{*}}$
(9)　　　　執(zhí)行行動${a_t}$，獲得拉格朗日回報${g^\beta }({r_t},{a_t})$，并觀察下一時刻狀態(tài)${r_{t + 1}}$
(10)　　　　將經(jīng)驗樣本$\left( {{r_t},{a_t},{g^\beta }({r_t},{a_t}),{r_{t + 1}}} \right)$存入經(jīng)驗回放池中
(11)　　　　從經(jīng)驗池中隨機抽取一組Mini-batch的經(jīng)驗樣本$\left( {{r_k},{a_k},{g^\beta }({r_k},{a_k}),{r_{k + 1}}} \right)$
(12)　　　　利用目標Q網(wǎng)絡得到$\mathop {\min }\limits_{ {a'} \in A} { Q}({r_{t + 1} },{a'},{w^ - })$，求得${y_k} = {g^\beta }({r_k},{a_k}) + \gamma \mathop {\min }\limits_{ {a'} \in A} { Q}({r_{t + 1} },{a'},{w^ - })$
(13)　　　　對${\left( { {y_k} - { Q}({r_t},{a_k},w)} \right)^2}$使用梯度下降法對$w$進行更新
(14)　　　　每隔時間長度${T_q}$更新目標Q網(wǎng)絡，即${w^ - } = w$
(15)　　　　利用隨機次梯度法更新拉格朗日乘子${ \beta} :\beta \ge 0$
(16)　　　end for
(17)　end for

下載: 導出CSV

表 2 基于DQN的VNF在線遷移算法

(1)　for $t = 1,2,···,T$ do
(2)　\網(wǎng)絡狀態(tài)的監(jiān)測\
(3)　監(jiān)測當前時隙$t$下的全局狀態(tài)$r(t)$，包括全局隊列狀態(tài)${{Q}}({{t}})$、全局節(jié)點狀態(tài)${{\zeta}} ({{t}})$以及全局鏈路狀態(tài)${{\eta}} ({{t}})$
(4)　if ${\zeta _h}(t) = 0{\text{或}}{\eta _{h,l} }(t) = 0$
(5)　　　在將滿足$B(h,f) = 1{\text{或}}P({f_p}\|{f_j})B({f_j},h)B({f_p},l) \ne 0$的所有$\forall f \in F$遷移至其它節(jié)點的基礎上，計算最優(yōu)的VNF遷移策略及 CPU資源分配策略$a_t^{\rm{*} } = \arg \mathop {\min }\limits_{a \in A} { Q}({r_t},a,w)$
(6)　　　else
(7)　　　直接計算最優(yōu)的VNF遷移策略及CPU資源分配策略$a_t^{\rm{*} } = \arg \mathop {\min }\limits_{a \in A} { Q}({r_t},a,w)$
(8)　基于最優(yōu)行動$a_t^{\rm{*}}$執(zhí)行VNF的遷移，并進行資源的分配
(9)　 $t = t + 1$
(10)　end for

下載: 導出CSV

表 3 仿真參數(shù)

仿真參數(shù)	仿真值	仿真參數(shù)	仿真值
網(wǎng)絡切片業(yè)務數(shù)量$I$	3	服務器總臺數(shù)$H$	8
VNF種類$J$	10	節(jié)點失效率	服從均值為[0.01,0.02]均勻分布
時隙長度${T_s} $	10 s	鏈路失效率	服從均值為[0.02,0.04]均勻分布
數(shù)據(jù)包到達過程	獨立同分布的泊松過程	鏈路傳輸時延$\delta $	0.5 ms
平均數(shù)據(jù)包大小$\overline P$	500 kbit/packet	服務器最高功率$P_h$	800 W
節(jié)點緩存空間$\chi $	300 MB	服務器功耗百分比$u_h$	0.3
節(jié)點CPU個數(shù)$\kappa $	8	最大迭代輪數(shù)	2000
單個CPU最大服務速率$\xi $	25 MB/s	總訓練步長	200000
鏈路帶寬容量Δ	640 Mbps	學習率$\alpha $	0.0001
折扣因子$\gamma $	0.9	Mini-batch	8

下載: 導出CSV

表 4 CNN神經(jīng)網(wǎng)絡參數(shù)

網(wǎng)絡層	卷積核大小	卷積步長	卷積核個數(shù)	激活函數(shù)
卷積層1	$7 \times 7$	2	32	ReLU
卷積層2	$5 \times 5$	2	64	ReLU
卷積層3	$3 \times 3$	1	64	ReLU
全連接層1	–	–	512	ReLU
全連接層2	–	–	122	Linear

下載: 導出CSV

參考文獻(16)

GE Xiaohu, TU Song, MAO Guoqiang, et al. 5G ultra-dense cellular networks[J]. IEEE Wireless Communications, 2016, 23(1): 72–79. doi: 10.1109/mwc.2016.7422408

SUGISONO K, FUKUOKA A, and YAMAZAKI H. Migration for VNF instances forming service chain[C]. The 7th IEEE International Conference on Cloud Networking, Tokyo, Japan, 2018: 1–3. doi: 10.1109/CloudNet.2018.8549194.

ZHENG Qinghua, LI Rui, LI Xiuqi, et al. Virtual machine consolidated placement based on multi-objective biogeography-based optimization[J]. Future Generation Computer Systems, 2016, 54: 95–122. doi: 10.1016/j.future.2015.02.010

ZHANG Xiaoqing, YUE Qiang, and HE Zhongtang. Dynamic Energy-efficient Virtual Machine Placement Optimization for Virtualized Clouds[M]. JIA Limin, LIU Zhigang, QIN Yong, et al. Proceedings of the 2013 International Conference on Electrical and Information Technologies for Rail Transportation (EITRT2013)-Volume II. Berlin, Heidelberg: Springer, 2014, 288: 439–448. doi: 10.1007/978-3-642-53751-6_47.

ERAMO V, AMMAR M, and LAVACCA F G. Migration energy aware reconfigurations of virtual network function instances in NFV architectures[J]. IEEE Access, 2017, 5: 4927–4938. doi: 10.1109/ACCESS.2017.2685437

ERAMO V, MIUCCI E, AMMAR M, et al. An approach for service function chain routing and virtual function network instance migration in network function virtualization architectures[J]. IEEE/ACM Transactions on Networking, 2017, 25(4): 2008–2025. doi: 10.1109/TNET.2017.2668470

WEN Tao, YU Hongfang, SUN Gang, et al. Network function consolidation in service function chaining orchestration[C]. 2016 IEEE International Conference on Communications, Kuala Lumpur, Malaysia, 2016: 1–6. doi: 10.1109/ICC.2016.7510679.

YANG Jian, ZHANG Shuben, WU Xiaomin, et al. Online learning-based server provisioning for electricity cost reduction in data center[J]. IEEE Transactions on Control Systems Technology, 2017, 25(3): 1044–1051. doi: 10.1109/TCST.2016.2575801

CHENG Aolin, LI Jian, YU Yuling, et al. Delay-sensitive user scheduling and power control in heterogeneous networks[J]. IET Networks, 2015, 4(3): 175–184. doi: 10.1049/iet-net.2014.0026

LI Rongpeng, ZHAO Zhifeng, CHEN Xianfu, et al. TACT: A transfer actor-critic learning framework for energy saving in cellular radio access networks[J]. IEEE Transactions on Wireless Communications, 2014, 13(4): 2000–2011. doi: 10.1109/TWC.2014.022014.130840

WANG Shangxing, LIU Hanpeng, GOMES P H, et al. Deep reinforcement learning for dynamic multichannel access in wireless networks[J]. IEEE Transactions on Cognitive Communications and Networking, 2018, 4(2): 257–265. doi: 10.1109/TCCN.2018.2809722

HUANG Xiaohong, YUAN Tingting, QIAO Guanghua, et al. Deep reinforcement learning for multimedia traffic control in software defined networking[J]. IEEE Network, 2018, 32(6): 35–41. doi: 10.1109/MNET.2018.1800097

HE Ying, ZHANG Zheng, YU F R, et al. Deep-reinforcement-learning-based optimization for cache-enabled opportunistic interference alignment wireless networks[J]. IEEE Transactions on Vehicular Technology, 2017, 66(11): 10433–10445. doi: 10.1109/TVT.2017.2751641

GLOROT X and BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[C]. The International Conference on Artificial Intelligence and Statistics, Sardinia, 2010: 249–256.

PERUMAL V and SUBBIAH S. Power-conservative server consolidation based resource management in cloud[J]. International Journal of Network Management, 2014, 24(6): 415–432. doi: 10.1002/nem.1873

QU Long, ASSI C, SHABAN K, et al. Delay-aware scheduling and resource optimization with network function virtualization[J]. IEEE Transactions on Communications, 2016, 64(9): 3746–3758. doi: 10.1109/TCOMM.2016.2580150

施引文獻

資源附件(0)

訪問統(tǒng)計