基于云推理模型的深度強(qiáng)化學(xué)習(xí)探索策略研究

李晨溪; 曹雷; 陳希亮; 張永亮; 徐志雄; 彭輝; 段理文

doi:10.11999/JEIT170347

基于云推理模型的深度強(qiáng)化學(xué)習(xí)探索策略研究

doi: 10.11999/JEIT170347

1.
(解放軍理工大學(xué)指揮信息系統(tǒng)學(xué)院南京 210007)
2.
(浙江大學(xué)機(jī)械工程學(xué)院杭州 310027)

基金項(xiàng)目:

中電集團(tuán)重點(diǎn)預(yù)研基金(6141B08010101)，中國博士后科學(xué)基金(2015T81081, 2016M602974)，江蘇省自然科學(xué)青年基金(BK20140075)

計(jì)量
- 文章訪問數(shù): 1645
- HTML全文瀏覽量: 254
- PDF下載量: 266
- 被引次數(shù): 0
出版歷程
- 收稿日期: 2017-04-18
- 修回日期: 2017-09-30
- 刊出日期: 2018-01-19

Cloud Reasoning Model-Based Exploration for Deep Reinforcement Learning

1.
(Institute of Command Information System, PLA University of Science and Technology, Nanjing 210007, China)
2.
(College of Mechanical Engineering, Zhejiang University, Hangzhou 310027, China)

Funds:

The Advanced Research of China Electronics Technology Group Corporation (6141B08010101), China Postdoctoral Science Foundation (2015T81081, 2016M602974), The Jiangsu Natural Science Foundation for Youths (BK20140075)

摘要

摘要: 強(qiáng)化學(xué)習(xí)通過與環(huán)境的交互學(xué)得任務(wù)的決策策略，具有自學(xué)習(xí)與在線學(xué)習(xí)的特點(diǎn)。但交互試錯的機(jī)制也往往導(dǎo)致了算法的運(yùn)行效率較低、收斂速度較慢。知識包含了人類經(jīng)驗(yàn)和對事物的認(rèn)知規(guī)律，利用知識引導(dǎo)智能體(agent)的學(xué)習(xí)，是解決上述問題的一種有效方法。該文嘗試將定性規(guī)則知識引入到強(qiáng)化學(xué)習(xí)中，通過云推理模型對定性規(guī)則進(jìn)行表示，將其作為探索策略引導(dǎo)智能體的動作選擇，以減少智能體在狀態(tài)-動作空間探索的盲目性。該文選用OpenAI Gym作為測試環(huán)境，通過在自定義的CartPole-v2中的實(shí)驗(yàn)，驗(yàn)證了提出的基于云推理模型探索策略的有效性，可以提高強(qiáng)化學(xué)習(xí)的學(xué)習(xí)效率，加快收斂速度。
- 云推理 /
- 深度強(qiáng)化學(xué)習(xí) /
- 知識 /
- 探索策略
Abstract: Reinforcement learning which has self-improving and online learning properties gets the policy of tasks through the interaction with environment. But the mechanism of trial-and-error usually leads to a large number of training episodes. Knowledge includes human experience and the cognition of environment. This paper tries to introduce the qualitative rules into the reinforcement learning, and represents these rules through the cloud reasoning model. It is used as the heuristics exploration strategy to guide the action selection. Empirical evaluation is conducted in OpenAI Gym environment called CartPole-v2 and the result shows that using exploration strategy based on the cloud reasoning model significantly enhances the performance of the learning process.
- Cloud reasoning /
- Deep reinforcement learning /
- Knowledge /
- Exploration strategy

HTML全文

參考文獻(xiàn)(13)

MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[OL]. https://arxiv.org /abs/1312.5602v1, 2013.12.

SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. MA: MIT Press, 1998: 3-24. doi: 10.1109/ TNN.1998.712192.

MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human- level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236.

OSBAND I, BLUNDELL C, PRITZEL A, et al. Deep exploration via bootstrapped DQN[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 4026-4034.

BELLEMARE M, SRINIVASAN S, OSTROVSKI G, et al. Unifying count-based exploration and intrinsic motivation[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 1471-1479.

HOUTHOOFT R, CHEN X, DUAN Y, et al. VIME: Variational information maximizing exploration[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 1109-1117.

DAVENPORT T H, PRUSAK L, and PRUSAK L. Working Knowledge: How Organizations Manage What They Know [M]. Boston: Harvard Business School Press, 1997: 1-24. doi: 10.1145/347634.348775.

SANTOS M and BOTELLA G. Dyna-H: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems[J]. Knowledge-Based Systems, 2012, 32(8): 28-36.

BIANCHI R A C, ROS R, and MANTARAS R L D. Improving reinforcement learning by using case based heuristics[C]. Proceedings of the International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development, Burlin, 2009: 75-89.

KUHLMANN G, STONE P, MOONEY R, et al. Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer[C]. Proceedings of the 19th National Conference on Artificial Intelligence Workshop on Supervisory Control of Learning and Adaptive Systems, California, 2004: 30-35.

LI Deyi, CHEUNG D, SHI Xuemei, et al. Uncertainty reasoning based on cloud models in controllers[J]. Computers Mathematics with Applications, 1998, 35(3): 99-123.

SINGH S P. Learning to solve Markovian decision processes [D]. [Ph.D. dissertation], University of Massachusetts, Amherst, 1994: 66-72.

HASSELT H V, GUEZ A, and SILVER D. Deep reinforcement learning with double Q-learning[C]. Proceedings of the 30th AAAI Conference on Articial Intelligence, Phoenix, 2016: 2094-2100.

相關(guān)文章

施引文獻(xiàn)

資源附件(0)

訪問統(tǒng)計(jì)