一级黄色片免费播放|中国黄色视频播放片|日本三级a|可以直接考播黄片影视免费一级毛片

高級搜索

留言板

尊敬的讀者、作者、審稿人, 關(guān)于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復(fù)。謝謝您的支持!

姓名
郵箱
手機(jī)號碼
標(biāo)題
留言內(nèi)容
驗(yàn)證碼

基于云推理模型的深度強(qiáng)化學(xué)習(xí)探索策略研究

李晨溪 曹雷 陳希亮 張永亮 徐志雄 彭輝 段理文

李晨溪, 曹雷, 陳希亮, 張永亮, 徐志雄, 彭輝, 段理文. 基于云推理模型的深度強(qiáng)化學(xué)習(xí)探索策略研究[J]. 電子與信息學(xué)報, 2018, 40(1): 244-248. doi: 10.11999/JEIT170347
引用本文: 李晨溪, 曹雷, 陳希亮, 張永亮, 徐志雄, 彭輝, 段理文. 基于云推理模型的深度強(qiáng)化學(xué)習(xí)探索策略研究[J]. 電子與信息學(xué)報, 2018, 40(1): 244-248. doi: 10.11999/JEIT170347
LI Chenxi, CAO Lei, CHEN Xiliang, ZHANG Yongliang, XU Zhixiong, PENG Hui, DUAN Liwen. Cloud Reasoning Model-Based Exploration for Deep Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2018, 40(1): 244-248. doi: 10.11999/JEIT170347
Citation: LI Chenxi, CAO Lei, CHEN Xiliang, ZHANG Yongliang, XU Zhixiong, PENG Hui, DUAN Liwen. Cloud Reasoning Model-Based Exploration for Deep Reinforcement Learning[J]. Journal of Electronics & Information Technology, 2018, 40(1): 244-248. doi: 10.11999/JEIT170347

基于云推理模型的深度強(qiáng)化學(xué)習(xí)探索策略研究

doi: 10.11999/JEIT170347
基金項(xiàng)目: 

中電集團(tuán)重點(diǎn)預(yù)研基金(6141B08010101),中國博士后科學(xué)基金(2015T81081, 2016M602974),江蘇省自然科學(xué)青年基金(BK20140075)

Cloud Reasoning Model-Based Exploration for Deep Reinforcement Learning

Funds: 

The Advanced Research of China Electronics Technology Group Corporation (6141B08010101), China Postdoctoral Science Foundation (2015T81081, 2016M602974), The Jiangsu Natural Science Foundation for Youths (BK20140075)

  • 摘要: 強(qiáng)化學(xué)習(xí)通過與環(huán)境的交互學(xué)得任務(wù)的決策策略,具有自學(xué)習(xí)與在線學(xué)習(xí)的特點(diǎn)。但交互試錯的機(jī)制也往往導(dǎo)致了算法的運(yùn)行效率較低、收斂速度較慢。知識包含了人類經(jīng)驗(yàn)和對事物的認(rèn)知規(guī)律,利用知識引導(dǎo)智能體(agent)的學(xué)習(xí),是解決上述問題的一種有效方法。該文嘗試將定性規(guī)則知識引入到強(qiáng)化學(xué)習(xí)中,通過云推理模型對定性規(guī)則進(jìn)行表示,將其作為探索策略引導(dǎo)智能體的動作選擇,以減少智能體在狀態(tài)-動作空間探索的盲目性。該文選用OpenAI Gym作為測試環(huán)境,通過在自定義的CartPole-v2中的實(shí)驗(yàn),驗(yàn)證了提出的基于云推理模型探索策略的有效性,可以提高強(qiáng)化學(xué)習(xí)的學(xué)習(xí)效率,加快收斂速度。
  • MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing Atari with deep reinforcement learning[OL]. https://arxiv.org /abs/1312.5602v1, 2013.12.
    SUTTON R S and BARTO A G. Reinforcement Learning: An Introduction[M]. MA: MIT Press, 1998: 3-24. doi: 10.1109/ TNN.1998.712192.
    MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human- level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236.
    OSBAND I, BLUNDELL C, PRITZEL A, et al. Deep exploration via bootstrapped DQN[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 4026-4034.
    BELLEMARE M, SRINIVASAN S, OSTROVSKI G, et al. Unifying count-based exploration and intrinsic motivation[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 1471-1479.
    HOUTHOOFT R, CHEN X, DUAN Y, et al. VIME: Variational information maximizing exploration[C]. Proceedings of the 29th Neural Information Processing Systems, Barcelona, 2016: 1109-1117.
    DAVENPORT T H, PRUSAK L, and PRUSAK L. Working Knowledge: How Organizations Manage What They Know [M]. Boston: Harvard Business School Press, 1997: 1-24. doi: 10.1145/347634.348775.
    SANTOS M and BOTELLA G. Dyna-H: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems[J]. Knowledge-Based Systems, 2012, 32(8): 28-36.
    BIANCHI R A C, ROS R, and MANTARAS R L D. Improving reinforcement learning by using case based heuristics[C]. Proceedings of the International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development, Burlin, 2009: 75-89.
    KUHLMANN G, STONE P, MOONEY R, et al. Guiding a reinforcement learner with natural language advice: Initial results in RoboCup soccer[C]. Proceedings of the 19th National Conference on Artificial Intelligence Workshop on Supervisory Control of Learning and Adaptive Systems, California, 2004: 30-35.
    LI Deyi, CHEUNG D, SHI Xuemei, et al. Uncertainty reasoning based on cloud models in controllers[J]. Computers Mathematics with Applications, 1998, 35(3): 99-123.
    SINGH S P. Learning to solve Markovian decision processes [D]. [Ph.D. dissertation], University of Massachusetts, Amherst, 1994: 66-72.
    HASSELT H V, GUEZ A, and SILVER D. Deep reinforcement learning with double Q-learning[C]. Proceedings of the 30th AAAI Conference on Articial Intelligence, Phoenix, 2016: 2094-2100.
  • 加載中
計(jì)量
  • 文章訪問數(shù):  1645
  • HTML全文瀏覽量:  254
  • PDF下載量:  266
  • 被引次數(shù): 0
出版歷程
  • 收稿日期:  2017-04-18
  • 修回日期:  2017-09-30
  • 刊出日期:  2018-01-19

目錄

    /

    返回文章
    返回