{"id":126782,"date":"2021-12-19T21:17:55","date_gmt":"2021-12-19T12:17:55","guid":{"rendered":"http:\/\/175.125.95.178\/ai-in-communication\/22860\/"},"modified":"2026-05-23T06:54:29","modified_gmt":"2026-05-22T21:54:29","slug":"22860","status":"publish","type":"ai-in-communication","link":"http:\/\/ee.presscat.kr\/en\/ai-in-communication\/22860\/","title":{"rendered":"Seungyul Han and Youngchul Sung, &quot;A max-min entropy framework for reinforcement learning,&quot; accepted to Conference on Neural Information Processing Systems (NeurIPS) 2021"},"content":{"rendered":"<p style=\"text-align:justify;margin-bottom:11px\"><span style=\"font-size:10pt\"><span style=\"line-height:107%\"><span>Abstract<\/span><\/span><\/span><\/p>\n<p style=\"text-align:justify;margin-bottom:11px\"><span style=\"font-size:10pt\"><span style=\"line-height:107%\"><span>In this paper, we propose a max-min entropy framework for reinforcement learning (RL) to overcome the limitation of the soft actor-critic (SAC) algorithm implementing the maximum entropy RL in model-free sample-based learning. Whereas the maximum entropy RL guides learning for policies to reach states with high entropy in the future, the proposed max-min entropy framework aims to learn to visit states with low entropy and maximize the entropy of these low-entropy states to promote better exploration. For general Markov decision processes (MDPs), an efficient algorithm is constructed under the proposed max-min entropy framework based on disentanglement of exploration and exploitation. Numerical results show that the proposed algorithm yields drastic performance improvement over the current state-of-the-art RL algorithms.<\/span><\/span><\/span><\/p>\n<p style=\"text-align:justify;margin-bottom:11px\">&nbsp;<\/p>\n<p style=\"text-align:justify;margin-bottom:11px\"><span style=\"font-size:10pt\"><span style=\"line-height:107%\"><span><\/p>\n<div class=\"\"><img decoding=\"async\" class=\"\" src=\"\/wp-content\/uploads\/drupal\/\uc131\uc601\ucca01.png\" alt=\"\" title=\"\"><\/div>\n<p><\/span><\/span><\/span><\/p>\n<p style=\"text-align:justify;margin-bottom:11px\"><span style=\"font-size:10pt\"><span style=\"line-height:107%\"><span><\/p>\n<div class=\"\"><img decoding=\"async\" class=\"\" src=\"\/wp-content\/uploads\/drupal\/\uc131\uc601\ucca02.png\" alt=\"\" title=\"\"><\/div>\n<p><\/span><\/span><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>1736<\/p>\n","protected":false},"featured_media":0,"template":"","class_list":["post-126782","ai-in-communication","type-ai-in-communication","status-publish","hentry"],"acf":[],"_links":{"self":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/ai-in-communication\/126782","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/ai-in-communication"}],"about":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/types\/ai-in-communication"}],"wp:attachment":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/media?parent=126782"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}