{"id":118474,"date":"2021-10-11T22:01:10","date_gmt":"2021-10-11T13:01:10","guid":{"rendered":"http:\/\/175.125.95.178\/ai-in-communication\/18474\/"},"modified":"2026-07-23T05:57:39","modified_gmt":"2026-07-22T20:57:39","slug":"18474","status":"publish","type":"ai-in-communication","link":"http:\/\/ee.presscat.kr\/en\/ai-in-communication\/18474\/","title":{"rendered":"Seungyul Han and Youngchul Sung, &quot;Diversity actor-critic: Sample-aware entropy regularization for sample-efficient exploration,&quot; to be presented at International Conference on Machine Learning (ICML) 2021, Jul. 2021"},"content":{"rendered":"<p style=\"text-align:justify;margin-bottom:11px\"><span style=\"font-size:10pt\"><span style=\"line-height:107%\"><span>In this paper, sample-aware policy entropy regularization is proposed to enhance the conventional policy entropy regularization for better exploration. Exploiting the sample distribution obtainable from the replay buffer, the proposed sample-aware entropy regularization maximizes the entropy of the weighted sum of the policy action distribution and the sample action distribution from the replay buffer for sample-efficient exploration. A practical algorithm named diversity actor-critic (DAC) is developed by applying policy iteration to the objective function with the proposed sample-aware entropy regularization. Numerical results show that DAC significantly outperforms existing recent algorithms for reinforcement learning.<\/span><\/span><\/span><\/p>\n<p style=\"text-align:justify;margin-bottom:11px\">&nbsp;<\/p>\n<p style=\"text-align:justify;margin-bottom:11px\"><span style=\"font-size:10pt\"><span style=\"line-height:107%\"><span><\/p>\n<div class=\"\"><img decoding=\"async\" class=\"\" src=\"\/wp-content\/uploads\/drupal\/\uc131\uc601\ucca0\uad50\uc218\ub2d81.png\" alt=\"\" title=\"\"><\/div>\n<p><\/span><\/span><\/span><\/p>\n<p style=\"text-align:justify;margin-bottom:11px\"><span style=\"font-size:10pt\"><span style=\"line-height:107%\"><span><\/p>\n<div class=\"\"><img decoding=\"async\" class=\"\" src=\"\/wp-content\/uploads\/drupal\/\uc131\uc601\ucca0\uad50\uc218\ub2d82.png\" alt=\"\" title=\"\"><\/div>\n<p><\/span><\/span><\/span><\/p>\n<p style=\"text-align:justify;margin-bottom:11px\"><span style=\"font-size:10pt\"><span style=\"line-height:107%\"><span><\/p>\n<div class=\"\"><img decoding=\"async\" class=\"\" src=\"\/wp-content\/uploads\/drupal\/\uc131\uc601\ucca0\uad50\uc218\ub2d83.png\" alt=\"\" title=\"\"><\/div>\n<p><\/span><\/span><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>768<\/p>\n","protected":false},"featured_media":0,"template":"","class_list":["post-118474","ai-in-communication","type-ai-in-communication","status-publish","hentry"],"acf":[],"_links":{"self":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/ai-in-communication\/118474","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/ai-in-communication"}],"about":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/types\/ai-in-communication"}],"wp:attachment":[{"href":"http:\/\/ee.presscat.kr\/en\/wp-json\/wp\/v2\/media?parent=118474"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}