Devlin, Sam Michael orcid.org/0000-0002-7769-3090 and Kudenko, Daniel orcid.org/0000-0003-3359-3255 (2012) Dynamic Potential-Based Reward Shaping. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems. 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012), 04-08 Jun 2012 IFAAMAS , ESP , pp. 433-440.
Abstract
Potential-based reward shaping can signicantly improve the time needed to learn an optimal policy and, in multi- agent systems, the performance of the nal joint-policy. It has been proven to not alter the optimal policy of an agent learning alone or the Nash equilibria of multiple agents learn- ing together. However, a limitation of existing proofs is the assumption that the potential of a state does not change dynamically during the learning. This assumption often is broken, espe- cially if the reward-shaping function is generated automati- cally. In this paper we prove and demonstrate a method of ex- tending potential-based reward shaping to allow dynamic shaping and maintain the guarantees of policy invariance in the single-agent case and consistent Nash equilibria in the multi-agent case.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Dates: |
|
Institution: | The University of York |
Academic Units: | The University of York > Faculty of Sciences (York) > Computer Science (York) |
Depositing User: | Pure (York) |
Date Deposited: | 14 Sep 2013 00:40 |
Last Modified: | 02 Apr 2025 23:31 |
Status: | Published |
Publisher: | IFAAMAS |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:75121 |
Downloads
Filename: aamas2012.pdf
Filename: p433_devlin.pdf
Description: p433-devlin