Devlin, Sam orcid.org/0000-0002-7769-3090 and Kudenko, Daniel orcid.org/0000-0003-3359-3255 (2011) Theoretical Considerations of Potential-Based Reward Shaping for Multi-Agent Systems. In: The 10th International Conference on Autonomous Agents and Multiagent Systems. Tenth International Conference on Autonomous Agents and Multi-Agent Systems, 02 May 2011 ACM , TWN , pp. 225-232.
Abstract
Potential-based reward shaping has previously been proven to both be equivalent to Q-table initialisation and guarantee policy invariance in single-agent reinforcement learning. The method has since been used in multi-agent reinforcement learning without consideration of whether the theoretical equivalence and guarantees hold. This paper extends the existing proofs to similar results in multi-agent systems, providing the theoretical background to explain the success of previous empirical studies. Specically, it is proven that the equivalence to Q-table initialisation remains and the Nash Equilibria of the underlying stochastic game are not modied. Furthermore, we demonstrate empirically that potential-based reward shaping eects exploration and, consequentially, can alter the joint policy converged upon.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Dates: |
|
Institution: | The University of York |
Academic Units: | The University of York > Faculty of Sciences (York) > Computer Science (York) |
Depositing User: | Pure (York) |
Date Deposited: | 19 Feb 2013 16:23 |
Last Modified: | 02 Apr 2025 23:31 |
Status: | Published |
Publisher: | ACM |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:75111 |
Downloads
Filename: aamas11.pdf
Filename: p225_devlin.pdf
Description: p225-devlin