Dynamic Potential-Based Reward Shaping

Abstract

Potential-based reward shaping can signicantly improve the time needed to learn an optimal policy and, in multi- agent systems, the performance of the nal joint-policy. It has been proven to not alter the optimal policy of an agent learning alone or the Nash equilibria of multiple agents learn- ing together. However, a limitation of existing proofs is the assumption that the potential of a state does not change dynamically during the learning. This assumption often is broken, espe- cially if the reward-shaping function is generated automati- cally. In this paper we prove and demonstrate a method of ex- tending potential-based reward shaping to allow dynamic shaping and maintain the guarantees of policy invariance in the single-agent case and consistent Nash equilibria in the multi-agent case.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Devlin, Sam Michael https://orcid.org/0000-0002-7769-3090 Kudenko, Daniel https://orcid.org/0000-0003-3359-3255
Dates:	Published: June 2012
Institution:	The University of York
Academic Units:	The University of York > Faculty of Sciences (York) > Computer Science (York)
Date Deposited:	14 Sep 2013 00:40
Last Modified:	03 Nov 2025 00:05
Status:	Published
Publisher:	IFAAMAS
Related URLs:	http://www.ifaamas.org/Proceedings/aamas... http://www.ifaamas.org/Proceedings/aamas...
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:75121