Online learning of shaping rewards in reinforcement learning

Abstract

Potential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains of how to compute the potential function which is used to shape the reward that is given to the learning agent. In this paper, we show how, in the absence of knowledge to define the potential function manually, this function can be learned online in parallel with the actual reinforcement learning process. Two cases are considered. The first solution which is based on the multi-grid discretisation is designed for model-free reinforcement learning. In the second case, the approach for the prototypical model-based R-max algorithm is proposed. It learns the potential function using the free space assumption about the transitions in the environment. Two novel algorithms are presented and evaluated. (C) 2010 Elsevier Ltd. All rights reserved.

Metadata

Item Type:	Article
Authors/Creators:	Grzes, Marek Kudenko, Daniel https://orcid.org/0000-0003-3359-3255
Copyright, Publisher and Additional Information:	The 18th International Conference on Artificial Neural Networks, ICANN 2008
Keywords:	Potential-based reward shaping,Reinforcement learning,Learning heuristic,TIME
Dates:	Published: May 2010
Institution:	The University of York
Academic Units:	The University of York > Faculty of Sciences (York) > Computer Science (York)
Depositing User:	Pure (York)
Date Deposited:	07 Jun 2012 12:21
Last Modified:	19 Sep 2025 23:29
Published Version:	https://doi.org/10.1016/j.neunet.2010.01.001
Status:	Published
Refereed:	Yes
Identification Number:	10.1016/j.neunet.2010.01.001
Related URLs:	http://www.scopus.com/inward/record.url?... http://www.sciencedirect.com/science/art...
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:47293

CORE (COnnecting REpositories)

Online learning of shaping rewards in reinforcement learning

Abstract

Metadata

Download

Accepted Version

Export

Statistics