Policy Invariance under Reward Transformations for Multi-Objective Reinforcement Learning

Abstract

Reinforcement Learning (RL) is a powerful and well-studied Machine Learning paradigm, where an agent learns to improve its performance in an environment by maximising a reward signal. In multi-objective Reinforcement Learning (MORL) the reward signal is a vector, where each component represents the performance on a different objective. Reward shaping is a well-established family of techniques that have been successfully used to improve the performance and learning speed of RL agents in single-objective problems. The basic premise of reward shaping is to add an additional shaping reward to the reward naturally received from the environment, to incorporate domain knowledge and guide an agent’s exploration. Potential-Based Reward Shaping (PBRS) is a specific form of reward shaping that offers additional guarantees. In this paper, we extend the theoretical guarantees of PBRS to MORL problems. Specifically, we provide theoretical proof that PBRS does not alter the true Pareto front in both single- and multi-agent MORL. We also contribute the first published empirical studies of the effect of PBRS in single- and multi-agent MORL problems.

Metadata

Item Type:	Article
Authors/Creators:	Mannion, Patrick Devlin, Sam https://orcid.org/0000-0002-7769-3090 Mason, Karl Duggan, Jim Howley, Enda
Copyright, Publisher and Additional Information:	© 2017 Elsevier B.V. This is an author-produced version of the published paper. Uploaded in accordance with the publisher’s self-archiving policy.
Keywords:	Reinforcement Learning,Multi-Objective,Reward Shaping
Dates:	Accepted: 15 May 2017 Published (online): 16 June 2017 Published: 8 November 2017
Institution:	The University of York
Academic Units:	The University of York > Faculty of Sciences (York) > Computer Science (York)
Date Deposited:	21 Jun 2017 16:45
Last Modified:	12 Dec 2025 14:05
Published Version:	https://doi.org/10.1016/j.neucom.2017.05.090
Status:	Published
Refereed:	Yes
Identification Number:	10.1016/j.neucom.2017.05.090
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:118049

Download

Accepted Version

Filename: MO_PBRS_R3.pdf

Description: MO_PBRS_R3

Licence: CC-BY-NC-ND 2.5

CLICK TO DOWNLOAD

CORE (COnnecting REpositories)

Policy Invariance under Reward Transformations for Multi-Objective Reinforcement Learning

Abstract

Metadata

Download

Accepted Version

Export

Statistics