Path planning with user route preference - A reward surface approximation approach using orthogonal Legendre polynomials

Srinivasan, AR orcid.org/0000-0001-9280-7837 and Chakraborty, S (2016) Path planning with user route preference - A reward surface approximation approach using orthogonal Legendre polynomials. In: Proceedings of 2016 IEEE International Conference on Automation Science and Engineering (CASE). 2016 IEEE International Conference on Automation Science and Engineering (CASE), 21-25 Aug 2016, Fort Worth, Texas, USA. IEEE ISBN 978-1-5090-2409-4

Abstract

As self driving cars become more ubiquitous, users would look for natural ways of informing the car AI about their personal choice of routes. This choice is not always dictated by straightforward logic such as shortest distance or shortest time, and can be influenced by hidden factors, such as comfort and familiarity. This paper presents a path learning algorithm for such applications, where from limited positive demonstrations, an autonomous agent learns the user's path preference and honors that choice in its route planning, but has the capability to adopt alternate routes, if the original choice(s) become impractical. The learning problem is modeled as a Markov decision process. The states (way-points) and actions (to move from one way-point to another) are pre-defined according to the existing network of paths between the origin and destination and the user's demonstration is assumed to be a sample of the preferred path. The underlying reward function which captures the essence of the demonstration is computed using an inverse reinforcement learning algorithm and from that the entire path mirroring the expert's demonstration is extracted. To alleviate the problem of state space explosion when dealing with a large state space, the reward function is approximated using a set of orthogonal polynomial basis functions with a fixed number of coefficients regardless of the size of the state space. A six fold reduction in total learning time is achieved compared to using simple basis functions, that has dimensionality equal to the number of distinct states.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Srinivasan, AR https://orcid.org/0000-0001-9280-7837 Chakraborty, S
Copyright, Publisher and Additional Information:	© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Keywords:	Learning (artificial intelligence) , Path planning , Databases , Markov processes , Planning , Real-time systems , Automobiles
Dates:	Published: August 2016 Published (online): 17 November 2016 Accepted: 10 June 2016
Institution:	The University of Leeds
Academic Units:	The University of Leeds > Faculty of Environment (Leeds) > Institute for Transport Studies (Leeds) > ITS: Safety and Technology (Leeds)
Depositing User:	Symplectic Publications
Date Deposited:	22 Jun 2020 12:24
Last Modified:	22 Jun 2020 12:24
Status:	Published
Publisher:	IEEE
Identification Number:	10.1109/coase.2016.7743527
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:162141

CORE (COnnecting REpositories)

Path planning with user route preference - A reward surface approximation approach using orthogonal Legendre polynomials

Abstract

Metadata

Download

Accepted Version

Export

Statistics