Towards safe reinforcement learning-based traffic control via safety-layer action correction

Liu, X. and Mihaylova, L. orcid.org/0000-0001-5856-2223 (Accepted: 2026) Towards safe reinforcement learning-based traffic control via safety-layer action correction. In: Proceedings of the 29th International Conference on Information Fusion (FUSION). 2026 29th International Conference on Information Fusion (FUSION), 23-26 Jun 2026, Trondheim, Norway. . Institute of Electrical and Electronics Engineers (IEEE). (In Press)

Abstract

Autonomous vehicles (AVs) are expected to significantly transform the operation of modern traffic systems, where their coexistence with human-driven vehicles (HDVs) gives rise to mixed-autonomy traffic. Moreover, advances in machine learning techniques have greatly improved the control capabilities of autonomous vehicles. In particular, reinforcement learning (RL) has been widely studied for autonomous vehicle control in traffic systems to improve efficiency in mixed-autonomy settings. However, the intrinsic need for trial-and-error in standard RL exploration makes it extremely difficult to deploy in real-world traffic environments, as learned policies may select unsafe actions that could lead to severe consequences. Conventional methods commonly incorporate large penalty terms into the reward function when safety constraints are violated; nevertheless, such approaches cannot guarantee the complete avoidance of violations during training. Inspired by the safety shield mechanism, we propose a safe exploration strategy integrated with the Proximal Policy Optimisation (PPO) algorithm for the AV to prevent constraint violations during RL training. We further evaluate the effectiveness of the proposed method in the Simulation of Urban Mobility (SUMO) simulator and compare it with a reward-shaping baseline in terms of cumulative reward and constraint violations. The simulation results demonstrate that our method achieves zero constraint violations while maintaining competitive training performance.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Liu, X. Mihaylova, L. https://orcid.org/0000-0001-5856-2223
Copyright, Publisher and Additional Information:	© 2026 The Author(s).
Keywords:	Safe Exploration; Reinforcement Learning; Autonomous Vehicle; PPO
Dates:	Accepted: 15 April 2026
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > School of Electrical and Electronic Engineering
Date Deposited:	15 May 2026 08:54
Last Modified:	15 May 2026 08:55
Status:	In Press
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Refereed:	Yes
Related URLs:	Conference
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:241146

Download

Accepted Version

Under temporary embargo

Filename: FUSION202-Towards Safe RL Based Traffic Control6.pdf

Request a copy

CORE (COnnecting REpositories)

Towards safe reinforcement learning-based traffic control via safety-layer action correction

Abstract

Metadata

Download

Accepted Version

Export

Statistics