Liu, X. and Mihaylova, L. orcid.org/0000-0001-5856-2223 (Accepted: 2026) Towards safe reinforcement learning-based traffic control via safety-layer action correction. In: Proceedings of the 29th International Conference on Information Fusion (FUSION). 2026 29th International Conference on Information Fusion (FUSION), 23-26 Jun 2026, Trondheim, Norway. . Institute of Electrical and Electronics Engineers (IEEE). (In Press)
Abstract
Autonomous vehicles (AVs) are expected to significantly transform the operation of modern traffic systems, where their coexistence with human-driven vehicles (HDVs) gives rise to mixed-autonomy traffic. Moreover, advances in machine learning techniques have greatly improved the control capabilities of autonomous vehicles. In particular, reinforcement learning (RL) has been widely studied for autonomous vehicle control in traffic systems to improve efficiency in mixed-autonomy settings. However, the intrinsic need for trial-and-error in standard RL exploration makes it extremely difficult to deploy in real-world traffic environments, as learned policies may select unsafe actions that could lead to severe consequences. Conventional methods commonly incorporate large penalty terms into the reward function when safety constraints are violated; nevertheless, such approaches cannot guarantee the complete avoidance of violations during training. Inspired by the safety shield mechanism, we propose a safe exploration strategy integrated with the Proximal Policy Optimisation (PPO) algorithm for the AV to prevent constraint violations during RL training. We further evaluate the effectiveness of the proposed method in the Simulation of Urban Mobility (SUMO) simulator and compare it with a reward-shaping baseline in terms of cumulative reward and constraint violations. The simulation results demonstrate that our method achieves zero constraint violations while maintaining competitive training performance.
Metadata
| Item Type: | Proceedings Paper |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2026 The Author(s). |
| Keywords: | Safe Exploration; Reinforcement Learning; Autonomous Vehicle; PPO |
| Dates: |
|
| Institution: | The University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > School of Electrical and Electronic Engineering |
| Date Deposited: | 15 May 2026 08:54 |
| Last Modified: | 15 May 2026 08:55 |
| Status: | In Press |
| Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
| Refereed: | Yes |
| Related URLs: | |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:241146 |
Download
Filename: FUSION202-Towards Safe RL Based Traffic Control6.pdf

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)