Bethell, Daniel, Gerasimou, Simos orcid.org/0000-0002-2706-5272, Calinescu, Radu orcid.org/0000-0002-2678-9260 et al. (1 more author) (Accepted: 2025) Safe Reinforcement Learning in Black-Box Environments via Adaptive Shielding. In: 28th European Conference on Artificial Intelligence. (In Press)
Abstract
Safe exploration of reinforcement learning (RL) agents is a critical activity for empowering their deployment in many real-world scenarios. When prior knowledge of the target domain or task is unavailable, training RL agents in unknown, black-box environments unavoidably yields significant safety risks. Our ADVICE (Adaptive Shielding with a Contrastive Autoencoder) novel post-shielding approach operates in continuous state and action spaces, distinguishing safe and unsafe features of state-action pairs during training, and uses this knowledge to safeguard the RL agent from executing actions that yield likely hazardous outcomes. Our comprehensive experimental evaluation shows that ADVICE significantly reduces safety violations («50%) compared to state-of-the-art safe RL exploration approaches, while maintaining a competitive outcome reward for the synthesised safe policy.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | This is an author-produced version of the published paper. Uploaded in accordance with the University’s Research Publications and Open Access policy. |
Dates: |
|
Institution: | The University of York |
Academic Units: | The University of York > Faculty of Sciences (York) > Computer Science (York) |
Depositing User: | Pure (York) |
Date Deposited: | 02 Sep 2025 10:40 |
Last Modified: | 02 Sep 2025 10:40 |
Status: | In Press |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:230836 |
Download
