Riley, Joshua orcid.org/0000-0002-9403-3705 (Accepted: 2020) Assured Multi-Agent Reinforcement Learning Using Quantitative Verification. In: DCAART 2021, 04-06 Feb 2021, Online. (Submitted)
Abstract
In multi-agent reinforcement learning, several agents converge together towards optimal policies that solve complex decision-making problems. This convergence process is inherently stochastic, meaning that its use in safety-critical domains can be problematic. To address this issue, we introduce a new approach that combines multi-agent reinforcement learning with a formal verification technique termed quantitative verification. Our assured multi-agent reinforcement learning approach constrains agent behaviours in ways that ensure the satisfaction of requirements associated with the safety, reliability, and other non-functional aspects of the decision-making problem being solved. The approach comprises three stages. First, it models the problem as an abstract Markov decision process, allowing quantitative verification to be applied. Next, this abstract model is used to synthesise a policy which satisfies safety, reliability, and performance constraints. Finally, the synthesised policy is used to constrain agent behaviour within the low-level problem with a greatly lowered risk of constraint violations.
Metadata
Item Type: | Conference or Workshop Item |
---|---|
Authors/Creators: |
|
Keywords: | Reinforcement Learning, Multi-Agent System, Quantitative Verification, Assurance, Multi-Agent Reinforcement Learning |
Dates: |
|
Institution: | The University of York |
Academic Units: | The University of York > Faculty of Sciences (York) > Computer Science (York) > Artificial Intelligence (York) |
Funding Information: | Funder Grant number DSTL UNSPECIFIED |
Depositing User: | Mr Joshua Riley |
Date Deposited: | 12 Jan 2021 12:20 |
Last Modified: | 12 Jan 2021 12:20 |
Status: | Submitted |
Refereed: | Yes |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:169945 |
Download
Filename: DCAART_2021_4_CR (4).pdf
Description: A conference paper submitted to DCAART