Cohn, A orcid.org/0000-0002-7652-8907, Hernández-Orallo, J, Mboli, JS et al. (3 more authors) (2022) A Framework for Categorising AI Evaluation Instruments. In: Proceedings of the Workshop on AI Evaluation Beyond Metrics co-located with the 31st International Joint Conference on Artificial Intelligence (IJCAI-ECAI 2022). Workshop on AI Evaluation Beyond Metrics (EBeM 2022), 25 Jul 2022, Vienna, Austria. CEUR Workshop Proceedings
Abstract
The current and future capabilities of Artificial Intelligence (AI) are typically assessed with an ever increasing number of benchmarks, competitions, tests and evaluation standards, which are meant to work as AI evaluation instruments (EI). These EIs are not only increasing in number, but also in complexity and diversity, making it hard to understand this evaluation landscape in a meaningful way. In this paper we present an approach for categorising EIs using a set of 18 facets, accompanied by a rubric to allow anyone to apply the framework to any existing or new EI. We apply the rubric to 23 EIs in different domains through a team of raters, and analyse how consistent the rubric is and how well it works to distinguish between EIs and map the evaluation landscape in AI.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2022 Copyright for this paper by its authors. This is an open access article under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0) (https://creativecommons.org/licenses/by/4.0/) |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Funding Information: | Funder Grant number Alan Turing Institute No ref given |
Depositing User: | Symplectic Publications |
Date Deposited: | 09 Aug 2022 12:37 |
Last Modified: | 09 Aug 2022 12:37 |
Published Version: | http://ceur-ws.org/Vol-3169/ |
Status: | Published |
Publisher: | CEUR Workshop Proceedings |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:189769 |