This is a preprint and may not have undergone formal peer review
Blackwell, R.E., Barry, J. and Cohn, A.G. orcid.org/0000-0002-7652-8907 (2024) Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores. [Preprint - arXiv]
Metadata
| Item Type: | Preprint |
|---|---|
| Authors/Creators: |
|
| Dates: |
|
| Institution: | The University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
| Funding Information: | Funder Grant number Alan Turing Institute Not Known |
| Date Deposited: | 16 Feb 2026 16:36 |
| Last Modified: | 16 Feb 2026 16:36 |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:237881 |

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)