Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores.

This is a preprint and may not have undergone formal peer review

Metadata

Item Type:	Preprint
Authors/Creators:	Blackwell, R.E. Barry, J. Cohn, A.G. https://orcid.org/0000-0002-7652-8907
Dates:	Published: 4 October 2024
Institution:	The University of Leeds
Academic Units:	The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds)
Funding Information:	Funder Grant number Alan Turing Institute Not Known
Date Deposited:	16 Feb 2026 16:36
Last Modified:	16 Feb 2026 16:36
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:237881

Filename: 2410.03492v2.pdf

Licence: CC-BY 4.0

CLICK TO DOWNLOAD

CORE (COnnecting REpositories)