Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores.

This is a preprint and may not have undergone formal peer review

Blackwell, R.E., Barry, J. and Cohn, A.G. orcid.org/0000-0002-7652-8907 (2024) Towards Reproducible LLM Evaluation: Quantifying Uncertainty in LLM Benchmark Scores. [Preprint - arXiv]

Metadata

Item Type: Preprint
Authors/Creators:
Dates:
  • Published: 4 October 2024
Institution: The University of Leeds
Academic Units: The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds)
Funding Information:
Funder
Grant number
Alan Turing Institute
Not Known
Date Deposited: 16 Feb 2026 16:36
Last Modified: 16 Feb 2026 16:36
Open Archives Initiative ID (OAI ID):

Download

Export

Statistics