Stone, J. (Submitted: 2025) Why bigger training sets must yield diminishing returns: An information-theoretic speed limit for AI. [Preprint - Neural Computation] (Submitted)
Abstract
It is now well established that AI systems suffer diminishing performance returns as the number of training items is increased. Given that a fundamental limit on the performance of any system is the amount of Shannon information in its training set, we prove that adding one item to a training set of n items cannot provide more than ∆H = 0.5 log2 ((n + 1)/n) bits of additional information. Because ∆H shrinks rapidly as n increases, it is inevitable that the performance of any AI system suffers from diminishing returns as n is increased.
Metadata
Item Type: | Preprint |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2025 The Author(s). This is an author-produced version of a paper submitted for publication in Neural Computation. Uploaded in accordance with the publisher's self-archiving policy. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Science (Sheffield) > Department of Psychology (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 09 May 2025 15:21 |
Last Modified: | 09 May 2025 15:21 |
Status: | Submitted |
Publisher: | Massachusetts Institute of Technology Press |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:226467 |