Zou, X., Perlaza, S.M., Esnaola, J. orcid.org/0000-0001-5597-1718 et al. (2 more authors) (2024) The worst-case data-generating probability measure in statistical learning. IEEE Journal on Selected Areas in Information Theory, 5. pp. 175-189. ISSN 2641-8770
Abstract
The worst-case data-generating (WCDG) probability measure is introduced as a tool for characterizing the generalization capabilities of machine learning algorithms. Such a WCDG probability measure is shown to be the unique solution to two different optimization problems: (a) The maximization of the expected loss over the set of probability measures on the datasets whose relative entropy with respect to a reference measure is not larger than a given threshold; and (b) The maximization of the expected loss with regularization by relative entropy with respect to the reference measure. Such a reference measure can be interpreted as a prior on the datasets. The WCDG cumulants are finite and bounded in terms of the cumulants of the reference measure. To analyze the concentration of the expected empirical induced by the WCDG probability measure, the notion of (,δ)-robustness of models is introduced. Closed-form expressions are presented for the sensitivity of the expected loss for a fixed model. These tools result in the characterization of a novel expression for the generalization error of arbitrary machine learning algorithms. This exact expression is provided in terms of the WCDG probability measure and leads to an upper bound that is equal to the sum of the mutual information and the lautum information between the models and the datasets, up to a constant factor. This upper bound is achieved by a Gibbs algorithm. This finding reveals that an exploration into the generalization error of the Gibbs algorithm facilitates the derivation of overarching insights applicable to any machine learning algorithm.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2024 The Authors. Except as otherwise noted, this author-accepted version of a journal article published in IEEE Journal on Selected Areas in Information Theory is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
Keywords: | Supervised Machine Learning; Worst-Case; Generalization Gap; Relative Entropy; Gibbs Algorithm; Sensitivity |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Automatic Control and Systems Engineering (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 03 Apr 2024 15:15 |
Last Modified: | 08 Jan 2025 16:23 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers |
Refereed: | Yes |
Identification Number: | 10.1109/JSAIT.2024.3383281 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:210922 |