The worst-case data-generating probability measure in statistical learning

Abstract

The worst-case data-generating (WCDG) probability measure is introduced as a tool for characterizing the generalization capabilities of machine learning algorithms. Such a WCDG probability measure is shown to be the unique solution to two different optimization problems: (a) The maximization of the expected loss over the set of probability measures on the datasets whose relative entropy with respect to a reference measure is not larger than a given threshold; and (b) The maximization of the expected loss with regularization by relative entropy with respect to the reference measure. Such a reference measure can be interpreted as a prior on the datasets. The WCDG cumulants are finite and bounded in terms of the cumulants of the reference measure. To analyze the concentration of the expected empirical induced by the WCDG probability measure, the notion of (,δ)-robustness of models is introduced. Closed-form expressions are presented for the sensitivity of the expected loss for a fixed model. These tools result in the characterization of a novel expression for the generalization error of arbitrary machine learning algorithms. This exact expression is provided in terms of the WCDG probability measure and leads to an upper bound that is equal to the sum of the mutual information and the lautum information between the models and the datasets, up to a constant factor. This upper bound is achieved by a Gibbs algorithm. This finding reveals that an exploration into the generalization error of the Gibbs algorithm facilitates the derivation of overarching insights applicable to any machine learning algorithm.

Metadata

Item Type:	Article
Authors/Creators:	Zou, X. Perlaza, S.M. Esnaola, J. https://orcid.org/0000-0001-5597-1718 Altman, E. Poor, H.V.
Copyright, Publisher and Additional Information:	© 2024 The Authors. Except as otherwise noted, this author-accepted version of a journal article published in IEEE Journal on Selected Areas in Information Theory is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/
Keywords:	Supervised Machine Learning; Worst-Case; Generalization Gap; Relative Entropy; Gibbs Algorithm; Sensitivity
Dates:	Accepted: 25 March 2024 Published (online): 2 April 2024 Published: 2 April 2024
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Automatic Control and Systems Engineering (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	03 Apr 2024 15:15
Last Modified:	08 Jan 2025 16:23
Status:	Published
Publisher:	Institute of Electrical and Electronics Engineers
Refereed:	Yes
Identification Number:	10.1109/JSAIT.2024.3383281
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:210922

Download

Accepted Version

Filename: JSAIT_accepted_version.pdf

Licence: CC-BY 4.0

CLICK TO DOWNLOAD

[thumbnail of JSAIT_accepted_version.pdf]

CORE (COnnecting REpositories)

The worst-case data-generating probability measure in statistical learning

Abstract

Metadata

Download

Accepted Version

Export

Statistics