Nafis, N., Esnaola, I., Martinez-Perez, A. orcid.org/0000-0002-8831-6346 et al. (2 more authors) (Submitted: 2025) Critical challenges and guidelines in evaluating synthetic tabular data: a systematic review. [Preprint - arXiv] (Submitted)
Abstract
Generating synthetic tabular data can be challenging, however evaluation of their quality is just as challenging, if not more. This systematic review sheds light on the critical importance of rigorous evaluation of synthetic health data to ensure reliability, relevance, and their appropriate use. Based on screening of 1766 papers and a detailed review of 101 papers we identified key challenges, including lack of consensus on evaluation methods, improper use of evaluation metrics, limited input from domain experts, inadequate reporting of dataset characteristics, and limited reproducibility of results. In response, we provide several guidelines on the generation and evaluation of synthetic data, to allow the community to unlock and fully harness the transformative potential of synthetic data and accelerate innovation.
Metadata
Item Type: | Preprint |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2025 The Author(s). This preprint is made available under a Creative Commons Attribution 4.0 International License. (https://creativecommons.org/licenses/by/4.0/) |
Keywords: | Information and Computing Sciences; Public Health; Health Sciences; Generic health relevance |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Department of Sociological Studies (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 08 Aug 2025 10:11 |
Last Modified: | 08 Aug 2025 10:11 |
Status: | Submitted |
Identification Number: | 10.48550/arxiv.2504.18544 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:230011 |