Bullward, A, Aljebreen, A orcid.org/0000-0002-4746-3446, Coles, A orcid.org/0000-0002-2657-0090 et al. (2 more authors) (2023) Research Paper: Process Mining and Synthetic Health Data: Reflections and Lessons Learnt. In: Lecture Notes in Business Information Processing. 4th International Conference on Process Mining, 23-28 Oct 2022, Bozen-Bolzano, Italy. Springer Nature , pp. 341-353. ISBN 978-3-031-27814-3
Abstract
Analysing the treatment pathways in real-world health data can provide valuable insight for clinicians and decision-makers. However, the procedures for acquiring real-world data for research can be restrictive, time-consuming and risks disclosing identifiable information. Synthetic data might enable representative analysis without direct access to sensitive data. In the first part of our paper, we propose an approach for grading synthetic data for process analysis based on its fidelity to relationships found in real-world data. In the second part, we apply our grading approach by assessing cancer patient pathways in a synthetic healthcare dataset (The Simulacrum provided by the English National Cancer Registration and Analysis Service) using process mining. Visualisations of the patient pathways within the synthetic data appear plausible, showing relationships between events confirmed in the underlying non-synthetic data. Data quality issues are also present within the synthetic data which reflect real-world problems and artefacts from the synthetic dataset’s creation. Process mining of synthetic data in healthcare is an emerging field with novel challenges. We conclude that researchers should be aware of the risks when extrapolating results produced from research on synthetic data to real-world scenarios and assess findings with analysts who are able to view the underlying data.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2023 The Author(s). This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. |
Keywords: | Data grading; Process mining; Simulacrum; Synthetic data; Taxonomy |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 05 Apr 2023 09:11 |
Last Modified: | 18 Apr 2023 22:54 |
Status: | Published |
Publisher: | Springer Nature |
Identification Number: | 10.1007/978-3-031-27815-0_25 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:197760 |