von Asmuth, E.G.J. orcid.org/0000-0001-6256-7457, Halkes, C.J.M., Versluis, J. orcid.org/0000-0003-2372-1663 et al. (16 more authors) (2026) An extraction pipeline for analysis of hematopoietic stem cell transplantation data. Bone Marrow Transplantation. ISSN: 0268-3369
Abstract
Many clinical studies are based on registry analyses, but exact approaches of data extraction and pre-processing are rarely included, while this is critical for reliability and reproducibility of results. We aimed to develop an open-source data extraction pipeline which generates a ready-to-analyze dataset focused on relevant determinants of outcomes after hematopoietic stem cell transplantation (HSCT). This pipeline was developed using EBMT registry data, including 54,457 allogeneic and 63,651 autologous HSCT procedures. The pipeline determines HLA matching from molecular data, assesses cytogenetic risk for acute myeloid leukemia and myelodysplastic syndrome, processes molecular markers, assigns the hematopoietic cell transplantation comorbidity index (HCT-CI) based on comorbidities, and maps disease states to simplified categories. We prospectively assessed the recently developed disease risk stratification system (DRSS), showing that the pipeline produces consistent results with previous studies. The hazard ratio correlation between our cohort and the original DRSS derivation cohort was 0.92 with a 2-year AUC of 0.616, indicating similar effects and predictive performance. We aim to establish a new standard by promoting transparent, standardized and uniform extraction of registry data, enhancing reproducibility in registry studies.
Metadata
| Item Type: | Article |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2026 The Authors. Except as otherwise noted, this author-accepted version of a journal article published in Bone Marrow Transplant is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
| Keywords: | Medical research; Clinical trial design |
| Dates: |
|
| Institution: | The University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Medicine, Dentistry and Health (Sheffield) > School of Medicine and Population Health |
| Date Deposited: | 20 Mar 2026 09:52 |
| Last Modified: | 20 Mar 2026 09:52 |
| Status: | Published online |
| Publisher: | Springer Science and Business Media LLC |
| Refereed: | Yes |
| Identification Number: | 10.1038/s41409-026-02818-z |
| Related URLs: | |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:239337 |
Downloads
Filename: Methods_paper_v6_final.pdf
Licence: CC-BY 4.0
Filename: Methods_supplemental_figures.pdf
Licence: CC-BY 4.0
Filename: suppl_file_1_disease_status_mapping.pdf
Licence: CC-BY 4.0
Filename: Data_extraction_supp_file_2.pdf
Licence: CC-BY 4.0

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)