Moglia, V. orcid.org/0009-0001-9124-8030, Smith, L., Cook, G. et al. (2 more authors) (2025) Machine Learning Approaches to the Early Detection of Pancreatic Cancer from Time-Series Primary Care Data. In: Artificial Intelligence in Medicine. 23rd International Conference, AIME 2025, 23-26 Jun 2025, Pavia, Italy. Lecture Notes in Computer Science, 15734 . Springer , pp. 313-322. ISBN 978-3-031-95837-3
Abstract
Pancreatic cancer is notoriously difficult to detect, with diagnosis often relying on symptoms that only develop at advanced stages of the disease. Routine blood tests may signal a developing cancer before these symptoms appear. Limited research has investigated the use of time-varying information from laboratory tests before diagnosis. This study used UK primary care data to compare machine learning approaches for detecting pancreatic cancer at various time-intervals before diagnosis. The machine learning challenge is that such real-world data is irregular and sparse and therefore difficult to use for model creation.
In this study, deep learning time-series models (LSTM and GRU-D) were compared to a feature engineering approach. We found that while predictive performance was strongest at diagnosis date (maximum AUROC of 0.85), cases could be detected 18 months before diagnosis, with GRU-D achieving an AUROC of 0.57. Closer to the diagnosis date, where diagnostic signals are stronger, feature engineering approaches outperformed the deep learning models. However, further from diagnosis, the deep learning models, particularly the GRU-D, maintained marginally better performance. Calibration of the models was good at the diagnosis date but was poor across all models at a lead time of greater than 6 months. This study demonstrates that routine blood tests show some predictive capacity for earlier detection of pancreatic cancer. However, this capacity quickly decreases further from diagnosis date, with poor discrimination beyond 6 months. These results should be of interest to researchers interested in using machine learning and electronic health records to support earlier diagnosis of cancer.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG. This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use (https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms), but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/978-3-031-95838-0_31. |
Keywords: | Machine Learning, Electronic Health Records, Time-Series, Pancreatic Cancer |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 25 Jun 2025 10:12 |
Last Modified: | 25 Jun 2025 13:48 |
Status: | Published |
Publisher: | Springer |
Series Name: | Lecture Notes in Computer Science |
Identification Number: | 10.1007/978-3-031-95838-0_31 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:228286 |
Download
Filename: final_version.pdf
