Smalley, A.L., Douterelo, I. orcid.org/0000-0002-3410-8576, Chipps, M. et al. (1 more author) (2025) Data-driven prediction of daily Cryptosporidium river concentrations for water resource management: use of catchment-averaged vs spatially distributed features in a Bagging-XGBoost model. Science of The Total Environment, 991. 179794. ISSN 0048-9697
Abstract
Cryptosporidium is a waterborne pathogen which poses a major challenge to water utilities because of its resistance to chlorination and its infectivity at very low concentrations. The ability to make predictions of Cryptosporidium concentrations in rivers would aid significantly in abstraction-based risk management of water resources, but current models are inappropriate for making predictions at the temporal resolutions required to inform abstraction decision-making. This study utilises Cryptosporidium data collected over 7 years at a major river abstraction site in South East England, alongside publicly-available remote sensing data, to train a Bagging-XGBoost model for Cryptosporidium predictive applications at daily timescales. Different combinations of catchment-averaged and spatially distributed datasets were trialled as model inputs. The highest-performing models predicted 69–75 % of >1 oocysts L−1 exceedances, and they also predicted the timing of 78–89 % of higher (>2 oocysts L−1) exceedances. Interpretation of predictions using SHapley Additive exPlanations analysis indicated that sources near (<30 km) to the intake were the most important and identified catchment-averaged rainfall at 1 and 2-day lag time and antecedent Cryptosporidium measurements as significant inputs. The study demonstrates the potential of such models when an unparsimonious approach to feature selection is taken, because of their ability to discern non-linear trends and their resistance to multicollinearity and redundancy in the input data. Such models could improve the ability of water utilities to predict Cryptosporidium peaks and aid abstraction decision-making, thereby reducing the loadings of this pathogen to reservoirs and water treatment works.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2025 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
Keywords: | Cryptosporidium; Water quality; Catchment modelling; Machine learning; Surface water; Abstraction management; Public health |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > School of Mechanical, Aerospace and Civil Engineering |
Funding Information: | Funder Grant number Engineering and Physical Sciences Research Council EP/S023666/1 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 30 Jun 2025 15:23 |
Last Modified: | 30 Jun 2025 15:23 |
Status: | Published |
Publisher: | Elsevier BV |
Refereed: | Yes |
Identification Number: | 10.1016/j.scitotenv.2025.179794 |
Sustainable Development Goals: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:228504 |