Mounce, S.R. orcid.org/0000-0003-0742-0908, Ellis, K., Edwards, J.M. et al. (3 more authors) (2017) Ensemble decision tree models using RUSBoost for estimating risk of iron failure in drinking water distribution systems. Water Resources Management, 31 (5). pp. 1575-1589. ISSN 0920-4741
Abstract
Safe, trusted drinking water is fundamental to society. Discolouration is a key aesthetic indicator visible to customers. Investigations to understand discolouration and iron failures in water supply systems require assessment of large quantities of disparate, inconsistent, multidimensional data from multiple corporate systems. A comprehensive data matrix was assembled for a seven year period across the whole of a UK water company (serving three million people). From this a novel data driven tool for assessment of iron risk was developed based on a yearly update and ranking procedure, for a subset of the best quality data. To avoid a ‘black box’ output, and provide an element of explanatory (human readable) interpretation, classification decision trees were utilised. Due to the very limited number of iron failures, results from many weak learners were melded into one high-quality ensemble predictor using the RUSBoost algorithm which is designed for class imbalance. Results, exploring simplicity vs predictive power, indicate enough discrimination between variable relationships in the matrix to produce ensemble decision tree classification models with good accuracy for iron failure estimation at District Management Area (DMA) scale. Two model variants were explored: ‘Nowcast’ (situation at end of calendar year) and ‘Futurecast’ (predict end of next year situation from this year’s data). The Nowcast 2014 model achieved 100% True Positive Rate (TPR) and 95.3% True Negative Rate (TNR), with 3.3% of DMAs classified High Risk for un-sampled instances. The Futurecast 2014 achieved 60.5% TPR and 75.9% TNR, with 25.7% of DMAs classified High Risk for un-sampled instances. The output can be used to focus preventive measures to improve iron compliance.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © The Author(s) 2017. This is an author produced version of a paper subsequently published in Water Resources Management. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | Water distribution systems; Water Quality; Iron; Machine Learning; Ensemble Decision Trees; RUSBoost |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Civil and Structural Engineering (Sheffield) |
Funding Information: | Funder Grant number DWYR CYMRU WELSH WATER NONE ENGINEERING AND PHYSICAL SCIENCE RESEARCH COUNCIL (EPSRC) EP/I029346/1 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 22 Feb 2017 10:15 |
Last Modified: | 03 Nov 2017 02:49 |
Published Version: | https://doi.org/10.1007/s11269-017-1595-8 |
Status: | Published |
Publisher: | Springer Verlag |
Refereed: | Yes |
Identification Number: | 10.1007/s11269-017-1595-8 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:112511 |
Downloads
Filename: Mounce Iron risk model_submitted_revision2.pdf
Licence: CC-BY 4.0
Filename: Supplementary material 1.pdf
Licence: CC-BY 4.0
Filename: Supplementary material 2.pdf
Licence: CC-BY 4.0