This is the latest version of this eprint.
Arnold, KF orcid.org/0000-0002-0911-5029, Davies, V, de Kamps, M orcid.org/0000-0001-7162-4425 et al. (3 more authors) (2020) Reflection on modern methods: generalized linear models for prognosis and intervention—theory, practice and implications for machine learning. International Journal of Epidemiology, 49 (6). pp. 2074-2082. ISSN 0300-5771
Abstract
Prediction and causal explanation are fundamentally distinct tasks of data analysis. In health applications, this difference can be understood in terms of the difference between prognosis (prediction) and prevention/treatment (causal explanation). Nevertheless, these two concepts are often conflated in practice. We use the framework of generalized linear models (GLMs) to illustrate that predictive and causal queries require distinct processes for their application and subsequent interpretation of results. In particular, we identify five primary ways in which GLMs for prediction differ from GLMs for causal inference: (i) the covariates that should be considered for inclusion in (and possibly exclusion from) the model; (ii) how a suitable set of covariates to include in the model is determined; (iii) which covariates are ultimately selected and what functional form (i.e. parameterization) they take; (iv) how the model is evaluated; and (v) how the model is interpreted. We outline some of the potential consequences of failing to acknowledge and respect these differences, and additionally consider the implications for machine learning (ML) methods. We then conclude with three recommendations that we hope will help ensure that both prediction and causal modelling are used appropriately and to greatest effect in health research.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © The Author(s) 2020. Published by Oxford University Press on behalf of the International Epidemiological Association. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
Keywords: | Prediction, causal inference, generalized linear models, directed acyclic graphs, machine learning, artificial intelligence |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) The University of Leeds > Faculty of Environment (Leeds) > School of Geography (Leeds) > Centre for Spatial Analysis & Policy (Leeds) The University of Leeds > Faculty of Medicine and Health (Leeds) > Medicine & Health Faculty Office (Leeds) > Faculty Office Functions (FOMH) (Leeds) > Dean's Office (FOMH) (Leeds) The University of Leeds > Faculty of Medicine and Health (Leeds) > School of Medicine (Leeds) > Leeds Institute of Cardiovascular and Metabolic Medicine (LICAMM) > Clinical & Population Science Dept (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 03 Mar 2020 12:06 |
Last Modified: | 12 Jun 2024 09:43 |
Status: | Published |
Publisher: | Oxford University Press |
Identification Number: | 10.1093/ije/dyaa049 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:157767 |
Available Versions of this Item
-
Generalised linear models for prognosis and intervention: Theory, practice, and implications for machine learning. (deposited 12 Jun 2024 09:44)
- Reflection on modern methods: generalized linear models for prognosis and intervention—theory, practice and implications for machine learning. (deposited 03 Mar 2020 12:06) [Currently Displayed]