Thelwall, M. orcid.org/0000-0001-6065-205X, Jiang, X. and Bath, P.A. (2025) Estimating the quality of published medical research with ChatGPT. Information Processing & Management, 62 (4). 104123. ISSN 0306-4573
Abstract
Estimating the quality of published research is important for evaluations of departments, researchers, and job candidates. Citation-based indicators sometimes support these tasks, but do not work for new articles and have low or moderate accuracy. Previous research has shown that ChatGPT can estimate the quality of research articles, with its scores correlating positively with an expert scores proxy in all fields, and often more strongly than citation-based indicators, except for clinical medicine. ChatGPT scores may therefore replace citation-based indicators for some applications. This article investigates the clinical medicine anomaly with the largest dataset yet and a more detailed analysis. The results showed that ChatGPT 4o-mini scores for articles submitted to the UK's Research Excellence Framework (REF) 2021 Unit of Assessment (UoA) 1 Clinical Medicine correlated positively (r = 0.134, n = 9872) with departmental mean REF scores, against a theoretical maximum correlation of r = 0.226. ChatGPT 4o and 3.5 turbo also gave positive correlations. At the departmental level, mean ChatGPT scores correlated more strongly with departmental mean REF scores (r = 0.395, n = 31). For the 100 journals with the most articles in UoA 1, their mean ChatGPT score correlated strongly with their departmental mean REF score (r = 0.495) but negatively with their citation rate (r=-0.148). Journal and departmental anomalies in these results point to ChatGPT being ineffective at assessing the quality of research in prestigious medical journals or research directly affecting human health, or both. Nevertheless, the results give evidence of ChatGPT's ability to assess research quality overall for Clinical Medicine, where it might replace citation-based indicators for new research.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2025 The Authors. Except as otherwise noted, this author-accepted version of a journal article published in Information Processing & Management is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
Keywords: | Research evaluation; Medical research evaluation; ChatGPT; Large Language Models; AI research evaluation |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Funding Information: | Funder Grant number UK RESEARCH AND INNOVATION APP43146 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 12 Mar 2025 09:17 |
Last Modified: | 12 Mar 2025 09:23 |
Status: | Published |
Publisher: | Elsevier |
Refereed: | Yes |
Identification Number: | 10.1016/j.ipm.2025.104123 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:223931 |
Download
Filename: Evaluating the quality of medical research ChatGPT3_R2wo_preprint.pdf
Licence: CC-BY 4.0