Thelwall, M. orcid.org/0000-0001-6065-205X (2025) Evaluating research quality with large language models: an analysis of ChatGPT’s effectiveness with different settings and inputs. Journal of Data and Information Science, 10 (1). ISSN 2096-157X
Abstract
Purpose
Evaluating the quality of academic journal articles is a time consuming but critical task for national research evaluation exercises, appointments and promotion. It is therefore important to investigate whether Large Language Models (LLMs) can play a role in this process.
Design/methodology/approach
This article assesses which ChatGPT inputs (full text without tables, figures, and references; title and abstract; title only) produce better quality score estimates, and the extent to which scores are affected by ChatGPT models and system prompts.
Findings
The optimal input is the article title and abstract, with average ChatGPT scores based on these (30 iterations on a dataset of 51 papers) correlating at 0.67 with human scores, the highest ever reported. ChatGPT 4o is slightly better than 3.5-turbo (0.66), and 4o-mini (0.66).
Research limitations
The data is a convenience sample of the work of a single author, it only includes one field, and the scores are self-evaluations.
Practical implications
The results suggest that article full texts might confuse LLM research quality evaluations, even though complex system instructions for the task are more effective than simple ones. Thus, whilst abstracts contain insufficient information for a thorough assessment of rigour, they may contain strong pointers about originality and significance. Finally, linear regression can be used to convert the model scores into the human scale scores, which is 31% more accurate than guessing.
Originality/value
This is the first systematic comparison of the impact of different prompts, parameters and inputs for ChatGPT research quality evaluations.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2024 Mike Thelwall, published by Sciendo. This work is licensed under the Creative Commons Attribution 4.0 International License. (http://creativecommons.org/licenses/by/4.0/) |
Keywords: | ChatGPT; Large Language Models; LLMs; Scientometrics; Research Assessment |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 02 Jan 2025 11:50 |
Last Modified: | 03 Mar 2025 11:43 |
Status: | Published |
Publisher: | Sciendo |
Refereed: | Yes |
Identification Number: | 10.2478/jdis-2025-0011 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:220733 |