Is Google Gemini better than ChatGPT at evaluating research quality?

Abstract

Google Gemini 1.5 Flash scores were compared with ChatGPT 4o-mini on evaluations of (a) 51 of the author’s journal articles and (b) up to 200 articles in each of 34 field-based Units of Assessment (UoAs) from the UK Research Excellence Framework (REF) 2021. From (a), the results suggest that Gemini 1.5 Flash, unlike ChatGPT 4o-mini, may work better when fed with a PDF or article full text, rather than just the title and abstract. From (b), Gemini 1.5 Flash seems to be marginally less able to predict an article’s research quality (using a departmental quality proxy indicator) than ChatGPT 4o-mini, although the differences are small, and both have similar disciplinary variations in this ability. Averaging multiple runs of Gemini 1.5 Flash improves the scores.

Metadata

Item Type:	Article
Authors/Creators:	Thelwall, M. https://orcid.org/0000-0001-6065-205X
Copyright, Publisher and Additional Information:	© 2024 Mike Thelwall, published by Sciendo. This work is licensed under the Creative Commons Attribution 4.0 International License. (https://creativecommons.org/licenses/by/4.0)
Keywords:	Research evaluation; Google Gemini API; ChatGPT; Large Language Models; AI research evaluation
Dates:	Published: 18 January 2025 Published (online): 18 January 2025 Accepted: 25 December 2024
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	09 Jan 2025 12:12
Last Modified:	03 Feb 2025 11:05
Status:	Published online
Publisher:	Sciendo
Refereed:	Yes
Identification Number:	10.2478/jdis-2025-0014
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:221121

CORE (COnnecting REpositories)

Is Google Gemini better than ChatGPT at evaluating research quality?

Abstract

Metadata

Download

Published Version

Export

Statistics