Do large language models know basic facts about journal articles?

Abstract

Purpose

There is an increase in the use of large language models (LLMs) in information science, including evaluating academic journal articles. Despite this, it is unclear whether they “know” about articles in the sense of being able to answer simple questions about individual papers without web searches.

Design/methodology/approach

In this study, 4 questions were asked of ChatGPT 4o-mini about 64,055 academic journal articles (excluding reviews) from 2021, identified by their titles and abstracts, with uncited and highly cited articles also assessed by ChatGPT 4.1 and 5 open weight LLMs.

Findings

The results were mostly incorrect, even for the most cited articles from that year. In particular, ChatGPT 4o-mini and the open weights LLMs had almost no knowledge of an article’s first author affiliation, rarely knew the publishing journal and usually guessed the publication year wrong, although ChatGPT 4o-mini was 42% correct for Physical Review B. Even ChatGPT 4.1 could only identify a small majority of the journals for the top cited papers of the year.

Practical implications

Smaller LLMs’ lack of basic knowledge about articles suggests that when they are asked to evaluate them without web searches, they will rarely cheat by eliciting citation information or journal reputation but will instead answer based on the article text because they may not associate online criticisms with individual articles.

Originality/value

This is the first investigation of the ability of LLMs to recall basic facts about journal articles.

Metadata

Item Type:	Article
Authors/Creators:	Thelwall, M. https://orcid.org/0000-0001-6065-205X
Copyright, Publisher and Additional Information:	© 2026 The Authors. Except as otherwise noted, this author-accepted version of a journal article published in Journal of Documentation is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/
Keywords:	Scientometrics; bibliometrics; ChatGPT 4o-mini; research evaluation; LLM
Dates:	Submitted: 3 November 2025 Accepted: 21 December 2025 Published (online): 20 January 2026 Published: 17 February 2026
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Social Sciences (Sheffield) > School of Information, Journalism and Communication
Funding Information:	Funder Grant number UK RESEARCH AND INNOVATION UKRI1079
Date Deposited:	08 Jan 2026 14:55
Last Modified:	12 Mar 2026 16:31
Status:	Published
Publisher:	Emerald
Refereed:	Yes
Identification Number:	10.1108/JD-11-2025-0330
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:235849

CORE (COnnecting REpositories)

Do large language models know basic facts about journal articles?

Abstract

Metadata

Download

Accepted Version

Export

Statistics