Thelwall, M. orcid.org/0000-0001-6065-205X (2026) Do large language models know basic facts about journal articles? Journal of Documentation. ISSN: 0022-0418
Abstract
Purpose
There is an increase in the use of large language models (LLMs) in information science, including evaluating academic journal articles. Despite this, it is unclear whether they “know” about articles in the sense of being able to answer simple questions about individual papers without web searches.
Design/methodology/approach
In this study, 4 questions were asked of ChatGPT 4o-mini about 64,055 academic journal articles (excluding reviews) from 2021, identified by their titles and abstracts, with uncited and highly cited articles also assessed by ChatGPT 4.1 and 5 open weight LLMs.
Findings
The results were mostly incorrect, even for the most cited articles from that year. In particular, ChatGPT 4o-mini and the open weights LLMs had almost no knowledge of an article’s first author affiliation, rarely knew the publishing journal and usually guessed the publication year wrong, although ChatGPT 4o-mini was 42% correct for Physical Review B. Even ChatGPT 4.1 could only identify a small majority of the journals for the top cited papers of the year.
Practical implications
Smaller LLMs’ lack of basic knowledge about articles suggests that when they are asked to evaluate them without web searches, they will rarely cheat by eliciting citation information or journal reputation but will instead answer based on the article text because they may not associate online criticisms with individual articles.
Originality/value
This is the first investigation of the ability of LLMs to recall basic facts about journal articles.
Metadata
| Item Type: | Article |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2026 The Authors. Except as otherwise noted, this author-accepted version of a journal article published in Journal of Documentation is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
| Keywords: | Scientometrics; bibliometrics; ChatGPT 4o-mini; research evaluation; LLM |
| Dates: |
|
| Institution: | The University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > School of Information, Journalism and Communication |
| Funding Information: | Funder Grant number UK RESEARCH AND INNOVATION UKRI1079 |
| Date Deposited: | 08 Jan 2026 14:55 |
| Last Modified: | 02 Feb 2026 15:37 |
| Status: | Published online |
| Publisher: | Emerald |
| Refereed: | Yes |
| Identification Number: | 10.1108/JD-11-2025-0330 |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:235849 |

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)