Brodeur, Abel, Valenta, David, Marcoci, Alexandru et al. (269 more authors) (2026) AI-assisted teams outperform AI-led teams but not human-only teams in assessing research reproducibility in quantitative social science. Proceedings of the National Academy of Sciences of the United States of America. e2524747123. e2524747123. ISSN: 1091-6490
Abstract
Significance Verifying results of published social sciences research is essential but expensive, costing hundreds of dollars per study. With AI tools like ChatGPT becoming widespread, we tested whether they could help scientists check if research findings can be reproduced. We assigned 288 researchers to 103 teams working with no AI, with AI as an assistant, or AI leading the work with minimal human input. Human teams and AI-assisted teams performed similarly on most tasks, but humans caught more critical errors. AI working autonomously achieved a 37% reproduction rate, making it potentially useful for automated screening when human review is cost-prohibitive. These results nonetheless show that human expertise remains essential for reliable scientific validation. Abstract Large Language Models (LLMs) such as ChatGPT are transforming how scientists conduct and validate research, offering promise as tools to improve scientific reproducibility. However, computational reproducibility and error detection remain expensive and labor-intensive. We experimentally test how collaboration between researchers and LLM assistants influences the reproduction of quantitative social science findings across different levels of AI autonomy. We randomly assigned 288 researchers to 103 teams working under three conditions: human-only, AI-assisted (using ChatGPT as a collaborative tool), or AI-led (ChatGPT operating with minimal human oversight). Teams reproduced published results from leading social science journals, detected coding errors, and proposed robustness checks. Human-only and AI-assisted teams achieved comparable reproduction rates (94% vs. 91%) and performed similarly on most outcomes, except human-only teams identified significantly more major coding errors. Both substantially outperformed AI-led teams, which achieved only a 37% reproduction rate, detected fewer errors across all categories, proposed weaker robustness checks, and required more time. This autonomous approach, however, likely represents only a lower bound of AI capabilities. Despite rapid model advances, expert human judgment currently remains indispensable for reliable empirical verification. While AI assistance did not degrade most outcomes, it provided no measurable advantages and was associated with reduced detection of major errors. However, the 37% autonomous reproduction rate indicates that AI could provide value in settings where scale or cost constraints preclude human review of papers, even though general-purpose LLMs offer no immediate advantages for human-supervised verification.
Metadata
| Item Type: | Article |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | We make our i) AI training materials and recording, ii) data and code, iii) preanalysis plan and iv) template form available here: https://github.com/I4Replication/AI-Games (50). We declare no restrictions on sharing or reuse. © 2026 the Author(s) |
| Keywords: | Humans,Reproducibility of Results,Large Language Models,Social Sciences/methods,Generative Artificial Intelligence,Artificial Intelligence,Intelligent Systems,Cooperative Behavior |
| Dates: |
|
| Institution: | The University of York |
| Academic Units: | The University of York > Faculty of Social Sciences (York) > Education (York) The University of York > Faculty of Social Sciences (York) > Economics and Related Studies (York) The University of York > Faculty of Sciences (York) > Psychology (York) The University of York > Faculty of Social Sciences (York) > Centre for Health Economics (York) The University of York > Faculty of Sciences (York) > Health Sciences (York) The University of York > Faculty of Social Sciences (York) > Social Policy and Social Work (York) |
| Date Deposited: | 29 May 2026 11:00 |
| Last Modified: | 16 Jun 2026 15:00 |
| Published Version: | https://doi.org/10.1073/pnas.2524747123 |
| Status: | Published |
| Refereed: | Yes |
| Identification Number: | 10.1073/pnas.2524747123 |
| Related URLs: | |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:241548 |
Download
Description: brodeur-et-al-2026-ai-assisted-teams-outperform-ai-led-teams-but-not-human-only-teams-in-assessing-research-1
Licence: CC-BY-NC-ND 2.5

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)