Akinseloyin, O., Jiang, X. orcid.org/0000-0003-4255-5445 and Paladel, V. (Submitted: 2025) Weakly supervised active learning for abstract screening leveraging LLM-based pseudo-labeling. [Preprint - medRxiv] (Submitted)
Abstract
Abstract screening is a notoriously labour-intensive step in systematic reviews. AI-aided abstract screening faces several grand challenges, such as the strict requirement of near-total recall of relevant studies, lack of initial annotation, and extreme data imbalance. Active learning is the predominant solution for this challenging task, which however is remarkably time-consuming and tedious. To address these challenges, this paper introduces a weakly supervised learning framework leveraging large language models (LLM). The proposed approach employs LLMs to score and rank candidate studies based on their adherence to the inclusion criteria for relevant studies that are specified in the review protocol. Pseudo-labels are generated by assuming the top T % and bottom B% as positive and negative samples, respectively, for training an initial classifier without manual annotation. Experimental results on 28 systematic reviews from a well-established benchmark demonstrate a breakthrough in automated abstract screening: Manual annotation can be eliminated to safely reducing 42-43% of screening workload on average and maintaining near-perfect recall — the first approach that has succeeded in achieving this strict requirement for abstract screening. Additionally, LLM-based pseudo-labelling significantly improves the efficiency and utility of the active learning regime for abstract screening.
Metadata
Item Type: | Preprint |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2025 The Author(s). This preprint is made available under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International. (http://creativecommons.org/licenses/by-nc-nd/4.0/) |
Keywords: | Information and Computing Sciences; Machine Learning |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Department of Journalism Studies (Sheffield) ?? Sheffield.IJC ?? The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 18 Sep 2025 09:35 |
Last Modified: | 18 Sep 2025 09:35 |
Status: | Submitted |
Identification Number: | 10.1101/2025.08.24.25334314 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:231720 |