
There is a more recent version of this eprint available. Click here to view it.
Jiang, X. orcid.org/0000-0003-4255-5445, Akinseloyin, O. and Palade, V. orcid.org/0000-0002-6768-8394 (Submitted: 2026) Efficient citation screening by weak classifier ensemble*. [Preprint - medRxiv] (Submitted)
Abstract
Citation screening in systematic review is timeconsuming. Machine learning can help semi-automate it but faces obstacles. Each systematic review is a new dataset without initial annotations. Extreme class imbalance against irrelevant studies makes it difficult to select a good subset of samples to train a classifier. The rigid requirement of a (near) total recall of relevant studies demands a careful trade-off between accuracy and recall. This paper pilots a weak classifier ensemble approach to tackle both challenges. The idea of ensembling is employed in two ways. First, multiple cost-effective large language models are applied and averaged to score and rank candidate studies to create a balanced pseudo-labelled training set. Second, different sets of pseudo-negative samples are bootstrapped from low-rank documents and multiple classifiers are trained and combined to make screening decisions. Experiments on 28 systematic reviews demonstrate significant performance improvements brought by the weakly supervised classifier ensemble, which also meets the rigid recall requirement for it to be safely used in practice.
Metadata
| Item Type: | Preprint |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2026 The Author(s). This preprint is made available under a Creative Commons Attribution-NonCommercial 4.0 International License. (http://creativecommons.org/licenses/by-nc/4.0/) |
| Keywords: | Automated systematic review; Citation screening; Large language model; Ensemble; Weakly supervised learning |
| Dates: |
|
| Institution: | The University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > School of Information, Journalism and Communication |
| Date Deposited: | 13 Feb 2026 13:22 |
| Last Modified: | 13 Feb 2026 15:49 |
| Status: | Submitted |
| Identification Number: | 10.64898/2026.01.07.26343635 |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:237903 |
Available Versions of this Item
- Efficient citation screening by weak classifier ensemble*. (deposited 13 Feb 2026 13:22) [Currently Displayed]
Download
Filename: 2026.01.07.26343635v1.full.pdf
Licence: CC-BY-NC 4.0

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)