This is the latest version of this eprint.
Jiang, X. orcid.org/0000-0003-4255-5445, Akinseloyin, O. and Palade, V. (2026) Efficient citation screening by weak classifier ensemble*. In: 2025 ACM/IEEE Joint Conference on Digital Libraries (JCDL). 2025 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 15-19 Dec 2025, Dekalb, IL, USA. Institute of Electrical and Electronics Engineers (IEEE), pp. 265-268. ISBN: 9798331568047.
Abstract
Citation screening in systematic review is timeconsuming. Machine learning can help semi-automate it but faces obstacles. Each systematic review is a new dataset without initial annotations. Extreme class imbalance against irrelevant studies makes it difficult to select a good subset of samples to train a classifier. The rigid requirement of a (near) total recall of relevant studies demands a careful trade-off between accuracy and recall. This paper pilots a weak classifier ensemble approach to tackle both challenges. The idea of ensembling is employed in two ways. First, multiple cost-effective large language models are applied and averaged to score and rank candidate studies to create a balanced pseudo-labelled training set. Second, different sets of pseudo-negative samples are bootstrapped from low-rank documents and multiple classifiers are trained and combined to make screening decisions. Experiments on 28 systematic reviews demonstrate significant performance improvements brought by the weakly supervised classifier ensemble, which also meets the rigid recall requirement for it to be safely used in practice.
Metadata
| Item Type: | Proceedings Paper |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2026 The Author(s). Except as otherwise noted, this author-accepted version of a conference paper published in 2025 ACM/IEEE Joint Conference on Digital Libraries (JCDL) is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
| Keywords: | Automated systematic review; Citation screening; Large language model; Ensemble; Weakly supervised learning |
| Dates: |
|
| Institution: | The University of Sheffield |
| Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > School of Information, Journalism and Communication |
| Date Deposited: | 13 Feb 2026 15:46 |
| Last Modified: | 13 Feb 2026 16:11 |
| Status: | Published |
| Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
| Refereed: | Yes |
| Identification Number: | 10.1109/jcdl67857.2025.00042 |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:237967 |
Available Versions of this Item
-
Efficient citation screening by weak classifier ensemble*. (deposited 13 Feb 2026 13:22)
- Efficient citation screening by weak classifier ensemble*. (deposited 13 Feb 2026 15:46) [Currently Displayed]
Download
Filename: 2026.01.07.26343635v1.full.pdf
Licence: CC-BY 4.0

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)