Transform(AI)ng Radiology with CheXSBT: Integrating Dual-Attention Swin Transformer with BERT for Seamless Chest X-Ray Report Generation

Khandeparker, A. and Lu, P. orcid.org/0000-0002-0199-3783 (2026) Transform(AI)ng Radiology with CheXSBT: Integrating Dual-Attention Swin Transformer with BERT for Seamless Chest X-Ray Report Generation. In: Ali, S., Hogg, D.C. and Peckham, M., (eds.) Medical Image Understanding and Analysis. Medical Image Understanding and Analysis (MIUA) 2025, 15-17 Jul 2025, Leeds, UK. Lecture Notes in Computer Science, 15916 . Springer Nature , Cham, Switzerland , pp. 159-173. ISBN: 978-3-031-98687-1 ISSN: 0302-9743 EISSN: 1611-3349

Abstract

Radiology reports are crucial for diagnosing diseases, yet generation them is time-consuming, places a significant workload on medical professionals, and is subject to inter-expert variability, as different radiologists may interpret the same X-ray differently. This paper presents a novel hybrid AI model called CheXSBT, which combines our custom-designed Dual-Attention Swin Transformer (DAST) for vision processing with BERT for natural language understanding to automate the generation of chest X-ray (CXR) reports. Leveraging the MIMIC-CXR dataset, which includes over 370,000 X-ray images and their corresponding reports, CheXSBT learns to interpret chest X-ray images and convert them into structured, meaningful text. Our study focuses on two main objectives: (1) automating report generation to accelerate the diagnostic process and (2) improving model interpretability to foster trust among radiologists. The approach involves preprocessing chest X-ray images and their corresponding text reports using the pre-trained BLIP processor, training the novel hybrid vision-language model on paired data, and fine-tuning it for clinical relevance and coherence. The performance of CheXSBT is rigorously evaluated using established metrics such as BLEU, ROUGE, and METEOR, achieving scores of 0.232 for BLEU-4 and 0.392 for ROUGE-L, outperforming other state-of-the-art models and ensuring high-quality report generation. By reducing radiologists’ workload and providing quick, accurate information, CheXSBT aims to transform the intersection between AI and clinical practice, making radiology reporting more efficient, consistent, and accessible.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Khandeparker, A. Lu, P. https://orcid.org/0000-0002-0199-3783
Editors:	Ali, S. Hogg, D.C. Peckham, M.
Copyright, Publisher and Additional Information:	This is an author produced version of a conference paper published in Medical Image Understanding and Analysis made available under the terms of the Creative Commons Attribution License (CC-BY), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited.
Keywords:	Vision-language models, Chest X-ray, Radiology report generation, Transformer, Swin transformer, BERT
Dates:	Accepted: 20 May 2025 Published (online): 17 July 2025 Published: 2026
Institution:	The University of Leeds
Academic Units:	The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds)
Depositing User:	Symplectic Publications
Date Deposited:	04 Jul 2025 14:28
Last Modified:	20 Aug 2025 15:10
Published Version:	https://link.springer.com/chapter/10.1007/978-3-03...
Status:	Published
Publisher:	Springer Nature
Series Name:	Lecture Notes in Computer Science
Identification Number:	10.1007/978-3-031-98688-8_12
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:228680

CORE (COnnecting REpositories)

Transform(AI)ng Radiology with CheXSBT: Integrating Dual-Attention Swin Transformer with BERT for Seamless Chest X-Ray Report Generation

Abstract

Metadata

Download

Accepted Version

Export

Statistics