Khandeparker, A. and Lu, P. orcid.org/0000-0002-0199-3783 (2026) Transform(AI)ng Radiology with CheXSBT: Integrating Dual-Attention Swin Transformer with BERT for Seamless Chest X-Ray Report Generation. In: Ali, S., Hogg, D.C. and Peckham, M., (eds.) Medical Image Understanding and Analysis. Medical Image Understanding and Analysis (MIUA) 2025, 15-17 Jul 2025, Leeds, UK. Lecture Notes in Computer Science, 15916 . Springer Nature , Cham, Switzerland , pp. 159-173. ISBN: 978-3-031-98687-1 ISSN: 0302-9743 EISSN: 1611-3349
Abstract
Radiology reports are crucial for diagnosing diseases, yet generation them is time-consuming, places a significant workload on medical professionals, and is subject to inter-expert variability, as different radiologists may interpret the same X-ray differently. This paper presents a novel hybrid AI model called CheXSBT, which combines our custom-designed Dual-Attention Swin Transformer (DAST) for vision processing with BERT for natural language understanding to automate the generation of chest X-ray (CXR) reports. Leveraging the MIMIC-CXR dataset, which includes over 370,000 X-ray images and their corresponding reports, CheXSBT learns to interpret chest X-ray images and convert them into structured, meaningful text. Our study focuses on two main objectives: (1) automating report generation to accelerate the diagnostic process and (2) improving model interpretability to foster trust among radiologists. The approach involves preprocessing chest X-ray images and their corresponding text reports using the pre-trained BLIP processor, training the novel hybrid vision-language model on paired data, and fine-tuning it for clinical relevance and coherence. The performance of CheXSBT is rigorously evaluated using established metrics such as BLEU, ROUGE, and METEOR, achieving scores of 0.232 for BLEU-4 and 0.392 for ROUGE-L, outperforming other state-of-the-art models and ensuring high-quality report generation. By reducing radiologists’ workload and providing quick, accurate information, CheXSBT aims to transform the intersection between AI and clinical practice, making radiology reporting more efficient, consistent, and accessible.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | This is an author produced version of a conference paper published in Medical Image Understanding and Analysis made available under the terms of the Creative Commons Attribution License (CC-BY), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. |
Keywords: | Vision-language models, Chest X-ray, Radiology report generation, Transformer, Swin transformer, BERT |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 04 Jul 2025 14:28 |
Last Modified: | 20 Aug 2025 15:10 |
Published Version: | https://link.springer.com/chapter/10.1007/978-3-03... |
Status: | Published |
Publisher: | Springer Nature |
Series Name: | Lecture Notes in Computer Science |
Identification Number: | 10.1007/978-3-031-98688-8_12 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:228680 |
Download
Filename: Transform_AI_ng_Radiology_Camera_Ready 2_Aradhya Khandeparker and Ping Lu.pdf
Licence: CC-BY 4.0