DCUFormer: Enhancing pavement crack segmentation in complex environments with dual-cross/upsampling attention

Abstract

Efficient road inspection and maintenance are essential to extend pavement lifespan and enhance safety. However, automated crack detection remains challenging due to varied environmental conditions and differences in image collection equipment, making robust algorithm development a critical need. Vision Transformers, with their capacity to capture long-range dependencies, offer significant advantages for crack detection in complex scenarios by effectively extracting global features. Nevertheless, existing Transformer-based methods encounter difficulties in boundary delineation due to decoder design limitations, which lead to suboptimal fusion of low-level and high-level features. To address this issue, we propose a comprehensive approach that integrates semantic preservation, detail refinement, and detail delineation. These concepts are realized through our novel Dual-Cross Attention Module (DCA) and Upsampling Attention Module (UA). The DCA module progressively filters redundant details from low-level feature layers using high-level semantic information, while preserving boundary details to refine high-level feature boundaries. In addition, the UA module employs progressive local cross-attention in upsampling, facilitating more precise boundary definitions and surpassing conventional dynamic upsampling methods. Our approach, utilizing both lightweight (MiT-B0, LVT) and middleweight (Swin-T) backbones, demonstrates state-of-the-art performance on three diverse datasets—Crack500, CrackSC, and UAV-Crack500—highlighting its robustness across varied conditions. This work contributes to advancing Transformer-based architectures for defect segmentation in complex engineering contexts, underscoring the critical role of improved feature fusion in crack detection. The code is available at: https://github.com/SHAN-JH/DCUFormer.

Metadata

Item Type:	Article
Authors/Creators:	Shan, J. Huang, Y. https://orcid.org/0000-0002-1220-6896 Jiang, W.
Copyright, Publisher and Additional Information:	This is an author produced version of an article published in Expert Systems with Applications, made available under the terms of the Creative Commons Attribution License (CC-BY), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited.
Keywords:	Pavement crack; Vision transformer; Semantic segmentation; Feature upsampling
Dates:	Published: 10 March 2025 Published (online): 23 November 2024 Accepted: 20 November 2024
Institution:	The University of Leeds
Academic Units:	The University of Leeds > Faculty of Environment (Leeds) > Institute for Transport Studies (Leeds) > ITS: Spatial Modelling and Dynamics (Leeds)
Depositing User:	Symplectic Publications
Date Deposited:	16 Dec 2024 18:47
Last Modified:	20 Dec 2024 12:15
Status:	Published
Publisher:	Elsevier
Identification Number:	10.1016/j.eswa.2024.125891
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:220258

CORE (COnnecting REpositories)

DCUFormer: Enhancing pavement crack segmentation in complex environments with dual-cross/upsampling attention

Abstract

Metadata

Download

Accepted Version

Export

Statistics