Shan, J., Huang, Y. orcid.org/0000-0002-1220-6896 and Jiang, W. (2025) DCUFormer: Enhancing pavement crack segmentation in complex environments with dual-cross/upsampling attention. Expert Systems with Applications, 264. 125891. ISSN 0957-4174
Abstract
Efficient road inspection and maintenance are essential to extend pavement lifespan and enhance safety. However, automated crack detection remains challenging due to varied environmental conditions and differences in image collection equipment, making robust algorithm development a critical need. Vision Transformers, with their capacity to capture long-range dependencies, offer significant advantages for crack detection in complex scenarios by effectively extracting global features. Nevertheless, existing Transformer-based methods encounter difficulties in boundary delineation due to decoder design limitations, which lead to suboptimal fusion of low-level and high-level features. To address this issue, we propose a comprehensive approach that integrates semantic preservation, detail refinement, and detail delineation. These concepts are realized through our novel Dual-Cross Attention Module (DCA) and Upsampling Attention Module (UA). The DCA module progressively filters redundant details from low-level feature layers using high-level semantic information, while preserving boundary details to refine high-level feature boundaries. In addition, the UA module employs progressive local cross-attention in upsampling, facilitating more precise boundary definitions and surpassing conventional dynamic upsampling methods. Our approach, utilizing both lightweight (MiT-B0, LVT) and middleweight (Swin-T) backbones, demonstrates state-of-the-art performance on three diverse datasets—Crack500, CrackSC, and UAV-Crack500—highlighting its robustness across varied conditions. This work contributes to advancing Transformer-based architectures for defect segmentation in complex engineering contexts, underscoring the critical role of improved feature fusion in crack detection. The code is available at: https://github.com/SHAN-JH/DCUFormer.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | This is an author produced version of an article published in Expert Systems with Applications, made available under the terms of the Creative Commons Attribution License (CC-BY), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. |
Keywords: | Pavement crack; Vision transformer; Semantic segmentation; Feature upsampling |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Environment (Leeds) > Institute for Transport Studies (Leeds) > ITS: Spatial Modelling and Dynamics (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 16 Dec 2024 18:47 |
Last Modified: | 20 Dec 2024 12:15 |
Status: | Published |
Publisher: | Elsevier |
Identification Number: | 10.1016/j.eswa.2024.125891 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:220258 |