Mitigating modality discrepancies for RGB-T semantic segmentation

Abstract

Semantic segmentation models gain robustness against adverse illumination conditions by taking advantage of complementary information from visible and thermal infrared (RGB-T) images. Despite its importance, most existing RGB-T semantic segmentation models directly adopt primitive fusion strategies, such as elementwise summation, to integrate multimodal features. Such strategies, unfortunately, overlook the modality discrepancies caused by inconsistent unimodal features obtained by two independent feature extractors, thus hindering the exploitation of cross-modal complementary information within the multimodal data. For that, we propose a novel network for RGB-T semantic segmentation, i.e. MDRNet + , which is an improved version of our previous work ABMDRNet. The core of MDRNet + is a brand new idea, termed the strategy of bridging-then-fusing, which mitigates modality discrepancies before cross-modal feature fusion. Concretely, an improved Modality Discrepancy Reduction (MDR + ) subnetwork is designed, which first extracts unimodal features and reduces their modality discrepancies. Afterward, discriminative multimodal features for RGB-T semantic segmentation are adaptively selected and integrated via several channel-weighted fusion (CWF) modules. Furthermore, a multiscale spatial context (MSC) module and a multiscale channel context (MCC) module are presented to effectively capture the contextual information. Finally, we elaborately assemble a challenging RGB-T semantic segmentation dataset, i.e., RTSS, for urban scene understanding to mitigate the lack of well-annotated training data. Comprehensive experiments demonstrate that our proposed model surpasses other state-of-the-art models on the MFNet, PST900, and RTSS datasets remarkably.

Metadata

Item Type:	Article
Authors/Creators:	Han, J. Shenlu, Z. Liu, Y. Jiao, Q. Zhang, Q.
Copyright, Publisher and Additional Information:	© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Reproduced in accordance with the publisher's self-archiving policy.
Keywords:	Bridging-then-fusing; contextual information; dataset; modality discrepancy reduction; RGB-T semantic segmentation
Dates:	Accepted: 27 December 2022 Published (online): 6 January 2023 Published: 6 January 2023
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	02 Feb 2023 12:23
Last Modified:	26 Sep 2024 14:34
Status:	Published
Publisher:	Institute of Electrical and Electronics Engineers
Refereed:	Yes
Identification Number:	10.1109/TNNLS.2022.3233089
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:195979

CORE (COnnecting REpositories)

Mitigating modality discrepancies for RGB-T semantic segmentation

Abstract

Metadata

Download

Accepted Version

Export

Statistics