Solano, P.E.C. orcid.org/0000-0001-7689-052X, Bulpitt, A. orcid.org/0000-0002-7905-4540, Subramanian, V. orcid.org/0000-0003-3603-0861 et al. (1 more author) (2024) Multi-task learning with cross-task consistency for improved depth estimation in colonoscopy. Medical Image Analysis. 103379. ISSN 1361-8415
Abstract
Colonoscopy screening is the gold standard procedure for assessing abnormalities in the colon and rectum, such as ulcers and cancerous polyps. Measuring the abnormal mucosal area and its 3D reconstruction can help quantify the surveyed area and objectively evaluate disease burden. However, due to the complex topology of these organs and variable physical conditions, for example, lighting, large homogeneous texture, and image modality estimating distance from the camera (aka depth) is highly challenging. Moreover, most colonoscopic video acquisition is monocular, making the depth estimation a non-trivial problem. While methods in computer vision for depth estimation have been proposed and advanced on natural scene datasets, the efficacy of these techniques has not been widely quantified on colonoscopy datasets. As the colonic mucosa has several low-texture regions that are not well pronounced, learning representations from an auxiliary task can improve salient feature extraction, allowing estimation of accurate camera depths. In this work, we propose to develop a novel multi-task learning (MTL) approach with a shared encoder and two decoders, namely a surface normal decoder and a depth estimator decoder. Our depth estimator incorporates attention mechanisms to enhance global context awareness. We leverage the surface normal prediction to improve geometric feature extraction. Also, we apply a cross-task consistency loss among the two geometrically related tasks, surface normal and camera depth. We demonstrate an improvement of 15.75% on relative error and 10.7% improvement on δ1.25 accuracy over the most accurate baseline state-of-the-art Big-to-Small (BTS) approach. All experiments are conducted on a recently released C3VD dataset, and thus, we provide a first benchmark of state-of-the-art methods on this dataset.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Keywords: | Deep learning; Monocular depth estimation; Surface normal prediction; Multi-task learning; Cross-task consistency; 3D colonoscopy |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence The University of Leeds > Faculty of Medicine and Health (Leeds) > School of Medicine (Leeds) > Leeds Institute of Medical Research (LIMR) > Division of Gastroenterology and Surgery |
Funding Information: | Funder Grant number Crohns and Colitis UK M2023-5 SAUBRAMANIAN |
Depositing User: | Symplectic Publications |
Date Deposited: | 07 Nov 2024 16:42 |
Last Modified: | 07 Nov 2024 16:42 |
Status: | Published |
Publisher: | Elsevier |
Identification Number: | 10.1016/j.media.2024.103379 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:219302 |