Xu, Z. orcid.org/0000-0002-3883-3716, Li, B., Hu, Y. orcid.org/0000-0002-4856-5014 et al. (4 more authors) (2026) Self-supervised Monocular Depth and Pose Estimation for Endoscopy with Latent Priors. IEEE Transactions on Medical Imaging. ISSN: 0278-0062
Abstract
Accurate 3D reconstruction in endoscopy enables quantitative and holistic lesion characterization within the gastrointestinal (GI) tract. To achieve this, reliable depth and pose estimation is required. However, endoscopy systems are monocular, and existing methods relying on synthetic datasets or complex models often lack generalizability in challenging endoscopic conditions. We propose a robust self-supervised monocular depth and pose estimation framework that incorporates a StyleGAN-based generator and a Variational Autoencoder (VAE). The StyleGAN generator leverages extensive depth scenes from natural images to condition the depth network, enhancing realism and robustness of depth predictions through latent feature priors. For pose estimation, we reformulate it within a VAE framework, treating pose transitions as latent variables to regularize scale, stabilize z-axis prominence, and improve x-y sensitivity. To further enhance pose stability and generalizability, we introduce a prior transfer module that distills motion knowledge from natural scene SLAM systems. Specifically, pose priors from a pretrained SLAM model—supervised on large-scale natural scene datasets—are used to guide the latent distribution of pose through a KL-divergence reparameterization. This mechanism effectively transfers structural motion priors into the endoscopic domain, improving trajectory consistency under challenging conditions. This dual refinement pipeline enables accurate depth and pose predictions, effectively addressing the GI tract’s complex textures and lighting. Extensive evaluations on SimCol, C3VD, and EndoSLAM datasets confirm our framework’s superior performance over published self-supervised methods in endoscopic depth and pose estimation. All data descriptions and code are available at https://github.com/EricXuziang/ Self-supervised-with-Latent-Priors.git.
Metadata
| Item Type: | Article |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | This is an author produced version of an article published in IEEE Transactions on Medical Imaging, made available via the University of Leeds Research Outputs Policy under the terms of the Creative Commons Attribution License (CC-BY), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. |
| Keywords: | Self-supervised learning, deep learning, endoscopy, monocular depth and pose estimation |
| Dates: |
|
| Institution: | The University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
| Funding Information: | Funder Grant number EPSRC Accounts Payable UKRI914 Academy of Medical Sciences SBF0010\1191 |
| Date Deposited: | 16 Mar 2026 12:29 |
| Last Modified: | 16 Mar 2026 12:29 |
| Status: | Published online |
| Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
| Identification Number: | 10.1109/tmi.2026.3671423 |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:238937 |
Download
Filename: AcceptedCopy.pdf
Licence: CC-BY 4.0

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)