Real‐time surgical tool detection with multi‐scale positional encoding and contrastive learning

Abstract

Real-time detection of surgical tools in laparoscopic data plays a vital role in understanding surgical procedures, evaluating the performance of trainees, facilitating learning, and ultimately supporting the autonomy of robotic systems. Existing detection methods for surgical data need to improve processing speed and high prediction accuracy. Most methods rely on anchors or region proposals, limiting their adaptability to variations in tool appearance and leading to sub-optimal detection results. Moreover, using non-anchor-based detectors to alleviate this problem has been partially explored without remarkable results. An anchor-free architecture based on a transformer that allows real-time tool detection is introduced. The proposal is to utilize multi-scale features within the feature extraction layer and at the transformer-based detection architecture through positional encoding that can refine and capture context-aware and structural information of different-sized tools. Furthermore, a supervised contrastive loss is introduced to optimize representations of object embeddings, resulting in improved feed-forward network performances for classifying localized bounding boxes. The strategy demonstrates superiority to state-of-the-art (SOTA) methods. Compared to the most accurate existing SOTA (DSSS) method, the approach has an improvement of nearly 4% on mAP50 and a reduction in the inference time by 113%. It also showed a 7% higher mAP50 than the baseline model.

Metadata

Item Type:	Article
Authors/Creators:	Loza, G. Valdastri, P. https://orcid.org/0000-0002-2280-5438 Ali, S. https://orcid.org/0000-0003-1313-3542
Copyright, Publisher and Additional Information:	© 2023 The Authors. This is an open access article under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited.
Keywords:	computer vision; medical image processing; object detection; surgery
Dates:	Published: April 2024 Published (online): 7 December 2023 Accepted: 22 November 2023
Institution:	The University of Leeds
Academic Units:	The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Electronic & Electrical Engineering (Leeds) > Robotics, Autonomous Systems & Sensing (Leeds)
Depositing User:	Symplectic Publications
Date Deposited:	12 Jun 2024 10:23
Last Modified:	12 Jun 2024 10:23
Status:	Published
Publisher:	Wiley
Identification Number:	10.1049/htl2.12060
Related URLs:	PubMed URL
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:213420

CORE (COnnecting REpositories)

Real‐time surgical tool detection with multi‐scale positional encoding and contrastive learning

Abstract

Metadata

Download

Published Version

Export

Statistics