Loza, G., Valdastri, P. orcid.org/0000-0002-2280-5438 and Ali, S. orcid.org/0000-0003-1313-3542 (2024) Real‐time surgical tool detection with multi‐scale positional encoding and contrastive learning. Healthcare Technology Letters, 11 (2-3). pp. 48-58. ISSN 2053-3713
Abstract
Real-time detection of surgical tools in laparoscopic data plays a vital role in understanding surgical procedures, evaluating the performance of trainees, facilitating learning, and ultimately supporting the autonomy of robotic systems. Existing detection methods for surgical data need to improve processing speed and high prediction accuracy. Most methods rely on anchors or region proposals, limiting their adaptability to variations in tool appearance and leading to sub-optimal detection results. Moreover, using non-anchor-based detectors to alleviate this problem has been partially explored without remarkable results. An anchor-free architecture based on a transformer that allows real-time tool detection is introduced. The proposal is to utilize multi-scale features within the feature extraction layer and at the transformer-based detection architecture through positional encoding that can refine and capture context-aware and structural information of different-sized tools. Furthermore, a supervised contrastive loss is introduced to optimize representations of object embeddings, resulting in improved feed-forward network performances for classifying localized bounding boxes. The strategy demonstrates superiority to state-of-the-art (SOTA) methods. Compared to the most accurate existing SOTA (DSSS) method, the approach has an improvement of nearly 4% on mAP50 and a reduction in the inference time by 113%. It also showed a 7% higher mAP50 than the baseline model.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2023 The Authors. This is an open access article under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. |
Keywords: | computer vision; medical image processing; object detection; surgery |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Electronic & Electrical Engineering (Leeds) > Robotics, Autonomous Systems & Sensing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 12 Jun 2024 10:23 |
Last Modified: | 12 Jun 2024 10:23 |
Status: | Published |
Publisher: | Wiley |
Identification Number: | 10.1049/htl2.12060 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:213420 |