VANICA, GEORGE and BORS, ADRIAN GHEORGHE orcid.org/0000-0001-7838-0021 (2026) OV-SGT: Open Vocabulary Semantic Graph Transformer for Scene Graph Generation. In: IEEE/CVF WACV workshop on Scene Graph for Structured Intelligence. . IEEE, Tucson, AZ, USA, pp. 1685-1694.
Abstract
Scene graph generation bridges visual perception and semantic understanding, but existing approaches face two challenges: closed vocabularies that limit real-world applicability and long-tail predicate distributions where common relationships dominate training data. We introduce the Open Vocabulary Semantic Graph Transformer (OV-SGT), which addresses both challenges through CLIP-aligned representation learning. Our method learns relationship embeddings within CLIP’s semantic space, enabling zero-shot generalization to unseen predicates. Key contributions include: (1) a node-edge fusion strategy preserving relationship directionality; (2) graph Laplacian eigenvector-based positional encoding capturing structural context; and (3) a multi-component loss combining contrastive, semantic, triplet, and focal objectives for zero-shot transfer while handling class imbalance. Experiments on Visual Genome demonstrate state-of-the-art performance, with significant gains on mean Recall@K metrics reflecting improved rare predicate recognition.
Metadata
| Item Type: | Proceedings Paper |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | This is an author-produced version of the published paper. Uploaded in accordance with the University’s Research Publications and Open Access policy. |
| Dates: |
|
| Institution: | The University of York |
| Academic Units: | The University of York > Faculty of Sciences (York) > Computer Science (York) |
| Date Deposited: | 23 Mar 2026 13:10 |
| Last Modified: | 02 Jun 2026 23:23 |
| Status: | Published |
| Publisher: | IEEE |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:239390 |

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)