Li, F., Hogg, D.C. orcid.org/0000-0002-6125-9564 and Cohn, A.G. orcid.org/0000-0002-7652-8907 (2024) Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning. In: Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. Proceedings of the Thirty-ThirdInternational Joint Conference on Artificial Intelligence, 03-09 Aug 2024, Jeju, Korea. International Joint Conferences on Artificial Intelligence , pp. 6342-6349. ISBN 978-1-956792-04-1
Abstract
Spatial reasoning plays a vital role in both human cognition and machine intelligence, prompting new research into language models' (LMs) capabilities in this regard. However, existing benchmarks reveal shortcomings in evaluating qualitative spatial reasoning (QSR). These benchmarks typically present oversimplified scenarios or unclear natural language descriptions, hindering effective evaluation. We present a novel benchmark for assessing QSR in LMs, which is grounded in realistic 3D simulation data, offering a series of diverse room layouts with various objects and their spatial relationships. This approach provides a more detailed and context-rich narrative for spatial reasoning evaluation, diverging from traditional, toy-task-oriented scenarios. Our benchmark encompasses a broad spectrum of qualitative spatial relationships, including topological, directional, and distance relations. These are presented with different viewing points, varied granularities, and density of relation constraints to mimic real-world complexities. A key contribution is our logic-based consistency-checking tool, which enables the assessment of multiple plausible solutions, aligning with real-world scenarios where spatial relationships are often open to interpretation. Our benchmark evaluation of advanced LMs reveals their strengths and limitations in spatial reasoning. They face difficulties with multi-hop spatial reasoning and interpreting a mix of different view descriptions, pointing to areas for future improvement.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Keywords: | Natural Language Processing; NLP; Resources and evaluation; Knowledge Representation and Reasoning; KRR; Qualitative, geometric, spatial, and temporal reasoning; Constraint Satisfaction and Optimization; CSO; Applications; Knowledge Representation and Reasoning; KRR; Learning and reasoning |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 10 Mar 2025 11:22 |
Last Modified: | 10 Mar 2025 11:22 |
Status: | Published |
Publisher: | International Joint Conferences on Artificial Intelligence |
Identification Number: | 10.24963/ijcai.2024/701 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:224232 |