Henrickson, M., Atwell, E. orcid.org/0000-0001-9395-3764, Stell, J. et al. (3 more authors) (2026) Retrieval-augmented generation for natural language art provenance searches in the Getty Provenance Index. Academia AI and Applications, 2 (1).
Abstract
This study presents a prototype Retrieval Augmented Generation (RAG) framework for art provenance research, focusing on the Getty Provenance Index German Sales dataset. The prototype addresses challenges posed by fragmented and multilingual archival data, as well as the limitations of traditional metadata-based search tools. By enabling flexible, natural language queries in multiple languages, the framework facilitates searches of the Getty Provenance Index without knowledge of specific object metadata. Using a sample of 10,000 records to test the concept and later an extended 100,000 record sample, we explore a RAG prototype that aims to improve both the efficiency and accessibility of provenance searches and find encouraging results for specific and exploratory research scenarios. The framework emphasises transparency, suggesting a scalable and practically oriented approach for historians and cultural heritage professionals working with complex art market archives.
Metadata
| Item Type: | Article |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2026 copyright by the authors. This is an open access article under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. |
| Keywords: | etrieval-augmented generation, art provenance research, Getty Provenance Index, multilingual semantic search, explainable AI |
| Dates: |
|
| Institution: | The University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Arts, Humanities and Cultures (Leeds) > Fine Art, History of Art & Cultural Studies (Leeds) The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
| Date Deposited: | 18 Feb 2026 14:14 |
| Last Modified: | 18 Feb 2026 14:15 |
| Status: | Published |
| Publisher: | Academia.edu |
| Identification Number: | 10.20935/acadai8122 |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:238115 |
Download
Filename: Retrieval_augmented_generation_for_natur.pdf
Licence: CC-BY 4.0

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)