Zero-shot urban function inference with street view images through prompting a pretrained vision-language model

Huang, W., Wang, J. and Cong, G. (2024) Zero-shot urban function inference with street view images through prompting a pretrained vision-language model. International Journal of Geographical Information Science, 38 (7). pp. 1414-1442. ISSN 1365-8816

Abstract

Metadata

Item Type: Article
Authors/Creators:
  • Huang, W.
  • Wang, J.
  • Cong, G.
Copyright, Publisher and Additional Information:

© 2024 Informa UK Limited, trading as Taylor & Francis Group. This is an author produced version of an article published in International Journal of Geographical Information Science. Uploaded in accordance with the publisher's self-archiving policy.

Keywords: Urban land use, prompt engineering, CLIP, foundation model, street view image
Dates:
  • Published: 2 July 2024
  • Published (online): 22 May 2024
  • Accepted: 21 April 2024
Institution: The University of Leeds
Academic Units: The University of Leeds > Faculty of Environment (Leeds) > School of Geography (Leeds)
Depositing User: Symplectic Publications
Date Deposited: 13 Mar 2025 13:26
Last Modified: 13 Mar 2025 13:26
Status: Published
Publisher: Taylor & Francis
Identification Number: 10.1080/13658816.2024.2347322
Related URLs:
Open Archives Initiative ID (OAI ID):

Download

Accepted Version


Embargoed until: 22 May 2025

Filename: UrbanCLIP_IJGIS.pdf

Request a copy

file not available

Export

Statistics