Cai, Z., Karvonen, A., Cong, C. et al. (1 more author) (2026) Fine-grained urban land use simulation: Integrating spatial dynamic modeling with a pre-trained vision-language model. Computers, Environment and Urban Systems, 126. 102416. ISSN: 0198-9715
Abstract
Accurate prediction of urban land use changes at fine spatial scales is essential for developing healthy and sustainable cities, yet traditional simulation models struggle to capture local dynamics due to limited availability of fine-grained data and insufficient complexity in modeling urban systems. To address these limitations, we propose a novel approach that leverages advances in pre-trained vision-language foundation models combined with spatial dynamic modeling to forecast detailed urban land use patterns. Specifically, we collected a spatially dense collection of street view images (SVIs) throughout Shenzhen, China, and applied UrbanCLIP, a specialized vision-language prompting framework, to perform zero-shot inference of urban land use directly from images without labeled datasets and model retraining. The resulting fine-grained classifications delineate eight distinct urban land use types, producing a detailed urban functional map. These high-resolution patterns were then integrated into a spatial dynamic model enhanced by polynomial regression to simulate urban evolution toward 2035. This approach effectively captures neighborhood influences, socioeconomic drivers, and urban planning policies. Our simulation provides actionable insights for sustainable development in Shenzhen by identifying areas for balanced growth, targeted infrastructure investments, and ecological preservation. Compared to conventional methods, our methodology significantly improves predictive accuracy and spatial granularity. By incorporating foundation models, our approach addresses traditional data constraints, offering scalable and robust tools for informed urban governance and decision-making.
Metadata
| Item Type: | Article |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | © 2026 The Authors. This is an open access article under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. |
| Keywords: | Land use change, Vision-language models, Foundation models, Spatial dynamic modeling, Street view images |
| Dates: |
|
| Institution: | The University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Environment (Leeds) > School of Geography (Leeds) |
| Date Deposited: | 03 Mar 2026 11:07 |
| Last Modified: | 03 Mar 2026 11:07 |
| Status: | Published |
| Publisher: | Elsevier |
| Identification Number: | 10.1016/j.compenvurbsys.2026.102416 |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:238583 |
Download
Filename: Fine-grained urban land use simulation.pdf
Licence: CC-BY 4.0

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)