Zhang, Z., Xie, M., Balsebre, P. et al. (3 more authors) (2026) UrbanMFM: Spatial Graph-Based Multiscale Foundation Models for Learning Generalized Urban Representation. IEEE Transactions on Knowledge and Data Engineering, 38 (3). pp. 2064-2078. ISSN: 1041-4347
Abstract
As geospatial data from web platforms becomes increasingly accessible and regularly updated, urban representation learning has emerged as a critical research area for advancing urban planning. Recent studies have developed foundation model-based algorithms to leverage this data for various urban-related downstream tasks. However, current research has inadequately explored deep integration strategies for multiscale, multimodal urban data in the context of urban foundation models. This gap arises primarily because the relationships between micro-scale (e.g., individual points of interest and street view imagery) and macro-scale (e.g., region-wide satellite imagery) urban features are inherently implicit and highly complex, making traditional interaction modeling insufficient. This paper introduces a novel research problem – how to learn multiscale urban representations by integrating diverse geographic data modalities and modeling complex multimodal relationships across different spatial scales. To address this significant challenge, we propose UrbanMFM, a spatial graph-based multiscale foundation model framework explicitly designed to capture and leverage these intricate relationships. UrbanMFM utilizes a self-supervised learning paradigm that integrates diverse geographic data modalities, including POI data and urban imagery, through novel contrastive learning objectives and advanced sampling techniques. By explicitly modeling spatial graphs to represent complex multiscale urban relationships, UrbanMFM effectively facilitates deep interactions between multimodal data sources. Extensive experiments on datasets from Singapore, New York, and Beijing demonstrate that UrbanMFM outperforms the strongest baselines significantly in four representative downstream tasks. By effectively modeling spatial hierarchies with diverse data, UrbanMFM provides a more comprehensive and adaptable representation of urban environments.
Metadata
| Item Type: | Article |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | This is an author produced version of an article published in IEEE Transactions on Knowledge and Data Engineering, made available under the terms of the Creative Commons Attribution License (CC-BY), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. |
| Keywords: | Foundation model, geospatial data mining, representation learning, urban regions |
| Dates: |
|
| Institution: | The University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Environment (Leeds) > School of Geography (Leeds) |
| Date Deposited: | 16 Feb 2026 15:23 |
| Last Modified: | 20 Feb 2026 21:47 |
| Published Version: | https://ieeexplore.ieee.org/document/11359527 |
| Status: | Published |
| Publisher: | Institute of Electrical and Electronics Engineers |
| Identification Number: | 10.1109/tkde.2026.3656202 |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:237956 |
Download
Filename: UrbanMFM.pdf
Licence: CC-BY 4.0

CORE (COnnecting REpositories)
CORE (COnnecting REpositories)