Tsoleridis, P., Choudhury, C.F. orcid.org/0000-0002-8886-8976 and Hess, S. orcid.org/0000-0002-3650-2518 (2025) Using probabilistic clustering techniques as a specification tool for capturing heterogeneity in choice models. Transportation Research Part C: Emerging Technologies, 179. 105289. ISSN: 0968-090X
Abstract
In the era of big data, data-driven methods have emerged as strong competitors to traditional econometric models for analysing choice behaviour. In particular, data-driven models offer flexible classification methods that are well-suited to capturing the heterogeneity among decision makers and improving model fit. A key limitation of the purely data-driven models, however, is the difficulty in the calculation of welfare measures, such as the value of travel time estimates (VTT) that are essential for cost–benefit analyses. This motivates the current study which focuses on combining data mining based segmentation approaches used in ML with traditional discrete choice models (DCM) to get the best of both - a clustering-based component to capture the heterogeneity among the travellers and a utility-based choice component that is suitable for quantifying policy-relevant measures, such as VTT estimates. In the proposed hybrid framework, travellers are probabilistically allocated into clusters based on their degree of similarity from each cluster and cluster-specific random-utility-based mode choice models are estimated simultaneously. The proposed hybrid framework is tested on 2 RP datasets (a GPS diary and a traditional household survey) and on 3 different choice contexts, providing a range of different sample sizes and data complexity. The performance of the proposed hybrid model (H-LCCM) is compared with that of the traditional latent class choice models (LCCM), where both the class membership and mode choice components are based on utility-based frameworks and two other state-of-the-art ML-assisted LCCM frameworks. Results indicate that H-LCCM outperforms the remaining specifications in the majority of the contexts examined, while offering a more scalable approach for contexts with a large number of observations (which is the case for big data sources) and/or with large choice sets (which is typical in spatial choice contexts). The proposed framework is practically applicable for policy-making as it allows the calculation of VTT estimates, therefore not sacrificing the microeconomic interpretability of traditional DCMs. The results are promising, especially in the current era of big data and are expected to contribute to the emerging literature looking at cross-synergies between traditional econometric approaches and new data-driven methods.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2025 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
Keywords: | Latent class choice models, Individual heterogeneity, Probabilistic clustering, Data mining, Mode choice, Destination choice |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Environment (Leeds) > Institute for Transport Studies (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 11 Sep 2025 08:46 |
Last Modified: | 11 Sep 2025 08:46 |
Published Version: | https://www.sciencedirect.com/science/article/pii/... |
Status: | Published |
Publisher: | Elsevier |
Identification Number: | 10.1016/j.trc.2025.105289 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:231386 |
Download
Filename: Using probabilistic clustering techniques as a specification tool for.pdf
Licence: CC-BY 4.0