Vouros, A. orcid.org/0000-0002-3383-6133, Langdell, S., Croucher, M. et al. (1 more author) (2021) An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations. Machine Learning, 110 (8). pp. 1975-2003. ISSN 0885-6125
Abstract
K-Means is one of the most used algorithms for data clustering and the usual clustering method for benchmarking. Despite its wide application it is well-known that it suffers from a series of disadvantages, such as the positions of the initial clustering centres (centroids), which can greatly affect the clustering solution. Over the years many K-Means variations and initialisations techniques have been proposed with different degrees of complexity. In this study we focus on common K-Means variations and deterministic initialisation techniques and we first show that more sophisticated initialisation methods reduce or alleviates the need of complex K-Means clustering, and secondly, that deterministic methods can achieve equivalent or better performance than stochastic methods. These conclusions are obtained through extensive benchmarking using different model data sets from various studies as well as clustering data sets.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2021 The Author(s). This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. |
Keywords: | K-Means clustering; Deterministic clustering; Benchmarking |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 08 Jan 2020 15:17 |
Last Modified: | 23 Aug 2021 09:57 |
Status: | Published |
Publisher: | Springer Nature |
Refereed: | Yes |
Identification Number: | 10.1007/s10994-021-06021-7 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:151015 |
Download
Filename: Vouros2021_Article_AnEmpiricalComparisonBetweenSt.pdf
Licence: CC-BY 4.0