Vouros, A. orcid.org/0000-0002-3383-6133, Langdell, S., Croucher, M. et al. (1 more author) (Submitted: 2019) An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations. arXiv. (Submitted)
Abstract
K-Means is one of the most used algorithms for data clustering and the usual clustering method for benchmarking. Despite its wide application it is well-known that it suffers from a series of disadvantages, such as the positions of the initial clustering centres (centroids), which can greatly affect the clustering solution. Over the years many K-Means variations and initialisations techniques have been proposed with different degrees of complexity. In this study we focus on common K-Means variations and deterministic initialisation techniques and we first show that more sophisticated initialisation methods reduce or alleviates the need of complex K-Means clustering, and secondly, that deterministic methods can achieve equivalent or better performance than stochastic methods. These conclusions are obtained through extensive benchmarking using different model data sets from various studies as well as clustering data sets.
Metadata
Authors/Creators: |
|
---|---|
Copyright, Publisher and Additional Information: | © 2019 The Author(s). For reuse permissions, please contact the Author(s). |
Keywords: | cs.LG; cs.LG; stat.ML |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 08 Jan 2020 15:17 |
Last Modified: | 08 Jan 2020 20:37 |
Published Version: | https://arxiv.org/abs/1908.09946v4 |
Status: | Submitted |