An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations

Abstract

K-Means is one of the most used algorithms for data clustering and the usual clustering method for benchmarking. Despite its wide application it is well-known that it suffers from a series of disadvantages, such as the positions of the initial clustering centres (centroids), which can greatly affect the clustering solution. Over the years many K-Means variations and initialisations techniques have been proposed with different degrees of complexity. In this study we focus on common K-Means variations and deterministic initialisation techniques and we first show that more sophisticated initialisation methods reduce or alleviates the need of complex K-Means clustering, and secondly, that deterministic methods can achieve equivalent or better performance than stochastic methods. These conclusions are obtained through extensive benchmarking using different model data sets from various studies as well as clustering data sets.

Metadata

Item Type:	Article
Authors/Creators:	Vouros, A. https://orcid.org/0000-0002-3383-6133 Langdell, S. Croucher, M. Vasilaki, E. https://orcid.org/0000-0003-3705-7070
Copyright, Publisher and Additional Information:	© 2021 The Author(s). This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Keywords:	K-Means clustering; Deterministic clustering; Benchmarking
Dates:	Submitted: 26 August 2019 Accepted: 15 June 2021 Published (online): 12 July 2021 Published: August 2021
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	08 Jan 2020 15:17
Last Modified:	23 Aug 2021 09:57
Status:	Published
Publisher:	Springer Nature
Refereed:	Yes
Identification Number:	10.1007/s10994-021-06021-7
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:151015

CORE (COnnecting REpositories)

An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations

Abstract

Metadata

Download

Published Version

Export

Statistics