3"e"nformatics: ethics, economics, & engineering: Efficiency and scalability of the k-Means clustering method

28/05/2010

Efficiency and scalability of the k-Means clustering method

Speaker: Dr. Giuseppe Di Fatta

Title: Efficiency and scalability of the k-Means clustering method

Abstract: One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Clustering is a classical unsupervised machine learning problem of the identification of groups of similar objects within a set, which is particularly useful in typical exploratory data mining applications. The scalability of the algorithms in terms of the input objects, the number of features and the number of clusters is crucial for very large sets of data. Techniques for improving the efficiency of k-Means have been largely explored in two main directions, sequential optimisation techniques and parallel processing. In the first case, the amount of computation in the sequential algorithm can be significantly reduced by adopting geometrical constraints, efficient data structures (e.g. multi-dimensional binary search trees) and partitioning strategies. This talk provides insights on the efficiency and scalability of the k-Means clustering method for both sequential and parallel implementations. In particular, it presents some recent work which improves the performance of the algorithm for very large data sets (millions of patterns) with high numbers of features and clusters.


Info-Aesthetics (will be published in 2012)	Semiotics in Information Systems Engineering

Information Rules	The Economic Value of Information

The Social Life of Information	Chariots of the Gods

28/05/2010

Efficiency and scalability of the k-Means clustering method

No comments: