This book gives a smooth, motivated and example-rich introduction to clustering, which is innovative in many aspects. Answers to important questions that are very rarely addressed if addressed at all, are provided. Examples: (a) what to do if the user has no idea of the number of clusters and/or their location - use what is called intelligent k-means; (b) what to do if the data contain both numeric and categorical features - use what is called three-step standardization procedure; (c) how to catch anomalous patterns, (d) how to validate clusters, etc. Some of these may be subject to criticism, however some motivation is always supplied, and the results are always reproducible thus testable. The book introduces a number of non-conventional cluster interpretation aids derived from a data geometry view accepted by the author and based on what is referred the contribution weights - basically showing those elements of cluster structures that distinguish clusters from the rest. These contribution weights, applied to categorical data, appear to be highly compatible with what statisticians such as A. Quetelet and K. Pearson were developing in the past couple of centuries, which is a highly original and welcome development. The book reviews a rich set of approaches being accumulated in such hot areas as text mining and bioinformatics, and shows that clustering is not just a set of naive methods for data processing but forms an evolving area of data science. I adopted the book as a text for my courses in data mining for bachelor and master degrees.
Ссылка удалена правообладателем ---- The book removed at the request of the copyright holder.