72 страниц. 2012 год. LAP Lambert Academic Publishing The goal of clustering is to decompose or partition a data set into groups such that both the intra-group similarity and the inter-group dissimilarity are maximized. In many applications, the size of the data that needs to be clustered is much more than what can be processed at a single site. Further, the data to be clustered could be inherently distributed. The increasing demand to scale up to these massive data sets which are inherently distributed over networks with limited bandwidth and computational resources has led to methods for parallel and distributed data clustering. In this book, we present a cohesive framework for cluster identification and outlier detection for distributed data. The data is either distributed originally because of its production at different locations or is distributed in order to gain a computational speed up. We use a parameter free clustering algorithm to cluster the data at local sites.