Loading...
Please wait, while we are loading the content...
Similar Documents
Smart-Sample : An Efficient Algorithm for Clustering Large High-Dimensional Datasets
| Content Provider | Semantic Scholar |
|---|---|
| Author | Lazarov, Dudu David, Gil Averbuch, Amir |
| Copyright Year | 2009 |
| Abstract | Finding useful related patterns in a dataset is an important task in many interesting applications. In particular, one common need in many algorithms, is the ability to separate a given dataset into a small number of clusters. Each cluster represents a subset of data-points from the dataset, which are considered similar. In some cases, it is also necessary to distinguish data points that are not part of a pattern from the other data-points. This paper introduces a new data clustering method named smart-sample and compares its performance to several clustering methodologies. We show that smart-sample clusters successfully large high-dimensional datasets. In addition, smart-sample outperforms other methodologies in terms of running-time. A variation of the smart-sample algorithm, which guarantees efficiency in terms of I/O, is also presented. We describe how to achieve an approximation of the in-memory smart-sample algorithm using a constant number of scans with a single sort operation on the disk. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://www.cs.tau.ac.il/~amir1/PS/smart-sample.pdf |
| Alternate Webpage(s) | http://cs.tau.ac.il/~amir1/PS/smart-sample.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |