Loading...
Please wait, while we are loading the content...
Similar Documents
Abstract a matrix density based algorithm to hierarchically co-cluster documents and words.
| Content Provider | CiteSeerX |
|---|---|
| Author | Mandhani, Bhushan |
| Abstract | This paper proposes an algorithm to hierarchically cluster documents. Each cluster is actually a cluster of documents and an associated cluster of words, thus a document–word co-cluster. Note that, the vector model for documents creates the document–word matrix, of which every co-cluster is a submatrix. One would intuitively expect a submatrix made up of high values to be a good document cluster, with the corresponding word cluster containing its most distinctive features. Our algorithm looks to exploit this. We have defined matrix density, and our algorithm basically uses matrix density considerations in its working. The algorithm is a partitional–agglomerative algorithm. The partitioning step involves the identification of dense submatrices so that the respective row sets partition the row set of the complete matrix. The hierarchical agglomerative step involves merging the most “similar ” submatrices until we are down to the required number of clusters (if we want a flat clustering) or until we have just the single complete matrix left (if we are interested in a hierarchical arrangement of documents). It also generates apt labels for each cluster or hierarchy node. The similarity measure between clusters used for merging is based on the fact that the clusters here are co-clusters, and is a key point of difference from existing agglomerative algorithms. We will refer to the proposed algorithm as RPSA (Rowset Partitioning and Submatrix Agglomeration). We have compared it as a clustering algorithm with Spherical K-Means and Spectral Graph Partitioning. We have also evaluated some hierarchies generated by the algorithm. |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Flat Clustering Document Word Matrix Agglomerative Algorithm Corresponding Word Cluster Matrix Density Vector Model Single Complete Matrix Hierarchical Arrangement Hierarchical Agglomerative Step Submatrix Agglomeration Hierarchy Node Apt Label Row Set Good Document Cluster Document Word Co-cluster Spherical K-means Hierarchically Co-cluster Document Partitional Agglomerative Algorithm Rowset Partitioning Spectral Graph Partitioning Required Number Key Point Associated Cluster Cluster Document Complete Matrix High Value Distinctive Feature Respective Row Set Dense Submatrices Similar Submatrices Matrix Density Consideration Partitioning Step Similarity Measure |
| Content Type | Text |
| Resource Type | Article |