Loading...
Please wait, while we are loading the content...
Similar Documents
Evaluating Distance Functions in TR Clustering 1 Evaluating distance functions for clustering tandem repeats
| Content Provider | CiteSeerX |
|---|---|
| Author | Rao, Suyog Benson, Alfredo Rodriguez Gary |
| Abstract | Tandem repeats are an important class of DNA repeats and much research has focused on their efficient identification [3,?,?], their use in DNA typing and fingerprinting [9,???], and their causative role in trinucleotide repeat diseases such as Huntington Disease, myotonic dystrophy, and Fragile-X mental retardation. We are interested in clustering tandem repeats into groups or families based on sequence similarity so that their biological importance may be further explored. To cluster tandem repeats we need a notion of pairwise distance which we obtain by alignment. In this paper we evaluate five distance functions used to produce those alignments- Euclidean, Entropy-weighted, Consensus, Entropy-Surface, and Shannon Divergence. It is important to analyze and compare these functions because the choice of distance metric forms the core of any clustering algorithm. We employ a novel method to compare alignments and thereby compare the distance functions themselves. We rank the distance functions based on the cluster validation techniques Average Cluster Density and Silhouette Index. Finally, we propose a multi-phase clustering method which produces good-quality clusters. In this study, we analyze clusters of tandem repeats from five sequences: Human Chromosomes 3, 5, 10 and X and C. Elegans Chromosome III. |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Tandem Repeat Distance Function Tr Clustering Evaluating Distance Function Sequence Similarity Dna Repeat Human Chromosome Dna Typing Trinucleotide Repeat Disease Cluster Validation Technique Elegans Chromosome Iii Silhouette Index Biological Importance Myotonic Dystrophy Good-quality Cluster Shannon Divergence Efficient Identification Fragile-x Mental Retardation Important Class Much Research Causative Role Pairwise Distance Huntington Disease Multi-phase Clustering Method Cluster Density |
| Content Type | Text |
| Resource Type | Article |