Loading...
Please wait, while we are loading the content...
Similar Documents
Evaluating Distance Functions in TR Clustering 1 Evaluating distance functions for clustering tandem repeats
| Content Provider | Semantic Scholar |
|---|---|
| Author | Rao, Suyog RodrÃguez, Alfredo Benson, Gary |
| Copyright Year | 2005 |
| Abstract | Tandem repeats are an important class of DNA repeats and much research has focused on their efficient identification [3, ?, ?], their use in DNA typing and fingerprinting [9,???], and their causative role in trinucleotide repeat diseases such as Huntington Disease, myotonic dystrophy, and Fragile-X mental retardation. We are interested in clustering tandem repeats into groups or families based on sequence similarity so that their biological importance may be further explored. To cluster tandem repeats we need a notion of pairwise distance which we obtain by alignment. In this paper we evaluate five distance functions used to produce those alignments Euclidean, Entropy-weighted, Consensus, Entropy-Surface, and Shannon Divergence. It is important to analyze and compare these functions because the choice of distance metric forms the core of any clustering algorithm. We employ a novel method to compare alignments and thereby compare the distance functions themselves. We rank the distance functions based on the cluster validation techniques Average Cluster Density and Silhouette Index. Finally, we propose a multi-phase clustering method which produces good-quality clusters. In this study, we analyze clusters of tandem repeats from five sequences: Human Chromosomes 3, 5, 10 and X and C. Elegans Chromosome III. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://tandem.bu.edu/papers/raobenson2.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |