Loading...
Please wait, while we are loading the content...
Similar Documents
Evaluating distance functions for clustering tandem repeats.
| Content Provider | Semantic Scholar |
|---|---|
| Author | Rao, Suyog Rodriguez, Alfredo Benson, Gary |
| Copyright Year | 2005 |
| Abstract | Tandem repeats are an important class of DNA repeats and much research has focused on their efficient identification, their use in DNA typing and fingerprinting, and their causative role in trinucleotide repeat diseases such as Huntington Disease, myotonic dystrophy, and Fragile-X mental retardation. We are interested in clustering tandem repeats into groups or families based on sequence similarity so that their biological importance may be further explored. To cluster tandem repeats we need a notion of pairwise distance which we obtain by alignment. In this paper we evaluate five distance functions used to produce those alignments--Consensus, Euclidean, Jensen-Shannon Divergence, Entropy-Surface, and Entropy-weighted. It is important to analyze and compare these functions because the choice of distance metric forms the core of any clustering algorithm. We employ a novel method to compare alignments and thereby compare the distance functions themselves. We rank the distance functions based on the cluster validation techniques--Average Cluster Density and Average Silhouette Width. Finally, we propose a multi-phase clustering method which produces good-quality clusters. In this study, we analyze clusters of tandem repeats from five sequences: Human Chromosomes 3, 5, 10 and X and C. elegans Chromosome III. |
| Starting Page | 3 |
| Ending Page | 12 |
| Page Count | 10 |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://www.jsbi.org/pdfs/journal1/IBSB05/IBSB05F011.pdf |
| Alternate Webpage(s) | http://www.sec.gov/rules/other/2010/33-9103.pdf |
| Alternate Webpage(s) | http://www.jsbi.org/journal/IBSB05/IBSB05F011.pdf |
| Alternate Webpage(s) | http://www.jsbi.org/modules/journal1/index.php/IBSB05/IBSB05F011.pdf |
| PubMed reference number | 16362901v1 |
| Volume Number | 16 |
| Issue Number | 1 |
| Journal | Genome informatics. International Conference on Genome Informatics |
| Language | English |
| Access Restriction | Open |
| Subject Keyword | Alignment DNA Fingerprinting Huntington Disease Mental Retardation Myotonic Dystrophy Numerous Pierre Robin Syndrome Short Tandem Repeat Tandem Repeat Sequences Trinucleotide Repeats Variable Number of Tandem Repeats algorithm statistical cluster |
| Content Type | Text |
| Resource Type | Article |