Loading...
Please wait, while we are loading the content...
Similar Documents
Scalable string similarity search/join with approximate seeds and multiple backtracking.
| Content Provider | CiteSeerX |
|---|---|
| Author | Siragusa, Enrico Weese, David Reinert, Knut |
| Abstract | We present in this paper scalable algorithms for optimal string similarity search and join. Our methods are variations of those applied in Masai [15], our recently published tool for mapping high-throughput DNA sequencing data with unpreceded speed and accuracy. The key features of our approach are filtration with approximate seeds and methods for multiple backtracking. Approximate seeds, compared to exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds. Combined together, these two methods significantly speed up string similarity search and join operations. Our tool is implemented in C++ and OpenMP using the SeqAn library. The source code is distributed under the BSD license and can be freely downloaded from |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Approximate Seed Multiple Backtracking Scalable String Similarity Search Join Paper Scalable Algorithm Source Code Seqan Library Bsd License Large Set Similarity Search High-throughput Dna Optimal String Similarity Search Increase Filtration Specificity Key Feature Join Operation Unpreceded Speed |
| Content Type | Text |
| Resource Type | Article |