Loading...
Please wait, while we are loading the content...
Similar Documents
Novel algorithms for finding the closest l-mers in biological data
| Content Provider | IEEE Xplore Digital Library |
|---|---|
| Author | S. Rajasekaran X. Cai A. Mamun |
| Copyright Year | 2017 |
| Abstract | With the advances in the next generation sequencing technology, huge amounts of data have been and get generated in biology. A bottleneck in dealing with such datasets lies in developing effective algorithms for extracting useful information from them. Algorithms for finding patterns in biological data pave the way for extracting crucial information from voluminous datasets. In this paper we focus on a fundamental pattern, namely, the closest l-mers. Given a set of m biological strings S1, S2, ..., Sm and an integer l, the problem of interest is that of finding an l-mer from each string such that the distance among them is the least. I.e., we want to find m l-mers X1, X2, ..., Xm such that Xi is an l-mer in Si (for 1 ≤ i ≤ m) and the Hamming distance among these m l-mers is the least (from among all such possible l-mers). This problem has many applications. An application of great importance is motif search. Algorithms for finding the closest l-mers have been used in solving the (l, d)-motif search problem (see e.g., [1], [2]). In this paper novel exact and approximate algorithms are proposed for this problem for the special case of m = 3. We consider the Euclidean distance metric if the sequences contain real numbers. |
| Starting Page | 525 |
| Ending Page | 528 |
| Page Count | 4 |
| File Format | HTM / HTML |
| ISBN | 9781509030507 |
| Journal | 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) |
| DOI | 10.1109/BIBM.2017.8217702 |
| Language | English |
| Access Restriction | Open |
| Subject Keyword | D)-motifs Genomics Search problems Biology Possible l-mers Motif search problem L-mer Bioinformatics Efficient algorithms Exact algorithms Time series motifs Hamming distance L-mers X Time series analysis String matching Biological data Computational complexity Fundamental pattern Closest l-mers Generation sequencing technology Euclidean distance M-biological strings Biology computing Randomized algorithms Approximation algorithms Approximate algorithms Closest triplet |
| Content Type | Text |
| Resource Type | Preprint Article |