NDLI: Novel algorithms for finding the closest l-mers in biological data

Please wait, while we are loading the content...

Novel algorithms for finding the closest l-mers in biological data

Content Provider	IEEE Xplore Digital Library
Author	S. Rajasekaran X. Cai A. Mamun
Copyright Year	2017
Abstract	With the advances in the next generation sequencing technology, huge amounts of data have been and get generated in biology. A bottleneck in dealing with such datasets lies in developing effective algorithms for extracting useful information from them. Algorithms for finding patterns in biological data pave the way for extracting crucial information from voluminous datasets. In this paper we focus on a fundamental pattern, namely, the closest l-mers. Given a set of m biological strings S1, S2, ..., Sm and an integer l, the problem of interest is that of finding an l-mer from each string such that the distance among them is the least. I.e., we want to find m l-mers X1, X2, ..., Xm such that Xi is an l-mer in Si (for 1 ≤ i ≤ m) and the Hamming distance among these m l-mers is the least (from among all such possible l-mers). This problem has many applications. An application of great importance is motif search. Algorithms for finding the closest l-mers have been used in solving the (l, d)-motif search problem (see e.g., [1], [2]). In this paper novel exact and approximate algorithms are proposed for this problem for the special case of m = 3. We consider the Euclidean distance metric if the sequences contain real numbers.
Starting Page	525
Ending Page	528
Page Count	4
File Format	HTM / HTML
ISBN	9781509030507
Journal	2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
DOI	10.1109/BIBM.2017.8217702
Language	English
Access Restriction	Open
Subject Keyword	D)-motifs Genomics Search problems Biology Possible l-mers Motif search problem L-mer Bioinformatics Efficient algorithms Exact algorithms Time series motifs Hamming distance L-mers X Time series analysis String matching Biological data Computational complexity Fundamental pattern Closest l-mers Generation sequencing technology Euclidean distance M-biological strings Biology computing Randomized algorithms Approximation algorithms Approximate algorithms Closest triplet
Content Type	Text
Resource Type	Preprint Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in