NDLI: A fast weak motif-finding algorithm based on community detection in graphs

Content Provider	Springer Nature : BioMed Central
Author	Jia, Caiyan Carson, Matthew B Yu, Jian
Abstract	Background Identification of transcription factor binding sites (also called ‘motif discovery’) in DNA sequences is a basic step in understanding genetic regulation. Although many successful programs have been developed, the problem is far from being solved on account of diversity in gene expression/regulation and the low specificity of binding sites. State-of-the-art algorithms have their own constraints (e.g., high time or space complexity for finding long motifs, low precision in identification of weak motifs, or the OOPS constraint: one occurrence of the motif instance per sequence) which limit their scope of application. Results In this paper, we present a novel and fast algorithm we call TFBSGroup. It is based on community detection from a graph and is used to discover long and weak (l,d) motifs under the ZOMOPS constraint (zero, one or multiple occurrence(s) of the motif instance(s) per sequence), where l is the length of a motif and d is the maximum number of mutations between a motif instance and the motif itself. Firstly, TFBSGroup transforms the (l, d) motif search in sequences to focus on the discovery of dense subgraphs within a graph. It identifies these subgraphs using a fast community detection method for obtaining coarse-grained candidate motifs. Next, it greedily refines these candidate motifs towards the true motif within their own communities. Empirical studies on synthetic (l, d) samples have shown that TFBSGroup is very efficient (e.g., it can find true (18, 6), (24, 8) motifs within 30 seconds). More importantly, the algorithm has succeeded in rapidly identifying motifs in a large data set of prokaryotic promoters generated from the Escherichia coli database RegulonDB. The algorithm has also accurately identified motifs in ChIP-seq data sets for 12 mouse transcription factors involved in ES cell pluripotency and self-renewal. Conclusions Our novel heuristic algorithm, TFBSGroup, is able to quickly identify nearly exact matches for long and weak (l, d) motifs in DNA sequences under the ZOMOPS constraint. It is also capable of finding motifs in real applications. The source code for TFBSGroup can be obtained from http://bioinformatics.bioengr.uic.edu/TFBSGroup/ .
Related Links	https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/1471-2105-14-227.pdf
Ending Page	14
Page Count	14
Starting Page	1
File Format	HTM / HTML
ISSN	14712105
DOI	10.1186/1471-2105-14-227
Journal	BMC Bioinformatics
Issue Number	1
Volume Number	14
Language	English
Publisher	BioMed Central
Publisher Date	2013-07-17
Access Restriction	Open
Subject Keyword	Bioinformatics Microarrays Computational Biology Computer Appl. in Life Sciences Algorithms Transcription Factor Binding Site Community Detection Motif Consensus Consensus Model Dense Subgraph Computational Biology/Bioinformatics
Content Type	Text
Resource Type	Article
Subject	Molecular Biology Biochemistry Computer Science Applications Applied Mathematics Structural Biology
Journal Impact Factor	2.9/2023
5-Year Journal Impact Factor	3.6/2023

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

fREDUCE: Detection of degenerate regulatory elements using correlation with expression

A survey of DNA motif finding algorithms

A study on the application of topic models to motif finding algorithms

Extracting transcription factor binding sites from unaligned gene sequences with statistical models

Sequence motif finder using memetic algorithm

LASAGNA: A novel algorithm for transcription factor binding site alignment

Searching for transcription factor binding sites in vector spaces

Sequence information gain based motif analysis

MotifMap: integrative genome-wide maps of regulatory motif sites for model species

A fast weak motif-finding algorithm based on community detection in graphs

Similar Documents

fREDUCE: Detection of degenerate regulatory elements using correlation with expression

A survey of DNA motif finding algorithms

A study on the application of topic models to motif finding algorithms

Extracting transcription factor binding sites from unaligned gene sequences with statistical models

Sequence motif finder using memetic algorithm

LASAGNA: A novel algorithm for transcription factor binding site alignment

Searching for transcription factor binding sites in vector spaces

Sequence information gain based motif analysis

MotifMap: integrative genome-wide maps of regulatory motif sites for model species

A fast weak motif-finding algorithm based on community detection in graphs