NDLI: Bacterial pan-genomes: data representation and analysis

Please wait, while we are loading the content...

PReach: Reachability in Probabilistic Signaling Networks

GapsMis: flexible sequence alignment with a bounded number of gaps

Predicting protein transport mechanism and immune response using spatial protein motifs and epitopes: a case study of Chlamydophila MOMP

Fast and Accurate Structure-Based Prediction of Resistance to the HIV-1 Integrase Inhibitor Raltegravir

Relating mammalian replication program to large-scale chromatin folding

iAtheroSim: atherosclerosis process simulator on smart devices

Identifying Pathway Proteins in Networks using Convergence

MEGADOCK-GPU: Acceleration of Protein-Protein Docking Calculation on GPUs

GaitTrack: Health Monitoring of Body Motion from Spatio-Temporal Parameters of Simple Smart Phones

MoTeX: A word-based HPC tool for MoTif eXtraction

An Image-Text Approach for Extracting Experimental Evidence of Protein-Protein Interactions in the Biomedical Literature

In silico analysis of autoimmune diseases and genetic relationships to vaccination against infectious diseases

Improving the Prediction of Kinase Binding Affinity Using Homology Models

Chromatin structure fully determines replication timing program in human cells

Characterizing Amino Acid Variations of Scavenger Receptors by Class Information Gain

A Neural-network Algorithm for All k Shortest Paths Problem

pXAlign: A parallel implementation of XAlign

Aggregating Personal Health Messages for Scalable Comparative Effectiveness Research

Global Network Alignment In The Context Of Aging

An Island-Based Approach for Differential Expression Analysis

Modularity and community detection in Semantic Similarity Networks trough Spectral Based Transformation and Markov Clustering

A Constrained K-shortest Path Algorithm to Rank the Topologies of the Protein Secondary Structure Elements Detected in CryoEM Volume Maps

Unsupervised pattern discovery in human chromatin structure through genomic segmentation

Biomarkers in Immunology: from Concepts to Applications

Revealing Protein Structures by Co-Occurrence Clustering of Aligned Pattern Clusters

Locating Discharge Medications in Natural Language Summaries

Haplotype-based prediction of gene alleles using pedigrees and SNP genotypes

Multi-Objective Stochastic Search for Sampling Local Minima in the Protein Energy Surface

Age-Specific Signatures of Glioblastoma at the Genomic, Genetic, and Epigenetic levels

Exploring the Structure Space of Wildtype Ras Guided by Experimental Data

Landscape of neutralizing assessment of monoclonal antibodies against dengue virus

Comparative analysis of network algorithms to address modularity with gene expression temporal data

Evidence of a Pathway of Reduction in Bacteria: Reduced Quantities of Restriction Sites Impact tRNA Activity in a Trial Set

A Semi-Supervised Learning Approach to Integrated Salient Risk Features for Bone Diseases

Identifying protein complexes in AP-MS data with negative evidence via soft Markov clustering

SNP2Structure: A public database for mapping and modeling nsSNPs on human protein structures

Beta-sheet Detection and Representation from Medium Resolution Cryo-EM Density Maps

HPVdb: a Data Mining System for Knowledge Discovery in Human Papillomavirus with Applications in T cell Immunology and Vaccinology

Mobility Patterns of Doctors Using Electronic Health Records on iPads

Color distribution can accelerate network alignment

Stable Feature Selection with Minimal Independent Dominating Sets

An integrated pharmacogenomic analysis of doxorubicin response using genotype information on DMET genes

Informatics-driven Protein-protein Docking

DNA Vaccine Design for Chikungunya Virus Based On the Conserved Epitopes Derived from Structural Protein

Modeling Incidental Findings in Radiology Records

TCGA Toolbox: an Open Web App Framework for Distributing Big Data Analysis Pipelines for Cancer Genomics

Designing Autocorrelated Genes

Classifying Proteins by Amino Acid Variations of Sequence Patterns

An Evolutionary Conservation & Rigidity Analysis Machine Learning Approach for Detecting Critical Protein Residues

Defining Functional Redundancy of Epitope Data as Potential Antigenic Cross-Reactivity

Enforcing Minimum Necessary Access in Healthcare Through Integrated Audit and Access Control

A Study of Temporal Action Sequencing During Consumption of a Meal

Cloud4SNP: Distributed Analysis of SNP Microarray Data on the Cloud

Using Global Network Alignment In The Context Of Aging

Multi-Resolution Rigidity-Based Sampling of Protein Conformational Paths

Binary Response Models for Recognition of Antimicrobial Peptides

Application of a MAX-CUT Heuristic to the Contig Orientation Problem in Genome Assembly

Dynamic networks reveal key players in aging

A Combined Molecular Dynamics, Rigidity Analysis Approach for Studying Protein Complexes

Quantitative Early Detection of Diabetic Foot

Visual Analytics to Optimize Patient-Population Evidence Delivery for Personalized Care

Computational methods for alternative splicing detection using RNA-seq

Reconstructing transcriptional regulatory networks by probabilistic network component analysis

A Confidence Measure for Model Fitting with X-Ray Crystallography Data

Computer Assisted Surgery-Planning for Microwave Ablation

Protein Structure Refinement by Iterative Fragment Exchange

Heuristics for the Sorting by Length-Weighted Inversion Problem

Improvement of Protein-Protein Interaction Prediction by Integrating Template-Based and Template-Free Protein Docking

MarkovBin: An Algorithm to Cluster Metagenomic Reads Using a Mixture Modeling of Hierarchical Distributions

glu-RNA: aliGn highLy strUctured ncRNAs using only sequence similarity

The MEGADOCK project: Ultra-high-speed protein-protein interaction prediction tools on supercomputing environments

Detecting various types of differential splicing events using RNA-Seq data

Predictive model of the treatment effect for patients with major depressive disorder

Semi-automated Constraint-based Metabolic Model Generation

MRFy: Remote Homology Detection for Beta-Structural Proteins Using Markov Random Fields and Stochastic Search

Towards Independent Particle Reconstruction from Cryogenic Transmission Electron Microscopy

Incorporating Gene Annotations as Node Metadata to Improve Network Centrality Measures for Better Node Ranking

Improving discrimination of essential genes by modeling local insertion frequencies in transposon mutagenesis data

ChainKnot: a comparative H-type pseudoknot prediction tool using multiple ab initio folding tools

Protein-protein Docking Using Information from Native Interaction Interfaces

GLProbs: Aligning multiple sequences adaptively

A Framework for Identifying Affinity Classes of Inorganic Materials Binding Peptide Sequences

Determining miRNA-disease associations using bipartite graph modelling

PERGA: A Paired-End Read Guided De Novo Assembler for Extending Contigs Using SVM Approach

Clustering Coefficients in Protein Interaction Hypernetworks

Statistical Methods for Ambiguous Sequence Mappings

Simultaneous determination of subunit and complex structures of symmetric homo-oligomers from ambiguous NMR data

A Privacy Preserving Markov Model for Sequence Classification

ngPhylo: N-Gram Modeled Proteins with Substitution Matrices for Phylogenetic Analysis

Greedy Randomized Search Procedure to Sort Genomes using Symmetric, Almost-Symmetric and Unitary Inversions

Classification of Alzheimer Diagnosis from ADNI Plasma Biomarker Data

Automated protein structure refinement using i3Drefine software and its assessment in CASP10

A generalized sparse regression model with adjustment of pedigree structure for variant detection from next generation sequencing data

The Forward Stem Matrix: An Efficient Data Structure for Finding Hairpins in RNA Secondary Structures

A PCA-guided Search Algorithm to Probe the Conformational Space of the Ras Protein

Text Mining of Protein Phosphorylation Information Using a Generalizable Rule-Based Approach

Fine-Scale Recombination Mapping of High-Throughput Sequence Data

A Novel Algorithm for Feature Detection and Hiding from Ultrasound Images

Decomposing Biochemical Networks Into Elementary Flux Modes Using Graph Traversal

Transforming Genomes Using MOD Files with Applications

Bacterial pan-genomes: data representation and analysis

PathCase-MAW: An Online Metabolic Network Analysis Workbench

Read Annotation Pipeline for High-Throughput Sequencing Data

Using Machine Learning to Predict the Health of HIV-Infected Patients

Flexible RNA design under structure and sequence constraints using formal languages

Simulating Anti-adhesive and Antibacterial Bifunctional Polymers for Surface Coating using BioScape

RNA-Seq analyses to reveal the human transcriptome landscape

The TREC Medical Records Track

Systematic Assessment of RNA-Seq Quantification Tools Using Simulated Sequence Data

Initial Results In Using de Novo Motif Inference to Detect Cis-Regulatory Modules

SpliceGrapherXT: From Splice Graphs to Transcripts Using RNA-Seq

GPU-Optimized Hybrid Neighbor/Cell List Algorithm for Coarse-Grained MD Simulations of Protein and RNA Folding and Assembly

Comparative network analysis of gene co-expression networks reveals the conserved and species-specific functions of cell-wall related genes between Arabidopsis and Poplar

Classifying Immunophenotypes With Templates From Flow Cytometry

Gene Set Cultural Algorithm: A Cultural Algorithm Approach to Reconstruct Networks from Gene Sets

Reachability analysis in large probabilistic biological networks

An Ensemble Model for Mobile Device based Arrhythmia Detection

Scheduling of virtual screening application on multi-user pilot-agent platform on grid/cloud to optimize the stretch

Classifying Proteins by Amino Acid Variations of Sequential Patterns

Exploring Local Features and the Bag-of-Visual-Words Approach for Bioimage Classification

PRASE: PageRank-based Active Subnetwork Extraction

Quality of Care and Electronic Health Record Systems

Evaluation of Label Dependency for the Prediction of HLA Genes

Prediction of Biological Protein-protein Interaction Types Using Short-Linear Motifs

Topological properties of chromosome conformation graphs reflect spatial proximities within chromatin

Conditional Random Field for Candidate Gene Prioritization

Improving phosphopeptide identification in shotgun proteomics by supervised filtering of peptide-spectrum matches

Quantum Sequence Analysis: A New Alignment-free Technique For Analyzing Sequences in Feature Space

Performance Model Selection for Learning-based Biological Image Analysis on a Cluster

Predicting Breast Cancer Patient Survival Using Machine Learning

An Ensemble Topic Model for Sharing Healthcare Data and Predicting Disease Risk

ngsShoRT: A Software for Pre-processing Illumina Short Read Sequences for De Novo Genome Assembly

Masher: Mapping Long(er) Reads with Hash-based Genome Indexing on GPUs

Role of Quality in Electronic Health Record Systems

Suffix-Tree Based Error Correction of NGS Reads Using Multiple Manifestations of an Error

Sparse and Stable Reconstruction of Genetic Regulatory Networks Using Time Series Gene Expression Data

Genomic Sequence Fragment Identification using Quasi-Alignment

Listing Sorting Sequences of Reversals and Translocations

Measuring Relatedness Between Scientific Entities in Annotation Datasets

An Overview on Semantic Analysis of Proteomics Data

Identification of gene clusters with phenotype-dependent expression with application to normal and premature ageing

decisivatoR: an R infrastructure package that addresses the problem of phylogenetic decisiveness

Meta-analysis of Genomic and Proteomic Features to Predict Synthetic Lethality of Yeast and Human Cancer

Estimating the Number of Manually Segmented Cellular Objects Required to Evaluate the Performance of a Segmentation Algorithm

Temporal Relation Identification and Classification in Clinical Notes

Abstraction of Kinetic Models For Biochemical Networks

Co-occurrence Clusters of Aligned Pattern Clusters

Three-Dimensional Spot Detection in Ratiometric Fluorescence Imaging For Measurement of Subcellular Organelles

Evaluating theoretical models of protein interaction network evolution without seed graphs

The Atomizer: Extracting Implicit Molecular Structure from Reaction Network Models

Studies of biological networks with statistical model checking: application to immune system cells

An Algorithm for Constructing Hypothetical Evolutionary Trees Using Common Mutation Similarity Matrices

Predicting Protein Families using Protein Shape Context

Bacterial pan-genomes: data representation and analysis

Content Provider	ACM Digital Library
Author	Fedorov, Boris Tatusova, Tatiana Zaslavsky, Leonid
Abstract	Bacterial genomes at NCBI represent a large collection of strains with different levels of sequence and assembly quality as well as sampling density. Among these, there are densely-sampled sets of related genomes, usually human pathogens, whose organization and protein content could be directly analyzed within the concept of pan-genome. Even in groups of close genomes, protein families appear with very different frequencies, with "core proteins" at one end and "dispensable proteins" at another and "accessory proteins" in between. In order to organize genomes available in the NCBI repositories in related groups (species-level clades), we use a distance method based on a robust distance between sets of ribosomal proteins. The threshold is selected to have one species per clade in most of the cases, with some clades containing genomes from a few species. Within each clade, we then build trees based on similarity of protein content using hierarchical clustering with tight parameters. In order to identify protein families for genomes within a clade accurately and reliably, we use a combined approach taking into account both sequence similarity and genome context: First, proteins are clustered in tentative clusters using inclusive parameters. Then, within each of tentative clusters, local genome context and protein phylogenetic tree are used to separate paralogs. The combined approach allows defining core and conservative clusters for the pan-genome more accurately than by sequence-based clustering. For computational efficiency, protein redundancy and near-redundancy is eliminated, with one representative sequence from each near-redundant group used.
Starting Page	683
Ending Page	683
Page Count	1
File Format	PDF
ISBN	9781450324342
DOI	10.1145/2506583.2506684
Language	English
Publisher	Association for Computing Machinery (ACM)
Publisher Date	2013-09-22
Publisher Place	New York
Access Restriction	Subscribed
Subject Keyword	Bacterial Pathogens Paralogs Genomics Infrastructure Orthologs Clustering Pangenome Core clusters Protein clusters Computational Indexing
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in