Loading...
Please wait, while we are loading the content...
Similar Documents
Methods for Mining Data from Genome Wide High-Throughput Technologies
| Content Provider | Semantic Scholar |
|---|---|
| Author | Pehkonen, Petri |
| Copyright Year | 2007 |
| Abstract | Current high-throughput technologies, like DNA-microarrays, produce measurement data concerning the structure and function of cellular molecules in a genome wide manner. Analysis of such data requires using efficient and robust computational tools. The standard output from microarray analysis is a set of genes which are coor differentially expressed. Biological interpretation of such outcome aims at finding the mechanisms that cause such expression. This step often involves searching the biological databases and literature for biological attributes that are over-represented in the gene set. Recent methods and software programs use statistical approaches for finding such information. Still, there remain many questions which they do not address. This work presents novel bioinformatic methods and software tools for biological interpretation of data obtained from high throughput technologies. The presented methods 1) discover expected relations of genes and experimental conditions by literature mining, 2) discover biological processes which can explain the coor differential expression by using cluster analysis of functional information on genes, 3) discover putative regulatory elements which can explain the genes' co-expression, and 4) find the chromosomal locations with enrichment of co-expressed genes by using a segmentation procedure. Methods presented in this work analyze categorical data representing the associations between genes and biological attributes. The methods include clustering and segmentation, and statistical evaluation of such results. For clustering of high dimensional binary data, we present a method based on Non-negative Matrix factorization (NMF). This recent matrix factorization method has shown good performance in the analysis of binary data. In segmentation, we apply heuristics in order to obtain results in reasonable time. As clustering and segmentation produce several solutions with different numbers of clusters, we show novel methods for results evaluation. The developed methods outperform the alternatives in comparisons performed by using real and simulated data. The methods are applied to interpretation of several different datasets. These include gene expression data obtained from salmon fish under the treatment of environmental toxins, baker's yeast during cell cycle and under the influence of antifungal drug, and nematode including human Parkinson’s Disease related transgene. Universal Decimal Classification: 575.111, 575.112 National Library of Medicine Classification: QU 26.5, QU 58.5, QU 450, QU 470 Medical Subject Headings: Computational Biology; Information Storage and Retrieval; Genes; Genome; Genomics; Gene Expression Profiling; Databases, Genetic; Microarray Analysis; Cluster Analysis; Transcription Factors; Bayes Theorem; Factor Analysis, Statistical; Models, Statistical |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://epublications.uef.fi/pub/urn_isbn_978-951-27-0436-1/urn_isbn_978-951-27-0436-1.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |