Loading...
Please wait, while we are loading the content...
Similar Documents
Algorithms for comparing and visualizing genome-scale datasets
| Content Provider | Semantic Scholar |
|---|---|
| Author | Popendorf, Kristoffer |
| Copyright Year | 2013 |
| Abstract | Recent years have seen a massive explosion in the number, complexity, and raw volume of new sequencing data thanks to advances in modern sequencing technology. In particular the advent of massively parallel sequencing, or colloquially “Next Generation Sequencing” or “NGS,” has opened up a new world of sequencing applications that were once impractical at best. Whole bacteria can now sequenced in a matter of days, new mammalian genomes can be sequenced for a fraction of what they once cost, and well known species like homo sapiens can be re-sequenced to discover novel genetic variants for less than a $1000. In the first chapter of this dissertation, we review the current state of genomics sequencing technology, its applications, and current challenges. One of the products of this sequencing explosion has been a wealth of newly sequenced genomes, including 57 vertebrates. With such rich data concerning some of our closest evolutionary relatives, comparative genomics studies promise to provide great insight into our physiology and development though analysis of similarities of whole genomes across multiple species. However, existing comparative genomics tools are capable of dealing with a few chromosomes at one time, and require excessive computational resources to keep pace with the vast number of genomes rapidly becoming available. To address this problem, we introduce a new approach to parallel sequence similarity search which offers efficient use of cluster computing resources to provide the scalability necessary to analyze current and future genome projects. We've named this algorithm Murasaki, and its details are described in Chapter 2. Another application of NGS technology has been in areas of transcriptome, regulation, and variant analysis. Two relatively new applications unique to NGS use the massive number of reads available from NGS to assay RNA products by sequencing the RNA itself (RNA-Seq), or capture and sequence the DNA bound to specific transcription factors or DNA-binding proteins (ChIP-Seq). The data from these experiments can be hard to understand because of the scale of the data involved is overwhelming and requires some practical reduction to find features of interest before conducting a more detailed investigation. Existing techniques for visualizing NGS data has been limited to examining small regions and/or offered limited support RNA-Seq/ChIP-Seq features. To address this problem we propose a new algorithm and data format implemented in our program, Samscope, described in chapter 3. In chapter 4 we summarize the impact of these new approaches, and examine their potential future areas of development. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://iroha.scitech.lib.keio.ac.jp:8080/sigma/bitstream/handle/10721/2627/document.pdf?isAllowed=y&sequence=4 |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |