Loading...
Please wait, while we are loading the content...
Similar Documents
Characterizing deep sequencing analytics using BFAST: towards a scalable distributed architecture for next-generation sequencing data
| Content Provider | ACM Digital Library |
|---|---|
| Author | Kim, Joohyun Maddineni, Sharath Jha, Shantenu |
| Abstract | Next Generation DNA Sequencing platforms produce significantly larger amounts of data compared to early Sanger technology sequencers. In addition to the challenges of data-management that arise from unprecedented volumes of data, there exists the important requirement of effectively analyzing the data. In this paper, we use BFAST -- genome-wide mapping application, as a representative example of the typical analysis that is required on data from NGS machines. We investigate two model genomes -- human genome and a microbe (Burkerholderia Glumae), that represent an eukaryotic and a prokaryotic system. The computational complexity of genome-wide mapping using BFAST, amongst other factors depends upon the size of a reference genome, the data size of short reads. We analyze the performance characteristics of BFAST and understand its dependency on different input parameters. Characterizing the performance suggests that genome-wide mapping benefits from both scaling-up (increased fine-grained parallelism) and scaling-out (task-level parallelism -- local and distributed). For certain problem instances, scaling-out can be a more efficient approach than scaling-up. We then design, develop and demonstrate a runtime-environment that supports both the scale-up and scale-out of BFAST on production grid and cloud environments. |
| Starting Page | 23 |
| Ending Page | 32 |
| Page Count | 10 |
| File Format | |
| ISBN | 9781450307024 |
| DOI | 10.1145/1996023.1996027 |
| Language | English |
| Publisher | Association for Computing Machinery (ACM) |
| Publisher Date | 2011-06-08 |
| Publisher Place | New York |
| Access Restriction | Subscribed |
| Subject Keyword | Pilot-job abstraction Runtime environment Burkerholderia glumae Human genome Genome sequence alignment Simple api for grid applications (saga) Distributed computing Bfast |
| Content Type | Text |
| Resource Type | Article |