NDLI: ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers

Content Provider	Springer Nature : BioMed Central
Author	Coombe, Lauren Zhang, Jessica Vandervalk, Benjamin P. Chu, Justin Jackman, Shaun D. Birol, Inanc Warren, René L.
Abstract	Background The long-range sequencing information captured by linked reads, such as those available from 10× Genomics (10xG), helps resolve genome sequence repeats, and yields accurate and contiguous draft genome assemblies. We introduce ARKS, an alignment-free linked read genome scaffolding methodology that uses linked reads to organize genome assemblies further into contiguous drafts. Our approach departs from other read alignment-dependent linked read scaffolders, including our own (ARCS), and uses a kmer-based mapping approach. The kmer mapping strategy has several advantages over read alignment methods, including better usability and faster processing, as it precludes the need for input sequence formatting and draft sequence assembly indexing. The reliance on kmers instead of read alignments for pairing sequences relaxes the workflow requirements, and drastically reduces the run time. Results Here, we show how linked reads, when used in conjunction with Hi-C data for scaffolding, improve a draft human genome assembly of PacBio long-read data five-fold (baseline vs. ARKS NG50 = 4.6 vs. 23.1 Mbp, respectively). We also demonstrate how the method provides further improvements of a megabase-scale Supernova human genome assembly (NG50 = 14.74 Mbp vs. 25.94 Mbp before and after ARKS), which itself exclusively uses linked read data for assembly, with an execution speed six to nine times faster than competitive linked read scaffolders (~ 10.5 h compared to 75.7 h, on average). Following ARKS scaffolding of a human genome 10xG Supernova assembly (of cell line NA12878), fewer than 9 scaffolds cover each chromosome, except the largest (chromosome 1, n = 13). Conclusions ARKS uses a kmer mapping strategy instead of linked read alignments to record and associate the barcode information needed to order and orient draft assembly sequences. The simplified workflow, when compared to that of our initial implementation, ARCS, markedly improves run time performances on experimental human genome datasets. Furthermore, the novel distance estimator in ARKS utilizes barcoding information from linked reads to estimate gap sizes. It accomplishes this by modeling the relationship between known distances of a region within contigs and calculating associated Jaccard indices. ARKS has the potential to provide correct, chromosome-scale genome assemblies, promptly. We expect ARKS to have broad utility in helping refine draft genomes.
Related Links	https://bmcbioinformatics.biomedcentral.com/counter/pdf/10.1186/s12859-018-2243-x.pdf
Ending Page	10
Page Count	10
Starting Page	1
File Format	HTM / HTML
ISSN	14712105
DOI	10.1186/s12859-018-2243-x
Journal	BMC Bioinformatics
Issue Number	1
Volume Number	19
Language	English
Publisher	BioMed Central
Publisher Date	2018-06-20
Access Restriction	Open
Subject Keyword	Bioinformatics Microarrays Computational Biology Computer Appl. in Life Sciences Algorithms 10× Genomics Chromium ARKS ARCS Next-generation sequencing de novo assembly Genome scaffolding Linked reads Supernova assembler Read mapping Kmers Computational Biology/Bioinformatics
Content Type	Text
Resource Type	Article
Subject	Molecular Biology Biochemistry Computer Science Applications Applied Mathematics Structural Biology
Journal Impact Factor	2.9/2023
5-Year Journal Impact Factor	3.6/2023

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers

Tigmint: correcting assembly errors using linked reads from large molecules

GoldPolish-target: targeted long-read genome assembly polishing

SLR-superscaffolder: a de novo scaffolding tool for synthetic long reads using a top-to-bottom scheme

LongStitch: high-quality genome assembly correction and scaffolding using long reads

GPU acceleration of Darwin read overlapper for de novo assembly of long DNA reads

SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information

WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data

Genome analysis ARCS : scaffolding genome drafts with linked reads

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers

Similar Documents

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers

Tigmint: correcting assembly errors using linked reads from large molecules

GoldPolish-target: targeted long-read genome assembly polishing

SLR-superscaffolder: a de novo scaffolding tool for synthetic long reads using a top-to-bottom scheme

LongStitch: high-quality genome assembly correction and scaffolding using long reads

GPU acceleration of Darwin read overlapper for de novo assembly of long DNA reads

SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information

WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data

Genome analysis ARCS : scaffolding genome drafts with linked reads

ARKS: chromosome-scale scaffolding of human genome drafts with linked read kmers