Schneeberger, K., Ossowski, S., Ott, F. et al. (10 more authors) (2011) Reference-guided assembly of four diverse Arabidopsis thaliana genomes. Proceedings of the National Academy of Sciences of the United States of America, 108 (25). pp. 10249-10254. ISSN 0027-8424
Abstract
We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through http://1001genomes.org/projects/assemblies.html.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2016 National Academy of Sciences. All rights reserved. Unless otherwise indicated, all materials on these pages are copyrighted by the National Academy of Sciences. All rights reserved. No part of these pages, either text or image may be used for any purpose other than personal use. Therefore, reproduction, modification, storage in a retrieval system or retransmission, in any form or by any means, electronic, mechanical or otherwise, for reasons other than personal use, is strictly prohibited without prior written permission. The identifier "the National Academies" refers collectively to the National Academy of Sciences, National Academy of Engineering, Institute of Medicine and National Research Council. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Science (Sheffield) > School of Biosciences (Sheffield) > Department of Animal and Plant Sciences (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 12 May 2016 08:51 |
Last Modified: | 12 May 2016 08:51 |
Published Version: | http://dx.doi.org/10.1073/pnas.1107739108 |
Status: | Published |
Publisher: | National Academy of Sciences |
Refereed: | Yes |
Identification Number: | 10.1073/pnas.1107739108 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:99605 |