Chanda, P., Elhaik, E. orcid.org/0000-0003-4795-1084 and Bader, J. S.
(2012)
HapZipper: sharing HapMap populations just got easier.
Nucleic Acids Research, 40 (20).
e159.
ISSN 0305-1048
Abstract
The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis. The HapMap project is one of the largest public resources of human single-nucleotide polymorphisms (SNPs), characterizing over 3 million SNPs genotyped in over 1000 individuals. The standard format and biological properties of HapMap data suggest that a dedicated genetic compression method can outperform generic compression tools. We propose a compression methodology for genetic data by introducing H ap Z ipper , a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as gzip , bzip2 and lzma . We demonstrate the usefulness of H ap Z ipper by compressing HapMap 3 populations to <5% of their original sizes. H ap Z ipper is freely downloadable from https://bitbucket.org/pchanda/hapzipper/downloads/HapZipper.tar.bz2
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © The Author(s) 2012. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
Keywords: | genome; single nucleotide polymorphism; genetics; compression |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Science (Sheffield) > School of Biosciences (Sheffield) > Department of Animal and Plant Sciences (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 05 Dec 2016 12:41 |
Last Modified: | 05 Dec 2016 12:41 |
Published Version: | http://dx.doi.org/10.1093/nar/gks709 |
Status: | Published |
Publisher: | Oxford University Press (OUP): Policy C - Option B |
Refereed: | Yes |
Identification Number: | 10.1093/nar/gks709 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:108692 |
Download
Filename: HapZipper: sharing HapMap populations just got easier.pdf
Licence: CC-BY-NC 3.0