This is a limited proof of concept to search for research data, not a production system.

Search the MIT Libraries

Title: Hickle: a HDF5-based python pickle replacement

Type Software Danny Price, Sébastien Celles, Pieter T. Eendebak, Michael M. McKerns, Eben M. Olson, Colin Raffel, Bairen Yi (2018): Hickle: a HDF5-based python pickle replacement. Zenodo. Software. https://zenodo.org/record/2345649

Authors: Danny Price (Swinburne University of Technology) ; Sébastien Celles ; Pieter T. Eendebak ; Michael M. McKerns ; Eben M. Olson ; Colin Raffel ; Bairen Yi ;

Links

Summary

hickle is a Python 2/3 package for quickly dumping and loading python data structures to Hierarchical Data Format 5 (HDF5) files. When dumping to HDF5, hickle automatically convert Python data structures (e.g. lists, dictionaries, numpy arrays) into HDF5 groups and datasets. When loading from file, hickle automatically converts data back into its original data type. A key motivation for hickle is to provide high-performance loading and storage of scientific data in the widely-supported HDF5 format.

hickle is designed as a drop-in replacement for the Python pickle package, which converts Python object hierarchies to and from Python-specific byte streams (processes known as 'pickling' and 'unpickling' respectively). Several different protocols exist, and files are not designed to be compatible between Python versions, nor interpretable in other languages. In contrast, hickle stores and loads files from HDF5, for which application programming interfaces (APIs) exist in most major languages, including C, Java, R, and MATLAB.

Python data structures are mapped into the HDF5 abstract data model in a logical fashion, using the h5py package. Metadata required to reconstruct the hierarchy of objects, and to allow conversion into Python objects, is stored in HDF5 attributes. Most commonly used Python iterables (dict, tuple, list, set), and data types (int, float, str) are supported, as are numpy N-dimensional arrays. Commonly-used astropy data structures and scipy sparse matrices are also supported.

hickle has been used in many scientific research projects, including:

Visualization and machine learning on volumetric fluorescence microscopy datasets from histological tissue imaging. Caching pre-computed features for MIDI and audio files for downstream machine learning tasks. Storage and transmission of high volume of shot-gun proteomics data, such as mass spectra of proteins and peptide segments. Storage of astronomical data and calibration data from radio telescopes.

hickle is released under the MIT license, and is available from PyPi via pip; source code is available at https://github.com/telegraphic/hickle. Note: this text is modified from the hickle Journal for Open-Source Software paper, https://github.com/telegraphic/hickle/blob/master/paper.md.

More information

  • DOI: 10.5281/zenodo.2345649

Subjects

  • Python, HDF5, pickle, data format

Dates

  • Publication date: 2018
  • Issued: December 17, 2018

Rights


Much of the data past this point we don't have good examples of yet. Please share in #rdi slack if you have good examples for anything that appears below. Thanks!

Format

electronic resource

Relateditems

DescriptionItem typeRelationshipUri
IsVersionOfhttps://doi.org/10.5281/zenodo.2345648
IsPartOfhttps://zenodo.org/communities/zenodo