Abstract
Darr is a Python science library to work with potentially large NumPy arrays and metadata that persist on disk, in a format that is simple, self-documented and tool-independent. The goal is to keep your data easily accessible on the short and long term, from a wide range of computing environments. Keeping data universally readable and documented is in line with good scientific practice. It not only makes it easy to share data with others, but also to look at you own data with different tools. More rationale for this approach is provided here.
Flat binary files and (JSON) text files are accompanied by a README text file that explains how the specific data and metadata are stored and how they can be read. This includes code for reading the array in a variety of current scientific data tools such as Python, R, Julia, IDL, Matlab, Maple, and Mathematica. It is trivially easy to share your data with others or with yourself when working in different computing environments, because it always contains a clear and specific description of how to read it. No need to export anything or to provide elaborate explanation. No dependence on complicated formats or specialized tools. Self-documentation and code examples are automatically updated as your array changes.
Darr uses NumPy memmory-mapped arrays under the hood, which you can access directly for full NumPy compatibility and efficient out-of-core read/write access to potentially very large arrays. In addition, Darr supports the possibility to append and truncate arrays, and the use of ragged arrays (still experimental).
Flat binary files and (JSON) text files are accompanied by a README text file that explains how the specific data and metadata are stored and how they can be read. This includes code for reading the array in a variety of current scientific data tools such as Python, R, Julia, IDL, Matlab, Maple, and Mathematica. It is trivially easy to share your data with others or with yourself when working in different computing environments, because it always contains a clear and specific description of how to read it. No need to export anything or to provide elaborate explanation. No dependence on complicated formats or specialized tools. Self-documentation and code examples are automatically updated as your array changes.
Darr uses NumPy memmory-mapped arrays under the hood, which you can access directly for full NumPy compatibility and efficient out-of-core read/write access to potentially very large arrays. In addition, Darr supports the possibility to append and truncate arrays, and the use of ragged arrays (still experimental).
Original language | English |
---|---|
Publication status | Published - 2021 |