Ndarray Data Language

Simple language with easy syntax for describing the content of multidimensional array (ndarray) data files.

Learn more

What is Ndarray Data Language?

Files with multidimensional array data (also referred as ndarray or datacube) are commonplace in science, engineering, and machine learning. They are typically in various binary formats because that is the most efficient storage method for array data. One important aspect of the workflow is describing file content in some textual form that is easy for users to understand, create, or share. This requires special software tools and many of these file formats have at least one. Their output text formats are not standardized and vary in the file content detail but usually are directly related to the underlying storage format.

The Ndarray Data Language aims to provide a common text format for describing the content of multidimensional array data files. The language's main features are:

  • Modelled as a simple abstraction of the HDF5 and netCDF data models and file formats.
  • Simple and clean syntax based on YAML. Parseable by all major programming languages, yet easy for users to understand and create in their favorite text editor.
  • Easily extendable to accomodate additional information for a particular file format.


Don't want to read the syntax specification? Understandable. This tutorial provides an easy to follow guide to the Ndarray Data Language.


Examples of real file content description using Ndarray Data Language.


What good is a data language without software to make it useable? Agree! But still too early for this. Hopefully coming soon!