Importing HDF5 Files

Overview

Hierarchical Data Format, Version 5, (HDF5) is a general-purpose, machine-independent standard for storing scientific data in files, developed by the National Center for Supercomputing Applications (NCSA). HDF5 is used by a wide range of engineering and scientific fields that want a standard way to store data so that it can be shared. For more information about the HDF5 file format, read the HDF5 documentation available at the HDF Web site (https://www.hdfgroup.org).

MATLAB® provides two methods to import data from an HDF5 file:

  • High-level functions that make it easy to import data, when working with numeric datasets

  • Low-level functions that enable more complete control over the importing process, by providing access to the routines in the HDF5 C library

Note

For information about importing to HDF4 files, which have a separate, incompatible format, see Import HDF4 Files Programmatically.

Using the High-Level HDF5 Functions to Import Data

MATLAB includes several functions that you can use to examine the contents of an HDF5 file and import data from the file into the MATLAB workspace.

Note

You can only use the high-level functions to read numeric datasets or attributes. To read non-numeric datasets or attributes, you must use the low-level interface.

  • h5disp — View the contents of an HDF5 file

  • h5info — Create a structure that contains all the metadata defining an HDF5 file

  • h5read — Read data from a variable in an HDF5 file

  • h5readatt — Read data from an attribute associated with a variable in an HDF5 file or with the file itself (a global attribute).

For details about how to use these functions, see their reference pages, which include examples. The following sections illustrate some common usage scenarios.

Determining the Contents of an HDF5 File

HDF5 files can contain data and metadata, called attributes. HDF5 files organize the data and metadata in a hierarchical structure similar to the hierarchical structure of a UNIX® file system.

In an HDF5 file, the directories in the hierarchy are called groups. A group can contain other groups, data sets, attributes, links, and data types. A data set is a collection of data, such as a multidimensional numeric array or string. An attribute is any data that is associated with another entity, such as a data set. A link is similar to a UNIX file system symbolic link. Links are a way to reference objects without having to make a copy of the object.

Data types are a description of the data in the data set or attribute. Data types tell how to interpret the data in the data set.

To get a quick view into the contents of an HDF5 file, use the h5disp function.

h5disp('example.h5')

HDF5 example.h5 
Group '/' 
    Attributes:
        'attr1':  97 98 99 100 101 102 103 104 105 0 
        'attr2':  2x2 H5T_INTEGER
    Group '/g1' 
        Group '/g1/g1.1' 
            Dataset 'dset1.1.1' 
                Size:  10x10
                MaxSize:  10x10
                Datatype:   H5T_STD_I32BE (int32)
                ChunkSize:  []
                Filters:  none
                Attributes:
                    'attr1':  49 115 116 32 97 116 116 114 105 ... 
                    'attr2':  50 110 100 32 97 116 116 114 105 ... 
            Dataset 'dset1.1.2' 
                Size:  20
                MaxSize:  20
                Datatype:   H5T_STD_I32BE (int32)
                ChunkSize:  []
                Filters:  none
        Group '/g1/g1.2' 
            Group '/g1/g1.2/g1.2.1' 
                Link 'slink'
                    Type:  soft link
    Group '/g2' 
        Dataset 'dset2.1' 
            Size:  10
            MaxSize:  10
            Datatype:   H5T_IEEE_F32BE (single)
            ChunkSize:  []
            Filters:  none
        Dataset 'dset2.2' 
            Size:  5x3
            MaxSize:  5x3
            Datatype:   H5T_IEEE_F32BE (single)
            ChunkSize:  []
            Filters:  none
					.
					.
					.

To explore the hierarchical organization of an HDF5 file, use the h5info function. h5info returns a structure that contains various information about the HDF5 file, including the name of the file.

info = h5info('example.h5')
info = 

         Filename: 'matlabroot\matlab\toolbox\matlab\demos\example.h5'
          Name: '/'
        Groups: [4x1 struct]
      Datasets: []
     Datatypes: []
         Links: []
    Attributes: [2x1 struct]

By looking at the Groups and Attributes fields, you can see that the file contains four groups and two attributes. The Datasets, Datatypes, and Links fields are all empty, indicating that the root group does not contain any data sets, data types, or links. To explore the contents of the sample HDF5 file further, examine one of the structures in Groups. The following example shows the contents of the second structure in this field.

level2 = info.Groups(2)

level2 = 

          Name: '/g2'
        Groups: []
      Datasets: [2x1 struct]
     Datatypes: []
         Links: []
    Attributes: []

In the sample file, the group named /g2 contains two data sets. The following figure illustrates this part of the sample HDF5 file organization.

To get information about a data set, such as its name, dimensions, and data type, look at either of the structures returned in the Datasets field.

dataset1 = level2.Datasets(1)

dataset1 = 
      Filename: 'matlabroot\example.h5'
          Name: '/g2/dset2.1'
          Rank: 1
      Datatype: [1x1 struct]
          Dims: 10
       MaxDims: 10
        Layout: 'contiguous'
    Attributes: []
         Links: []
     Chunksize: []
     Fillvalue: []

Importing Data from an HDF5 File

To read data or metadata from an HDF5 file, use the h5read function. As arguments, specify the name of the HDF5 file and the name of the data set. (To read the value of an attribute, you must use h5readatt.)

To illustrate, this example reads the data set, /g2/dset2.1 from the HDF5 sample file example.h5.

data = h5read('example.h5','/g2/dset2.1')

data =

    1.0000
    1.1000
    1.2000
    1.3000
    1.4000
    1.5000
    1.6000
    1.7000
    1.8000
    1.9000

Mapping HDF5 Datatypes to MATLAB Datatypes

When the h5read function reads data from an HDF5 file into the MATLAB workspace, it maps HDF5 data types to MATLAB data types, as shown in the table below.

HDF5 Data Typeh5read Returns
Bit-fieldArray of packed 8-bit integers
FloatMATLAB single and double types, provided that they occupy 64 bits or fewer
Integer types, signed and unsignedEquivalent MATLAB integer types, signed and unsigned
OpaqueArray of uint8 values
ReferenceReturns the actual data pointed to by the reference, not the value of the reference.
Strings, fixed-length and variable lengthCell array of character vectors
EnumsCell array of character vectors, where each enumerated value is replaced by the corresponding member name
Compound1-by-1 struct array; the dimensions of the dataset are expressed in the fields of the structure.
ArraysArray of values using the same datatype as the HDF5 array. For example, if the array is of signed 32-bit integers, the MATLAB array will be of type int32.

The example HDF5 file included with MATLAB includes examples of all these datatypes.

For example, the data set /g3/string is a string.

h5disp('example.h5','/g3/string')
HDF5 example.h5 
Dataset 'string' 
    Size:  2
    MaxSize:  2
    Datatype:   H5T_STRING
        String Length: 3
        Padding: H5T_STR_NULLTERM
        Character Set: H5T_CSET_ASCII
        Character Type: H5T_C_S1
    ChunkSize:  []
    Filters:  none
    FillValue:  ''

Now read the data from the file, MATLAB returns it as a cell array of character vectors.

s = h5read('example.h5','/g3/string')

s = 

    'ab '
    'de '

>> whos s
  Name      Size            Bytes  Class    Attributes

  s         2x1               236  cell  

The compound data types are always returned as a 1-by-1 struct. The dimensions of the data set are expressed in the fields of the struct. For example, the data set /g3/compound2D is a compound datatype.

h5disp('example.h5','/g3/compound2D')
HDF5 example.h5 
Dataset 'compound2D' 
    Size:  2x3
    MaxSize:  2x3
    Datatype:   H5T_COMPOUND
        Member 'a':  H5T_STD_I8LE (int8)
        Member 'b':  H5T_IEEE_F64LE (double)
    ChunkSize:  []
    Filters:  none
    FillValue:  H5T_COMPOUND

Now read the data from the file, MATLAB returns it as a 1-by-1 struct.

data = h5read('example.h5','/g3/compound2D')

data = 

    a: [2x3 int8]
    b: [2x3 double]

Read an HDF5 Dataset with Dynamically Loaded Filters

In R2015a and later releases, MATLAB supports reading HDF5 datasets that are written using a third-party filter. To read the datasets using the dynamically loaded filter feature, you must:

  • Install the HDF5 filter plugin on your system as a shared library or a DLL.

  • Set the HDF5_PLUGIN_PATH environment variable to point to the installation.

For more information see, HDF5 Dynamically Loaded Filters.

Note

Writing HDF5 datasets using dynamically loaded filters is not supported.

Using the Low-Level HDF5 Functions to Import Data

MATLAB provides direct access to dozens of functions in the HDF5 library with low-level functions that correspond to the functions in the HDF5 library. In this way, you can access the features of the HDF5 library from MATLAB, such as reading and writing complex data types and using the HDF5 subsetting capabilities. For more information, see Using the MATLAB Low-Level HDF5 Functions to Export Data.