h5create

Create HDF5 dataset

Description

example

h5create(filename,ds,sz) creates a dataset ds whose name includes its full location in the HDF5 file filename, and with a size specified by sz.

example

h5create(filename,ds,sz,Name,Value) specifies one or more optional name-value pair arguments.

For example, 'ChunkSize',[5 5] specifies 5-by-5 chunks of the dataset that can be stored individually in the HDF5 file.

Examples

collapse all

Create a fixed-size 100-by-200-by-300 dataset 'myDataset' whose full path is specified as '/g1/g2/myDataset'.

h5create('myfile.h5','/g1/g2/myDataset',[100 200 300])

Write data to 'myDataset'. Since the dimensions of 'myDataset' are fixed, the amount of data to be written to it must match its size.

mydata = ones(100,200,300);
h5write('myfile.h5','/g1/g2/myDataset',mydata)
h5disp('myfile.h5')
HDF5 myfile.h5 
Group '/' 
    Group '/g1' 
        Group '/g1/g2' 
            Dataset 'myDataset' 
                Size:  100x200x300
                MaxSize:  100x200x300
                Datatype:   H5T_IEEE_F64LE (double)
                ChunkSize:  []
                Filters:  none
                FillValue:  0.000000

Create a single-precision 1000-by-2000 dataset and apply the highest level of compression. Chunk storage must be used when applying HDF5 compression.

h5create('myfile.h5','/myDataset2',[1000 2000],'Datatype','single', ...
          'ChunkSize',[50 80],'Deflate',9)

Display the contents of the entire HDF5 file.

h5disp('myfile.h5')
HDF5 myfile.h5 
Group '/' 
    Dataset 'myDataset2' 
        Size:  1000x2000
        MaxSize:  1000x2000
        Datatype:   H5T_IEEE_F32LE (single)
        ChunkSize:  50x80
        Filters:  deflate(9)
        FillValue:  0.000000

Create a two-dimensional dataset '/myDataset3' that is unlimited along the second dimension. ChunkSize must be specified to set any dimension of the dataset to Inf.

h5create('myfile.h5','/myDataset3',[200 Inf],'ChunkSize',[20 20])

Write data to '/myDataset3'. You can write data of any size along the second dimension to '/myDataset3', since its second dimension is unlimited.

mydata = rand(200,500);
h5write('myfile.h5','/myDataset3',mydata,[1 1],[200 500])

Display the entire contents of the HDF5 file.

h5disp('myfile.h5')
HDF5 myfile.h5 
Group '/' 
    Dataset 'myDataset3' 
        Size:  200x500
        MaxSize:  200xInf
        Datatype:   H5T_IEEE_F64LE (double)
        ChunkSize:  20x20
        Filters:  none
        FillValue:  0.000000

Input Arguments

collapse all

File name, specified as a character vector or string scalar containing the name of an HDF5 file.

Depending on the location you are writing to, filename can take on one of these forms.

Location

Form

Current folder

To write to the current folder, specify the name of the file in filename.

Example: 'myFile.h5'

Other folders

To write to a folder different from the current folder, specify the full or relative path name in filename.

Example: 'C:\myFolder\myFile.h5'

Example: 'myFolder\myFile.h5'

Remote Location

To write to a remote location, filename must contain the full path of the file specified as a uniform resource locator (URL) of the form:

scheme_name://path_to_file/my_file.ext

Based on your remote location, scheme_name can be one of the values in this table.

Remote Locationscheme_name
Amazon S3™s3
Windows Azure® Blob Storagewasb, wasbs

For more information, see Work with Remote Data.

Example: 's3://bucketname/path_to_file/myFile.h5'

  • If filename does not already exist, h5create creates it.

  • If you specify an existing HDF5 file name and a new dataset name, then h5create will add the new dataset to the existing HDF5 file.

Dataset name, specified as a character vector or string scalar containing the full path name of the dataset to be created. If you specify intermediate groups in the dataset name and they did not previously exist, then h5create creates them.

Dataset size, specified as a row vector. To specify an unlimited dimension, specify the corresponding element of sz as Inf.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Deflate','9'

Datatype of the dataset, specified as the comma-separated pair consisting of 'Datatype' and any of the following MATLAB® datatypes.

  • 'double'

  • 'single'

  • 'uint64'

  • 'int64'

  • 'uint32'

  • 'int32'

  • 'uint16'

  • 'int16'

  • 'uint8'

  • 'int8'

  • 'string'

Chunk size, specified as the comma-separated pair consisting of 'ChunkSize' and a row vector containing the dimensions of the chunk. The length of 'ChunkSize' must equal the length of the dataset size sz. ChunkSize must be specified to set any dimension in sz to Inf.

gzip compression level, specified as a numeric value between 0 and 9, where 0 is the lowest compression level and 9 is the highest.

Fill value for missing data in numeric datasets. FillValue must be a numeric value.

32-bit Fletcher checksum filter, specified as the comma-separated pair consisting of 'Fletcher32' and a numeric or logical 1 (true) or 0 (false). A Fletcher checksum filter is designed to verify that the transferred data in a file is error-free.

Shuffle filter, specified as the comma-separated pair consisting of 'Shuffle' and a numeric or logical 1 (true) or 0 (false). A shuffle filter is an algorithm designed to improve the compression ratio by rearranging the byte order of data stored in memory.

Text encoding, specified as the comma-separated pair consisting of 'TextEncoding' and one of these values:

  • 'UTF-8' — Represent characters using UTF-8 encoding.

  • 'system' — Represent characters as bytes using the system encoding (not recommended).

Limitations

  • h5create does not support creating files stored remotely in HDFS™.

More About

collapse all

Chunk Storage in HDF5

Chunk storage refers to a method of storing a dataset in memory by dividing it into smaller pieces of data known as "chucks". Chunking a dataset can improve performance when operating on a subset of the dataset, since the chunks can be read and written to the HDF5 file individually.

Introduced in R2011a