signalDatastore

Datastore for collection of signals

Description

Use a signalDatastore object to manage a collection of in-memory data or signal files, where each individual file fits in memory, but the entire collection does not necessarily fit.

Creation

Description

sds = signalDatastore(data) creates a signal datastore with in-memory input signals contained in data.

example

sds = signalDatastore(location) creates a signal datastore based on a collection of MAT-files in location.

example

sds = signalDatastore(___,Name,Value) specifies additional properties using one or more name-value pair arguments.

Input Arguments

expand all

In-memory input data, specified as vectors, matrices, timetables, or cell arrays. Each element of data is a member that is output by the datastore on each call to read.

Example: {randn(100,1); randn(120,3); randn(135,2); randn(100,1)}

Files or folders included in the datastore, specified as a path or a DsFileSet object.

  • path — Specify the path as a character vector, cell array of character vectors, string scalar, or a string array, containing the location of files or folders that are local or remote.

    • Local files or folders — Specify location as a local path to files or folders. If the files are not in the current folder, then local path must specify full or relative paths. Files within subfolders of the specified folder are not automatically included in the datastore. You can use the wildcard character (*) when specifying the local path. This character specifies that the datastore include all matching files or all files in the matching folders.

    • Remote files or folders — Specify location to be the full paths of the files or folders as a uniform resource locator (URL) of the form hdfs:///path_to_file. For more information, see Work with Remote Data.

  • DsFileSet object — You also can specify location as a DsFileSet object. For more information, see matlab.io.datastore.DsFileSet.

When location represents a folder, the datastore includes only supported file formats and ignores any other format. To specify a custom list of file extensions to include in your datastore, see the FileExtensions property.

Example: 'whale.mat'

Example: '../dir/data/signal.mat'

Data Types: char | string | cell

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: sds = signalDatastore('C:\dir\signaldata','FileExtensions','.csv')

Subfolder inclusion flag, specified as the comma-separated pair consisting of 'IncludeSubfolders' and true or false. Specify true to include all files and subfolders within each folder or false to include only the files within each folder.

Example: 'IncludeSubfolders',true

Data Types: logical | double

Signal file extensions, specified as the comma-separated pair consisting of 'FileExtensions' and a string scalar, string array, character vector, or cell array of character vectors.

If no read function is specified, 'FileExtensions' can only be set to .mat to read MAT-files, or to .csv to read CSV files. If 'FileExtensions' is omitted, it defaults to .mat if there are MAT-files in the specified location, otherwise 'FileExtensions' defaults to .csv if there are CSV files in the specified location.

If neither MAT-files nor CSV files are present, signalDatastore errors out with the default read function. Specify a custom read using ReadFcn function to read files of any other type.

When you do not specify a file extension, the signalDatastore needs to parse the files to decide the default extension to read. Specify an extension to avoid the parsing time.

Example: 'FileExtensions','.csv'

Data Types: string | char | cell

In addition to these name-value pairs, you also can specify any of the properties on this page as name-value pairs, except for the Files property.

Properties

expand all

In-Memory Data

Member names, specified as a cell array. The length of the member names for the input data should equal the length of the data cell array. This property applies only when the datastore contains in-memory data.

Signal member data, specified as a string scalar or a string array. The length of the member names for the input data should equal the length of the data cell array. This property applies only when the datastore contains in-memory data.

File Data

Files included in the datastore, specified as a cell array of strings or character vectors. Each character vector in the cell array represents the full path to a file. The location argument in the signalDatastore defines Files when the datastore is created. This property applies only when the datastore contains file data.

Data Types: string | char | cell

Function that reads data, specified as a function handle. The function must take a file name as input, and then it outputs the corresponding data. For example, if customreader is the specified function to read the data, then it must have a signature similar to this:

function [data,info] = customreader(filename)
...
end

The signal data is output in the data variable. The info variable is a structure containing time information and other relevant information from the file.

Example: @customreader

Data Types: function_handle

Alternate file system root paths, specified as the comma-separated pair consisting of 'AlternateFileSystemRoots' and a string vector or a cell array. Use 'AlternateFileSystemRoots' when you create a datastore on a local machine, but need to access and process the data on another machine (possibly of a different operating system). Also, when processing data using the Parallel Computing Toolbox™ and the MATLAB® Parallel Server™, and the data is stored on your local machines with a copy of the data available on different platform cloud or cluster machines, you must use 'AlternateFileSystemRoots' to associate the root paths.

  • To associate a set of root paths that are equivalent to one another, specify 'AlternateFileSystemRoots' as a string vector. For example,

    ["Z:\datasets","/mynetwork/datasets"]

  • To associate multiple sets of root paths that are equivalent for the datastore, specify 'AlternateFileSystemRoots' as a cell array containing multiple rows where each row represents a set of equivalent root paths. Specify each row in the cell array as either a string vector or a cell array of character vectors. For example:

    • Specify 'AlternateFileSystemRoots' as a cell array of string vectors.

      {["Z:\datasets", "/mynetwork/datasets"];...
       ["Y:\datasets", "/mynetwork2/datasets","S:\datasets"]}

    • Alternatively, specify 'AlternateFileSystemRoots' as a cell array of cell array of character vectors.

      {{'Z:\datasets','/mynetwork/datasets'};...
       {'Y:\datasets', '/mynetwork2/datasets','S:\datasets'}}

The value of 'AlternateFileSystemRoots' must satisfy these conditions:

  • Contains one or more rows, where each row specifies a set of equivalent root paths.

  • Each row specifies multiple root paths and each root path must contain at least two characters.

  • Root paths are unique and are not subfolders of one another.

  • Contains at least one root path entry that points to the location of the files.

For more information, see Set Up Datastore for Processing on Different Machines or Clusters.

Example: ["Z:\datasets","/mynetwork/datasets"]

Data Types: string | cell

Names of variables in signal files, specified as a string scalar or vector of unique names. Use this property when your files contain more than one variable and you want to specify the names of the variables that hold the signal data you want to read.

  • When the property value is a string scalar, signalDatastore returns data contained in the specified variable.

  • When the property value is a string vector, signalDatastore returns a cell array with the data contained in the specified variables.

Note

To determine the name of the first variable in a file, signalDatastore follows these steps:

  • For MAT-files:

    s = load(fileName);
    varNames = fieldnames(s);
    firstVar = s.(varNames{1});

  • For CSV files:

    opts = detectImportOptions(fileName,'PreserveVariableNames',true);
    varNames = opts.VariableNames;
    firstVar = string(varNames{1});

This property applies only when datastore contains file data and the default read function is used.

Name of the variable holding the sample rate, specified as a string scalar. This property applies only when datastore contains file data.

Name of the variable holding the sample time value, specified as a string scalar. This property applies only when datastore contains file data.

Name of the variable holding the time values vector, specified as a string scalar. This property applies only when datastore contains file data.

Note

'SampleRateVariableName', 'SampleTimeVariableName', and'TimeValuesVariableName' are mutually exclusive. Use these properties when your files contain a variable that holds the time information of the signal data. If not specified, signalDatastore assumes that signal data has no time information. These properties are not valid if a custom read function is specified.

In-Memory and File Data

Sample rate values, specified as a positive real scalar or vector.

  • Set the value of SampleRate to a scalar to specify the same sample rate for all signals in the signalDatastore.

  • Set the value of SampleRate to a vector to specify a different sample rate for each signal in the signalDatastore.

The number of elements in the vector must equal the number of elements in the signalDatastore.

Sample time values, specified as a positive scalar, a vector, a duration scalar, or a duration vector.

  • Set the value of SampleTime to a scalar to specify the same sample time for all signals in the signalDatastore.

  • Set the value of SampleTime to a vector to specify a different sample time for each signal in the signalDatastore.

The number of elements in the vector must equal the number of elements in the signalDatastore.

Time values, specified as a vector, a duration vector, a matrix, or a cell array.

  • Set TimeValues to a numeric or duration vector to specify the same time values for all signals in the signalDatastore. The vector must have the same length as all the signals in the set.

  • Set TimeValues to a numeric or duration matrix or cell array to specify that each signal of the signalDatastore has signals with the same time values, but the time values differ from signal to signal.

    • If TimeValues is a matrix, then the number of columns equal the number of members of the signalDatastore. All signals in the datastore must have a length equal to the number of rows of the matrix.

    • If TimeValues is a cell array, then the number of vectors equal the number of members of the signalDatastore. All signals in a member must have a length equal to the number of elements of the corresponding vector in the cell array.

Maximum number of signal files returned by read, specified as a positive real scalar. If you set the ReadSize property to n, such that n > 1, each time you call the read function, the function reads:

  • The first variable of the first n files, if sds contains file data.

  • The first n members, if sds contains in-memory data.

The output of read is a cell array of signal data when ReadSize > 1.

Object Functions

readRead next consecutive signal observation
readallRead all signals from datastore
previewRead first signal observation from datastore for preview
shuffleShuffle signals in signal datastore
subsetCreate datastore with subset of signals
partitionPartition signal datastore and return partitioned portion
numpartitionsReturn estimate for reasonable number of partitions for parallel processing
resetReset datastore to initial state
progress Determine how much data has been read
hasdataDetermine if data is available to read
transformTransform datastore
combineCombine data from multiple datastores
isPartitionableDetermine whether datastore is partitionable
isShuffleableDetermine whether datastore is shuffleable

Note

isPartitionable and isShuffleable return true by default for signalDatastore. You can test if the output of combine and transform are partitionable or shuffleable using the two functions.

Examples

collapse all

Create a signal datastore to iterate through the elements of an in-memory cell array of signal data. The data consists of a sinusoidally modulated linear chirp, a concave quadratic chirp, and a voltage controlled oscillator. The signals are sampled at 3000 Hz.

fs = 3000;
t = 0:1/fs:3-1/fs;
data = {chirp(t,300,t(end),800).*exp(2j*pi*10*cos(2*pi*2*t)); ...
        2*chirp(t,200,t(end),1000,'quadratic',[],'concave'); ...
        vco(sin(2*pi*t),[0.1 0.4]*fs,fs)};
sds = signalDatastore(data,'SampleRate',fs);

While the datastore has data, read each observation from the signal datastore and plot the short-time Fourier transform.

plotID = 1;
while hasdata(sds)
    [dataOut,info] = read(sds);
    subplot(3,1,plotID)
    stft(dataOut,info.SampleRate)
    plotID = plotID + 1;
end

Specify the path to the sample signals included with Signal Processing Toolbox™.

folder = fullfile(matlabroot,'examples','signal','data');

Create and display a signal datastore that points to the specified folder.

sds = signalDatastore(folder)
sds = 
  signalDatastore with properties:

                       Files:{
                             ' .../devel/bat/BR2020bd/build/matlab/examples/signal/data/GANModel.mat';
                             ' .../devel/bat/BR2020bd/build/matlab/examples/signal/data/HeartRates.mat';
                             ' .../devel/bat/BR2020bd/build/matlab/examples/signal/data/Hello.mat'
                              ... and 26 more
                             }
    AlternateFileSystemRoots: [0x0 string]
                    ReadSize: 1

Specify the file path to the signal samples included with Signal Processing Toolbox™.

folder = fullfile(matlabroot,'examples','signal','data');

Create a signal datastore that points to the .csv files in the specified folder.

sds = signalDatastore(folder,'FileExtensions','.csv')
sds = 
  signalDatastore with properties:

                       Files:{
                             ' .../devel/bat/BR2020bd/build/matlab/examples/signal/data/tremor.csv'
                             }
    AlternateFileSystemRoots: [0x0 string]
                    ReadSize: 1

Specify the path to four example files included with Signal Processing Toolbox™.

folder = fullfile(matlabroot,'examples','signal','data', ...
         ["INR.mat","relatedsig.mat","spots_num.mat","voice.mat"]);    

Set the ReadSize property to 2 to read data from two files at a time. Each read returns a cell array where the first cell contains the first variable of the first file read, and the second cell contains the first variable from the second file. While the datastore has data, display the names of the variables read in each read.

sds = signalDatastore(folder,'ReadSize',2);
while hasdata(sds)
    [data,info] = read(sds);
    fprintf('Variable Name:\t%s\n',info.SignalVariableNames)
end
Variable Name:	Date
Variable Name:	s1
Variable Name:	year
Variable Name:	fs

Specify the path to three signals included with Signal Processing Toolbox™.

  • The strong.mat file contains three variables: her, him and fs.

  • The slogan.mat file contains three variables: hotword, phrase and fs.

  • The Ring.mat file contains two variables: y and Fs.

fld = ["strong.mat","slogan.mat","Ring.mat"];
folder = fullfile(matlabroot,'examples','signal','data',fld);

Create a signal datastore that points to the specified folder. Each file contains multiple variables of different names. The scalar in each file represents a sample rate. Define a custom read function that reads all the variables in the file as a structure and returns the variable in dataOut and information about the variables in infoOut. The SampleRate field of infoOut contains the scalar contained in each file, and dataOut contains the variables read from each file.

function [dataOut,infoOut] =   MyCustomRead(filename)
    fText = importdata(filename);
    value = struct2cell(fText);
    dataOut = {};
    for i = 1:length(value)
        if isscalar(value{i}) == 1
            infoOut.SampleRate = value{i};
        else
            dataOut{end+1} = value{i};
        end
    end
end
sds = signalDatastore(folder,'ReadFcn',@MyCustomRead);

While the datastore has unread files, read from the datastore and compute the short-time Fourier transforms of the signals.

while hasdata(sds)
    [data,infoOut] = read(sds);
    fs = infoOut.SampleRate;
    figure
    for i = 1:length(data)
        if length(data)>1
        subplot(2,1,i)
        end
        stft(data{i},fs)   
    end
end

Specify the path to example files included with Signal Processing Toolbox™. Each file contains two signals and a random sample rate fs ranging from 3000 to 4000 Hz.

  • The first signal x1 consists of a set of pulses of decreasing duration, separated by regions of oscillating amplitude and fluctuating frequency with an increasing trend.

  • The second signal x2 is a chirp with sinusoidally varying frequency content.

folder = fullfile(matlabroot,'examples','signal','data','dataset');

Create a signal datastore that points to the specified folder and set the names of the signal variables and sample rate. While the datastore has data, read each observation and visualize their spectrograms.

sds = signalDatastore(folder,'SignalVariableNames',['x1';'x2'],'SampleRateVariableName','fs');
plotID = 1;
while hasdata(sds) 
    [data,info] = read(sds);
    subplot(2,2,plotID)
    pspectrum(data{1},info.SampleRate,'OverlapPercent',50,'Leakage',1,'TwoSided',true,'spectrogram')
    subplot(2,2,plotID+1)
    pspectrum(data{2},info.SampleRate,'OverlapPercent',90,'Leakage',0.4,'TwoSided',true,'spectrogram')
    plotID = plotID + 2;
end

Introduced in R2020a