Dataset Arrays

Note

The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

What Are Dataset Arrays?

Statistics and Machine Learning Toolbox™ has dataset arrays for storing variables with heterogeneous data types. For example, you can combine numeric data, logical data, cell arrays of character vectors, and categorical arrays in one dataset array variable.

Within a dataset array, each variable (column) must be one homogeneous data type, but the different variables can be of heterogeneous data types. A dataset array is usually interpreted as a set of variables measured on many units of observation. That is, each row in a dataset array corresponds to an observation, and each column to a variable. In this sense, a dataset array organizes data like a typical spreadsheet.

Dataset arrays are a unique data type, with a corresponding set of valid operations. Even if a dataset array contains only numeric variables, you cannot operate on the dataset array like a numeric variable. The valid operations for dataset arrays are the methods of the dataset class.

Dataset Array Conversion

You can create a dataset array by combining variables that exist in the MATLAB workspace, or directly importing data from a file, such as a text file or spreadsheet. This table summarizes the functions you can use to create dataset arrays.

Data SourceConversion to Dataset Array
Data from a filedataset
Heterogeneous collection of workspace variablesdataset
Numeric arraymat2dataset
Cell arraycell2dataset
Structure arraystruct2dataset
Tabletable2dataset

You can export dataset arrays to text or spreadsheet files using export. To convert a dataset array to a cell array or structure array, use dataset2cell or dataset2struct. To convert a dataset array to a table, use dataset2table.

Dataset Array Properties

In addition to storing data in a dataset array, you can store metadata such as:

  • Variable and observation names

  • Data descriptions

  • Units of measurement

  • Variable descriptions

This information is stored as dataset array properties. For a dataset array named ds, you can view the dataset array metadata by entering ds.Properties at the command line. You can access a specific property, such as variable names—property VarNames—using ds.Properties.VarNames. You can both retrieve and modify property values using this syntax.

Variable and observation names are included in the display of a dataset array. Variable names display across the top row, and observation names, if present, appear in the first column. Note that variable and observation names do not affect the size of a dataset array.

See Also

| | | | | | | |

Related Topics