Note
The dataset
data type is not recommended. To
work with heterogeneous data, use the MATLAB®
table
data type instead. See MATLAB
table
documentation for more information.
This example shows how to create a dataset array from a numeric array existing in the MATLAB® workspace.
Load sample data.
load fisheriris
Two variables load into the workspace: meas
, a 150-by-4 numeric array, and species
, a 150-by-1 cell array of species labels.
Create a dataset array.
Use mat2dataset
to convert the numeric array, meas
, into a dataset array.
ds = mat2dataset(meas); ds(1:10,:)
ans = meas1 meas2 meas3 meas4 5.1 3.5 1.4 0.2 4.9 3 1.4 0.2 4.7 3.2 1.3 0.2 4.6 3.1 1.5 0.2 5 3.6 1.4 0.2 5.4 3.9 1.7 0.4 4.6 3.4 1.4 0.3 5 3.4 1.5 0.2 4.4 2.9 1.4 0.2 4.9 3.1 1.5 0.1
The array, meas
, has four columns, so the dataset array, ds
, has four variables. The default variable names are the array name, meas
, with column numbers appended.
You can specify your own variable or observation names using the name-value pair arguments VarNames
and ObsNames
, respectively.
If you use dataset
to convert a numeric array to a dataset array, by default, the resulting dataset array has one variable that is an array instead of separate variables for each column.
Examine the dataset array.
Return the size of the dataset array, ds
.
size(ds)
ans = 1×2
150 4
The dataset array, ds
, is the same size as the numeric array, meas
. Variable names and observation names do not factor into the size of a dataset array.
Explore dataset array metadata.
Return the metadata properties of the dataset array, ds
.
ds.Properties
ans = struct with fields:
Description: ''
VarDescription: {}
Units: {}
DimNames: {'Observations' 'Variables'}
UserData: []
ObsNames: {}
VarNames: {'meas1' 'meas2' 'meas3' 'meas4'}
You can also access the properties individually. For example, you can retrieve the variable names using ds.Properties.VarNames
.
Access data in a dataset array variable.
You can use variable names with dot indexing to access the data in a dataset array. For example, find the minimum value in the first variable, meas1
.
min(ds.meas1)
ans = 4.3000
Change variable names.
The four variables in ds
are actually measurements of sepal length, sepal width, petal length, and petal width. Modify the variable names to be more descriptive.
ds.Properties.VarNames = {'SLength','SWidth','PLength','PWidth'};
Add description.
you can add a description for the dataset array.
ds.Properties.Description = 'Fisher iris data';
ds.Properties
ans = struct with fields:
Description: 'Fisher iris data'
VarDescription: {}
Units: {}
DimNames: {'Observations' 'Variables'}
UserData: []
ObsNames: {}
VarNames: {'SLength' 'SWidth' 'PLength' 'PWidth'}
The dataset array properties are updated with the new variable names and description.
Add a variable to the dataset array.
The variable species
is a cell array of species labels. Add species
to the dataset array, ds
, as a nominal array named Species
. Display the first five observations in the dataset array.
ds.Species = nominal(species); ds(1:5,:)
ans = SLength SWidth PLength PWidth Species 5.1 3.5 1.4 0.2 setosa 4.9 3 1.4 0.2 setosa 4.7 3.2 1.3 0.2 setosa 4.6 3.1 1.5 0.2 setosa 5 3.6 1.4 0.2 setosa
The dataset array, ds
, now has the fifth variable, Species
.
This example shows how to create a dataset array from heterogeneous variables existing in the MATLAB® workspace.
Load sample data.
load carsmall
Create a dataset array.
Create a dataset array from a subset of the workspace variables.
ds = dataset(Origin,Acceleration,Cylinders,MPG); ds.Properties.VarNames(:)
ans = 4x1 cell
{'Origin' }
{'Acceleration'}
{'Cylinders' }
{'MPG' }
When creating the dataset array, you do not need to enter variable names. dataset
automatically uses the name of each workspace variable.
Notice that the dataset array, ds
, contains a collection of variables with heterogeneous data types. Origin
is a character array, and the other variables are numeric.
Examine a dataset array.
Display the first five observations in the dataset array.
ds(1:5,:)
ans = Origin Acceleration Cylinders MPG USA 12 8 18 USA 11.5 8 15 USA 11 8 18 USA 12 8 16 USA 10.5 8 17
Apply a function to a dataset array.
Use datasetfun
to return the data type of each variable in ds
.
varclass = datasetfun(@class,ds,'UniformOutput',false);
varclass(:)
ans = 4x1 cell
{'char' }
{'double'}
{'double'}
{'double'}
You can get additional information about the variables using summary(ds)
.
Modify a dataset array.
Cylinders
is a numeric variable that has values 4
, 6
, and 8
for the number of cylinders. Convert Cylinders
to a nominal array with levels four
, six
, and eight
.
Display the country of origin and number of cylinders for the first 15 cars.
ds.Cylinders = nominal(ds.Cylinders,{'four','six','eight'}); ds(1:15,{'Origin','Cylinders'})
ans = Origin Cylinders USA eight USA eight USA eight USA eight USA eight USA eight USA eight USA eight USA eight USA eight France four USA eight USA eight USA eight USA eight
The variable Cylinders
has a new data type.
dataset
| datasetfun
| mat2dataset
| nominal