Create a Dataset Array from Workspace Variables

Note

The dataset data type is not recommended. To work with heterogeneous data, use the MATLAB® table data type instead. See MATLAB table documentation for more information.

Create a Dataset Array from a Numeric Array

This example shows how to create a dataset array from a numeric array existing in the MATLAB® workspace.

Load sample data.

load fisheriris

Two variables load into the workspace: meas, a 150-by-4 numeric array, and species, a 150-by-1 cell array of species labels.

Create a dataset array.

Use mat2dataset to convert the numeric array, meas, into a dataset array.

ds = mat2dataset(meas);
ds(1:10,:)
ans = 
    meas1    meas2    meas3    meas4
    5.1      3.5      1.4      0.2  
    4.9        3      1.4      0.2  
    4.7      3.2      1.3      0.2  
    4.6      3.1      1.5      0.2  
      5      3.6      1.4      0.2  
    5.4      3.9      1.7      0.4  
    4.6      3.4      1.4      0.3  
      5      3.4      1.5      0.2  
    4.4      2.9      1.4      0.2  
    4.9      3.1      1.5      0.1  

The array, meas, has four columns, so the dataset array, ds, has four variables. The default variable names are the array name, meas, with column numbers appended.

You can specify your own variable or observation names using the name-value pair arguments VarNames and ObsNames, respectively.

If you use dataset to convert a numeric array to a dataset array, by default, the resulting dataset array has one variable that is an array instead of separate variables for each column.

Examine the dataset array.

Return the size of the dataset array, ds.

size(ds)
ans = 1×2

   150     4

The dataset array, ds, is the same size as the numeric array, meas. Variable names and observation names do not factor into the size of a dataset array.

Explore dataset array metadata.

Return the metadata properties of the dataset array, ds.

ds.Properties
ans = struct with fields:
       Description: ''
    VarDescription: {}
             Units: {}
          DimNames: {'Observations'  'Variables'}
          UserData: []
          ObsNames: {}
          VarNames: {'meas1'  'meas2'  'meas3'  'meas4'}

You can also access the properties individually. For example, you can retrieve the variable names using ds.Properties.VarNames.

Access data in a dataset array variable.

You can use variable names with dot indexing to access the data in a dataset array. For example, find the minimum value in the first variable, meas1.

min(ds.meas1)
ans = 4.3000

Change variable names.

The four variables in ds are actually measurements of sepal length, sepal width, petal length, and petal width. Modify the variable names to be more descriptive.

ds.Properties.VarNames = {'SLength','SWidth','PLength','PWidth'};

Add description.

you can add a description for the dataset array.

ds.Properties.Description = 'Fisher iris data';
ds.Properties
ans = struct with fields:
       Description: 'Fisher iris data'
    VarDescription: {}
             Units: {}
          DimNames: {'Observations'  'Variables'}
          UserData: []
          ObsNames: {}
          VarNames: {'SLength'  'SWidth'  'PLength'  'PWidth'}

The dataset array properties are updated with the new variable names and description.

Add a variable to the dataset array.

The variable species is a cell array of species labels. Add species to the dataset array, ds, as a nominal array named Species. Display the first five observations in the dataset array.

ds.Species = nominal(species);
ds(1:5,:)
ans = 
    SLength    SWidth    PLength    PWidth    Species
    5.1        3.5       1.4        0.2       setosa 
    4.9          3       1.4        0.2       setosa 
    4.7        3.2       1.3        0.2       setosa 
    4.6        3.1       1.5        0.2       setosa 
      5        3.6       1.4        0.2       setosa 

The dataset array, ds, now has the fifth variable, Species.

Create Dataset Array from Heterogeneous Workspace Variables

This example shows how to create a dataset array from heterogeneous variables existing in the MATLAB® workspace.

Load sample data.

load carsmall

Create a dataset array.

Create a dataset array from a subset of the workspace variables.

ds = dataset(Origin,Acceleration,Cylinders,MPG);
ds.Properties.VarNames(:)
ans = 4x1 cell
    {'Origin'      }
    {'Acceleration'}
    {'Cylinders'   }
    {'MPG'         }

When creating the dataset array, you do not need to enter variable names. dataset automatically uses the name of each workspace variable.

Notice that the dataset array, ds, contains a collection of variables with heterogeneous data types. Origin is a character array, and the other variables are numeric.

Examine a dataset array.

Display the first five observations in the dataset array.

ds(1:5,:)
ans = 
    Origin     Acceleration    Cylinders    MPG
    USA          12            8            18 
    USA        11.5            8            15 
    USA          11            8            18 
    USA          12            8            16 
    USA        10.5            8            17 

Apply a function to a dataset array.

Use datasetfun to return the data type of each variable in ds.

varclass = datasetfun(@class,ds,'UniformOutput',false);
varclass(:)
ans = 4x1 cell
    {'char'  }
    {'double'}
    {'double'}
    {'double'}

You can get additional information about the variables using summary(ds).

Modify a dataset array.

Cylinders is a numeric variable that has values 4, 6, and 8 for the number of cylinders. Convert Cylinders to a nominal array with levels four, six, and eight.

Display the country of origin and number of cylinders for the first 15 cars.

ds.Cylinders = nominal(ds.Cylinders,{'four','six','eight'});
ds(1:15,{'Origin','Cylinders'})
ans = 
    Origin     Cylinders
    USA        eight    
    USA        eight    
    USA        eight    
    USA        eight    
    USA        eight    
    USA        eight    
    USA        eight    
    USA        eight    
    USA        eight    
    USA        eight    
    France     four     
    USA        eight    
    USA        eight    
    USA        eight    
    USA        eight    

The variable Cylinders has a new data type.

See Also

| | |

Related Examples

More About