Access Data in Dataset Array Variables

This example shows how to work with dataset array variables and their data.

Access variables by name.

You can access variable data, or select a subset of variables, by using variable (column) names and dot indexing. Load a sample dataset array. Display the names of the variables in hospital.

load hospital 
hospital.Properties.VarNames(:)
ans = 7x1 cell
    {'LastName'     }
    {'Sex'          }
    {'Age'          }
    {'Weight'       }
    {'Smoker'       }
    {'BloodPressure'}
    {'Trials'       }

The dataset array has 7 variables (columns) and 100 observations (rows). You can double-click hospital in the Workspace window to view the dataset array in the Variables editor.

Plot histogram.

Plot a histogram of the data in the variable Weight.

figure
histogram(hospital.Weight)

The histogram shows that the weight distribution is bimodal.

Plot data grouped by category.

Draw box plots of Weight grouped by the values in Sex (Male and Female). That is, use the variable Sex as a grouping variable.

figure
boxplot(hospital.Weight,hospital.Sex)

The box plot suggests that gender accounts for the bimodality in weight.

Select a subset of variables.

Create a new dataset array with only the variables LastName, Sex, and Weight. You can access the variables by name or column number.

ds1 = hospital(:,{'LastName','Sex','Weight'});
ds2 = hospital(:,[1,2,4]);

The dataset arrays ds1 and ds2 are equivalent. Use parentheses ( ) when indexing dataset arrays to preserve the data type; that is, to create a dataset array from a subset of a dataset array. You can also use the Variables editor to create a new dataset array from a subset of variables and observations.

Convert the variable data type.

Convert the data type of the variable Smoker from logical to nominal with labels No and Yes.

hospital.Smoker = nominal(hospital.Smoker,{'No','Yes'});
class(hospital.Smoker)
ans = 
'nominal'

Explore data.

Display the first 10 elements of Smoker.

hospital.Smoker(1:10)
ans = 10x1 nominal
     Yes 
     No 
     No 
     No 
     No 
     No 
     Yes 
     No 
     No 
     No 

If you want to change the level labels in a nominal array, use setlabels.

Add variables.

The variable BloodPressure is a 100-by-2 array. The first column corresponds to systolic blood pressure, and the second column to diastolic blood pressure. Separate this array into two new variables, SysPressure and DiaPressure.

hospital.SysPressure = hospital.BloodPressure(:,1);
hospital.DiaPressure = hospital.BloodPressure(:,2);
hospital.Properties.VarNames(:)
ans = 9x1 cell
    {'LastName'     }
    {'Sex'          }
    {'Age'          }
    {'Weight'       }
    {'Smoker'       }
    {'BloodPressure'}
    {'Trials'       }
    {'SysPressure'  }
    {'DiaPressure'  }

The dataset array, hospital, has two new variables.

Search for variables by name.

Use regexp to find variables in hospital with 'Pressure' in their name. Create a new dataset array containing only these variables.

bp = regexp(hospital.Properties.VarNames,'Pressure');
bpIdx = cellfun(@isempty,bp);
bpData = hospital(:,~bpIdx);
bpData.Properties.VarNames(:)
ans = 3x1 cell
    {'BloodPressure'}
    {'SysPressure'  }
    {'DiaPressure'  }

The new dataset array, bpData, contains only the blood pressure variables.

Delete variables.

Delete the variable BloodPressure from the dataset array, hospital.

hospital.BloodPressure = [];
hospital.Properties.VarNames(:)
ans = 8x1 cell
    {'LastName'   }
    {'Sex'        }
    {'Age'        }
    {'Weight'     }
    {'Smoker'     }
    {'Trials'     }
    {'SysPressure'}
    {'DiaPressure'}

The variable BloodPressure is no longer in the dataset array.

See Also

Related Examples

More About