This example shows how to work with dataset array variables and their data.
You can access variable data, or select a subset of variables, by using variable (column) names and dot indexing. Load a sample dataset array. Display the names of the variables in hospital
.
load hospital
hospital.Properties.VarNames(:)
ans = 7x1 cell
{'LastName' }
{'Sex' }
{'Age' }
{'Weight' }
{'Smoker' }
{'BloodPressure'}
{'Trials' }
The dataset array has 7 variables (columns) and 100 observations (rows). You can double-click hospital
in the Workspace window to view the dataset array in the Variables editor.
Plot a histogram of the data in the variable Weight
.
figure histogram(hospital.Weight)
The histogram shows that the weight distribution is bimodal.
Draw box plots of Weight
grouped by the values in Sex
(Male and Female). That is, use the variable Sex
as a grouping variable.
figure boxplot(hospital.Weight,hospital.Sex)
The box plot suggests that gender accounts for the bimodality in weight.
Create a new dataset array with only the variables LastName
, Sex
, and Weight
. You can access the variables by name or column number.
ds1 = hospital(:,{'LastName','Sex','Weight'}); ds2 = hospital(:,[1,2,4]);
The dataset arrays ds1
and ds2
are equivalent. Use parentheses ( )
when indexing dataset arrays to preserve the data type; that is, to create a dataset array from a subset of a dataset array. You can also use the Variables editor to create a new dataset array from a subset of variables and observations.
Convert the data type of the variable Smoker
from logical to nominal with labels No
and Yes
.
hospital.Smoker = nominal(hospital.Smoker,{'No','Yes'}); class(hospital.Smoker)
ans = 'nominal'
Display the first 10 elements of Smoker
.
hospital.Smoker(1:10)
ans = 10x1 nominal
Yes
No
No
No
No
No
Yes
No
No
No
If you want to change the level labels in a nominal array, use setlabels
.
The variable BloodPressure
is a 100-by-2 array. The first column corresponds to systolic blood pressure, and the second column to diastolic blood pressure. Separate this array into two new variables, SysPressure
and DiaPressure
.
hospital.SysPressure = hospital.BloodPressure(:,1); hospital.DiaPressure = hospital.BloodPressure(:,2); hospital.Properties.VarNames(:)
ans = 9x1 cell
{'LastName' }
{'Sex' }
{'Age' }
{'Weight' }
{'Smoker' }
{'BloodPressure'}
{'Trials' }
{'SysPressure' }
{'DiaPressure' }
The dataset array, hospital
, has two new variables.
Use regexp
to find variables in hospital
with 'Pressure'
in their name. Create a new dataset array containing only these variables.
bp = regexp(hospital.Properties.VarNames,'Pressure');
bpIdx = cellfun(@isempty,bp);
bpData = hospital(:,~bpIdx);
bpData.Properties.VarNames(:)
ans = 3x1 cell
{'BloodPressure'}
{'SysPressure' }
{'DiaPressure' }
The new dataset array, bpData
, contains only the blood pressure variables.
Delete the variable BloodPressure
from the dataset array, hospital
.
hospital.BloodPressure = []; hospital.Properties.VarNames(:)
ans = 8x1 cell
{'LastName' }
{'Sex' }
{'Age' }
{'Weight' }
{'Smoker' }
{'Trials' }
{'SysPressure'}
{'DiaPressure'}
The variable BloodPressure
is no longer in the dataset array.