This example shows how to add and delete observations in a dataset array. You can also edit dataset arrays using the Variables editor.
Import the data from the first worksheet in
hospitalSmall.xlsx
into a dataset array.
ds = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx')); size(ds)
ans = 14 6
The dataset array, ds
, has 14 observations (rows) and 6
variables (columns).
The second worksheet in hospitalSmall.xlsx
has
additional patient data. Append the observations in this spreadsheet
to the end of ds
.
ds2 = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx'),'Sheet',2); dsNew = [ds;ds2]; size(dsNew)
ans = 22 6
The dataset array dsNew
has 22 observations.
In order to vertically concatenate two dataset arrays, both arrays
must have the same number of variables, with the same variable names.
If you want to append new observations stored in a cell array, first convert the cell array to a dataset array, and then concatenate the dataset arrays.
cellObs = {'id','name','sex','age','wgt','smoke'; 'YQR-965','BAKER','M',36,160,0; 'LFG-497','WALL' ,'F',28,125,1; 'KSD-003','REED' ,'M',32,187,0}; dsNew = [dsNew;cell2dataset(cellObs)]; size(dsNew)
ans = 25 6
You can also append new observations stored in a structure. Convert the structure to a dataset array, and then concatenate the dataset arrays.
structObs(1,1).id = 'GHK-842'; structObs(1,1).name = 'GEORGE'; structObs(1,1).sex = 'M'; structObs(1,1).age = 45; structObs(1,1).wgt = 182; structObs(1,1).smoke = 1; structObs(2,1).id = 'QRH-308'; structObs(2,1).name = 'BAILEY'; structObs(2,1).sex = 'F'; structObs(2,1).age = 29; structObs(2,1).wgt = 120; structObs(2,1).smoke = 0; dsNew = [dsNew;struct2dataset(structObs)]; size(dsNew)
ans = 27 6
Use unique
to delete any observations in
a dataset array that are duplicated.
dsNew = unique(dsNew); size(dsNew)
ans = 21 6
One duplicated observation is deleted.
Delete observations 18, 20, and 21 from the dataset array.
dsNew([18,20,21],:) = []; size(dsNew)
ans = 18 6
The dataset array has only 18 observations now.
First, specify the variable of identifiers, id
,
as observation names. Then, delete the variable id
from dsNew
.
You can use the observation name to index observations.
dsNew.Properties.ObsNames = dsNew.id;
dsNew.id = [];
dsNew('KOQ-996',:) = [];
size(dsNew)
ans = 17 5
The dataset array now has one less observation and one less variable.
You can also search for observations in the dataset array. For
example, delete observations for any patients with the last name WILLIAMS
.
toDelete = strcmp(dsNew.name,'WILLIAMS');
dsNew(toDelete,:) = [];
size(dsNew)
ans = 16 5
cell2dataset
| dataset
| struct2dataset