Note
The dataset
data type is not recommended. To
work with heterogeneous data, use the MATLAB®
table
data type instead. See MATLAB
table
documentation for more information.
This example shows how to create a dataset array from the contents of a tab-delimited text file.
Create a dataset array using default settings.
Import the text file hospitalSmall.txt
as a dataset array using the default
settings.
ds = dataset('File',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.txt'))
ds = name sex age wgt smoke 'SMITH' 'm' 38 176 1 'JOHNSON' 'm' 43 163 0 'WILLIAMS' 'f' 38 131 0 'JONES' 'f' 40 133 0 'BROWN' 'f' 49 119 0 'DAVIS' 'f' 46 142 0 'MILLER' 'f' 33 142 1 'WILSON' 'm' 40 180 0 'MOORE' 'm' 28 183 0 'TAYLOR' 'f' 31 132 0 'ANDERSON' 'f' 45 128 0 'THOMAS' 'f' 42 137 0 'JACKSON' 'm' 25 174 0 'WHITE' 'm' 39 202 1
By default, dataset
uses the first row of
the text file for variable names. If the first row does not contain
variable names, you can specify the optional name-value pair argument 'ReadVarNames',false
to
change the default behavior.
The dataset array contains heterogeneous variables. The variables id
, name
,
and sex
are cell arrays of character vectors, and
the other variables are numeric.
Summarize the dataset array.
You can see the data type and other descriptive statistics for
each variable by using summary
to summarize the
dataset array.
summary(ds)
name: [14x1 cell array of character vectors] sex: [14x1 cell array of character vectors] age: [14x1 double] min 1st quartile median 3rd quartile max 25 33 39.5 43 49 wgt: [14x1 double] min 1st quartile median 3rd quartile max 119 132 142 176 202 smoke: [14x1 double] min 1st quartile median 3rd quartile max 0 0 0 0 1
Import observation names.
Import the text file again, this time specifying that the first column contains observation names.
ds = dataset('File',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.txt'),'ReadObsNames',true)
ds = sex age wgt smoke SMITH 'm' 38 176 1 JOHNSON 'm' 43 163 0 WILLIAMS 'f' 38 131 0 JONES 'f' 40 133 0 BROWN 'f' 49 119 0 DAVIS 'f' 46 142 0 MILLER 'f' 33 142 1 WILSON 'm' 40 180 0 MOORE 'm' 28 183 0 TAYLOR 'f' 31 132 0 ANDERSON 'f' 45 128 0 THOMAS 'f' 42 137 0 JACKSON 'm' 25 174 0 WHITE 'm' 39 202 1
The elements of the first column in the text file, last names,
are now observation names. Observation names and row names are dataset
array properties. You can always add or change the observation names
of an existing dataset array by modifying the property ObsNames
.
Change dataset array properties.
By default, the DimNames
property of the
dataset array has name
as the descriptor of the
observation (row) dimension. dataset
got this name
from the first row of the first column in the text file.
Change the first element of DimNames
to LastName
.
ds.Properties.DimNames{1} = 'LastName';
ds.Properties
ans = Description: '' VarDescription: {} Units: {} DimNames: {'LastName' 'Variables'} UserData: [] ObsNames: {14x1 cell} VarNames: {'sex' 'age' 'wgt' 'smoke'}
Index into dataset array.
You can use observation names to index into a dataset array.
For example, return the data for the patient with last name BROWN
.
ds('BROWN',:)
ans = sex age wgt smoke BROWN 'f' 49 119 0
Note that observation names must be unique.
This example shows how to create a dataset array from the contents of a comma-separated text file.
Create a dataset array.
Import the file hospitalSmall.csv
as a dataset array,
specifying the comma-delimited format.
ds = dataset('File',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.csv'),'Delimiter',',')
ds = id name sex age wgt smoke 'YPL-320' 'SMITH' 'm' 38 176 1 'GLI-532' 'JOHNSON' 'm' 43 163 0 'PNI-258' 'WILLIAMS' 'f' 38 131 0 'MIJ-579' 'JONES' 'f' 40 133 0 'XLK-030' 'BROWN' 'f' 49 119 0 'TFP-518' 'DAVIS' 'f' 46 142 0 'LPD-746' 'MILLER' 'f' 33 142 1 'ATA-945' 'WILSON' 'm' 40 180 0 'VNL-702' 'MOORE' 'm' 28 183 0 'LQW-768' 'TAYLOR' 'f' 31 132 0 'QFY-472' 'ANDERSON' 'f' 45 128 0 'UJG-627' 'THOMAS' 'f' 42 137 0 'XUE-826' 'JACKSON' 'm' 25 174 0 'TRW-072' 'WHITE' 'm' 39 202 1
By default, dataset
uses the first row in the text file
as variable names.
Add observation names.
Use the unique identifiers in the variable id
as
observation names. Then, delete the variable id
from
the dataset array.
ds.Properties.ObsNames = ds.id; ds.id = []
ds = name sex age wgt smoke YPL-320 'SMITH' 'm' 38 176 1 GLI-532 'JOHNSON' 'm' 43 163 0 PNI-258 'WILLIAMS' 'f' 38 131 0 MIJ-579 'JONES' 'f' 40 133 0 XLK-030 'BROWN' 'f' 49 119 0 TFP-518 'DAVIS' 'f' 46 142 0 LPD-746 'MILLER' 'f' 33 142 1 ATA-945 'WILSON' 'm' 40 180 0 VNL-702 'MOORE' 'm' 28 183 0 LQW-768 'TAYLOR' 'f' 31 132 0 QFY-472 'ANDERSON' 'f' 45 128 0 UJG-627 'THOMAS' 'f' 42 137 0 XUE-826 'JACKSON' 'm' 25 174 0 TRW-072 'WHITE' 'm' 39 202 1
Delete observations.
Delete any patients with the last name BROWN
.
You can use strcmp
to match 'BROWN'
with
the elements of the variable containing last names, name
.
toDelete = strcmp(ds.name,'BROWN');
ds(toDelete,:) = []
ds = name sex age wgt smoke YPL-320 'SMITH' 'm' 38 176 1 GLI-532 'JOHNSON' 'm' 43 163 0 PNI-258 'WILLIAMS' 'f' 38 131 0 MIJ-579 'JONES' 'f' 40 133 0 TFP-518 'DAVIS' 'f' 46 142 0 LPD-746 'MILLER' 'f' 33 142 1 ATA-945 'WILSON' 'm' 40 180 0 VNL-702 'MOORE' 'm' 28 183 0 LQW-768 'TAYLOR' 'f' 31 132 0 QFY-472 'ANDERSON' 'f' 45 128 0 UJG-627 'THOMAS' 'f' 42 137 0 XUE-826 'JACKSON' 'm' 25 174 0 TRW-072 'WHITE' 'm' 39 202 1
One patient having last name BROWN
is deleted
from the dataset array.
Return size of dataset array.
The array now has 13 observations.
size(ds)
ans = 13 5
Note that the row and column corresponding to variable and observation
names, respectively, are not included in the size of a dataset
array.
This example shows how to create a dataset array from the contents of an Excel® spreadsheet file.
Create a dataset array.
Import the data from the first worksheet in the file
hospitalSmall.xlsx
, specifying that the data file is
an Excel spreadsheet.
ds = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx'))
ds = id name sex age wgt smoke 'YPL-320' 'SMITH' 'm' 38 176 1 'GLI-532' 'JOHNSON' 'm' 43 163 0 'PNI-258' 'WILLIAMS' 'f' 38 131 0 'MIJ-579' 'JONES' 'f' 40 133 0 'XLK-030' 'BROWN' 'f' 49 119 0 'TFP-518' 'DAVIS' 'f' 46 142 0 'LPD-746' 'MILLER' 'f' 33 142 1 'ATA-945' 'WILSON' 'm' 40 180 0 'VNL-702' 'MOORE' 'm' 28 183 0 'LQW-768' 'TAYLOR' 'f' 31 132 0 'QFY-472' 'ANDERSON' 'f' 45 128 0 'UJG-627' 'THOMAS' 'f' 42 137 0 'XUE-826' 'JACKSON' 'm' 25 174 0 'TRW-072' 'WHITE' 'm' 39 202 1
By default, dataset
creates variable names using the
contents of the first row in the spreadsheet.
Specify which worksheet to import.
Import the data from the second worksheet into a new dataset array.
ds2 = dataset('XLSFile',fullfile(matlabroot,'help/toolbox/stats/examples','hospitalSmall.xlsx'),'Sheet',2)
ds2 = id name sex age wgt smoke 'TRW-072' 'WHITE' 'm' 39 202 1 'ELG-976' 'HARRIS' 'f' 36 129 0 'KOQ-996' 'MARTIN' 'm' 48 181 1 'YUZ-646' 'THOMPSON' 'm' 32 191 1 'XBR-291' 'GARCIA' 'f' 27 131 1 'KPW-846' 'MARTINEZ' 'm' 37 179 0 'XBA-581' 'ROBINSON' 'm' 50 172 0 'BKD-785' 'CLARK' 'f' 48 133 0