Class: dataset
(Not Recommended) Construct dataset array
The dataset
data type is not recommended. To work with heterogeneous data,
use the MATLAB®
table
data type instead. See MATLAB
table
documentation for more information.
A = dataset(
varspec
,'ParamName
',Value
)
A = dataset('File',filename
,'ParamName
',Value
)
A = dataset('XLSFile',filename
,'ParamName
',Value
)
A = dataset('XPTFile',xptfilename
,'ParamName
',Value
)
A = dataset(
creates dataset array varspec
,'ParamName
',Value
)A
using the workspace variable input method
varspec
and one or more optional name/value pairs (see
Parameter Name/Value Pairs).
The input method varspec
can be one or more of the following:
VAR
— a workspace variable.
dataset
uses the workspace name for the variable name in
A
. To include multiple variables, specify
VAR_1
,VAR_2
,...,VAR_N
.
Variables can be arrays of any size, but all variables must have the same number
of rows. VAR
can also be an expression. In this case,
dataset
creates a default name automatically.
{VAR
,name
} — a
workspace variable, VAR
and a variable name,
name
. dataset
uses
name
as the variable name. To include multiple
variables and names, specify
{VAR_1
,name_1
},
{VAR_2
,name_2
},...,
{VAR_N
,name_N
}.
{VAR
,name_1
,...
,name_m
}
— an m-columned workspace variable,
VAR
. dataset
uses the names
name_1
, ...
,
name_m
as variable names. You must include a name
for every column in VAR
. Each column becomes a separate
variable in A
.
You can combine these input methods to include as many variables and names as needed. Names must be valid, unique MATLAB identifiers. For example input combinations, see Examples. For optional name/value pairs see Inputs.
To convert numeric arrays, cell arrays, structure arrays, or tables to dataset arrays, you can also use (respectively):
Note
Dataset arrays may contain built-in types or array objects as variables. Array objects must implement each of the following:
Standard MATLAB parenthesis indexing of the form var(i,...)
,
where i
is a numeric or logical vector corresponding to rows of
the variable
A size
method with a dim
argument
A vertcat
method
A = dataset('File',
creates dataset array filename
,'ParamName
',Value
)A
from column-oriented data in the text file
specified by filename
. Variables in A
are of type
double
if data in the corresponding column of the file, following the
column header, are entirely numeric; otherwise the variables in A
are
cell arrays of character vectors. dataset
converts empty fields to
either NaN
(for a numeric variable) or the empty
character vector (for a character-valued variable). dataset
ignores
insignificant white space in the file. You cannot specify both a file and workspace
variables as input. See Name/Value Pairs for more information.
A = dataset('XLSFile',
creates dataset array filename
,'ParamName
',Value
)A
from column-oriented data in the Excel® spreadsheet specified by filename
. Variables in
A
are of type double
if data in the corresponding
column of the spreadsheet, following the column header, are entirely numeric; otherwise the
variables in A
are cell arrays of character vectors. See Name/Value
Pairs for more information.
A = dataset('XPTFile',
creates a dataset array from a SAS® XPORT format file. Variable names from the XPORT format file are preserved.
Numeric data types in the XPORT format file are preserved but all other data types are
converted to cell arrays of character vectors. The XPORT format allows for 28 missing data
types. xptfilename
,'ParamName
',Value
)dataset
represents these in the file by an upper case letter,
'.'
or '_'
. dataset
converts
all missing data to NaN
values in A
. See Name/Value
Pairs for more information.
Specify one or more of the following name/value pairs when constructing a dataset:
|
A string array or cell array |
|
A string array or cell array |
Name/value pairs available when using text files as inputs:
|
A character vector or string scalar indicating the character separating columns in the file. Values are
|
|
A format character vector or string scalar, as accepted by |
|
Numeric value indicating the number of lines to skip at the beginning of a file. Default: |
|
Specifies characters to treat as the empty character vector in a numeric
column. Values may be a character array, a string array, or a cell array of
character vectors. The parameter applies only to numeric columns in the file;
|
Name/value pairs available when using text files or Excel spreadsheets as inputs:
|
A logical value indicating whether ( |
|
A logical value indicating whether ( When reading from an |
Name/value pairs available when using Excel spreadsheets as input:
|
A positive scalar value of type |
|
A character vector or string scalar of the form |
Create a dataset array from workspace variables, including observation names:
load cereal cereal = dataset(Calories,Protein,Fat,Sodium,Fiber,Carbo,... Sugars,'ObsNames',Name) cereal.Properties.VarDescription = Variables(4:10,2);
Create a dataset array from a single, multi-columned workspace variable, designating variable names for each column:
load cities categories = cellstr(categories); cities = dataset({ratings,categories{:}},... 'ObsNames',cellstr(names))
Load data from a text or spreadsheet file
patients = dataset('File','hospital.dat',... 'Delimiter',',','ReadObsNames',true) patients2 = dataset('XLSFile','hospital.xls',... 'ReadObsNames',true)
Load patient data from the CSV file hospital.dat
and store
the information in a dataset
array with observation names given
by the first column in the data (patient identification):
patients = dataset('file','hospital.dat', ... 'format','%s%s%s%f%f%f%f%f%f%f%f%f', ... 'Delimiter',',','ReadObsNames',true);
You can also load the data without specifying a format.
dataset
will automatically create dataset
variables that are either double
arrays or cell arrays of
character vectors, depending on the contents of the
file:
patients = dataset('file','hospital.dat',... 'delimiter',',',... 'ReadObsNames',true);
Make the {0,1}-valued variable smoke
nominal, and change the
labels to 'No'
and 'Yes'
:
patients.smoke = nominal(patients.smoke,{'No','Yes'});
Add new levels to smoke
as placeholders for more detailed
histories of smokers:
patients.smoke = addlevels(patients.smoke,... {'0-5 Years','5-10 Years','LongTerm'});
Assuming the nonsmokers have never smoked, relabel the 'No'
level:
patients.smoke = setlabels(patients.smoke,'Never','No');
Drop the undifferentiated 'Yes'
level from
smoke
:
patients.smoke = droplevels(patients.smoke,'Yes'); Warning: OLDLEVELS contains categorical levels that were present in A, caused some array elements to have undefined levels.
Note that smokers now have an undefined level.
Set each smoker to one of the new levels, by observation name:
patients.smoke('YPL-320') = '5-10 Years';
cell2dataset
| mat2dataset
| struct2dataset
| tdfread
| textscan
| xlsread