This example shows how to create a categorical array. categorical
is a data type for storing data with values from a finite set of discrete categories. These categories can have a natural order, but it is not required. A categorical array provides efficient storage and convenient manipulation of data, while also maintaining meaningful names for the values. Categorical arrays are often used in a table to define groups of rows.
By default, categorical arrays contain categories that have no mathematical ordering. For example, the discrete set of pet categories {'dog' 'cat' 'bird'}
has no meaningful mathematical ordering, so MATLAB® uses the alphabetical ordering {'bird' 'cat' 'dog'}
. Ordinal categorical arrays contain categories that have a meaningful mathematical ordering. For example, the discrete set of size categories {'small', 'medium', 'large'}
has the mathematical ordering small < medium < large
.
When you create categorical arrays from cell arrays of character vectors or string arrays, leading and trailing spaces are removed. For example, if you specify the text {' cat' 'dog '} as categories, then when you convert them to categories they become {'cat' 'dog'}.
You can use the categorical
function to create a categorical array from a numeric array, logical array, string array, cell array of character vectors, or an existing categorical array.
Create a 1-by-11 cell array of character vectors containing state names from New England.
state = {'MA','ME','CT','VT','ME','NH','VT','MA','NH','CT','RI'};
Convert the cell array, state
, to a categorical array that has no mathematical order.
state = categorical(state)
state = 1x11 categorical
Columns 1 through 9
MA ME CT VT ME NH VT MA NH
Columns 10 through 11
CT RI
class(state)
ans = 'categorical'
List the discrete categories in the variable state
.
categories(state)
ans = 6x1 cell
{'CT'}
{'MA'}
{'ME'}
{'NH'}
{'RI'}
{'VT'}
The categories are listed in alphabetical order.
Create a 1-by-8 cell array of character vectors containing the sizes of eight objects.
AllSizes = {'medium','large','small','small','medium',... 'large','medium','small'};
The cell array, AllSizes
, has three distinct values: 'large'
, 'medium'
, and 'small'
. With the cell array of character vectors, there is no convenient way to indicate that small < medium < large
.
Convert the cell array, AllSizes
, to an ordinal categorical array. Use valueset
to specify the values small
, medium
, and large
, which define the categories. For an ordinal categorical array, the first category specified is the smallest and the last category is the largest.
valueset = {'small','medium','large'}; sizeOrd = categorical(AllSizes,valueset,'Ordinal',true)
sizeOrd = 1x8 categorical
Columns 1 through 6
medium large small small medium large
Columns 7 through 8
medium small
class(sizeOrd)
ans = 'categorical'
The order of the values in the categorical array, sizeOrd
, remains unchanged.
List the discrete categories in the categorical variable, sizeOrd
.
categories(sizeOrd)
ans = 3x1 cell
{'small' }
{'medium'}
{'large' }
The categories are listed in the specified order to match the mathematical ordering small < medium < large
.
Create a vector of 100 random numbers between zero and 50.
x = rand(100,1)*50;
Use the discretize
function to create a categorical array by binning the values of x
. Put all values between zero and 15 in the first bin, all the values between 15 and 35 in the second bin, and all the values between 35 and 50 in the third bin. Each bin includes the left endpoint, but does not include the right endpoint.
catnames = {'small','medium','large'}; binnedData = discretize(x,[0 15 35 50],'categorical',catnames);
binnedData
is a 100-by-1 ordinal categorical array with three categories, such that small < medium < large
.
Use the summary
function to print the number of elements in each category.
summary(binnedData)
small 30 medium 35 large 35
Starting in R2016b, you can create string arrays with the string
function and convert them to categorical array.
Create a string array that contains names of planets.
str = string({'Earth','Jupiter','Neptune','Jupiter','Mars','Earth'})
str = 1x6 string
"Earth" "Jupiter" "Neptune" "Jupiter" "Mars" "Earth"
Convert str
to a categorical array.
planets = categorical(str)
planets = 1x6 categorical
Earth Jupiter Neptune Jupiter Mars Earth
Add missing elements to str
and convert it to a categorical array. Where str
has missing values, planets
has undefined values.
str(8) = 'Mars'
str = 1x8 string
Columns 1 through 6
"Earth" "Jupiter" "Neptune" "Jupiter" "Mars" "Earth"
Columns 7 through 8
<missing> "Mars"
planets = categorical(str)
planets = 1x8 categorical
Columns 1 through 6
Earth Jupiter Neptune Jupiter Mars Earth
Columns 7 through 8
<undefined> Mars
categorical
| categories
| discretize
| summary