Selecting data based on its values is often useful. This type of data selection can involve creating a logical vector based on values in one variable, and then using that logical vector to select a subset of values in other variables. You can create a logical vector for selecting data by finding values in a numeric array that fall within a certain range. Additionally, you can create the logical vector by finding specific discrete values. When using categorical arrays, you can easily:
Select elements from particular
categories. For categorical arrays, use the logical operators ==
or ~=
to
select data that is in, or not in, a particular category. To select
data in a particular group of categories, use the ismember
function.
For ordinal categorical arrays, use inequalities >
, >=
, <
,
or <=
to find data in categories above or below
a particular category.
Delete data that is in a particular category. Use logical operators to include or exclude data from particular categories.
Find elements that are not in
a defined category. Categorical arrays indicate which elements
do not belong to a defined category by <undefined>
.
Use the isundefined
function to find observations
without a defined value.
This example shows how to index and search using categorical arrays. You can access data using categorical arrays stored within a table in a similar manner.
Load Sample Data
Load sample data gathered from 100 patients.
load patients
whos
Name Size Bytes Class Attributes Age 100x1 800 double Diastolic 100x1 800 double Gender 100x1 11412 cell Height 100x1 800 double LastName 100x1 11616 cell Location 100x1 14208 cell SelfAssessedHealthStatus 100x1 11540 cell Smoker 100x1 100 logical Systolic 100x1 800 double Weight 100x1 800 double
Create Categorical Arrays from Cell Arrays of Character Vectors
Gender
and Location
contain data that belong in categories. Each cell array contains character vectors taken from a small set of unique values (indicating two genders and three locations respectively). Convert Gender
and Location
to categorical arrays.
Gender = categorical(Gender); Location = categorical(Location);
Search for Members of a Single Category
For categorical arrays, you can use the logical operators ==
and ~=
to find the data that is in, or not in, a particular category.
Determine if there are any patients observed at the location, 'Rampart General Hospital'
.
any(Location=='Rampart General Hospital')
ans = logical
0
There are no patients observed at Rampart General Hospital.
Search for Members of a Group of Categories
You can use ismember
to find data in a particular group of categories. Create a logical vector for the patients observed at County General Hospital
or VA Hospital
.
VA_CountyGenIndex = ... ismember(Location,{'County General Hospital','VA Hospital'});
VA_CountyGenIndex
is a 100-by-1 logical array containing logical true
(1
) for each element in the categorical array Location
that is a member of the category County General Hospital
or VA Hospital
. The output, VA_CountyGenIndex
contains 76 nonzero elements.
Use the logical vector, VA_CountyGenIndex
to select the LastName
of the patients observed at either County General Hospital
or VA Hospital
.
VA_CountyGenPatients = LastName(VA_CountyGenIndex);
VA_CountyGenPatients
is a 76-by-1 cell array of character vectors.
Select Elements in a Particular Category to Plot
Use the summary
function to print a summary containing the category names and the number of elements in each category.
summary(Location)
County General Hospital 39 St. Mary's Medical Center 24 VA Hospital 37
Location
is a 100-by-1 categorical array with three categories. County General Hospital
occurs in 39 elements, St. Mary s Medical Center
in 24 elements, and VA Hospital
in 37 elements.
Use the summary
function to print a summary of Gender
.
summary(Gender)
Female 53 Male 47
Gender
is a 100-by-1 categorical array with two categories. Female
occurs in 53 elements and Male
occurs in 47 elements.
Use logical operator ==
to access the age of only the female patients. Then plot a histogram of this data.
figure() histogram(Age(Gender=='Female')) title('Age of Female Patients')
histogram(Age(Gender=='Female'))
plots the age data for the 53 female patients.
Delete Data from a Particular Category
You can use logical operators to include or exclude data from particular categories. Delete all patients observed at VA Hospital
from the workspace variables, Age
and Location
.
Age = Age(Location~='VA Hospital'); Location = Location(Location~='VA Hospital');
Now, Age
is a 63-by-1 numeric array, and Location
is a 63-by-1 categorical array.
List the categories of Location
, as well as the number of elements in each category.
summary(Location)
County General Hospital 39 St. Mary's Medical Center 24 VA Hospital 0
The patients observed at VA Hospital
are deleted from Location
, but VA Hospital
is still a category.
Use the removecats
function to remove VA Hospital
from the categories of Location
.
Location = removecats(Location,'VA Hospital');
Verify that the category, VA Hospital
, was removed.
categories(Location)
ans = 2x1 cell
{'County General Hospital' }
{'St. Mary's Medical Center'}
Location
is a 63-by-1 categorical array that has two categories.
Delete Element
You can delete elements by indexing. For example, you can remove the first element of Location
by selecting the rest of the elements with Location(2:end)
. However, an easier way to delete elements is to use []
.
Location(1) = []; summary(Location)
County General Hospital 38 St. Mary's Medical Center 24
Location
is a 62-by-1 categorical array that has two categories. Deleting the first element has no effect on other elements from the same category and does not delete the category itself.
Check for Undefined Data
Remove the category County General Hospital
from Location
.
Location = removecats(Location,'County General Hospital');
Display the first eight elements of the categorical array, Location
.
Location(1:8)
ans = 8x1 categorical
St. Mary's Medical Center
<undefined>
St. Mary's Medical Center
St. Mary's Medical Center
<undefined>
<undefined>
St. Mary's Medical Center
St. Mary's Medical Center
After removing the category, County General Hospital
, elements that previously belonged to that category no longer belong to any category defined for Location
. Categorical arrays denote these elements as undefined
.
Use the function isundefined
to find observations that do not belong to any category.
undefinedIndex = isundefined(Location);
undefinedIndex
is a 62-by-1 categorical array containing logical true
(1
) for all undefined elements in Location
.
Set Undefined Elements
Use the summary
function to print the number of undefined elements in Location
.
summary(Location)
St. Mary's Medical Center 24 <undefined> 38
The first element of Location
belongs to the category, St. Mary's Medical Center
. Set the first element to be undefined
so that it no longer belongs to any category.
Location(1) = '<undefined>';
summary(Location)
St. Mary's Medical Center 23 <undefined> 39
You can make selected elements undefined
without removing a category or changing the categories of other elements. Set elements to be undefined
to indicate elements with values that are unknown.
Preallocate Categorical Arrays with Undefined Elements
You can use undefined elements to preallocate the size of a categorical array for better performance. Create a categorical array that has elements with known locations only.
definedIndex = ~isundefined(Location); newLocation = Location(definedIndex); summary(newLocation)
St. Mary's Medical Center 23
Expand the size of newLocation
so that it is a 200-by-1 categorical array. Set the last new element to be undefined
. All of the other new elements also are set to be undefined
. The 23 original elements keep the values they had.
newLocation(200) = '<undefined>';
summary(newLocation)
St. Mary's Medical Center 23 <undefined> 177
newLocation
has room for values you plan to store in the array later.
any
| categorical
| categories
| histogram
| isundefined
| removecats
| summary