onehotencode

Encode data labels into one-hot vectors

    Description

    example

    B = onehotencode(A,featureDim) encodes data labels in categorical array A into a one-hot encoded array B. Each element of A is replaced with a numeric vector of length equal to the number of unique classes in A along the dimension specified by featureDim. The vector contains a 1 in the position corresponding to the class of the label in A, and 0 in every other position. Any <undefined> values are encoded to NaN values.

    example

    tblB = onehotencode(tblA) encodes categorical data labels in table tblA into a table of one-hot encoded numeric values. The single variable of tblA is replaced with as many variables as the number of unique classes in tblA. Each row in tblB contains a 1 in the variable corresponding to the class of the label in tlbA and a 0 in all other variables.

    example

    ___ = onehotencode(___,typename) encodes the labels into numeric values of data type typename.

    example

    ___ = onehotencode(___,'ClassNames',classes) also specifies the names of the classes to use for encoding. Use this syntax when A or tblA do not contain categorical values, when you want to exclude any class labels from being encoded, or when you want to encode the vector elements in a specific order. Any label in A or tblA of a class that does not exist in classes is encoded to a vector of NaN values.

    Examples

    collapse all

    Encode a categorical vector of class labels into one-hot vectors representing the labels.

    Create a column vector of labels, where each row of the vector represents a single observation. Convert the labels to a categorical array.

    labels = ["red"; "blue"; "red"; "green"; "yellow"; "blue"];
    labels = categorical(labels);

    View the order of the categories.

    categories(classes)
    ans = 4×1 cell    
    'blue'       
    'green'      
    'red'        
    'yellow'     
    

    Encode the labels into one-hot vectors. Expand the labels into vectors in the second dimension to encode the classes.

    labels = onehotencode(color,2)
    labels = 6×4    
         0     0     1     0
         1     0     0     0
         0     0     1     0
         0     1     0     0
         0     0     0     1
         1     0     0     0
    

    Each observation in labels is now a row vector with a 1 in the position corresponding to the category of the class label and 0 in all other positions. The categories are encoded in the same order as the categories, such that a 1 in position 1 represents the fist category in the list, in this case, 'blue'.

    One-hot encode a table of categorical values.

    Create a table of categorical data labels. Each row in the table holds a single observation.

    color = ["blue"; "red"; "blue"; "green"; "yellow"; "red"];
    color = categorical(color);
    color = table (color);
    color = 
        color 
        ______
    
        blue  
        red   
        blue  
        green 
        yellow
        red   
    

    One-hot encode the table of class labels.

    color = onehotencode(color)
    color = 
        blue    green    red    yellow
        ____    _____    ___    ______
    
         1        0       0       0   
         0        0       1       0   
         1        0       0       0   
         0        1       0       0   
         0        0       0       1   
         0        0       1       0   
    

    Each column of the table represents a class. The data labels are encoded with a 1 in the column of the corresponding class, and 0 everywhere else.

    If not all classes in the data are relevant, encode the data labels using only a subset of the classes.

    Create a row vector of data labels, where each column of the vector represents a single observation

    pets = ["dog" "fish" "cat" "dog" "cat" "bird"];

    Define the list of classes to encode. These classes are a subset of those present in the observations.

    animalClasses = ["bird"; "cat"; "dog"];

    One-hot encode the observations into the first dimension. Specify the classes to encode.

    encPets = onehotencode(pets,1,"ClassNames",animalClasses)
    encPets = 3×6    
         1   NaN     0     1     0     0
         0   NaN     1     0     1     0
         0   NaN     0     0     0     1
    

    Observations of a class not present in the list of classes to encode are encoded to a vector of NaN values.

    Use onehotencode to encode a matrix of class labels, such as a semantic segmentation of an image.

    Define a simple 15-by-15 pixel segmentation matrix of class labels.

    A = "blue";
    B = "green";
    C = "black";
    
    A = repmat(A,8,15);
    B = repmat(B,7,5);
    C = repmat(C,7,5);
    
    seg = [A;B C B];

    Convert the segmentation matrix into a categorical array.

    seg = categorical(seg);

    One-hot encode the segmentation matrix into an array of type single. Expand the encoded labels into the third dimension.

    encSeg = onehotencode(seg,3,"single");

    Check the size of the encoded segmentation.

    size(encSeg)
    ans = 1×3    
        15    15     3
    

    The three possible classes of the pixels in the segmentation matrix are encoded as vectors in the third dimension.

    If your data is a table that contains several types of class variables, you can encode each variable separately.

    Create a table of observations of several types of categorical data.

    color = ["blue"; "red"; "blue"; "green"; "yellow"; "red"];
    color = categorical(color);
    
    pets = ["dog"; "fish"; "cat"; "dog"; "cat"; "bird"];
    pets = categorical(pets);
    
    location = ["USA"; "CAN"; "CAN"; "USA"; "AUS"; "USA"];
    location = categorical(location);
    
    data = table(color,pets,location)
    data = 
        color     pets    location
        ______    ____    ________
    
        blue      dog       USA   
        red       fish      CAN   
        blue      cat       CAN   
        green     dog       USA   
        yellow    cat       USA   
        red       bird      USA   
    
    

    Use a for-loop to one-hot encode each table variable and append it to a new table containing the encoded data.

    encData = table();
    
    for i=1:width(data)
     encData = [encData onehotencode(data(:,i))];
    end
    
    encData
    encData = 
        blue    green    red    yellow    bird    cat    dog    fish    CAN    USA
        ____    _____    ___    ______    ____    ___    ___    ____    ___    ___
    
         1        0       0       0        0       0      1      0       0      1 
         0        0       1       0        0       0      0      1       1      0 
         1        0       0       0        0       1      0      0       1      0 
         0        1       0       0        0       0      1      0       0      1 
         0        0       0       1        0       1      0      0       0      1 
         0        0       1       0        1       0      0      0       0      1 
    
    

    Each row of encdata encodes the three different categorical classes for each observation.

    Input Arguments

    collapse all

    Array of data labels to encode, specified as a categorical array, a numeric array, or a string array.

    If A is a categorical array, the elements of the one-hot encoded vectors match the same order as that given by categories(A).

    If A is not a categorical array, you must specify the classes to encode using the 'ClassNames' name-value pair. The vectors are encoded in the order that the classes appear in classes.

    If A contains undefined values or values not present in classes, those values are encoded as a vector of NaN values. typename must be 'double' or 'single'.

    Data Types: categorical

    Table of data labels to encode, specified as a table. The table must contain a single variable and one row for each observation. Each entry must contain a categorical scalar, a numeric scalar, or a string scalar.

    If tblAcontains categorical values, the elements of the one-hot encoded vectors match the same order as the categories; for example, that given by categories(tbl(1,n)).

    If tblA does not contain categorical values, you must specify the classes to encode using the 'ClassNames' name-value pair. The vectors are encoded in the order that the classes appear in classes.

    If tblA contains undefined values or values not present in classes, those values are encoded as NaN values. typename must be 'double' or 'single'.

    Data Types: table

    Dimension to expand to encode the labels, specified as a positive integer.

    featureDim must specify a singleton dimension of A, or be larger than n where n is the number of dimensions of A.

    Data type of the encoded labels, specified as a character vector or a string scalar.

    If the classification label input is a categorical array, a numeric array, or a string array, then the encoded labels are returned as an array of data type typename.

    If the classification label input is a table, then the encoded labels are returned as a table where each entry has data type typename.

    Valid values of typename are floating point, signed and unsigned integer, and logical types.

    Example: 'int64'

    Data Types: char | string

    Classes to encode, specified as a cell array of character vectors, a string vector, a numeric vector, or a two-dimensional char array.

    If the input A or tblA does not contain categorical values, then you must specify classes. You can also use the classes argument to exclude any class labels from being encoded, or to encode the vector elements in a specific order.

    If A or tblA contains undefined values or values not present in classes, those values are encoded to a vector of NaN values. typename must be 'double' or 'single'.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | string | cell

    Output Arguments

    collapse all

    Encoded labels, returned as a numeric array.

    Encoded labels, returned as a table.

    Each row of tblB contains the one-hot encoded label for a single observation, in the same order as that provided in tblA. Each row contains a 1 in the variable corresponding to the class of the label in tlbA and a 0 in all other variables.

    Introduced in R2020b