generateLearnerDataTypeFcn

Generate function that defines data types for fixed-point code generation

Description

To generate fixed-point C/C++ code for the predict function of a machine learning model, use generateLearnerDataTypeFcn, saveLearnerForCoder, loadLearnerForCoder, and codegen (MATLAB Coder).

  • After training a machine learning model, save the model using saveLearnerForCoder.

  • Create a structure that defines fixed-point data types by using the function generated from generateLearnerDataTypeFcn.

  • Define an entry-point function that loads the model by using both loadLearnerForCoder and the structure, and then calls the predict function.

  • Generate code using codegen, and then verify the generated code.

The generateLearnerDataTypeFcn function requires Fixed-Point Designer™, and generating fixed-point C/C++ code requires MATLAB® Coder™ and Fixed-Point Designer.

This flow chart shows the fixed-point code generation workflow for the predict function of a machine learning model. Use generateLearnerDataTypeFcn for the highlighted step.

example

generateLearnerDataTypeFcn(filename,X) generates a data type function that defines fixed-point data types for the variables required to generate fixed-point C/C++ code for prediction of a machine learning model. filename stores the machine learning model, and X contains the predictor data for the predict function of the model.

Use the generated function to create a structure that defines fixed-point data types. Then, use the structure as the input argument T of loadLearnerForCoder.

generateLearnerDataTypeFcn(filename,X,Name,Value) specifies additional options by using one or more name-value pair arguments. For example, you can specify 'WordLength',32 to use 32-bit word length for the fixed-point data types.

Examples

collapse all

After training a machine learning model, save the model using saveLearnerForCoder. For fixed-point code generation, specify the fixed-point data types of the variables required for prediction by using the data type function generated by generateLearnerDataTypeFcn. Then, define an entry-point function that loads the model by using both loadLearnerForCoder and the specified fixed-point data types, and calls the predict function of the model. Use codegen (MATLAB Coder) to generate fixed-point C/C++ code for the entry-point function, and then verify the generated code.

Before generating code using codegen, you can use buildInstrumentedMex (Fixed-Point Designer) and showInstrumentationResults (Fixed-Point Designer) to optimize the fixed-point data types to improve the performance of the fixed-point code. Record minimum and maximum values of named and internal variables for prediction by using buildInstrumentedMex. View the instrumentation results using showInstrumentationResults; then, based on the results, tune the fixed-point data type properties of the variables. For details regarding this optional step, see Fixed-Point Code Generation for Prediction of SVM.

Train Model

Load the ionosphere data set and train a binary SVM classification model.

load ionosphere
Mdl = fitcsvm(X,Y,'KernelFunction','gaussian');

Mdl is a ClassificationSVM model.

Save Model

Save the SVM classification model to the file myMdl.mat by using saveLearnerForCoder.

saveLearnerForCoder(Mdl,'myMdl');

Define Fixed-Point Data Types

Use generateLearnerDataTypeFcn to generate a function that defines the fixed-point data types of the variables required for prediction of the SVM model.

generateLearnerDataTypeFcn('myMdl',X)

generateLearnerDataTypeFcn generates the myMdl_datatype function.

Create a structure T that defines the fixed-point data types by using myMdl_datatype.

T = myMdl_datatype('Fixed')
T = struct with fields:
               XDataType: [0x0 embedded.fi]
           ScoreDataType: [0x0 embedded.fi]
    InnerProductDataType: [0x0 embedded.fi]

The structure T includes the fields for the named and internal variables required to run the predict function. Each field contains a fixed-point object, returned by fi (Fixed-Point Designer). The fixed-point object specifies fixed-point data type properties, such as word length and fraction length. For example, display the fixed-point data type properties of the predictor data.

T.XDataType
ans = 

[]

          DataTypeMode: Fixed-point: binary point scaling
            Signedness: Signed
            WordLength: 16
        FractionLength: 14

        RoundingMethod: Floor
        OverflowAction: Wrap
           ProductMode: FullPrecision
  MaxProductWordLength: 128
               SumMode: FullPrecision
      MaxSumWordLength: 128

Define Entry-Point Function

Define an entry-point function named myFixedPointPredict that does the following:

  • Accept the predictor data X and the fixed-point data type structure T.

  • Load a fixed-point version of a trained SVM classification model by using both loadLearnerForCoder and the structure T.

  • Predict labels and scores using the loaded model.

type myFixedPointPredict.m % Display contents of myFixedPointPredict.m file
function [label,score] = myFixedPointPredict(X,T) %#codegen
Mdl = loadLearnerForCoder('myMdl','DataType',T);
[label,score] = predict(Mdl,X);
end

Note: If you click the button located in the upper-right section of this example and open the example in MATLAB®, then MATLAB opens the example folder. This folder includes the entry-point function file.

Generate Code

The XDataType field of the structure T specifies the fixed-point data type of the predictor data. Convert X to the type specified in T.XDataType by using the cast (Fixed-Point Designer) function.

X_fx = cast(X,'like',T.XDataType);

Generate code for the entry-point function using codegen. Specify X_fx and constant folded T as input arguments of the entry-point function.

codegen myFixedPointPredict -args {X_fx,coder.Constant(T)}

codegen generates the MEX function myFixedPointPredict_mex with a platform-dependent extension.

Verify Generated Code

Pass predictor data to predict and myFixedPointPredict_mex to compare the outputs.

[labels,scores] = predict(Mdl,X);
[labels_fx,scores_fx] = myFixedPointPredict_mex(X_fx,T);

Compare the outputs from predict and myFixedPointPredict_mex.

verify_labels = isequal(labels,labels_fx)
verify_labels = logical
   1

isequal returns logical 1 (true), which means labels and labels_fx are equal. If the labels are not equal, you can compute the percentage of incorrectly classified labels as follows.

sum(strcmp(labels_fx,labels)==0)/numel(labels_fx)*100
ans = 0

Find the maximum of the relative differences between the score outputs.

relDiff_scores = max(abs((scores_fx.double(:,1)-scores(:,1))./scores(:,1)))
relDiff_scores = 0.0055

If you are not satisfied with the comparison results and want to improve the precision of the generated code, you can tune the fixed-point data types and regenerate the code. For details, see Tips in generateLearnerDataTypeFcn, Data Type Function, and Fixed-Point Code Generation for Prediction of SVM.

Input Arguments

collapse all

Name of the MATLAB formatted binary file (MAT-file) that contains the structure array representing a model object, specified as a character vector or string scalar.

You must create the filename file using saveLearnerForCoder, and the model in filename can be one of the following:

The extension of the filename file must be .mat. If filename has no extension, then generateLearnerDataTypeFcn appends .mat.

If filename does not include a full path, then generateLearnerDataTypeFcn loads the file from the current folder.

Example: 'myMdl'

Data Types: char | string

Predictor data for the predict function of the model stored in filename, specified as an n-by-p numeric matrix, where n is the number of observations and p is the number of predictor variables.

Data Types: single | double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: generateLearnerDataTypeFcn(filename,X,'OutputFunctionName','myDataTypeFcn','WordLength',32) generates a data type function named myDataTypeFcn that uses 32 bits for the word length when defining the fixed-point data type for each variable.

Name of the generated function, specified as the comma-separated pair consisting of 'OutputFunctionName' and a character vector or string scalar. The 'OutputFunctionName' value must be a valid MATLAB function name.

The default function name is the file name in filename followed by _datatype. For example, if filename is myMdl, then the default function name is myMdl_datatype.

Example: 'OutputFunctionName','myDataTypeFcn'

Data Types: char | string

Word length in bits, specified as the comma-separated pair consisting of 'WordLength' and a numeric scalar.

The generated data type function defines a fixed-point object for each variable using the specified 'WordLength' value. If a variable requires a longer word length than the specified value, the software doubles the word length for the variable.

The optimal word length depends on your target hardware properties. When the specified word length is longer than the longest word size of your target hardware, the generated code contains multiword operations.

For details, see Fixed-Point Data Types (Fixed-Point Designer).

Example: 'WordLength',32

Data Types: single | double

Range of the output argument of the predict function, specified as the comma-separated pair consisting of 'OutputRange' and a numeric vector of two elements (minimum and maximum values of the output).

The 'OutputRange' value specifies the range of predicted class scores for a classification model and the range of predicted responses for a regression model. The following tables list the output arguments for which you can specify the range by using the 'OutputRange' name-value pair argument.

Classification Model

Modelpredict Function of ModelOutput Argument
Decision treepredictscore
Ensemble of decision treespredictscore
SVMpredictscore

Regression Model

Modelpredict Function of ModelOutput Argument
Decision treepredictYfit
Ensemble of decision treespredictYfit
SVMpredictyfit

When X contains a large number of observations and the range for the output argument is known, specify the 'OutputRange' value to reduce the amount of computation.

If you do not specify the 'OutputRange' value, then the software simulates the output range using the predictor data X and the predict function.

The software determines the span of numbers that the fixed-point data can represent by using the 'OutputRange' value and the 'PercentSafetyMargin' value.

Example: 'OutputRange',[0,1]

Data Types: single | double

Safety margin percentage, specified as the comma-separated pair consisting of 'PercentSafetyMargin' and a numeric scalar.

For each variable, the software simulates the range of the variable and adds the specified safety margin to determine the span of numbers that the fixed-point data can represent. Then, the software proposes the maximum fraction length that does not cause overflows.

Use caution when you specify the 'PercentSafetyMargin' value. If a variable range is large, then increasing the safety margin can cause underflow, because the software decreases fraction length to represent a larger range using a given word length.

Example: 'PercentSafetyMargin',15

Data Types: single | double

More About

collapse all

Data Type Function

Use the data type function generated by generateLearnerDataTypeFcn to create a structure that defines fixed-point data types for the variables required to generate fixed-point C/C++ code for prediction of a machine learning model. Use the output structure of the data type function as the input argument T of loadLearnerForCoder.

If filename is 'myMdl', then generateLearnerDataTypeFcn generates a data type function named myMdl_datatype. The myMdl_datatype function supports this syntax:

T = myMdl_datatype(dt)

T = myMdl_datatype(dt) returns a data type structure that defines data types for the variables required to generate fixed-point C/C++ code for prediction of a machine learning model.

Each field of T contains a fixed-point object returned by fi (Fixed-Point Designer). The input argument dt specifies the DataType property of the fixed-point object.

  • Specify dt as 'Fixed'(default) for fixed-point code generation.

  • Specify dt as 'Double' to simulate floating-point behavior of the fixed-point code.

Use the output structure T as the second input argument of loadLearnerForCoder.

The structure T contains the fields in the following table. These fields define the data types for the variables that directly influence the precision of the model. These variables, along with other named and internal variables, are required to run the predict function of the model.

DescriptionFields
Common fields for classification
  • XDataType (input)

  • ScoreDataType (output or internal variable) and TransformedScoreDataType (output)

    • If you train a model using the default 'ScoreTransform' value of 'none' or 'identity' (that is, you do not transform predicted scores), then the ScoreDataType field influences the precision of the output scores.

    • If you train a model using a value of 'ScoreTransform' other than 'none' or 'identity' (that is, you do transform predicted scores), then the ScoreDataType field influences the precision of the internal untransformed scores. The TransformedScoreDataType field influences the precision of the transformed output scores.

Common fields for regression
  • XDataType (input)

  • YFitDataType (output)

Additional fields for an ensemble of decision trees
  • WeakLearnerOutputDataType (internal variable) — Data type for outputs from weak learners.

  • AggregatedLearnerWeightsDataType (internal variable) — Data type for a weighted aggregate of the outputs from weak learners, applicable only if you train a model using bagging ('Method','bag'). The software computes predicted scores (ScoreDataType) by dividing the aggregate by the sum of learner weights.

Additional fields for SVM
  • XnormDataType (internal variable), applicable only if you train a model using 'Standardize' or 'KernelScale'

  • InnerProductDataType (internal variable)

The software proposes the maximum fraction length that does not cause overflows, based on the default word length (16) and safety margin (10%) for each variable.

The following code shows the data type function myMdl_datatype, generated by generateLearnerDataTypeFcn when filename is 'myMdl' and the model in the filename file is an SVM classifier.

function T = myMdl_datatype(dt)

if nargin < 1
	dt = 'Fixed';
end

% Set fixed-point math settings
fm = fimath('RoundingMethod','Floor', ...
    'OverflowAction','Wrap', ...
    'ProductMode','FullPrecision', ...
    'MaxProductWordLength',128, ...
    'SumMode','FullPrecision', ...
    'MaxSumWordLength',128);

% Data type for predictor data
T.XDataType = fi([],true,16,14,fm,'DataType',dt);

% Data type for output score
T.ScoreDataType = fi([],true,16,14,fm,'DataType',dt);

% Internal variables
% Data type of the squared distance dist = (x-sv)^2 for the Gaussian kernel G(x,sv) = exp(-dist),
% where x is the predictor data for an observation and sv is a support vector
T.InnerProductDataType = fi([],true,16,6,fm,'DataType',dt);

end

Tips

  • To improve the precision of the generated fixed-point code, you can tune the fixed-point data types. Modify the fixed-point data types by updating the data type function (myMdl_datatype) and creating a new structure, and then regenerate the code using the new structure. You can update the myMdl_datatype function in one of two ways:

    • Regenerate the myMdl_datatype function by using generateLearnerDataTypeFcn and its name-value pair arguments.

      • Increase the word length by using the 'WordLength' name-value pair argument.

      • Decrease the safety margin by using the 'PercentSafetyMargin' name-value pair argument.

      If you increase the word length or decrease the safety margin, the software can propose a longer fraction length, and therefore, improve the precision of the generated code based on the given data set.

    • Manually modify the fixed-point data types in the function file (myMdl_datatype.m). For each variable, you can tune the word length and fraction length and specify fixed-point math settings using a fimath (Fixed-Point Designer) object.

  • In the generated fixed-point code, a large number of operations or a large variable range can result in loss of precision, compared to the precision of the corresponding floating-point code. When training an SVM model, keep the following tips in mind to avoid loss of precision in the generated fixed-point code:

    • Data standardization ('Standardize') — To avoid overflows in the model property values of support vectors in an SVM model, you can standardize the predictor data. Instead of using the 'Standardize' name-value pair argument when training the model, standardize the predictor data before passing the data to the fitting function and the predict function so that the fixed-point code does not include the operations for the standardization.

    • Kernel function ('KernelFunction') — Using the Gaussian kernel or linear kernel is preferable to using a polynomial kernel. A polynomial kernel requires higher computational complexity than the other kernels, and the output of a polynomial kernel function is unbounded.

    • Kernel scale ('KernelScale') — Using a kernel scale requires additional operations if the value of 'KernelScale' is not 1.

    • The prediction of a one-class classification problem might have loss of precision if the predicted class score values have a large range.

Compatibility Considerations

expand all

Behavior changed in R2020a

See Also

| | (Fixed-Point Designer) | (Fixed-Point Designer) | (Fixed-Point Designer) | (MATLAB Coder)

Introduced in R2019b