Superclasses: CompactRegressionGP
Gaussian process regression model class
RegressionGP
is a Gaussian process regression
(GPR) model. You can train a GPR model, using fitrgp
.
Using the trained model, you can
Predict responses for training data using resubPredict
or
new predictor data using predict
. You can also compute
the prediction intervals.
Compute the regression loss for training data using resubLoss
or
new data using loss
.
Create a RegressionGP
object by using fitrgp
.
FitMethod
— Method used to estimate the parameters'none'
| 'exact'
| 'sd'
| 'sr'
| 'fic'
Method used to estimate the basis function coefficients, β; noise standard deviation, σ; and kernel parameters, θ, of the GPR model, stored as a character vector. It can be one of the following.
Fit Method | Description |
---|---|
'none' | No estimation. fitrgp uses
the initial parameter values as the parameter values. |
'exact' | Exact Gaussian process regression. |
'sd' | Subset of data points approximation. |
'sr' | Subset of regressors approximation. |
'fic' | Fully independent conditional approximation. |
BasisFunction
— Explicit basis function'none'
| 'constant'
| 'linear'
| 'pureQuadratic'
| function handleExplicit basis function used in the GPR model, stored as a character vector or a function handle. It can be one of the following. If n is the number of observations, the basis function adds the term H*β to the model, where H is the basis matrix and β is a p-by-1 vector of basis coefficients.
Explicit Basis | Basis Matrix |
---|---|
'none' | Empty matrix. |
'constant' |
(n-by-1 vector of 1s, where n is the number of observations) |
'linear' |
|
'pureQuadratic' |
where |
Function handle | Function handle, where |
Data Types: char
| function_handle
Beta
— Estimated coefficientsEstimated coefficients for the explicit basis functions, stored
as a vector. You can define the explicit basis function by using the BasisFunction
name-value
pair argument in fitrgp
.
Data Types: double
Sigma
— Estimated noise standard deviationEstimated noise standard deviation of the GPR model, stored as a scalar value.
Data Types: double
CategoricalPredictors
— Indices of categorical predictorsCategorical predictor
indices, specified as a vector of positive integers. CategoricalPredictors
contains index values corresponding to the columns of the predictor data that contain
categorical predictors. If none of the predictors are categorical, then this property is empty
([]
).
Data Types: single
| double
HyperparameterOptimizationResults
— Cross-validation optimization of hyperparametersBayesianOptimization
object | tableThis property is read-only.
Cross-validation optimization of hyperparameters, specified as a BayesianOptimization
object or a table of hyperparameters and associated
values. This property is nonempty if the 'OptimizeHyperparameters'
name-value pair argument is nonempty when you create the model. The value of
HyperparameterOptimizationResults
depends on the setting of the
Optimizer
field in the
HyperparameterOptimizationOptions
structure when you create the
model.
Value of Optimizer Field | Value of HyperparameterOptimizationResults |
---|---|
'bayesopt' (default) | Object of class BayesianOptimization |
'gridsearch' or 'randomsearch' | Table of hyperparameters used, observed objective function values (cross-validation loss), and rank of observations from lowest (best) to highest (worst) |
LogLikelihood
— Maximized marginal log likelihood[]
Maximized marginal log likelihood of the GPR model, stored as
a scalar value if the FitMethod
is different
from 'none'
. If FitMethod
is 'none'
,
then LogLikelihood
is empty.
If FitMethod
is 'sd'
, 'sr'
,
or 'fic'
, then LogLikelihood
is
the maximized approximation of the marginal log likelihood of the
GPR model.
Data Types: double
ModelParameters
— Parameters used for training GPParams
objectParameters used for training the GPR model, stored as a GPParams
object.
KernelFunction
— Form of the covariance function'squaredExponential'
| 'matern32'
| 'matern52'
| 'ardsquaredexponential'
| 'ardmatern32'
| 'ardmatern52'
| function handleForm of the covariance function used in the GPR model, stored as a character vector containing the name of the built-in covariance function or a function handle. It can be one of the following.
Function | Description |
---|---|
'squaredexponential' | Squared exponential kernel. |
'matern32' | Matern kernel with parameter 3/2. |
'matern52' | Matern kernel with parameter 5/2. |
'ardsquaredexponential' | Squared exponential kernel with a separate length scale per predictor. |
'ardmatern32' | Matern kernel with parameter 3/2 and a separate length scale per predictor. |
'ardmatern52' | Matern kernel with parameter 5/2 and a separate length scale per predictor. |
Function handle | A function handle that fitrgp can call
like this:Kmn = kfcn(Xm,Xn,theta) where Xm is an m-by-d matrix, Xn is
an n-by-d matrix and Kmn is
an m-by-n matrix of kernel products
such that Kmn (i,j)
is the kernel product between Xm (i,:)
and Xn (j,:). theta is
the r-by-1 unconstrained parameter vector for kfcn . |
Data Types: char
| function_handle
KernelInformation
— Information about the parameters of the kernel functionInformation about the parameters of the kernel function used in the GPR model, stored as a structure with the following fields.
Field Name | Description |
---|---|
Name | Name of the kernel function |
KernelParameters | Vector of the estimated kernel parameters |
KernelParameterNames | Names associated with the elements of KernelParameters . |
Data Types: struct
PredictMethod
— Method used to make predictions'exact'
| 'bcd'
| 'sd'
| 'sr'
| 'fic'
Method that predict
uses to make predictions
from the GPR model, stored as a character vector. It can be one of
the following.
PredictMethod | Description |
---|---|
'exact' | Exact Gaussian process regression |
'bcd' | Block Coordinate Descent |
'sd' | Subset of Data points approximation |
'sr' | Subset of Regressors approximation |
'fic' | Fully Independent Conditional approximation |
Alpha
— Weights Weights used to make predictions from the trained GPR model,
stored as a numeric vector. predict
computes the
predictions for a new predictor matrix Xnew
by
using the product
is the matrix of kernel products between and active set vector A and α is a vector of weights.
Data Types: double
BCDInformation
— Information on BCD-based computation of Alpha
[]
Information on block coordinate descent (BCD)-based computation
of Alpha
when PredictMethod
is 'bcd'
,
stored as a structure containing the following fields.
Field Name | Description |
---|---|
Gradient | n-by-1 vector containing the gradient of the BCD objective function at convergence. |
Objective | Scalar containing the BCD objective function at convergence. |
SelectionCounts | n-by-1 integer vector indicating the number of times each point was selected into a block during BCD. |
Alpha
property contains the Alpha
vector
computed from BCD.
If PredictMethod
is not 'bcd'
,
then BCDInformation
is empty.
Data Types: struct
ResponseTransform
— Transformation applied to predicted response'none'
(default)Transformation applied to the predicted response, stored as a character vector describing how
the response values predicted by the model are transformed. In RegressionGP
, ResponseTransform
is
'none'
by default, and RegressionGP
does not use ResponseTransform
when
making predictions.
ActiveSetVectors
— Subset of training dataSubset of training data used to make predictions from the GPR model, stored as a matrix.
predict
computes the predictions for a new
predictor matrix Xnew
by using the product
is the matrix of kernel products between and active set vector A and α is a vector of weights.
ActiveSetVectors
is equal to the training
data X
for exact GPR fitting and a subset of
the training data X
for sparse GPR methods. When
there are categorical predictors in the model, ActiveSetVectors
contains
dummy variables for the corresponding predictors.
Data Types: double
ActiveSetHistory
— History of active set selection and parameter estimationHistory of interleaved active
set selection and parameter estimation for FitMethod
equal
to 'sd'
, 'sr'
, or 'fic'
,
stored as a structure with the following fields.
Field Name | Description |
---|---|
ParameterVector | Cell array containing the parameter vectors: basis function coefficients, β, kernel function parameters θ, and noise standard deviation σ. |
ActiveSetIndices | Cell array containing the active set indices. |
Loglikelihood | Vector containing the maximized log likelihoods. |
CriterionProfile | Cell array containing the active set selection criterion values as the active set grows from size 0 to its final size. |
Data Types: struct
ActiveSetMethod
— Method used to select the active set'sgma'
| 'entropy'
| 'likelihood'
| 'random'
Method used to select the active set for sparse methods ('sd'
,'sr'
,
or 'fic'
), stored as a character vector. It can
be one of the following.
ActiveSetMethod | Description |
---|---|
'sgma' | Sparse greedy matrix approximation |
'entropy' | Differential entropy-based selection |
'likelihood' | Subset of regressors log likelihood-based selection |
'random' | Random selection |
The selected active set is
used in parameter estimation or prediction, depending on the choice
of FitMethod
and PredictMethod
in fitrgp
.
ActiveSetSize
— Size of the active setSize of the active set for sparse methods ('sd'
,'sr'
,
or 'fic'
), stored as an integer value.
Data Types: double
IsActiveSetVector
— Indicators for selected active setIndicators for selected active set for making predictions from
the trained GPR model, stored as a logical vector. These indicators
mark the subset of training data that fitrgp
selects
as the active set. For example, if X
is the original
training data, then ActiveSetVectors = X(IsActiveSetVector,:)
.
Data Types: logical
NumObservations
— Number of observations in training dataNumber of observations in training data, stored as a scalar value.
Data Types: double
X
— Training dataTraining data, stored as an n-by-d table
or matrix, where n is the number of observations
and d is the number of predictor variables (columns)
in the training data. If the GPR model is trained on a table, then X
is
a table. Otherwise, X
is a matrix.
Data Types: double
| table
Y
— Observed response valuesObserved response values used to train the GPR model, stored as an n-by-1 vector, where n is the number of observations.
Data Types: double
PredictorNames
— Names of predictorsNames of predictors used in the GPR model, stored as a cell
array of character vectors. Each name (cell) corresponds to a column
in X
.
Data Types: cell
ExpandedPredictorNames
— Names of expanded predictorsNames of expanded predictors for the GPR model, stored as a
cell array of character vectors. Each name (cell) corresponds to a
column in ActiveSetVectors
.
If the model uses dummy variables for categorical variables,
then ExpandedPredictorNames
includes the names
that describe the expanded variables. Otherwise, ExpandedPredictorNames
is
the same as PredictorNames
.
Data Types: cell
ResponseName
— Name of the response variableName of the response variable in the GPR model, stored as a character vector.
Data Types: char
PredictorLocation
— Means of predictors[]
Means of predictors used for training the GPR model if the training
data is standardized, stored as a 1-by-d vector.
If the training data is not standardized, PredictorLocation
is
empty.
If PredictorLocation
is not empty, then
the predict
method
centers the predictor values by subtracting the respective element
of PredictorLocation
from every column of X
.
If there are categorical predictors, then PredictorLocation
includes
a 0 for each dummy variable corresponding to those predictors. The
dummy variables are not centered or scaled.
Data Types: double
PredictorScale
— Standard deviations of predictors[]
Standard deviations of predictors used for training the GPR
model if the training data is standardized, stored as a 1-by-d vector.
If the training data is not standardized, PredictorScale
is
empty.
If PredictorScale
is not empty, the predict
method
scales the predictors by dividing every column of X
by
the respective element of PredictorScale
(after
centering using PredictorLocation
).
If there are categorical predictors, then PredictorLocation
includes
a 1 for each dummy variable corresponding to those predictors. The
dummy variables are not centered or scaled.
Data Types: double
RowsUsed
— Indicators for rows used in training[]
Indicators for rows used in training the GPR model, stored as
a logical vector. If all rows are used in training the model, then RowsUsed
is
empty.
Data Types: logical
compact | Create compact Gaussian process regression model |
crossval | Cross-validate Gaussian process regression model |
loss | Regression error for Gaussian process regression model |
partialDependence | Compute partial dependence |
plotPartialDependence | Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots |
postFitStatistics | Compute post-fit statistics for the exact Gaussian process regression model |
predict | Predict response of Gaussian process regression model |
resubLoss | Resubstitution loss for a trained Gaussian process regression model |
resubPredict | Resubstitution prediction from a trained Gaussian process regression model |
For subset of data, subset of regressors, or fully independent
conditional approximation fitting methods (FitMethod
equal to 'sd'
, 'sr'
,
or 'fic'
), if you do not provide the active set, fitrgp
selects the active set and computes
the parameter estimates in a series of iterations.
In the first iteration, the software uses the initial parameter values in vector η0 = [β0,σ0,θ0] to select an active set A1. It maximizes the GPR marginal log likelihood or its approximation using η0 as the initial values and A1 to compute the new parameter estimates η1. Next, it computes the new log likelihood L1 using η1 and A1.
In the second iteration, the software selects the active set A2 using the parameter values in η1. Then, using η1 as the initial values and A2, it maximizes the GPR marginal log likelihood or its approximation and estimates the new parameter values η2. Then using η2 and A2, computes the new log likelihood value L2.
The following table summarizes the iterations and what is computed at each iteration.
Iteration Number | Active Set | Parameter Vector | Log Likelihood |
---|---|---|---|
1 | A1 | η1 | L1 |
2 | A2 | η2 | L2 |
3 | A3 | η3 | L3 |
… | … | … | … |
The software iterates similarly for a specified number of repetitions.
You can specify the number of replications for active set selection
using the NumActiveSetRepeats
name-value
pair argument.
You can access the properties of this class using
dot notation. For example, KernelInformation
is
a structure holding the kernel parameters and their names. Hence,
to access the kernel function parameters of the trained model gprMdl
,
use gprMdl.KernelInformation.KernelParameters
.
Usage notes and limitations:
The predict
function supports code
generation.
When you train a Gaussian process regression model by using fitrgp
and you supply training data in a table, the predictors must
be numeric (double
or single
). Code generation
does not support categorical predictors (logical
,
categorical
, char
,
string
, or cell
). Also, you cannot use the
'CategoricalPredictors'
name-value pair argument. To include categorical predictors in a model,
preprocess the categorical predictors by using dummyvar
before fitting the
model.
For more information, see Introduction to Code Generation.