In supervised learning, it is expected that the points with similar predictor values , naturally have close response (target) values . In Gaussian processes, the covariance function expresses this similarity [1]. It specifies the covariance between the two latent variables and , where both and are d-by-1 vectors. In other words, it determines how the response at one point is affected by responses at other points , i ≠ j, i = 1, 2, ..., n. The covariance function can be defined by various kernel functions. It can be parameterized in terms of the kernel parameters in vector . Hence, it is possible to express the covariance function as .
For many standard kernel functions, the kernel parameters are based on the signal standard deviation and the characteristic length scale . The characteristic length scales briefly define how far apart the input values can be for the response values to become uncorrelated. Both and need to be greater than 0, and this can be enforced by the unconstrained parametrization vector , such that
The built-in kernel (covariance) functions with same length scale for each predictor are:
Squared Exponential Kernel
This is one of the most commonly used covariance functions and
is the default option for fitrgp
. The squared
exponential kernel function is defined as
where is the characteristic length scale, and is the signal standard deviation.
Exponential Kernel
You can specify the exponential kernel function using the 'KernelFunction','exponential'
name-value
pair argument. This covariance function is defined by
where is the characteristic length scale and
is the Euclidean distance between and .
Matern 3/2
You can specify the Matern 3/2 kernel function using the 'KernelFunction','matern32'
name-value
pair argument. This covariance function is defined by
where
is the Euclidean distance between and .
Matern 5/2
You can specify the Matern 5/2 kernel function using the 'KernelFunction','matern52'
name-value
pair argument. The Matern 5/2 covariance function is defined as
where
is the Euclidean distance between and .
Rational Quadratic Kernel
You can specify the rational quadratic kernel function using
the 'KernelFunction','rationalquadratic'
name-value
pair argument. This covariance function is defined by
where is the characteristic length scale, is a positive-valued scale-mixture parameter, and
is the Euclidean distance between and .
It is possible to use a separate length scale for each predictor m, m = 1, 2, ...,d. The built-in kernel (covariance) functions with a separate length scale for each predictor implement automatic relevance determination (ARD) [2]. The unconstrained parametrization in this case is
The built-in kernel (covariance) functions with separate length scale for each predictor are:
ARD Squared Exponential Kernel
You can specify this kernel function using the 'KernelFunction','ardsquaredexponential'
name-value
pair argument. This covariance function is the squared exponential
kernel function, with a separate length scale for each predictor.
It is defined as
ARD Exponential Kernel
You can specify this kernel function using the 'KernelFunction','ardexponential'
name-value
pair argument. This covariance function is the exponential kernel
function, with a separate length scale for each predictor. It is defined
as
where
ARD Matern 3/2
You can specify this kernel function using the 'KernelFunction','ardmatern32'
name-value
pair argument. This covariance function is the Matern 3/2 kernel function,
with a different length scale for each predictor. It is defined as
where
ARD Matern 5/2
You can specify this kernel function using the 'KernelFunction','ardmatern52'
name-value
pair argument. This covariance function is the Matern 5/2 kernel function,
with a different length scale for each predictor. It is defined as
where
ARD Rational Quadratic Kernel
You can specify this kernel function using the 'KernelFunction','ardrationalquadratic'
name-value
pair argument. This covariance function is the rational quadratic
kernel function, with a separate length scale for each predictor.
It is defined as
You can specify the kernel function using the KernelFunction
name-value
pair argument in a call to fitrgp
. You can either
specify one of the built-in kernel parameter options, or specify a
custom function. When providing the initial kernel parameter values
for a built-in kernel function, input the initial values for signal
standard deviation and the characteristic length scale(s) as a numeric
vector. When providing the initial kernel parameter values for a custom
kernel function, input the initial values the unconstrained parametrization
vector . fitrgp
uses analytical derivatives to estimate
parameters when using a built-in kernel function, whereas when using
a custom kernel function it uses numerical derivatives.
[1] Rasmussen, C. E. and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press. Cambridge, Massachusetts, 2006.
[2] Neal, R. M. Bayesian Learning for Neural Networks. Springer, New York. Lecture Notes in Statistics, 118, 1996.