Kernel (Covariance) Function Options

In supervised learning, it is expected that the points with similar predictor values $x_{i}$ , naturally have close response (target) values $y_{i}$ . In Gaussian processes, the covariance function expresses this similarity [1]. It specifies the covariance between the two latent variables $f (x_{i})$ and $f (x_{j})$ , where both $x_{i}$ and $x_{j}$ are d-by-1 vectors. In other words, it determines how the response at one point $x_{i}$ is affected by responses at other points $x_{j}$ , i ≠ j, i = 1, 2, ..., n. The covariance function $k (x_{i}, x_{j})$ can be defined by various kernel functions. It can be parameterized in terms of the kernel parameters in vector $θ$ . Hence, it is possible to express the covariance function as $k (x_{i}, x_{j} | θ)$ .

For many standard kernel functions, the kernel parameters are based on the signal standard deviation $σ_{f}$ and the characteristic length scale $σ_{l}$ . The characteristic length scales briefly define how far apart the input values $x_{i}$ can be for the response values to become uncorrelated. Both $σ_{l}$ and $σ_{f}$ need to be greater than 0, and this can be enforced by the unconstrained parametrization vector $θ$ , such that

$θ_{1} = \log σ_{l}, θ_{2} = \log σ_{f} .$

The built-in kernel (covariance) functions with same length scale for each predictor are:

Squared Exponential Kernel
This is one of the most commonly used covariance functions and is the default option for fitrgp. The squared exponential kernel function is defined as

$k (x_{i}, x_{j} | θ) = σ_{f}^{2} exp [- \frac{1}{2} \frac{{(x_{i} - x_{j})}^{T} (x_{i} - x_{j})}{σ_{l}^{2}}] .$
where $σ_{l}$ is the characteristic length scale, and $σ_{f}$ is the signal standard deviation.
Exponential Kernel
You can specify the exponential kernel function using the 'KernelFunction','exponential' name-value pair argument. This covariance function is defined by

$k (x_{i}, x_{j} | θ) = σ_{f}^{2} \exp (- \frac{r}{σ_{l}}),$
where $σ_{l}$ is the characteristic length scale and

$\begin{array}{l} r = \sqrt{{(x_{i} - x_{j})}^{T} (x_{i} - x_{j})} \end{array}$
is the Euclidean distance between $x_{i}$ and $x_{j}$ .
Matern 3/2
You can specify the Matern 3/2 kernel function using the 'KernelFunction','matern32' name-value pair argument. This covariance function is defined by

$\begin{array}{l} k (x_{i}, x_{j} | θ) = σ_{f}^{2} (1 + \frac{\sqrt{3} r}{σ_{l}}) exp (- \frac{\sqrt{3} r}{σ_{l}}) \end{array},$
where

$\begin{array}{l} r = \sqrt{{(x_{i} - x_{j})}^{T} (x_{i} - x_{j})} \end{array}$
is the Euclidean distance between $x_{i}$ and $x_{j}$ .
Matern 5/2
You can specify the Matern 5/2 kernel function using the 'KernelFunction','matern52' name-value pair argument. The Matern 5/2 covariance function is defined as

$\begin{array}{l} k (x_{i}, x_{j}) = σ_{f}^{2} (1 + \frac{\sqrt{5} r}{σ_{l}} + \frac{5 r^{2}}{3 σ_{l}^{2}}) exp (- \frac{\sqrt{5} r}{σ_{l}}) \end{array},$
where

$\begin{array}{l} r = \sqrt{{(x_{i} - x_{j})}^{T} (x_{i} - x_{j})} \end{array}$
is the Euclidean distance between $x_{i}$ and $x_{j}$ .
Rational Quadratic Kernel
You can specify the rational quadratic kernel function using the 'KernelFunction','rationalquadratic' name-value pair argument. This covariance function is defined by
$k (x_{i}, x_{j} | θ) = σ_{f}^{2} {(1 + \frac{r^{2}}{2 α σ_{l}^{2}})}^{- α},$
where $σ_{l}$ is the characteristic length scale, $α$ is a positive-valued scale-mixture parameter, and

$\begin{array}{l} r = \sqrt{{(x_{i} - x_{j})}^{T} (x_{i} - x_{j})} \end{array}$
is the Euclidean distance between $x_{i}$ and $x_{j}$ .

It is possible to use a separate length scale $σ_{m}^{}$ for each predictor m, m = 1, 2, ...,d. The built-in kernel (covariance) functions with a separate length scale for each predictor implement automatic relevance determination (ARD) [2]. The unconstrained parametrization $θ$ in this case is

$\begin{array}{l} θ_{m} = \log σ_{m}, for m = 1, 2, ..., d \\ θ_{d + 1} = \log σ_{f} . \end{array}$

The built-in kernel (covariance) functions with separate length scale for each predictor are:

ARD Squared Exponential Kernel
You can specify this kernel function using the 'KernelFunction','ardsquaredexponential' name-value pair argument. This covariance function is the squared exponential kernel function, with a separate length scale for each predictor. It is defined as

$k (x_{i}, x_{j} | θ) = σ_{f}^{2} exp [- \frac{1}{2} \sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}}] .$
ARD Exponential Kernel
You can specify this kernel function using the 'KernelFunction','ardexponential' name-value pair argument. This covariance function is the exponential kernel function, with a separate length scale for each predictor. It is defined as
$k (x_{i}, x_{j} | θ) = σ_{f}^{2} \exp (- r),$
where
$r = \sqrt{\sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}}} .$
ARD Matern 3/2
You can specify this kernel function using the 'KernelFunction','ardmatern32' name-value pair argument. This covariance function is the Matern 3/2 kernel function, with a different length scale for each predictor. It is defined as

$k (x_{i}, x_{j} | θ) = σ_{f}^{2} (1 + \sqrt{3} r) exp (- \sqrt{3} r),$
where

$r = \sqrt{\sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}}} .$
ARD Matern 5/2
You can specify this kernel function using the 'KernelFunction','ardmatern52' name-value pair argument. This covariance function is the Matern 5/2 kernel function, with a different length scale for each predictor. It is defined as

$\begin{array}{l} k (x_{i}, x_{j} | θ) = σ_{f}^{2} (1 + \sqrt{5} r + \frac{5}{3} r^{2}) exp (- \sqrt{5} r) \end{array},$
where

$r = \sqrt{\sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}}} .$
ARD Rational Quadratic Kernel
You can specify this kernel function using the 'KernelFunction','ardrationalquadratic' name-value pair argument. This covariance function is the rational quadratic kernel function, with a separate length scale for each predictor. It is defined as
$k (x_{i}, x_{j} | θ) = σ_{f}^{2} {(1 + \frac{1}{2 α} \sum_{m = 1}^{d} \frac{{(x_{i m} - x_{j m})}^{2}}{σ_{m}^{2}})}^{- α} .$

You can specify the kernel function using the KernelFunction name-value pair argument in a call to fitrgp. You can either specify one of the built-in kernel parameter options, or specify a custom function. When providing the initial kernel parameter values for a built-in kernel function, input the initial values for signal standard deviation and the characteristic length scale(s) as a numeric vector. When providing the initial kernel parameter values for a custom kernel function, input the initial values the unconstrained parametrization vector $θ$ . fitrgp uses analytical derivatives to estimate parameters when using a built-in kernel function, whereas when using a custom kernel function it uses numerical derivatives.

References

[1] Rasmussen, C. E. and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press. Cambridge, Massachusetts, 2006.

[2] Neal, R. M. Bayesian Learning for Neural Networks. Springer, New York. Lecture Notes in Statistics, 118, 1996.

Documentation

Kernel (Covariance) Function Options

References

See Also

Related Topics

Statistics and Machine Learning Toolbox Documentation

Support