Automatic differentiation makes it easier to create custom training loops, custom layers, and other deep learning customizations.
Generally, the simplest way to customize deep learning training is to create a dlnetwork
. Include the layers you want in the network. Then perform training in a
custom loop by using some sort of gradient descent, where the gradient is the gradient of the
objective function. The objective function can be classification error, cross-entropy, or any
other relevant scalar function of the network weights. See List of Functions with dlarray Support.
This example is a high-level version of a custom training loop. Here, f
is the objective function, such as loss, and g
is the gradient of the
objective function with respect to the weights in the network net
. The
update
function represents some type of gradient descent.
% High-level training loop n = 1; while (n < nmax) [f,g] = dlfeval(@model,net,dlX,t); net = update(net,g); n = n + 1; end
You call dlfeval
to compute the numeric value of the objective and
gradient. To enable the automatic computation of the gradient, the data dlX
must be a dlarray
.
dlX = dlarray(X);
The objective function has a dlgradient
call to calculate the
gradient. The dlgradient
call must be inside of the function that
dlfeval
evaluates.
function [f,g] = model(net,dlX,T) % Calculate objective using supported functions for dlarray y = forward(net,dlX); f = fcnvalue(y,T); % crossentropy or similar g = dlgradient(f,net.Learnables); % Automatic gradient end
For an example using a dlnetwork
with a simple
dlfeval
-dlgradient
-dlarray
syntax, see Grad-CAM Reveals the Why Behind Deep Learning Decisions. For a more complex example
using a custom training loop, see Train Generative Adversarial Network (GAN). For further details on
custom training using automatic differentiation, see Define Custom Training Loops, Loss Functions, and Networks.
dlgradient
and dlfeval
Together for Automatic DifferentiationTo use automatic differentiation, you must call dlgradient
inside a
function and evaluate the function using dlfeval
. Represent the point
where you take a derivative as a dlarray
object, which manages the data
structures and enables tracing of evaluation. For example, the Rosenbrock function is a common
test function for optimization.
function [f,grad] = rosenbrock(x) f = 100*(x(2) - x(1).^2).^2 + (1 - x(1)).^2; grad = dlgradient(f,x); end
Calculate the value and gradient of the Rosenbrock function at the point x0
= [–1,2]. To enable automatic differentiation in the Rosenbrock function, pass
x0
as a dlarray
.
x0 = dlarray([-1,2]); [fval,gradval] = dlfeval(@rosenbrock,x0)
fval = 1x1 dlarray 104 gradval = 1x2 dlarray 396 200
For an example using automatic differentiation, see Grad-CAM Reveals the Why Behind Deep Learning Decisions.
To evaluate a gradient numerically, a dlarray
constructs a data structure for
reverse mode differentiation, as described in Automatic Differentiation Background. This data structure is the
trace of the derivative computation. Keep in mind these guidelines
when using automatic differentiation and the derivative trace:
Do not introduce a new dlarray
inside of an objective function
calculation and attempt to differentiate with respect to that object. For example:
function [dy,dy1] = fun(x1) x2 = dlarray(0); y = x1 + x2; dy = dlgradient(y,x2); % Error: x2 is untraced dy1 = dlgradient(y,x1); % No error even though y has an untraced portion end
Do not use extractdata
with a traced argument. Doing so breaks the tracing. For example:
fun = @(x)dlgradient(x + atan(extractdata(x)),x);
% Gradient for any point is 1 due to the leading 'x' term in fun.
dlfeval(fun,dlarray(2.5))
ans = 1x1 dlarray 1
However, you can use extractdata
to introduce a new independent
variable from a dependent one.
Use only supported functions. See List of Functions with dlarray Support. To use an unsupported function f, try to implement f using supported functions.
You can evaluate gradients using automatic differentiation only for scalar-valued
functions. Intermediate calculations can have any number of variables, but the final
function value must be scalar. If you need to take derivatives of a vector-valued
function, take derivatives of one component at a time. In this case, consider setting the
dlgradient
'RetainData'
name-value pair argument to
true
.
A call to dlgradient
evaluates derivatives at a particular point. The software generally makes an arbitrary
choice for the value of a derivative when there is no theoretical value. For example, the
relu
function, relu(x) = max(x,0)
, is not
differentiable at x = 0
. However, dlgradient
returns a value for the derivative.
x = dlarray(0); y = dlfeval(@(t)dlgradient(relu(t),t),x)
y = 1x1 dlarray 0
The value at the nearby point eps
is different.
x = dlarray(eps); y = dlfeval(@(t)dlgradient(relu(t),t),x)
y = 1x1 dlarray 1
Currently, dlarray
does not allow higher order derivatives. In other
words, you cannot calculate a second derivative by calling dlgradient
twice.
dlarray
| dlfeval
| dlgradient
| dlnetwork