Define Custom Training Loops, Loss Functions, and Networks

For most deep learning tasks, you can use a pretrained network and adapt it to your own data. For an example showing how to use transfer learning to retrain a convolutional neural network to classify a new set of images, see Train Deep Learning Network to Classify New Images. Alternatively, you can create and train networks from scratch using layerGraph objects with the trainNetwork and trainingOptions functions.

If the trainingOptions function does not provide the training options that you need for your task, then you can create a custom training loop using automatic differentiation. To learn more, see Define Custom Training Loops.

If Deep Learning Toolbox™ does not provide the layers you need for your task (including output layers that specify loss functions), then you can create a custom layer. To learn more, see Define Custom Deep Learning Layers. For loss functions that cannot be specified using an output layer, you can specify the loss in a custom training loop. To learn more, see Specify Loss Functions. For networks that cannot be created using layer graphs, you can define custom networks as a function. To learn more, see Define Custom Networks.

Custom training loops, loss functions, and networks use automatic differentiation to automatically compute the model gradients. To learn more, see Automatic Differentiation Background.

Define Custom Training Loops

For most tasks, you can control the training algorithm details using the trainingOptions and trainNetwork functions. If the trainingOptions function does not provide the options you need for your task (for example, a custom learn rate schedule), then you can define your own custom training loop using automatic differentiation.

For an example showing how to train a network with a custom learn rate schedule, see Train Network Using Custom Training Loop.

Update Learnable Parameters Using Automatic Differentiation

To update the learnable parameters, you must first calculate the gradients of the loss with respect to the learnable parameters.

Create a function of the form gradients = modelGradients(dlnet,dlX,dlT), where dlnet is the network, dlX are the input predictors, dlT are the targets, and gradients are the returned gradients. Optionally, you can pass extra arguments to the gradients function (for example, if the loss function requires extra information), or return extra arguments (for example, metrics for plotting the training progress). For models defined as a function, you do not need to pass a network as an input argument.

To use automatic differentiation, you call dlgradient to compute the gradient of the function, and dlfeval to set up or update the computational graph. These functions use a dlarray to manage the data structures and enable tracing of evaluation.

To update the network weights, you can use the following functions:

FunctionDescription
adamupdate Update parameters using adaptive moment estimation (Adam)
rmspropupdate Update parameters using root mean squared propagation (RMSProp)
sgdmupdate Update parameters using stochastic gradient descent with momentum (SGDM)
dlupdate Update parameters using custom function

For an example showing how to create a model gradients function to train a generative adversarial network (GAN) that generates images, see Train Generative Adversarial Network (GAN).

Specify Loss Functions

When using dlnetwork objects, do not use output layers, instead you must calculate the loss manually in the model gradients function. You can use the following functions to compute the loss:

FunctionDescription
softmax The softmax activation operation applies the softmax function to the channel dimension of the input data.
sigmoid The sigmoid activation operation applies the sigmoid function to the input data.
crossentropy The cross-entropy operation computes the cross-entropy loss between network predictions and target values for single-label and multi-label classification tasks.
mse The half mean squared error operation computes the half mean squared error loss between network predictions and target values for regression tasks.

Alternatively, you can use custom loss function by creating a function of the form loss = myLoss(Y,T), where Y is the network predictions, T are the targets, and loss is the returned loss.

Use the loss value when computing gradients for updating the network weights.

For an example showing how to create a model gradients function to train a generative adversarial network (GAN) that generates images using a custom loss function, see Train Generative Adversarial Network (GAN).

Define Custom Networks

For most tasks, you can use a pretrained network or define your own network as a layer graph. To learn more about pretrained networks, see Pretrained Deep Neural Networks. For a list of layers supported by dlnetwork objects, see Supported Layers.

For architectures that cannot be created using layer graphs, you can define a custom model as a function of the form [dlY1,...,dlYM] = model(dlX1,...,dlXN,parameters), where dlX1,...,dlXN correspond to the input data for the N model inputs, parameters contains the network parameters, and dlY1,...,dlYM correspond to the M model outputs. To train a custom network, use a custom training loop

If you define a custom network as a function, then the model function must support automatic differentiation. You can use the following deep learning operations. The functions listed here are only a subset. For a complete list of functions that support dlarray input, see List of Functions with dlarray Support.

FunctionDescription
avgpoolThe average pooling operation performs downsampling by dividing the input into pooling regions and computing the average value of each region.
batchnormThe batch normalization operation normalizes each input channel across a mini-batch. To speed up training of convolutional neural networks and reduce the sensitivity to network initialization, use batch normalization between convolution and nonlinear operations such as relu.
crossentropyThe cross-entropy operation computes the cross-entropy loss between network predictions and target values for single-label and multi-label classification tasks.
crosschannelnormThe cross-channel normalization operation uses local responses in different channels to normalize each activation. Cross-channel normalization typically follows a relu operation. Cross-channel normalization is also known as local response normalization.
dlconvThe convolution operation applies sliding filters to the input data. Use 1-D and 2-D filters with ungrouped or grouped convolutions and 3-D filters with ungrouped convolutions.
dltranspconvThe transposed convolution operation upsamples feature maps.
fullyconnectThe fully connect operation multiplies the input by a weight matrix and then adds a bias vector.
gruThe gated recurrent unit (GRU) operation allows a network to learn dependencies between time steps in time series and sequence data.
leakyreluThe leaky rectified linear unit (ReLU) activation operation performs a nonlinear threshold operation, where any input value less than zero is multiplied by a fixed scale factor.
lstmThe long short-term memory (LSTM) operation allows a network to learn long-term dependencies between time steps in time series and sequence data.
maxpoolThe maximum pooling operation performs downsampling by dividing the input into pooling regions and computing the maximum value of each region.
maxunpoolThe maximum unpooling operation unpools the output of a maximum pooling operation by upsampling and padding with zeros.
mseThe half mean squared error operation computes the half mean squared error loss between network predictions and target values for regression tasks.
reluThe rectified linear unit (ReLU) activation operation performs a nonlinear threshold operation, where any input value less than zero is set to zero.
sigmoidThe sigmoid activation operation applies the sigmoid function to the input data.
softmaxThe softmax activation operation applies the softmax function to the channel dimension of the input data.

See Also

| | |

Related Topics