trainNetwork

Train neural network for deep learning

Description

For classification and regression tasks, you can use trainNetwork to train a convolutional neural network (ConvNet, CNN) for image data, a recurrent neural network (RNN) such as a long short-term memory (LSTM) or a gated recurrent unit (GRU) network for sequence data, or a multi-layer perceptron (MLP) network for numeric feature data. You can train on either a CPU or a GPU. For image classification and image regression, you can train using multiple GPUs or in parallel. Using GPU, multi-GPU, and parallel options requires Parallel Computing Toolbox™. To use a GPU for deep learning, you must also have a CUDA® enabled NVIDIA® GPU with compute capability 3.0 or higher. To specify training options, including options for the execution environment, use the trainingOptions function.

example

net = trainNetwork(imds,layers,options) trains a network specified by layers for image classification tasks using the images and labels in the image datastore imds and the training options defined by options.

example

net = trainNetwork(ds,layers,options) trains a network using the data returned by the datastore ds. For networks with multiple inputs, use this syntax with a datastore that returns multiple columns of data, such as a combined datastore.

example

net = trainNetwork(X,Y,layers,options) trains a network using the image or feature data specified by the numeric array X with categorical or numeric responses specified by Y.

example

net = trainNetwork(sequences,Y,layers,options) trains a recurrent network (for example, an LSTM or GRU network) for the sequence data specified by sequences and responses specified by Y.

example

net = trainNetwork(tbl,layers,options) trains a network using the data in the table tbl.

net = trainNetwork(tbl,responseNames,layers,options) trains a network using the data in the table tbl and specifies the table columns containing the responses.

[net,info] = trainNetwork(___) also returns information on the training using any of the previous syntaxes.

Examples

collapse all

Load the data as an ImageDatastore object.

digitDatasetPath = fullfile(matlabroot,'toolbox','nnet', ...
    'nndemos','nndatasets','DigitDataset');
imds = imageDatastore(digitDatasetPath, ...
    'IncludeSubfolders',true, ...
    'LabelSource','foldernames');

The datastore contains 10,000 synthetic images of digits from 0 to 9. The images are generated by applying random transformations to digit images created with different fonts. Each digit image is 28-by-28 pixels. The datastore contains an equal number of images per category.

Display some of the images in the datastore.

figure
numImages = 10000;
perm = randperm(numImages,20);
for i = 1:20
    subplot(4,5,i);
    imshow(imds.Files{perm(i)});
    drawnow;
end

Divide the datastore so that each category in the training set has 750 images and the testing set has the remaining images from each label.

numTrainingFiles = 750;
[imdsTrain,imdsTest] = splitEachLabel(imds,numTrainingFiles,'randomize');

splitEachLabel splits the image files in digitData into two new datastores, imdsTrain and imdsTest.

Define the convolutional neural network architecture.

layers = [ ...
    imageInputLayer([28 28 1])
    convolution2dLayer(5,20)
    reluLayer
    maxPooling2dLayer(2,'Stride',2)
    fullyConnectedLayer(10)
    softmaxLayer
    classificationLayer];

Set the options to the default settings for the stochastic gradient descent with momentum. Set the maximum number of epochs at 20, and start the training with an initial learning rate of 0.0001.

options = trainingOptions('sgdm', ...
    'MaxEpochs',20,...
    'InitialLearnRate',1e-4, ...
    'Verbose',false, ...
    'Plots','training-progress');

Train the network.

net = trainNetwork(imdsTrain,layers,options);

Run the trained network on the test set, which was not used to train the network, and predict the image labels (digits).

YPred = classify(net,imdsTest);
YTest = imdsTest.Labels;

Calculate the accuracy. The accuracy is the ratio of the number of true labels in the test data matching the classifications from classify to the number of images in the test data.

accuracy = sum(YPred == YTest)/numel(YTest)
accuracy = 0.9420

Train a convolutional neural network using augmented image data. Data augmentation helps prevent the network from overfitting and memorizing the exact details of the training images.

Load the sample data, which consists of synthetic images of handwritten digits.

[XTrain,YTrain] = digitTrain4DArrayData;

digitTrain4DArrayData loads the digit training set as 4-D array data. XTrain is a 28-by-28-by-1-by-5000 array, where:

  • 28 is the height and width of the images.

  • 1 is the number of channels.

  • 5000 is the number of synthetic images of handwritten digits.

YTrain is a categorical vector containing the labels for each observation.

Set aside 1000 of the images for network validation.

idx = randperm(size(XTrain,4),1000);
XValidation = XTrain(:,:,:,idx);
XTrain(:,:,:,idx) = [];
YValidation = YTrain(idx);
YTrain(idx) = [];

Create an imageDataAugmenter object that specifies preprocessing options for image augmentation, such as resizing, rotation, translation, and reflection. Randomly translate the images up to three pixels horizontally and vertically, and rotate the images with an angle up to 20 degrees.

imageAugmenter = imageDataAugmenter( ...
    'RandRotation',[-20,20], ...
    'RandXTranslation',[-3 3], ...
    'RandYTranslation',[-3 3])
imageAugmenter = 
  imageDataAugmenter with properties:

           FillValue: 0
     RandXReflection: 0
     RandYReflection: 0
        RandRotation: [-20 20]
           RandScale: [1 1]
          RandXScale: [1 1]
          RandYScale: [1 1]
          RandXShear: [0 0]
          RandYShear: [0 0]
    RandXTranslation: [-3 3]
    RandYTranslation: [-3 3]

Create an augmentedImageDatastore object to use for network training and specify the image output size. During training, the datastore performs image augmentation and resizes the images. The datastore augments the images without saving any images to memory. trainNetwork updates the network parameters and then discards the augmented images.

imageSize = [28 28 1];
augimds = augmentedImageDatastore(imageSize,XTrain,YTrain,'DataAugmentation',imageAugmenter);

Specify the convolutional neural network architecture.

layers = [
    imageInputLayer(imageSize)
    
    convolution2dLayer(3,8,'Padding','same')
    batchNormalizationLayer
    reluLayer   
    
    maxPooling2dLayer(2,'Stride',2)
    
    convolution2dLayer(3,16,'Padding','same')
    batchNormalizationLayer
    reluLayer   
    
    maxPooling2dLayer(2,'Stride',2)
    
    convolution2dLayer(3,32,'Padding','same')
    batchNormalizationLayer
    reluLayer   
    
    fullyConnectedLayer(10)
    softmaxLayer
    classificationLayer];

Specify training options for stochastic gradient descent with momentum.

opts = trainingOptions('sgdm', ...
    'MaxEpochs',15, ...
    'Shuffle','every-epoch', ...
    'Plots','training-progress', ...
    'Verbose',false, ...
    'ValidationData',{XValidation,YValidation});

Train the network. Because the validation images are not augmented, the validation accuracy is higher than the training accuracy.

net = trainNetwork(augimds,layers,opts);

Load the sample data, which consists of synthetic images of handwritten digits. The third output contains the corresponding angles in degrees by which each image has been rotated.

Load the training images as 4-D arrays using digitTrain4DArrayData. The output XTrain is a 28-by-28-by-1-by-5000 array, where:

  • 28 is the height and width of the images.

  • 1 is the number of channels.

  • 5000 is the number of synthetic images of handwritten digits.

YTrain contains the rotation angles in degrees.

[XTrain,~,YTrain] = digitTrain4DArrayData;

Display 20 random training images using imshow.

figure
numTrainImages = numel(YTrain);
idx = randperm(numTrainImages,20);
for i = 1:numel(idx)
    subplot(4,5,i)    
    imshow(XTrain(:,:,:,idx(i)))
    drawnow;
end

Specify the convolutional neural network architecture. For regression problems, include a regression layer at the end of the network.

layers = [ ...
    imageInputLayer([28 28 1])
    convolution2dLayer(12,25)
    reluLayer
    fullyConnectedLayer(1)
    regressionLayer];

Specify the network training options. Set the initial learn rate to 0.001.

options = trainingOptions('sgdm', ...
    'InitialLearnRate',0.001, ...
    'Verbose',false, ...
    'Plots','training-progress');

Train the network.

net = trainNetwork(XTrain,YTrain,layers,options);

Test the performance of the network by evaluating the prediction accuracy of the test data. Use predict to predict the angles of rotation of the validation images.

[XTest,~,YTest] = digitTest4DArrayData;
YPred = predict(net,XTest);

Evaluate the performance of the model by calculating the root-mean-square error (RMSE) of the predicted and actual angles of rotation.

rmse = sqrt(mean((YTest - YPred).^2))
rmse = single
    6.0356

Train a deep learning LSTM network for sequence-to-label classification.

Load the Japanese Vowels data set as described in [1] and [2]. XTrain is a cell array containing 270 sequences of varying length with 12 features corresponding to LPC cepstrum coefficients. Y is a categorical vector of labels 1,2,...,9. The entries in XTrain are matrices with 12 rows (one row for each feature) and a varying number of columns (one column for each time step).

[XTrain,YTrain] = japaneseVowelsTrainData;

Visualize the first time series in a plot. Each line corresponds to a feature.

figure
plot(XTrain{1}')
title("Training Observation 1")
numFeatures = size(XTrain{1},1);
legend("Feature " + string(1:numFeatures),'Location','northeastoutside')

Define the LSTM network architecture. Specify the input size as 12 (the number of features of the input data). Specify an LSTM layer to have 100 hidden units and to output the last element of the sequence. Finally, specify nine classes by including a fully connected layer of size 9, followed by a softmax layer and a classification layer.

inputSize = 12;
numHiddenUnits = 100;
numClasses = 9;

layers = [ ...
    sequenceInputLayer(inputSize)
    lstmLayer(numHiddenUnits,'OutputMode','last')
    fullyConnectedLayer(numClasses)
    softmaxLayer
    classificationLayer]
layers = 
  5×1 Layer array with layers:

     1   ''   Sequence Input          Sequence input with 12 dimensions
     2   ''   LSTM                    LSTM with 100 hidden units
     3   ''   Fully Connected         9 fully connected layer
     4   ''   Softmax                 softmax
     5   ''   Classification Output   crossentropyex

Specify the training options. Specify the solver as 'adam' and 'GradientThreshold' as 1. Set the mini-batch size to 27 and set the maximum number of epochs to 70.

Because the mini-batches are small with short sequences, the CPU is better suited for training. Set 'ExecutionEnvironment' to 'cpu'. To train on a GPU, if available, set 'ExecutionEnvironment' to 'auto' (the default value).

maxEpochs = 70;
miniBatchSize = 27;

options = trainingOptions('adam', ...
    'ExecutionEnvironment','cpu', ...
    'MaxEpochs',maxEpochs, ...
    'MiniBatchSize',miniBatchSize, ...
    'GradientThreshold',1, ...
    'Verbose',false, ...
    'Plots','training-progress');

Train the LSTM network with the specified training options.

net = trainNetwork(XTrain,YTrain,layers,options);

Load the test set and classify the sequences into speakers.

[XTest,YTest] = japaneseVowelsTestData;

Classify the test data. Specify the same mini-batch size used for training.

YPred = classify(net,XTest,'MiniBatchSize',miniBatchSize);

Calculate the classification accuracy of the predictions.

acc = sum(YPred == YTest)./numel(YTest)
acc = 0.9514

If you have a data set of numeric features (for example a collection of numeric data without spatial or time dimensions), then you can train a deep learning network using a feature input layer.

Read the transmission casing data from the CSV file "transmissionCasingData.csv".

filename = "transmissionCasingData.csv";
tbl = readtable(filename,'TextType','String');

Convert the labels for prediction to categorical using the convertvars function.

labelName = "GearToothCondition";
tbl = convertvars(tbl,labelName,'categorical');

To train a network using categorical features, you must first convert the categorical features to numeric. First, convert the categorical predictors to categorical using the convertvars function by specifying a string array containing the names of all the categorical input variables. In this data set, there are two categorical features with names "SensorCondition" and "ShaftCondition".

categoricalInputNames = ["SensorCondition" "ShaftCondition"];
tbl = convertvars(tbl,categoricalInputNames,'categorical');

Loop over the categorical input variables. For each variable:

  • Convert the categorical values to one-hot encoded vectors using the onehotencode function.

  • Add the one-hot vectors to the table using the addvars function. Specify to insert the vectors after the column containing the corresponding categorical data.

  • Remove the corresponding column containing the categorical data.

for i = 1:numel(categoricalInputNames)
    name = categoricalInputNames(i);
    oh = onehotencode(tbl(:,name));
    tbl = addvars(tbl,oh,'After',name);
    tbl(:,name) = [];
end

Split the vectors into separate columns using the splitvars function.

tbl = splitvars(tbl);

View the first few rows of the table. Notice that the categorical predictors have been split into multiple columns with the categorical values as the variable names.

head(tbl)
ans=8×23 table
    SigMean     SigMedian    SigRMS    SigVar     SigPeak    SigPeak2Peak    SigSkewness    SigKurtosis    SigCrestFactor    SigMAD     SigRangeCumSum    SigCorrDimension    SigApproxEntropy    SigLyapExponent    PeakFreq    HighFreqPower    EnvPower    PeakSpecKurtosis    No Sensor Drift    Sensor Drift    No Shaft Wear    Shaft Wear    GearToothCondition
    ________    _________    ______    _______    _______    ____________    ___________    ___________    ______________    _______    ______________    ________________    ________________    _______________    ________    _____________    ________    ________________    _______________    ____________    _____________    __________    __________________

    -0.94876     -0.9722     1.3726    0.98387    0.81571       3.6314        -0.041525       2.2666           2.0514         0.8081        28562              1.1429             0.031581            79.931            0          6.75e-06       3.23e-07         162.13                0                1                1              0           No Tooth Fault  
    -0.97537    -0.98958     1.3937    0.99105    0.81571       3.6314        -0.023777       2.2598           2.0203        0.81017        29418              1.1362             0.037835            70.325            0          5.08e-08       9.16e-08         226.12                0                1                1              0           No Tooth Fault  
      1.0502      1.0267     1.4449    0.98491     2.8157       3.6314         -0.04162       2.2658           1.9487        0.80853        31710              1.1479             0.031565            125.19            0          6.74e-06       2.85e-07         162.13                0                1                0              1           No Tooth Fault  
      1.0227      1.0045     1.4288    0.99553     2.8157       3.6314        -0.016356       2.2483           1.9707        0.81324        30984              1.1472             0.032088             112.5            0          4.99e-06        2.4e-07         162.13                0                1                0              1           No Tooth Fault  
      1.0123      1.0024     1.4202    0.99233     2.8157       3.6314        -0.014701       2.2542           1.9826        0.81156        30661              1.1469              0.03287            108.86            0          3.62e-06       2.28e-07         230.39                0                1                0              1           No Tooth Fault  
      1.0275      1.0102     1.4338     1.0001     2.8157       3.6314         -0.02659       2.2439           1.9638        0.81589        31102              1.0985             0.033427            64.576            0          2.55e-06       1.65e-07         230.39                0                1                0              1           No Tooth Fault  
      1.0464      1.0275     1.4477     1.0011     2.8157       3.6314        -0.042849       2.2455           1.9449        0.81595        31665              1.1417             0.034159            98.838            0          1.73e-06       1.55e-07         230.39                0                1                0              1           No Tooth Fault  
      1.0459      1.0257     1.4402    0.98047     2.8157       3.6314        -0.035405       2.2757            1.955        0.80583        31554              1.1345               0.0353            44.223            0          1.11e-06       1.39e-07         230.39                0                1                0              1           No Tooth Fault  

View the class names of the data set.

classNames = categories(tbl{:,labelName})
classNames = 2×1 cell
    {'No Tooth Fault'}
    {'Tooth Fault'   }

Next, partition the data set into training and test partitions. Set aside 15% of the data for testing.

Determine the number of observations for each partition.

numObservations = size(tbl,1);
numObservationsTrain = floor(0.85*numObservations);
numObservationsTest = numObservations - numObservationsTrain;

Create an array of random indices corresponding to the observations and partition it using the partition sizes.

idx = randperm(numObservations);
idxTrain = idx(1:numObservationsTrain);
idxTest = idx(numObservationsTrain+1:end);

Partition the table of data into training, validation, and testing partitions using the indices.

tblTrain = tbl(idxTrain,:);
tblTest = tbl(idxTest,:);

Define a network with a feature input layer and specify the number of features. Also, configure the input layer to normalize the data using Z-score normalization.

numFeatures = size(tbl,2) - 1;
numClasses = numel(classNames);
 
layers = [
    featureInputLayer(numFeatures,'Normalization', 'zscore')
    fullyConnectedLayer(50)
    batchNormalizationLayer
    reluLayer
    fullyConnectedLayer(numClasses)
    softmaxLayer
    classificationLayer];

Specify the training options.

miniBatchSize = 16;

options = trainingOptions('adam', ...
    'MiniBatchSize',miniBatchSize, ...
    'Shuffle','every-epoch', ...
    'Plots','training-progress', ...
    'Verbose',false);

Train the network using the architecture defined by layers, the training data, and the training options.

net = trainNetwork(tblTrain,layers,options);

Predict the labels of the test data using the trained network and calculate the accuracy. The accuracy is the proportion of the labels that the network predicts correctly.

YPred = classify(net,tblTest,'MiniBatchSize',miniBatchSize);
YTest = tblTest{:,labelName};

accuracy = sum(YPred == YTest)/numel(YTest)
accuracy = 0.9688

Input Arguments

collapse all

Image datastore containing images and labels, specified as an ImageDatastore object.

Create an image datastore using the imageDatastore function. To use the names of the folders containing the images as labels, set the 'LabelSource' option to 'foldernames'. Alternatively, specify the labels manually using the Labels property of the image datastore.

The trainNetwork function supports image datastores for image classification networks only. To use image datastores for regression networks, create a transformed or combined datastore using the transform and combine functions. For more information, see the ds input argument.

ImageDatastore allows batch reading of JPG or PNG image files using prefetching. If you use a custom function for reading the images, then ImageDatastore does not prefetch.

Tip

Use augmentedImageDatastore for efficient preprocessing of images for deep learning including image resizing.

Do not use the readFcn option of imageDatastore for preprocessing or resizing as this option is usually significantly slower.

Datastore for out-of-memory data and preprocessing.

The table below lists the datastores that are directly compatible with trainNetwork. You can use other built-in datastores for training deep learning networks by using the transform and combine functions. These functions can convert the data read from datastores to the table or cell array format required by trainNetwork. For networks with multiple inputs, the datastore must be a combined or transformed datastore, or a custom mini-batch datastore. For more information, see Datastores for Deep Learning.

Type of DatastoreDescription
CombinedDatastoreHorizontally concatenate the data read from two or more underlying datastores.
TransformedDatastoreTransform batches of read data from an underlying datastore according to your own preprocessing pipeline.
AugmentedImageDatastoreApply random affine geometric transformations, including resizing, rotation, reflection, shear, and translation, for training deep neural networks.
PixelLabelImageDatastore (Computer Vision Toolbox)Apply identical affine geometric transformations to images and corresponding ground truth labels for training semantic segmentation networks (requires Computer Vision Toolbox™).
RandomPatchExtractionDatastore (Image Processing Toolbox)Extract pairs of random patches from images or pixel label images (requires Image Processing Toolbox™). You optionally can apply identical random affine geometric transformations to the pairs of patches.
DenoisingImageDatastore (Image Processing Toolbox)Apply randomly generated Gaussian noise for training denoising networks (requires Image Processing Toolbox).
Custom mini-batch datastoreCreate mini-batches of sequence, time series, text, or feature data. For details, see Develop Custom Mini-Batch Datastore.

The datastore must return data in a table or a cell array. The format of the datastore output depends on the network architecture.

Network ArchitectureDatastore OutputExample Output
Single input layer

Table or cell array with two columns.

The first and second columns specify the predictors and responses, respectively.

Table elements must be scalars, row vectors, or 1-by-1 cell arrays containing a numeric array.

Custom mini-batch datastores must output tables.

data = read(ds)
data =

  4×2 table

        Predictors        Response
    __________________    ________

    {224×224×3 double}       2    
    {224×224×3 double}       7    
    {224×224×3 double}       9    
    {224×224×3 double}       9  
data = read(ds)
data =

  4×2 cell array

    {224×224×3 double}    {[2]}
    {224×224×3 double}    {[7]}
    {224×224×3 double}    {[9]}
    {224×224×3 double}    {[9]}
Multiple input layers

Cell array with (numInputs + 1) columns, where numInputs is the number of network inputs.

The first numInputs columns specify the predictors for each input and the last column specifies the responses.

The order of inputs is given by the InputNames property of the layer graph layers.

data = read(ds)
data =

  4×3 cell array

    {224×224×3 double}    {128×128×3 double}    {[2]}
    {224×224×3 double}    {128×128×3 double}    {[2]}
    {224×224×3 double}    {128×128×3 double}    {[9]}
    {224×224×3 double}    {128×128×3 double}    {[9]}

The format of the predictors depend on the type of data.

DataFormat of Predictors
2-D image

h-by-w-by-c numeric array, where h, w, and c are the height, width, and number of channels of the image, respectively.

3-D image

h-by-w-by-d-by-c numeric array, where h, w, d, and c are the height, width, depth, and number of channels of the image, respectively.

Vector sequence

c-by-s matrix, where c is the number of features of the sequence and s is the sequence length.

2-D image sequence

h-by-w-by-c-by-s array, where h, w, and c correspond to the height, width, and number of channels of the image, respectively, and s is the sequence length.

Each sequence in the mini-batch must have the same sequence length.

3-D image sequence

h-by-w-by-d-by-c-by-s array, where h, w, d, and c correspond to the height, width, depth, and number of channels of the image, respectively, and s is the sequence length.

Each sequence in the mini-batch must have the same sequence length.

Features

c-by-1 column vector, where c is the number of features.

For predictors returned in tables, the elements must contain a numeric scalar, a numeric row vector, or a 1-by-1 cell array containing a numeric array.

The trainNetwork function does not support networks with multiple sequence input layers.

The format of the responses depend on the type of task.

TaskFormat of Responses
ClassificationCategorical scalar
Regression

  • Scalar

  • Numeric vector

  • 3-D numeric array representing an image

Sequence-to-sequence classification

1-by-s sequence of categorical labels, where s is the sequence length of the corresponding predictor sequence.

Sequence-to-sequence regression

R-by-s matrix, where R is the number of responses and s is the sequence length of the corresponding predictor sequence.

For responses returned in tables, the elements must be a categorical scalar, a numeric scalar, a numeric row vector, or a 1-by-1 cell array containing a numeric array.

Image or feature data, specified as a numeric array. The size of the array depends on the type of input:

InputDescription
2-D imagesA h-by-w-by-c-by-N numeric array, where h, w, and c are the height, width, and number of channels of the images, respectively, and N is the number of images.
3-D imagesA h-by-w-by-d-by-c-by-N numeric array, where h, w, d, and c are the height, width, depth, and number of channels of the images, respectively, and N is the number of images.
FeaturesA N-by-numFeatures numeric array, where N is the number of observations and numFeatures is the number of features of the input data.

If the array contains NaNs, then they are propagated through the network.

Sequence or time series data, specified as an N-by-1 cell array of numeric arrays, where N is the number of observations, or a numeric array representing a single sequence.

For cell array or numeric array input, the dimensions of the numeric arrays containing the sequences depend on the type of data.

InputDescription
Vector sequencesc-by-s matrices, where c is the number of features of the sequences and s is the sequence length.
2-D image sequencesh-by-w-by-c-by-s arrays, where h, w, and c correspond to the height, width, and number of channels of the images, respectively, and s is the sequence length.
3-D image sequencesh-by-w-by-d-by-c-by-s, where h, w, d, and c correspond to the height, width, depth, and number of channels of the 3-D images, respectively, and s is the sequence length.

To specify sequences using a datastore, use the ds input argument.

Responses, specified as a categorical vector of labels, a numeric array, a cell array of categorical sequences, or cell array of numeric sequences. The format of Y depends on the type of task. Responses must not contain NaNs.

Classification

TaskFormat
Image or feature classificationN-by-1 categorical vector of labels, where N is the number of observations.
Sequence-to-label classification
Sequence-to-sequence classification

N-by-1 cell array of categorical sequences of labels, where N is the number of observations. Each sequence must have the same number of time steps as the corresponding predictor sequence.

For sequence-to-sequence classification tasks with one observation, sequences can also be a vector. In this case, Y must be a categorical sequence of labels.

Regression

TaskFormat
2-D image regression
  • N-by-R matrix, where N is the number of images and R is the number of responses.

  • h-by-w-by-c-by-N numeric array, where h, w, and c are the height, width, and number of channels of the images, respectively, and N is the number of images.

3-D image regression
  • N-by-R matrix, where N is the number of images and R is the number of responses.

  • h-by-w-by-d-by-c-by-N numeric array, where h, w, d, and c are the height, width, depth, and number of channels of the images, respectively, and N is the number of images.

Sequence-to-one regressionN-by-R matrix, where N is the number of sequences and R is the number of responses.
Sequence-to-sequence regression

N-by-1 cell array of numeric sequences, where N is the number of sequences. The sequences are matrices with R rows, where R is the number of responses. Each sequence must have the same number of time steps as the corresponding predictor sequence.

For sequence-to-sequence regression tasks with one observation, sequences can be a matrix. In this case, Y must be a matrix of responses.

Feature regression

N-by-R matrix, where N is the number of observations and R is the number of responses.

Normalizing the responses often helps to stabilize and speed up training of neural networks for regression. For more information, see Train Convolutional Neural Network for Regression.

Input data, specified as a table containing predictors and responses. Each row in the table corresponds to an observation.

The arrangement of predictors and responses in the table columns depends on the type of task.

Classification

TaskPredictorsResponses
Image classification
  • Absolute or relative file path to an image, specified as a character vector in a single column

  • Image specified as a 1-by-1 cell array containing a 3-D numeric array

Predictors must be in the first column of the table.

Categorical label

Sequence-to-label classification

Absolute or relative file path to a MAT file containing sequence or time series data.

The MAT file must contain a time series represented by a matrix with rows corresponding to data points and columns corresponding to time steps.

Predictors must be in the first column of the table.

Categorical label

Sequence-to-sequence classification

Absolute or relative file path to a MAT file. The MAT file must contain a time series represented by a categorical vector, with entries corresponding to labels for each time step.

Feature classification

Numeric scalar.

If you do not specify the responseNames argument, then the predictors must be in the first numFeatures columns of the table, where numFeatures is the number of features of the input data.

Categorical label

For classification networks with image or sequence input, if you do not specify responseNames, then the function, by default, uses the first column of tbl for the predictors and the second column as the labels. For classification networks with feature input, if you do not specify the responseNames argument, then the function, by default, uses the first (numColumns - 1) columns of tbl for the predictors and the last column for the labels, where numFeatures is the number of features in the input data.

Regression

TaskPredictorsResponses
Image regression

  • Absolute or relative file path to an image, specified as a character vector

  • Image specified as a 1-by-1 cell array containing a 3-D numeric array

Predictors must be in the first column of the table.

  • One or more columns of scalar values

  • Numeric row vector

  • 1-by-1 cell array containing a 3-D numeric array

Sequence-to-one regression

Absolute or relative file path to a MAT file containing sequence or time series data.

The MAT file must contain a time series represented by a matrix with rows corresponding to data points and columns corresponding to time steps.

Predictors must be in the first column of the table.

  • One or more columns of scalar values

  • Numeric row vector

Sequence-to-sequence regression

Absolute or relative file path to a MAT file. The MAT file must contain a time series represented by a matrix, where rows correspond to responses and columns correspond to time steps.

Feature regression

Features specified in one or more columns as scalars.

If you do not specify the responseNames argument, then the predictors must be in the first numFeatures columns of the table, where numFeatures is the number of features of the input data.

One or more columns of scalar values

For regression networks with image or sequence input, if you do not specify responseNames, then the function, by default, uses the first column of tbl for the predictors and the subsequent columns as responses. For regression networks with feature input, if you do not specify the responseNames argument, then the function, by default, uses the first numFeatures columns for the predictors and the subsequent columns for the responses, where numFeatures is the number of features in the input data.

Normalizing the responses often helps to stabilize and speed up training of neural networks for regression. For more information, see Train Convolutional Neural Network for Regression.

Responses cannot contain NaNs. If the predictor data contains NaNs, then they are propagated through the training. However, in most cases, the training fails to converge.

Data Types: table

Names of the response variables in the input table, specified as one of the following:

  • For classification or regression tasks with a single response, responseNames must be a character vector or string scalar containing the response variable in the input table.

    For regression tasks with multiple responses, responseNames must be string array or cell array of character vectors containing the response variables in the input table.

Data Types: char | cell | string

Network layers, specified as a Layer array or a LayerGraph object.

To create a network with all layers connected sequentially, you can use a Layer array as the input argument. In this case, the returned network is a SeriesNetwork object.

A directed acyclic graph (DAG) network has a complex structure in which layers can have multiple inputs and outputs. To create a DAG network, specify the network architecture as a LayerGraph object and then use that layer graph as the input argument to trainNetwork.

For a list of built-in layers, see List of Deep Learning Layers.

Training options, specified as a TrainingOptionsSGDM, TrainingOptionsRMSProp, or TrainingOptionsADAM object returned by the trainingOptions function.

Output Arguments

collapse all

Trained network, returned as a SeriesNetwork object or a DAGNetwork object.

If you train the network using a Layer array, then net is a SeriesNetwork object. If you train the network using a LayerGraph object, then net is a DAGNetwork object.

Training information, returned as a structure, where each field is a scalar or a numeric vector with one element per training iteration.

For classification tasks, info contains the following fields:

  • TrainingLoss — Loss function values

  • TrainingAccuracy — Training accuracies

  • ValidationLoss — Loss function values

  • ValidationAccuracy — Validation accuracies

  • BaseLearnRate — Learning rates

  • FinalValidationLoss — Final validation loss

  • FinalValidationAccuracy — Final validation accuracy

For regression tasks, info contains the following fields:

  • TrainingLoss — Loss function values

  • TrainingRMSE — Training RMSE values

  • ValidationLoss — Loss function values

  • ValidationRMSE — Validation RMSE values

  • BaseLearnRate — Learning rates

  • FinalValidationLoss — Final validation loss

  • FinalValidationRMSE — Final validation RMSE

The structure only contains the fields ValidationLoss, ValidationAccuracy, ValidationRMSE , FinalValidationLoss , FinalValidationAccuracy and FinalValidationRMSE when options specifies validation data. The 'ValidationFrequency' option of trainingOptions determines which iterations the software calculates validation metrics. The final validation metrics are scalar. The other fields of the structure are row vectors, where each element corresponds to a training iteration. For iterations when the software does not calculate validation metrics, the corresponding values in the structure are NaN.

If your network contains batch normalization layers, then the final validation metrics are often different from the validation metrics evaluated during training. This is because batch normalization layers in the final network perform different operations than during training. For more information, see batchNormalizationLayer.

More About

collapse all

Save Checkpoint Networks and Resume Training

Deep Learning Toolbox™ enables you to save networks as .mat files after each epoch during training. This periodic saving is especially useful when you have a large network or a large data set, and training takes a long time. If the training is interrupted for some reason, you can resume training from the last saved checkpoint network. If you want trainNetwork to save checkpoint networks, then you must specify the name of the path by using the 'CheckpointPath' name-value pair argument of trainingOptions. If the path that you specify does not exist, then trainingOptions returns an error.

trainNetwork automatically assigns unique names to checkpoint network files. In the example name, net_checkpoint__351__2018_04_12__18_09_52.mat, 351 is the iteration number, 2018_04_12 is the date, and 18_09_52 is the time at which trainNetwork saves the network. You can load a checkpoint network file by double-clicking it or using the load command at the command line. For example:

load net_checkpoint__351__2018_04_12__18_09_52.mat
You can then resume training by using the layers of the network as an input argument to trainNetwork. For example:

trainNetwork(XTrain,YTrain,net.Layers,options)
You must manually specify the training options and the input data, because the checkpoint network does not contain this information. For an example, see Resume Training from Checkpoint Network.

Floating-Point Arithmetic

All functions for deep learning training, prediction, and validation in Deep Learning Toolbox perform computations using single-precision, floating-point arithmetic. Functions for deep learning include trainNetwork, predict, classify, and activations. The software uses single-precision arithmetic when you train networks using both CPUs and GPUs.

References

[1] Kudo, M., J. Toyama, and M. Shimbo. "Multidimensional Curve Classification Using Passing-Through Regions." Pattern Recognition Letters. Vol. 20, No. 11–13, pp. 1103–1111.

[2] Kudo, M., J. Toyama, and M. Shimbo. Japanese Vowels Data Set. https://archive.ics.uci.edu/ml/datasets/Japanese+Vowels

Extended Capabilities

Introduced in R2016a