Train shallow neural network
This function trains a shallow neural network. For deep learning with
convolutional or LSTM neural networks, see trainNetwork
instead.
[
trains a network with additional options specified by one or more name-value pair
arguments.trainedNet
,tr
] = train(net
,X
,T
,Xi
,Ai
,EW
,Name,Value
)
Here input x
and targets t
define a
simple function that you can plot:
x = [0 1 2 3 4 5 6 7 8];
t = [0 0.84 0.91 0.14 -0.77 -0.96 -0.28 0.66 0.99];
plot(x,t,'o')
Here feedforwardnet
creates a two-layer feed-forward
network. The network has one hidden layer with ten neurons.
net = feedforwardnet(10); net = configure(net,x,t); y1 = net(x) plot(x,t,'o',x,y1,'x')
The network is trained and then resimulated.
net = train(net,x,t); y2 = net(x) plot(x,t,'o',x,y1,'x',x,y2,'*')
This example trains an open-loop nonlinear-autoregressive network with
external input, to model a levitated magnet system defined by a control
current x
and the magnet’s vertical position response
t
, then simulates the network. The function preparets
prepares the data
before training and simulation. It creates the open-loop network’s combined
inputs xo
, which contains both the external input
x
and previous values of position
t
. It also prepares the delay states
xi
.
[x,t] = maglev_dataset; net = narxnet(10); [xo,xi,~,to] = preparets(net,x,{},t); net = train(net,xo,to,xi); y = net(xo,xi)
This same system can also be simulated in closed-loop form.
netc = closeloop(net); view(netc) [xc,xi,ai,tc] = preparets(netc,x,{},t); yc = netc(xc,xi,ai);
Parallel Computing Toolbox™ allows Deep Learning Toolbox™ to simulate and train networks faster and on larger datasets than can fit on one PC. Parallel training is currently supported for backpropagation training only, not for self-organizing maps.
Here training and simulation happens across parallel MATLAB workers.
parpool [X,T] = vinyl_dataset; net = feedforwardnet(10); net = train(net,X,T,'useParallel','yes','showResources','yes'); Y = net(X);
Use Composite values to distribute the data manually, and get back the results as a Composite value. If the data is loaded as it is distributed then while each piece of the dataset must fit in RAM, the entire dataset is limited only by the total RAM of all the workers.
[X,T] = vinyl_dataset; Q = size(X,2); Xc = Composite; Tc = Composite; numWorkers = numel(Xc); ind = [0 ceil((1:numWorkers)*(Q/numWorkers))]; for i=1:numWorkers indi = (ind(i)+1):ind(i+1); Xc{i} = X(:,indi); Tc{i} = T(:,indi); end net = feedforwardnet; net = configure(net,X,T); net = train(net,Xc,Tc); Yc = net(Xc);
Note in the example above the function configure was used to set the dimensions and processing settings of the network's inputs. This normally happens automatically when train is called, but when providing composite data this step must be done manually with non-Composite data.
Networks can be trained using the current GPU device, if it is supported by Parallel Computing Toolbox. GPU training is currently supported for backpropagation training only, not for self-organizing maps.
[X,T] = vinyl_dataset; net = feedforwardnet(10); net = train(net,X,T,'useGPU','yes'); y = net(X);
To put the data on a GPU manually:
[X,T] = vinyl_dataset; Xgpu = gpuArray(X); Tgpu = gpuArray(T); net = configure(net,X,T); net = train(net,Xgpu,Tgpu); Ygpu = net(Xgpu); Y = gather(Ygpu);
Note in the example above the function configure was used to set the dimensions and processing settings of the network's inputs. This normally happens automatically when train is called, but when providing gpuArray data this step must be done manually with non-gpuArray data.
To run in parallel, with workers each assigned to a different unique GPU, with extra workers running on CPU:
net = train(net,X,T,'useParallel','yes','useGPU','yes'); y = net(X);
Using only workers with unique GPUs might result in higher speed, as CPU workers might not keep up.
net = train(net,X,T,'useParallel','yes','useGPU','only'); Y = net(X);
Here a network is trained with checkpoints saved at a rate no greater than once every two minutes.
[x,t] = vinyl_dataset; net = fitnet([60 30]); net = train(net,x,t,'CheckpointFile','MyCheckpoint','CheckpointDelay',120);
After a computer failure, the latest network can be recovered and used to
continue training from the point of failure. The checkpoint file includes a
structure variable checkpoint
, which includes the
network, training record, filename, time, and number.
[x,t] = vinyl_dataset; load MyCheckpoint net = checkpoint.net; net = train(net,x,t,'CheckpointFile','MyCheckpoint');
Another use for the checkpoint feature is when you stop a parallel
training session (started with the 'UseParallel'
parameter) even though the Neural Network Training Tool is not available
during parallel training. In this case, set a
'CheckpointFile'
, use Ctrl+C to stop training any
time, then load your checkpoint file to get the network and training
record.
net
— Input networknetwork
objectInput network, specified as a network
object. To create a
network
object, use for example, feedforwardnet
or narxnet
.
X
— Network inputsNetwork inputs, specified as an
R
-by-Q
matrix or an
Ni
-by-TS
cell array, where
R
is the input size
Q
is the batch size
Ni = net.numInputs
TS
is the number of time steps
train
arguments can have two formats: matrices, for
static problems and networks with single inputs and outputs, and cell arrays
for multiple timesteps and networks with multiple inputs and outputs.
The matrix format can be used if only one time step is to be
simulated (TS = 1
). It is convenient for
networks with only one input and output, but can be used with
networks that have more. When the network has multiple inputs,
the matrix size is (sum of
Ri
)-by-Q
.
The cell array format is more general, and more convenient for
networks with multiple inputs and outputs, allowing sequences of
inputs to be presented. Each element X{i,ts}
is an Ri
-by-Q
matrix,
where Ri = net.inputs{i}.size
.
If Composite data is used, then 'useParallel'
is
automatically set to 'yes'
. The function takes Composite
data and returns Composite results.
If gpuArray data is used, then 'useGPU'
is
automatically set to 'yes'
. The function takes gpuArray
data and returns gpuArray results
Note
If a column of X contains at least one NaN
,
train
does not use that column for training,
testing, or validation. If a target value in T
is a
NaN
, then train
ignores that
row, and uses the other rows for training, testing, or
validation.
T
— Network targetsNetwork targets, specified as a
U
-by-Q
matrix or an
No
-by-TS
cell array, where
U
is the output size
Q
is the batch size
No = net.numOutputs
TS
is the number of time steps
train
arguments can have two formats: matrices, for
static problems and networks with single inputs and outputs, and cell arrays
for multiple timesteps and networks with multiple inputs and outputs.
The matrix format can be used if only one time step is to be
simulated (TS = 1
). It is convenient for
networks with only one input and output, but can be used with
networks that have more. When the network has multiple inputs,
the matrix size is (sum of
Ui
)-by-Q
.
The cell array format is more general, and more convenient for
networks with multiple inputs and outputs, allowing sequences of
inputs to be presented. Each element T{i,ts}
is a Ui
-by-Q
matrix, where
Ui = net.outputs{i}.size
.
If Composite data is used, then 'useParallel'
is
automatically set to 'yes'
. The function takes Composite
data and returns Composite results.
If gpuArray data is used, then 'useGPU'
is
automatically set to 'yes'
. The function takes gpuArray
data and returns gpuArray results
Note that T
is optional and need only be used for
networks that require targets.
Note
Any NaN
values in the inputs X
or the targets T
, are treated as missing data. If a
column of X
or T
contains at least
one NaN
, that column is not used for training,
testing, or validation.
Xi
— Initial input delay conditionsInitial input delay conditions, specified as an
Ni
-by-ID
cell array or an
R
-by-(ID*Q)
matrix, where
ID = net.numInputDelays
Ni = net.numInputs
R
is the input size
Q
is the batch size
For cell array input, the columns of Xi
are ordered
from the oldest delay condition to the most recent:
Xi{i,k}
is the input i
at time
ts = k - ID
.
Xi
is also optional and need only be used for
networks that have input or layer delays.
Ai
— Initial layer delay conditionsInitial layer delay conditions, specified as a
Nl
-by-LD
cell array or a (sum of
Si
)-by-(LD*Q
) matrix, where
Nl = net.numLayers
LD = net.numLayerDelays
Si = net.layers{i}.size
Q
is the batch size
For cell array input, the columns of Ai
are ordered
from the oldest delay condition to the most recent:
Ai{i,k}
is the layer output i
at
time ts = k - LD
.
EW
— Error weightsError weights, specified as a
No
-by-TS
cell array or a (sum of
Ui
)-by-Q
matrix, where
No = net.numOutputs
TS
is the number of time steps
Ui = net.outputs{i}.size
Q
is the batch size
For cell array input. each element EW{i,ts}
is a
Ui
-by-Q
matrix, where
Ui = net.outputs{i}.size
Q
is the batch size
The error weights EW
can also have a size of 1 in
place of all or any of No
, TS
,
Ui
or Q
. In that case,
EW
is automatically dimension extended to match the
targets T
. This allows for conveniently weighting the
importance in any dimension (such as per sample) while having equal
importance across another (such as time, with TS=1
). If
all dimensions are 1, for instance if EW = {1}
, then all
target values are treated with the same importance. That is the default
value of EW
.
As noted above, the error weights EW
can be of the
same dimensions as the targets T
, or have some
dimensions set to 1. For instance if EW
is
1-by-Q
, then target samples will have different
importances, but each element in a sample will have the same importance. If
EW
is (sum of Ui
)-by-1, then
each output element has a different importance, with all samples treated
with the same importance.
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
'useParallel','yes'
'useParallel'
— Option to specify parallel calculations'no'
(default) | 'yes'
Option to specify parallel calculations, specified as
'yes'
or 'no'
.
'no'
– Calculations occur on normal
MATLAB thread. This is the default
'useParallel'
setting.
'yes'
– Calculations occur on parallel
workers if a parallel pool is open. Otherwise calculations
occur on the normal MATLAB® thread.
'useGPU'
— Option to specify GPU calculations'no'
(default) | 'yes'
| 'only'
Option to specify GPU calculations, specified as
'yes'
, 'no'
, or
'only'
.
'no'
– Calculations occur on the CPU.
This is the default 'useGPU'
setting.
'yes'
– Calculations occur on the
current gpuDevice
if it is a supported
GPU (See Parallel Computing Toolbox for GPU requirements.) If the current
gpuDevice
is not supported,
calculations remain on the CPU. If
'useParallel'
is also
'yes'
and a parallel pool is open,
then each worker with a unique GPU uses that GPU, other
workers run calculations on their respective CPU cores.
'only'
– If no parallel pool is open,
then this setting is the same as 'yes'
.
If a parallel pool is open then only workers with unique
GPUs are used. However, if a parallel pool is open, but no
supported GPUs are available, then calculations revert to
performing on all worker CPUs.
'showResources'
— Option to show resources'no'
(default) | 'yes'
Option to show resources, specified as 'yes'
or
'no'
.
'no'
– Do not display computing
resources used at the command line. This is the default
setting.
'yes'
– Show at the command line a
summary of the computing resources actually used. The actual
resources may differ from the requested resources, if
parallel or GPU computing is requested but a parallel pool
is not open or a supported GPU is not available. When
parallel workers are used, each worker’s computation mode is
described, including workers in the pool that are not
used.
'reduction'
— Memory reductionMemory reduction, specified as a positive integer.
For most neural networks, the default CPU training computation mode is
a compiled MEX algorithm. However, for large networks the calculations
might occur with a MATLAB calculation mode. This can be confirmed using
'showResources'
. If MATLAB is being used and memory is an issue, setting the
reduction option to a value N greater than 1, reduces much of the
temporary storage required to train by a factor of N, in exchange for
longer training times.
'CheckpointFile'
— Checkpoint file''
(default) | character vectorCheckpoint file, specified as a character vector.
The value for 'CheckpointFile'
can be set to a
filename to save in the current working folder, to a file path in
another folder, or to an empty string to disable checkpoint saves (the
default value).
'CheckpointDelay'
— Checkpoint delayCheckpoint delay, specified as a nonnegative integer.
The optional parameter 'CheckpointDelay'
limits how
often saves happen. Limiting the frequency of checkpoints can improve
efficiency by keeping the amount of time saving checkpoints low compared
to the time spent in calculations. It has a default value of 60, which
means that checkpoint saves do not happen more than once per minute. Set
the value of 'CheckpointDelay'
to 0 if you want
checkpoint saves to occur only once every epoch.
trainedNet
— Trained networknetwork
objectTrained network, returned as a network
object.
tr
— Training recordTraining record (epoch
and perf
),
returned as a structure whose fields depend on the network training function
(net.NET.trainFcn
). It can include fields such
as:
Training, data division, and performance functions and parameters
Data division indices for training, validation and test sets
Data division masks for training validation and test sets
Number of epochs (num_epochs
) and the best
epoch (best_epoch
).
A list of training state names (states
).
Fields for each state name recording its value throughout training
Performances of the best network (best_perf
,
best_vperf
,
best_tperf
)
train
calls the function indicated by
net.trainFcn
, using the training parameter values indicated by
net.trainParam
.
Typically one epoch of training is defined as a single presentation of all input vectors to the network. The network is then updated according to the results of all those presentations.
Training occurs until a maximum number of epochs occurs, the performance goal is met,
or any other stopping condition of the function net.trainFcn
occurs.
Some training functions depart from this norm by presenting only one input vector (or
sequence) each epoch. An input vector (or sequence) is chosen randomly for each epoch
from concurrent input vectors (or sequences). competlayer
returns networks that use trainru
, a training function that does this.