A regression layer computes the half-mean-squared-error loss
for regression problems. For typical regression problems, a regression layer must follow the final
fully connected layer.
For a single observation, the mean-squared-error is given by:
where R is the number of responses,
ti is the target output, and
yi is the network’s prediction for
response i.
For image and sequence-to-one regression networks, the loss function of the regression
layer is the half-mean-squared-error of the predicted responses, not normalized by
R:
For image-to-image regression networks, the loss function of the regression layer is the
half-mean-squared-error of the predicted responses for each pixel, not normalized by
R:
where H, W, and
C denote the height, width, and number of channels of the output
respectively, and p indexes into each element (pixel) of
t and y linearly.
For sequence-to-sequence regression networks, the loss function of the regression layer is
the half-mean-squared-error of the predicted responses for each time step, not normalized by
R:
where S is the sequence length.
When training, the software calculates the mean loss over the observations in the
mini-batch.