Normalize each channel of mini-batch
The batch normalization operation normalizes each input channel
across a mini-batch. To speed up training of convolutional neural networks and reduce the
sensitivity to network initialization, use batch normalization between convolution and nonlinear
operations such as relu
.
Note
This function applies the batch normalization operation to dlarray
data. If
you want to apply batch normalization within a layerGraph
object
or Layer
array, use
the following layer:
[
normalizes each channel of the input mini-batch dlY
,mu
,sigmaSq
] = batchnorm(dlX
,offset
,scaleFactor
)dlX
using the mean and
variance statistics computed from each channel and applies a scale factor and offset.
The normalized activation is calculated using the following formula:
where xi is the input activation,
μc (mu
) and
σc2
(sigmaSq
) are the per-channel mean and variance, respectively, and
ε is a small constant. mu
and
sigmaSq
are calculated over all 'S'
(spatial),
'B'
(batch), 'T'
(time), and 'U'
(unspecified) dimensions in dlX
for each channel.
The normalized activation is offset and scaled according to the following formula:
The offset β and scale factor γ are specified with
the offset
and scaleFactor
arguments.
The input dlX
is a formatted dlarray
with
dimension labels. The output dlY
is a formatted
dlarray
with the same dimension labels as dlX
.
[
normalizes each channel of the input mini-batch dlY
,datasetMu
,datasetSigmaSq
] = batchnorm(dlX
,offset
,scaleFactor
,datasetMu
,datasetSigmaSq
)dlX
using the mean and
variance statistics computed from each channel and applies a scale factor and offset. The
function also updates the data set statistics datasetMu
and
datasetSigmaSq
using the following formula:
where sn is the statistic computed over several mini-batches, sx is the per-channel statistic of the current mini-batch, and ϕ is the decay value for the statistic.
Use this syntax to iteratively update the mean and variance statistics over several mini-batches of data during training. Use the final value of the mean and variance computed over all training mini-batches to normalize data for prediction and classification.
[___] = batchnorm(___,'DataFormat',FMT)
also specifies the dimension format FMT
when dlX
is
not a formatted dlarray
in addition to the input arguments in previous
syntaxes. The output dlY
is an unformatted dlarray
with the same dimension order as dlX
.
[___] = batchnorm(___,
specifies options using one or more name-value pair arguments in addition to the input
arguments in previous syntaxes. For example, Name,Value
)'MeanDecay',3
sets the decay
rate of the moving average computation.
dlarray
| dlconv
| dlfeval
| dlgradient
| fullyconnect
| groupnorm
| relu