Batch normalization layer
A batch normalization layer normalizes each input channel across a mini-batch. To speed up training of convolutional neural networks and reduce the sensitivity to network initialization, use batch normalization layers between convolutional layers and nonlinearities, such as ReLU layers.
The layer first normalizes the activations of each channel by subtracting the mini-batch mean and dividing by the mini-batch standard deviation. Then, the layer shifts the input by a learnable offset β and scales it by a learnable scale factor γ.
creates a batch normalization layer.layer
= batchNormalizationLayer
creates a batch normalization layer and sets the optional Batch Normalization, Parameters and Initialization, Learn Rate and Regularization, and
layer
= batchNormalizationLayer('Name',Value
)Name
properties using name-value pairs. For
example, batchNormalizationLayer('Name','batchnorm')
creates
a batch normalization layer with the name 'batchnorm'
. You
can specify multiple name-value pairs. Enclose each property name in
quotes.
A batch normalization normalizes its inputs xi by first calculating the mean μB and variance σB2 over a mini-batch and over each input channel. Then, it calculates the normalized activations as
Here, ϵ (the property Epsilon
) improves
numerical stability when the mini-batch variance is very small. To allow for the
possibility that inputs with zero mean and unit variance are not optimal for the layer
that follows the batch normalization layer, the batch normalization layer further shifts
and scales the activations as
Here, the offset β and scale factor γ
(Offset
and Scale
properties) are
learnable parameters that are updated during network training.
When network training finishes, the batch normalization layer calculates the mean and
variance over the full training set and stores them in the
TrainedMean
and TrainedVariance
properties. When you use a trained network to make predictions on new images, the layer
uses the trained mean and variance instead of the mini-batch mean and variance to
normalize the activations.
[1] Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." preprint, arXiv:1502.03167 (2015).
convolution2dLayer
| fullyConnectedLayer
| groupNormalizationLayer
| reluLayer
| trainingOptions
| trainNetwork