batchnorm

Normalize each channel of mini-batch

Description

The batch normalization operation normalizes each input channel across a mini-batch. To speed up training of convolutional neural networks and reduce the sensitivity to network initialization, use batch normalization between convolution and nonlinear operations such as relu.

Note

This function applies the batch normalization operation to dlarray data. If you want to apply batch normalization within a layerGraph object or Layer array, use the following layer:

example

[dlY,mu,sigmaSq] = batchnorm(dlX,offset,scaleFactor) normalizes each channel of the input mini-batch dlX using the mean and variance statistics computed from each channel and applies a scale factor and offset.

The normalized activation is calculated using the following formula:

x^i=xiμcσc2+ε

where xi is the input activation, μc (mu) and σc2 (sigmaSq) are the per-channel mean and variance, respectively, and ε is a small constant. mu and sigmaSq are calculated over all 'S' (spatial), 'B' (batch), 'T' (time), and 'U' (unspecified) dimensions in dlX for each channel.

The normalized activation is offset and scaled according to the following formula:

yi=γx^i+β.

The offset β and scale factor γ are specified with the offset and scaleFactor arguments.

The input dlX is a formatted dlarray with dimension labels. The output dlY is a formatted dlarray with the same dimension labels as dlX.

dlY = batchnorm(dlX,offset,scaleFactor,mu,sigmaSq) normalizes each channel of the input dlX using the specified mu and sigmaSq statistics and applies a scale factor and offset.

example

[dlY,datasetMu,datasetSigmaSq] = batchnorm(dlX,offset,scaleFactor,datasetMu,datasetSigmaSq) normalizes each channel of the input mini-batch dlX using the mean and variance statistics computed from each channel and applies a scale factor and offset. The function also updates the data set statistics datasetMu and datasetSigmaSq using the following formula:

sn=ϕsx+(1ϕ)sn1

where sn is the statistic computed over several mini-batches, sx is the per-channel statistic of the current mini-batch, and ϕ is the decay value for the statistic.

Use this syntax to iteratively update the mean and variance statistics over several mini-batches of data during training. Use the final value of the mean and variance computed over all training mini-batches to normalize data for prediction and classification.

[___] = batchnorm(___,'DataFormat',FMT) also specifies the dimension format FMT when dlX is not a formatted dlarray in addition to the input arguments in previous syntaxes. The output dlY is an unformatted dlarray with the same dimension order as dlX.

[___] = batchnorm(___,Name,Value) specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, 'MeanDecay',3 sets the decay rate of the moving average computation.

Examples

collapse all

Use batchnorm to normalize each channel of a mini-batch and obtain the per-channel normalization statistics.

Create the input data as a single observation of random values with a height and width of four and three channels.

height = 4;
width = 4;
channels = 3;
observations = 1;

X = rand(height,width,channels,observations);
dlX = dlarray(X,'SSCB');

Create the learnable parameters.

offset = zeros(channels,1);
scaleFactor = ones(channels,1);

Compute the batch normalization and obtain the statistics of each channel of the batch.

[dlY,mu,sigmaSq] = batchnorm(dlX,offset,scaleFactor);
mu
sigmaSq
mu = 3×1    
    0.6095
    0.6063
    0.4619
sigmaSq = 3×1    
    0.1128
    0.0880
    0.0805

Use the batchnorm function to normalize several batches of data and update the statistics of the whole data set after each normalization.

Create three batches of data. The data consists of 10-by-10 random arrays with five channels. Each batch contains 20 observations. The second and third batches are scaled by a multiplicative factor of 1.5 and 2.5, respectively, so the mean of the data set increases with each batch.

height = 10;
width = 10;
channels = 5;
observations = 20;

X1 = rand(height,width,channels,observations);
dlX1 = dlarray(X1,'SSCB');

X2 = 1.5*rand(height,width,channels,observations);
dlX2 = dlarray(X2,'SSCB');

X3 = 2.5*rand(height,width,channels,observations);
dlX3 = dlarray(X3,'SSCB');

Create the learnable parameters.

offset = zeros(channels,1);
scale = ones(channels,1);

Normalize the first batch of data, dlX1, using batchnorm. Obtain the values of the mean and variance of this batch as outputs.

[dlY1,mu,sigmaSq] = batchnorm(dlX1,offset,scale);

Normalize the second batch of data, dlX2. Use mu and sigmaSq as inputs to obtain the values of the combined mean and variance of the data in batches dlX1 and dlX2.

[dlY2,datasetMu,datasetSigmaSq] = batchnorm(dlX2,offset,scale,mu,sigmaSq);

Normalize the final batch of data, dlX3. Update the data set statistics datasetMu and datasetSigmaSq to obtain the values of the combined mean and variance of all data in batches dlX1, dlX2, and dlX3.

[dlY3,datasetMuFull,datasetSigmaSqFull] = batchnorm(dlX3,offset,scale,datasetMu,datasetSigmaSq);

Observe the change in the mean of each channel as each batch is normalized.

plot([mu';datasetMu';datasetMuFull'])
legend({'Channel 1','Channel 2','Channel 3','Channel 4','Channel 5'},'Location','southeast')
xticks([1 2 3])
xlabel('Number of Batches')
xlim([0.9 3.1])
ylabel('Per-Channel Mean')
title('Data Set Mean')

Input Arguments

collapse all

Input data, specified as a dlarray with or without dimension labels or a numeric array. When dlX is not a formatted dlarray, you must specify the dimension label format using 'DataFormat',FMT. If dlX is a numeric array, at least one of offset or scaleFactor must be a dlarray.

dlX must have a 'C' channel dimension.

Data Types: single | double

Channel offset β, specified as a dlarray vector with or without dimension labels or a numeric vector.

If offset is a formatted dlarray, it must contain a 'C' dimension of the same size as the 'C' dimension of the input data.

Data Types: single | double

Channel scale factor γ, specified as a dlarray vector with or without dimension labels or a numeric vector.

If scaleFactor is a formatted dlarray, it must contain a 'C' dimension of the same size as the 'C' dimension of the input data.

Data Types: single | double

Mean statistic for normalization, specified as a numeric vector of the same length as the 'C' dimension of the input data.

mu is calculated over all 'S' (spatial), 'B' (batch), 'T' (time), and 'U' (unspecified) dimensions in dlX for each channel.

Data Types: single | double

Variance statistic for normalization, specified as a numeric vector of the same length as the 'C' dimension of the input data.

sigmaSq is calculated over all 'S' (spatial), 'B' (batch), 'T' (time), and 'U' (unspecified) dimensions in dlX for each channel.

Data Types: single | double

Mean statistic of several batches of data, specified as a numeric vector of the same length as the 'C' dimension of the input data. To iteratively update the dataset mean over several batches of input data, use the datasetMu output of a previous call to batchnorm as the datasetMu input argument.

Data Types: single | double

Variance statistic of several batches of data, specified as a numeric vector of the same length as the 'C' dimension of the input data. To iteratively update the dataset variance over several batches of input data, use the datasetSigmaSq output of a previous call to batchnorm as the datasetSigmaSq input argument.

Data Types: single | double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'MeanDecay',0.3,'MeanVariance',0.5 sets the decay rate for the moving average computations of the mean and variance of several batches of data to 0.3 and 0.5, respectively.

Dimension order of unformatted input data, specified as the comma-separated pair consisting of 'DataFormat' and a character array or string FMT that provides a label for each dimension of the data. Each character in FMT must be one of the following:

  • 'S' — Spatial

  • 'C' — Channel

  • 'B' — Batch (for example, samples and observations)

  • 'T' — Time (for example, sequences)

  • 'U' — Unspecified

You can specify multiple dimensions labeled 'S' or 'U'. You can use the labels 'C', 'B', and 'T' at most once.

You must specify 'DataFormat' when the input data dlX is not a formatted dlarray.

Example: 'DataFormat','SSCB'

Data Types: char | string

Variance offset for preventing divide-by-zero errors, specified as the comma-separated pair consisting of 'Epsilon' and a numeric scalar. The specified value must be greater than 1e-5. The default value is 1e-5.

Data Types: single | double

Decay value for the moving average computation of the datasetMu output, specified as the comma-separated pair consisting of 'MeanDecay' and a numeric scalar between 0 and 1. The default value is 0.1.

Data Types: single | double

Decay value for the moving average computation of the datasetSigmaSq output, specified as the comma-separated pair consisting of 'VarianceDecay' and a numeric scalar between 0 and 1. The default value is 0.1.

Data Types: single | double

Output Arguments

collapse all

Normalized data, returned as a dlarray. The output dlY has the same underlying data type as the input dlX.

If the input data dlX is a formatted dlarray, dlY has the same dimension labels as dlX. If the input data is not a formatted dlarray, dlY is an unformatted dlarray with the same dimension order as the input data.

Per-channel mean of the input data, returned as a numeric column vector with length equal to the size of the 'C' dimension of the input data.

Per-channel variance of the input data, returned as a numeric column vector with length equal to the size of the 'C' dimension of the input data.

Updated mean statistic of several batches of data, returned as a numeric vector with length equal to the size of the 'C' dimension of the input data. datasetMu is returned with the same shape as the input datasetMu.

The datasetMu output is the moving average computation of the mean statistic for each channel over several batches of input data. datasetMu is computed from the channel mean of the input data and the input datasetMu using the following formula:

datasetMu = meanDecay × currentMu + (1 – meanDecay) × datasetMu,

where currentMu is the channel mean computed from the input data and the value of meanDecay is specified using the 'MeanDecay' name-value pair argument.

Updated variance statistic of several batches of data, returned as a numeric vector with length equal to the size of the 'C' dimension of the input data. datasetSigmaSq is returned with the same shape as the input datasetSigmaSq.

The datasetSigmaSq output is the moving average computation of the variance statistic for each channel over several batches of input data. datasetSigmaSq is computed from the channel variance of the input data and the input datasetSigmaSq using the following formula:

datasetSigmaSq = varianceDecay × currentSigmaSq + (1 – varianceDecay) × datasetSigmaSq,

where currentSigmaSq is the channel variance computed from the input data and the value of varianceDecay is specified using the 'VarianceDecay' name-value pair.

More About

collapse all

Batch Normalization

The batchnorm function normalizes each input channel of a mini-batch of data. For more information, see the definition of Batch Normalization Layer on the batchNormalizationLayer reference page.

Extended Capabilities

Introduced in R2019b