batchnorm

Normalize each channel of mini-batch

collapse all in page

Syntax

[dlY,mu,sigmaSq] = batchnorm(dlX,offset,scaleFactor)

dlY = batchnorm(dlX,offset,scaleFactor,mu,sigmaSq)

[dlY,datasetMu,datasetSigmaSq] = batchnorm(dlX,offset,scaleFactor,datasetMu,datasetSigmaSq)

[___] = batchnorm(___,'DataFormat',FMT)

[___] = batchnorm(___,Name,Value)

Description

The batch normalization operation normalizes each input channel across a mini-batch. To speed up training of convolutional neural networks and reduce the sensitivity to network initialization, use batch normalization between convolution and nonlinear operations such as relu.

Note

This function applies the batch normalization operation to dlarray data. If you want to apply batch normalization within a layerGraph object or Layer array, use the following layer:

batchNormalizationLayer

example

[dlY,mu,sigmaSq] = batchnorm(dlX,offset,scaleFactor) normalizes each channel of the input mini-batch dlX using the mean and variance statistics computed from each channel and applies a scale factor and offset.

The normalized activation is calculated using the following formula:

${\hat{x}}_{i} = \frac{x_{i} - μ_{c}}{\sqrt{σ_{c}^{2} + ε}}$

where x_i is the input activation, μ_c (mu) and σ_c² (sigmaSq) are the per-channel mean and variance, respectively, and ε is a small constant. mu and sigmaSq are calculated over all 'S' (spatial), 'B' (batch), 'T' (time), and 'U' (unspecified) dimensions in dlX for each channel.

The normalized activation is offset and scaled according to the following formula:

$y_{i} = γ {\hat{x}}_{i} + β .$

The offset β and scale factor γ are specified with the offset and scaleFactor arguments.

The input dlX is a formatted dlarray with dimension labels. The output dlY is a formatted dlarray with the same dimension labels as dlX.

dlY = batchnorm(dlX,offset,scaleFactor,mu,sigmaSq) normalizes each channel of the input dlX using the specified mu and sigmaSq statistics and applies a scale factor and offset.

example

[dlY,datasetMu,datasetSigmaSq] = batchnorm(dlX,offset,scaleFactor,datasetMu,datasetSigmaSq) normalizes each channel of the input mini-batch dlX using the mean and variance statistics computed from each channel and applies a scale factor and offset. The function also updates the data set statistics datasetMu and datasetSigmaSq using the following formula:

$s_{n} = ϕ s_{x} + (1 - ϕ) s_{n - 1}$

where s_n is the statistic computed over several mini-batches, s_x is the per-channel statistic of the current mini-batch, and ϕ is the decay value for the statistic.

Use this syntax to iteratively update the mean and variance statistics over several mini-batches of data during training. Use the final value of the mean and variance computed over all training mini-batches to normalize data for prediction and classification.

[___] = batchnorm(___,'DataFormat',FMT) also specifies the dimension format FMT when dlX is not a formatted dlarray in addition to the input arguments in previous syntaxes. The output dlY is an unformatted dlarray with the same dimension order as dlX.

[___] = batchnorm(___,Name,Value) specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, 'MeanDecay',3 sets the decay rate of the moving average computation.

Examples

collapse all

Normalize Data and Obtain the Statistics

Use batchnorm to normalize each channel of a mini-batch and obtain the per-channel normalization statistics.

Create the input data as a single observation of random values with a height and width of four and three channels.

height = 4;
width = 4;
channels = 3;
observations = 1;

X = rand(height,width,channels,observations);
dlX = dlarray(X,'SSCB');

Create the learnable parameters.

offset = zeros(channels,1);
scaleFactor = ones(channels,1);

Compute the batch normalization and obtain the statistics of each channel of the batch.

[dlY,mu,sigmaSq] = batchnorm(dlX,offset,scaleFactor);
mu
sigmaSq

mu = 3×1    
    0.6095
    0.6063
    0.4619
sigmaSq = 3×1    
    0.1128
    0.0880
    0.0805

Update Mean and Variance Over Multiple Batches of Data

Open Live Script

Use the batchnorm function to normalize several batches of data and update the statistics of the whole data set after each normalization.

Create three batches of data. The data consists of 10-by-10 random arrays with five channels. Each batch contains 20 observations. The second and third batches are scaled by a multiplicative factor of 1.5 and 2.5, respectively, so the mean of the data set increases with each batch.

height = 10;
width = 10;
channels = 5;
observations = 20;

X1 = rand(height,width,channels,observations);
dlX1 = dlarray(X1,'SSCB');

X2 = 1.5*rand(height,width,channels,observations);
dlX2 = dlarray(X2,'SSCB');

X3 = 2.5*rand(height,width,channels,observations);
dlX3 = dlarray(X3,'SSCB');

Create the learnable parameters.

offset = zeros(channels,1);
scale = ones(channels,1);

Normalize the first batch of data, dlX1, using batchnorm. Obtain the values of the mean and variance of this batch as outputs.

[dlY1,mu,sigmaSq] = batchnorm(dlX1,offset,scale);

Normalize the second batch of data, dlX2. Use mu and sigmaSq as inputs to obtain the values of the combined mean and variance of the data in batches dlX1 and dlX2.

[dlY2,datasetMu,datasetSigmaSq] = batchnorm(dlX2,offset,scale,mu,sigmaSq);

Normalize the final batch of data, dlX3. Update the data set statistics datasetMu and datasetSigmaSq to obtain the values of the combined mean and variance of all data in batches dlX1, dlX2, and dlX3.

[dlY3,datasetMuFull,datasetSigmaSqFull] = batchnorm(dlX3,offset,scale,datasetMu,datasetSigmaSq);

Observe the change in the mean of each channel as each batch is normalized.

plot([mu';datasetMu';datasetMuFull'])
legend({'Channel 1','Channel 2','Channel 3','Channel 4','Channel 5'},'Location','southeast')
xticks([1 2 3])
xlabel('Number of Batches')
xlim([0.9 3.1])
ylabel('Per-Channel Mean')
title('Data Set Mean')

Input Arguments

collapse all

`dlX` — Input data
`dlarray` | numeric array

Input data, specified as a dlarray with or without dimension labels or a numeric array. When dlX is not a formatted dlarray, you must specify the dimension label format using 'DataFormat',FMT. If dlX is a numeric array, at least one of offset or scaleFactor must be a dlarray.

dlX must have a 'C' channel dimension.

Data Types: single | double

`offset` — Channel offset
`dlarray` vector | numeric vector

Channel offset β, specified as a dlarray vector with or without dimension labels or a numeric vector.

If offset is a formatted dlarray, it must contain a 'C' dimension of the same size as the 'C' dimension of the input data.

Data Types: single | double

`scaleFactor` — Channel scale factor
`dlarray` vector | numeric vector

Channel scale factor γ, specified as a dlarray vector with or without dimension labels or a numeric vector.

If scaleFactor is a formatted dlarray, it must contain a 'C' dimension of the same size as the 'C' dimension of the input data.

Data Types: single | double

`mu` — Mean statistic for normalization
numeric vector

Mean statistic for normalization, specified as a numeric vector of the same length as the 'C' dimension of the input data.

mu is calculated over all 'S' (spatial), 'B' (batch), 'T' (time), and 'U' (unspecified) dimensions in dlX for each channel.

Data Types: single | double

`sigmaSq` — Variance statistic for normalization
numeric vector

Variance statistic for normalization, specified as a numeric vector of the same length as the 'C' dimension of the input data.

sigmaSq is calculated over all 'S' (spatial), 'B' (batch), 'T' (time), and 'U' (unspecified) dimensions in dlX for each channel.

Data Types: single | double

`datasetMu` — Mean statistic of several batches of data
numeric vector

Mean statistic of several batches of data, specified as a numeric vector of the same length as the 'C' dimension of the input data. To iteratively update the dataset mean over several batches of input data, use the datasetMu output of a previous call to batchnorm as the datasetMu input argument.

Data Types: single | double

`datasetSigmaSq` — Variance statistic of several batches of data
numeric vector

Variance statistic of several batches of data, specified as a numeric vector of the same length as the 'C' dimension of the input data. To iteratively update the dataset variance over several batches of input data, use the datasetSigmaSq output of a previous call to batchnorm as the datasetSigmaSq input argument.

Data Types: single | double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'MeanDecay',0.3,'MeanVariance',0.5 sets the decay rate for the moving average computations of the mean and variance of several batches of data to 0.3 and 0.5, respectively.

`'DataFormat'` — Dimension order of unformatted data
char array | string

Dimension order of unformatted input data, specified as the comma-separated pair consisting of 'DataFormat' and a character array or string FMT that provides a label for each dimension of the data. Each character in FMT must be one of the following:

'S' — Spatial
'C' — Channel
'B' — Batch (for example, samples and observations)
'T' — Time (for example, sequences)
'U' — Unspecified

You can specify multiple dimensions labeled 'S' or 'U'. You can use the labels 'C', 'B', and 'T' at most once.

You must specify 'DataFormat' when the input data dlX is not a formatted dlarray.

Example: 'DataFormat','SSCB'

Data Types: char | string

`'Epsilon'` — Variance offset
numeric scalar

Variance offset for preventing divide-by-zero errors, specified as the comma-separated pair consisting of 'Epsilon' and a numeric scalar. The specified value must be greater than 1e-5. The default value is 1e-5.

Data Types: single | double

`'MeanDecay'` — Mean decay value
numeric scalar between `0` and `1`

Decay value for the moving average computation of the datasetMu output, specified as the comma-separated pair consisting of 'MeanDecay' and a numeric scalar between 0 and 1. The default value is 0.1.

Data Types: single | double

`'VarianceDecay'` — Variance decay value
numeric scalar between `0` and `1`

Decay value for the moving average computation of the datasetSigmaSq output, specified as the comma-separated pair consisting of 'VarianceDecay' and a numeric scalar between 0 and 1. The default value is 0.1.

Data Types: single | double

Output Arguments

collapse all

`dlY` — Normalized data
`dlarray`

Normalized data, returned as a dlarray. The output dlY has the same underlying data type as the input dlX.

If the input data dlX is a formatted dlarray, dlY has the same dimension labels as dlX. If the input data is not a formatted dlarray, dlY is an unformatted dlarray with the same dimension order as the input data.

`mu` — Per-channel mean
numeric column vector

Per-channel mean of the input data, returned as a numeric column vector with length equal to the size of the 'C' dimension of the input data.

`sigmaSq` — Per-channel variance
numeric column vector

Per-channel variance of the input data, returned as a numeric column vector with length equal to the size of the 'C' dimension of the input data.

`datasetMu` — Updated mean statistic of several batches of data
numeric vector

Updated mean statistic of several batches of data, returned as a numeric vector with length equal to the size of the 'C' dimension of the input data. datasetMu is returned with the same shape as the input datasetMu.

The datasetMu output is the moving average computation of the mean statistic for each channel over several batches of input data. datasetMu is computed from the channel mean of the input data and the input datasetMu using the following formula:

datasetMu = meanDecay × currentMu + (1 – meanDecay) × datasetMu,

where currentMu is the channel mean computed from the input data and the value of meanDecay is specified using the 'MeanDecay' name-value pair argument.

`datasetSigmaSq` — Updated variance statistic of several batches of data
numeric vector

Updated variance statistic of several batches of data, returned as a numeric vector with length equal to the size of the 'C' dimension of the input data. datasetSigmaSq is returned with the same shape as the input datasetSigmaSq.

The datasetSigmaSq output is the moving average computation of the variance statistic for each channel over several batches of input data. datasetSigmaSq is computed from the channel variance of the input data and the input datasetSigmaSq using the following formula:

datasetSigmaSq = varianceDecay × currentSigmaSq + (1 – varianceDecay) × datasetSigmaSq,

where currentSigmaSq is the channel variance computed from the input data and the value of varianceDecay is specified using the 'VarianceDecay' name-value pair.

More About

collapse all

Batch Normalization

The batchnorm function normalizes each input channel of a mini-batch of data. For more information, see the definition of Batch Normalization Layer on the batchNormalizationLayer reference page.

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Usage notes and limitations:

When at least one of the following input arguments is a gpuArray or a dlarray with underlying data of type gpuArray, this function runs on the GPU:
- dlX
- offset
- scaleFactor

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Documentation

batchnorm

Syntax

Description

Examples

Normalize Data and Obtain the Statistics

Update Mean and Variance Over Multiple Batches of Data

Input Arguments

`dlX` — Input data
`dlarray` | numeric array

`offset` — Channel offset
`dlarray` vector | numeric vector

`scaleFactor` — Channel scale factor
`dlarray` vector | numeric vector

`mu` — Mean statistic for normalization
numeric vector

`sigmaSq` — Variance statistic for normalization
numeric vector

`datasetMu` — Mean statistic of several batches of data
numeric vector

`datasetSigmaSq` — Variance statistic of several batches of data
numeric vector

Name-Value Pair Arguments

`'DataFormat'` — Dimension order of unformatted data
char array | string

`'Epsilon'` — Variance offset
numeric scalar

`'MeanDecay'` — Mean decay value
numeric scalar between `0` and `1`

`'VarianceDecay'` — Variance decay value
numeric scalar between `0` and `1`

Output Arguments

`dlY` — Normalized data
`dlarray`

`mu` — Per-channel mean
numeric column vector

`sigmaSq` — Per-channel variance
numeric column vector

`datasetMu` — Updated mean statistic of several batches of data
numeric vector

`datasetSigmaSq` — Updated variance statistic of several batches of data
numeric vector

More About

Batch Normalization

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

See Also

Topics

Deep Learning Toolbox Documentation

Support

Documentation

batchnorm

Syntax

Description

Examples

Normalize Data and Obtain the Statistics

Update Mean and Variance Over Multiple Batches of Data

Input Arguments

dlX — Input data dlarray | numeric array

offset — Channel offset dlarray vector | numeric vector

scaleFactor — Channel scale factor dlarray vector | numeric vector

mu — Mean statistic for normalization numeric vector

sigmaSq — Variance statistic for normalization numeric vector

datasetMu — Mean statistic of several batches of data numeric vector

datasetSigmaSq — Variance statistic of several batches of data numeric vector

Name-Value Pair Arguments

'DataFormat' — Dimension order of unformatted data char array | string

'Epsilon' — Variance offset numeric scalar

'MeanDecay' — Mean decay value numeric scalar between 0 and 1

'VarianceDecay' — Variance decay value numeric scalar between 0 and 1

Output Arguments

dlY — Normalized data dlarray

mu — Per-channel mean numeric column vector

sigmaSq — Per-channel variance numeric column vector

datasetMu — Updated mean statistic of several batches of data numeric vector

datasetSigmaSq — Updated variance statistic of several batches of data numeric vector

More About

Batch Normalization

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

See Also

Topics

Deep Learning Toolbox Documentation

Support

`dlX` — Input data
`dlarray` | numeric array

`offset` — Channel offset
`dlarray` vector | numeric vector

`scaleFactor` — Channel scale factor
`dlarray` vector | numeric vector

`mu` — Mean statistic for normalization
numeric vector

`sigmaSq` — Variance statistic for normalization
numeric vector

`datasetMu` — Mean statistic of several batches of data
numeric vector

`datasetSigmaSq` — Variance statistic of several batches of data
numeric vector

`'DataFormat'` — Dimension order of unformatted data
char array | string

`'Epsilon'` — Variance offset
numeric scalar

`'MeanDecay'` — Mean decay value
numeric scalar between `0` and `1`

`'VarianceDecay'` — Variance decay value
numeric scalar between `0` and `1`

`dlY` — Normalized data
`dlarray`

`mu` — Per-channel mean
numeric column vector

`sigmaSq` — Per-channel variance
numeric column vector

`datasetMu` — Updated mean statistic of several batches of data
numeric vector

`datasetSigmaSq` — Updated variance statistic of several batches of data
numeric vector

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.