groupnorm

Normalize activations across groups of channels

    Description

    The group normalization operation divides the channels of the input data into groups and normalizes the activations across each group. To speed up training of convolutional neural networks and reduce the sensitivity to network initialization, use group normalization between convolution and nonlinear operations such as relu. You can perform instance normalization and layer normalization by setting the appropriate number of groups.

    Note

    This function applies the group normalization operation to dlarray data. If you want to apply batch normalization within a layerGraph object or Layer array, use the following layer:

    example

    dlY = groupnorm(dlX,numGroups,offset,scaleFactor) normalizes each observation in dlX across groups of channels specified by numGroups, then applies a scale factor and offset to each channel.

    The normalized activation is calculated using the following formula:

    x^i=xiμgσg2+ε

    where xi is the input activation, μg and σg2 are the per-group mean and variance, respectively, and ε is a small constant. The mean and variance are calculated per-observation over all 'S' (spatial), 'T' (time), and 'U' (unspecified) dimensions in dlX for each group of channels.

    The normalized activation is offset and scaled according to the following formula:

    yi=γx^i+β.

    The offset β and scale factor γ are specified with the offset and scaleFactor arguments.

    The input dlX is a formatted dlarray with dimension labels. The output dlY is a formatted dlarray with the same dimension labels as dlX.

    example

    dlY = groupnorm(___,'DataFormat',FMT) also specifies the dimension format FMT when dlX is not a formatted dlarray in addition to the input arguments in previous syntaxes. The output dlY is an unformatted dlarray with the same dimension order as dlX.

    example

    dlY = groupnorm(___Name,Value) specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, 'Epsilon',3e-5 sets the variance offset.

    Examples

    collapse all

    Use groupnorm to normalize input data across channel groups.

    Create the input data as a single observation of random values with a height and width of four and six channels.

    height = 4;
    width = 4;
    channels = 6;
    observations = 1;
    
    X = rand(height,width,channels,observations);
    dlX = dlarray(X,'SSCB');

    Create the learnable parameters.

    offset = zeros(channels,1);
    scaleFactor = ones(channels,1);

    Compute the group normalization. Divide the input into three groups of two channels each.

    numGroups = 3;
    dlY = groupnorm(dlX,numGroups,offset,scaleFactor);
    

    Input Arguments

    collapse all

    Input data, specified as a dlarray with or without dimension labels or a numeric array. When dlX is not a formatted dlarray, you must specify the dimension label format using 'DataFormat',FMT. If dlX is a numeric array, at least one of offset or scaleFactor must be a dlarray.

    dlX must have a 'C' channel dimension.

    Data Types: single | double

    Channel groups to normalize across, specified as a positive integer, "all-channels", or "channel-wise".

    numGroupsDescription
    positive integerThe function divides the number of channels in dlX into the specified number of groups. The specified number of groups must exactly divide the number of channels in dlX.
    "all-channels"All channels in dlX are combined into a single group. The input data is normalized across all channels. This type of normalization is also known as layer normalization.
    "channel-wise"Each channel in dlX is considered as a single group and is normalized separately. This type of normalization is also known as instance normalization.

    Data Types: single | double | char | string

    Channel offset β, specified as a dlarray vector with or without dimension labels or a numeric vector.

    If offset is a formatted dlarray, it must contain a 'C' dimension of the same size as the 'C' dimension of the input data.

    Data Types: single | double

    Channel scale factor γ, specified as a dlarray vector with or without dimension labels or a numeric vector.

    If scaleFactor is a formatted dlarray, it must contain a 'C' dimension of the same size as the 'C' dimension of the input data.

    Data Types: single | double

    Name-Value Pair Arguments

    Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

    Example: 'Epsilon',3e-5 sets the variance offset to 3e-5 and 0.5, respectively.

    Dimension order of unformatted input data, specified as the comma-separated pair consisting of 'DataFormat' and a character array or string FMT that provides a label for each dimension of the data. Each character in FMT must be one of the following:

    • 'S' — Spatial

    • 'C' — Channel

    • 'B' — Batch (for example, samples and observations)

    • 'T' — Time (for example, sequences)

    • 'U' — Unspecified

    You can specify multiple dimensions labeled 'S' or 'U'. You can use the labels 'C', 'B', and 'T' at most once.

    You must specify 'DataFormat' when the input data dlX is not a formatted dlarray.

    Example: 'DataFormat','SSCB'

    Data Types: char | string

    Variance offset for preventing divide-by-zero errors, specified as the comma-separated pair consisting of 'Epsilon' and a numeric scalar. The specified value must be greater than 1e-5. The default value is 1e-5.

    Data Types: single | double

    Output Arguments

    collapse all

    Normalized data, returned as a dlarray. The output dlY has the same underlying data type as the input dlX.

    If the input data dlX is a formatted dlarray, dlY has the same dimension labels as dlX. If the input data is not a formatted dlarray, dlY is an unformatted dlarray with the same dimension order as the input data.

    More About

    collapse all

    Group Normalization

    The groupnorm function normalizes each input channel of a mini-batch of data. For more information, see the definition of Group Normalization Layer on the groupNormalizationLayer reference page.

    Extended Capabilities

    Introduced in R2020b