dltranspconv

Deep learning transposed convolution

Description

The transposed convolution operation upsamples feature maps.

Note

This function applies the deep learning transposed convolution operation to dlarray data. If you want to apply transposed convolution within a layerGraph object or Layer array, use one of the following layers:

example

dlY = dltranspconv(dlX,weights,bias) computes the deep learning transposed convolution of the input dlX using the filters defined by weights, and adds a constant bias. The input dlX is a formatted dlarray with dimension labels. Transposed convolution acts on dimensions that you specify as 'S' and 'C' dimensions. The output dlY is a formatted dlarray with the same dimension labels as dlX.

dlY = dltranspconv(dlX,weights,bias,'DataFormat',FMT) also specifies the dimension format FMT when dlX is not a formatted dlarray. The output dlY is an unformatted dlarray with the same dimension order as dlX.

example

dlY = dltranspconv(___Name,Value) specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, 'Stride',3 sets the stride of the convolution operation.

Examples

collapse all

Convolve an image and then use transposed convolution to resize the convolved image to the same size as the original image.

Import the image data and convert it to a dlarray.

X = imread('sherlock.jpg');
dlX = dlarray(single(X),'SSC');

Display the image.

imshow(X)

Initialize the convolutional filters and bias term. Specify an ungrouped convolution that applies a single filter to all three channels of the input data.

filterHeight = 10;
filterWidth = 10;
numChannelsPerGroup = 3;
numFiltersPerGroup = 1;
numGroups = 1;

weights = rand(filterHeight,filterWidth,numChannelsPerGroup,numFiltersPerGroup,numGroups);
bias = rand(numFiltersPerGroup*numGroups,1);

Perform the convolution. Use a 'Stride' value of 2 and a 'DilationFactor' value of 2.

dlY = dlconv(dlX,weights,bias,'Stride',2,'DilationFactor',3);

Display the convolved image.

Y = extractdata(dlY);
imshow(rescale(Y))

Initialize the transposed convolutional filters and bias. Specify an ungrouped transposed convolution that applies three filters to the input. Use the same filter height and filter width as for the convolution operation.

numChannelsPerGroupTC = 1;
numFiltersPerGroupTC = 3;

weightsTC = rand(filterHeight,filterWidth,numFiltersPerGroupTC,numChannelsPerGroupTC,numGroups);
biasTC = rand(numFiltersPerGroupTC*numGroups,1);

Perform the transposed convolution. Use the same stride and dilation factor as for the convolution operation.

dlZ = dltranspconv(dlY,weightsTC,biasTC,'Stride',2,'DilationFactor',3);

Display the image after the transposed convolution.

Z = extractdata(dlZ);
imshow(rescale(Z))

Compare the size of the original image, the convolved image, and the image after the transposed convolution.

sizeX = size(X)
sizeX = 1×3

   640   960     3

sizeY = size(Y)
sizeY = 1×2

   307   467

sizeZ = size(Z)
sizeZ = 1×3

   640   960     3

The transposed convolution upsamples the convolved data to the size of the original input data.

Apply transposed convolution to the input data in three groups of two channels each. Apply four filters per group.

Create the input data as ten observations of size 100-by-100 with six channels.

height = 100;
width = 100;
channels = 6;
numObservations = 10;

X = rand(height,width,channels,numObservations);
dlX = dlarray(X,'SSCB');

Initialize the filters for the transposed convolution operation. Specify three groups of transposed convolutions that each apply four filters to two channels of the input data.

filterHeight = 8;
filterWidth = 8;
numChannelsPerGroup = 2;
numFiltersPerGroup = 4;
numGroups = 3;

weights = rand(filterHeight,filterWidth,numFiltersPerGroup,numChannelsPerGroup,numGroups);

Initialize the bias term.

bias = rand(numFiltersPerGroup*numGroups,1);

Perform the transposed convolution.

dlY = dltranspconv(dlX,weights,bias);
size(dlY)
ans = 1×4

   107   107    12    10

dims(dlY)
ans = 
'SSCB'

The 12 channels of the convolution output represent the three groups of transposed convolutions with four filters per group.

Input Arguments

collapse all

Input data, specified as a dlarray with or without dimension labels or a numeric array. When dlX is not a formatted dlarray, you must specify the dimension label format using 'DataFormat',FMT. If dlX is a numeric array, at least one of weights or bias must be a dlarray.

Convolution acts on dimensions that you specify as spatial dimensions using the 'S' dimension label. You can specify up to three dimensions in dlX as 'S' dimensions.

Data Types: single | double

Filters, specified as a dlarray with or without labels or a numeric array. The weights argument specifies the size and values of the filters, as well as the number of filters and the number of groups for grouped transposed convolutions.

Specify weights as a filterSize-by-numFiltersPerGroup-by-numChannelsPerGroup-by-numGroups array.

  • filterSize — Size of the convolutional filters. filterSize can have up to three dimensions, depending on the number of spatial dimensions in the input data.

    Input Data 'S' DimensionsfilterSize
    1-Dh, where h corresponds to the height of the filter
    2-D h-by-w, where h and w correspond to the height and width of the filter, respectively
    3-Dh-by-w-by-d, where h, w, and d correspond to the height, width, and depth of the filter, respectively

  • numFiltersPerGroup — Number of filters to apply within each group.

  • numChannelsPerGroup — Number of channels within each group for grouped transposed convolutions. numChannelsPerGroup must equal the number of channels in the input data divided by numGroups, the number of groups. For ungrouped convolutions, where numGroups = 1, numChannelsPerGroup must equal the number of channels in the input data.

  • numGroups — Number of groups (optional). When numGroups > 1, the function performs grouped transposed convolutions. When numGroups = 1, the function performs ungrouped transposed convolutions; in this case, this dimension is singleton and can be omitted.

If weights is a formatted dlarray, it can have multiple spatial dimensions labeled 'S', one channel dimension labeled 'C', and up to two other dimensions labeled 'U'. The number of 'S' dimensions must match the number of 'S' dimensions of the input data. The labeled dimensions correspond to the filter specifications as follows.

Filter SpecificationDimension Labels
filterSizeUp to three 'S' dimensions
numFiltersPerGroup'C' dimension
numChannelsPerGroupFirst 'U' dimension
numGroups (optional)Second 'U' dimension

Data Types: single | double

Bias constant, specified as a dlarray vector or dlarray scalar with or without labels, a numeric vector, or a numeric scalar.

  • If bias is a scalar or has only singleton dimensions, the same bias is applied to each entry of the output.

  • If bias has a nonsingleton dimension, each element of bias is the bias applied to the corresponding convolutional filter specified by weights. The number of elements of bias must match the number of filters specified by weights.

If bias is a formatted dlarray, the nonsingleton dimension must be a channel dimension labeled 'C'.

Data Types: single | double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Stride',2 sets the stride of each filter to 2.

Dimension order of unformatted input data, specified as the comma-separated pair consisting of 'DataFormat' and a character array or string FMT that provides a label for each dimension of the data. Each character in FMT must be one of the following:

  • 'S' — Spatial

  • 'C' — Channel

  • 'B' — Batch (for example, samples and observations)

  • 'T' — Time (for example, sequences)

  • 'U' — Unspecified

You can specify multiple dimensions labeled 'S' or 'U'. You can use the labels 'C', 'B', and 'T' at most once.

You must specify 'DataFormat' when the input data dlX is not a formatted dlarray.

Example: 'DataFormat','SSCB'

Data Types: char | string

Step size for traversing the input data, specified as the comma-separated pair consisting of 'Stride' and a numeric scalar or numeric vector. If you specify 'Stride' as a scalar, the same value is used for all spatial dimensions. If you specify 'Stride' as a vector of the same size as the number of spatial dimensions of the input data, the vector values are used for the corresponding spatial dimensions.

The default value of 'Stride' is 1.

Example: 'Stride',3

Data Types: single | double

Filter dilation factor, specified as the comma-separated pair consisting of 'DilationFactor' and one of the following.

  • Numeric scalar — The same dilation factor value is applied for all spatial dimensions.

  • Numeric vector — A different dilation factor value is applied along each spatial dimension. Use a vector of size d, where d is the number of spatial dimensions of the input data. The ith element of the vector specifies the dilation factor applied to the ith spatial dimension.

Use the dilation factor to increase the receptive field of the filter (the area of the input that the filter can see) on the input data. Using a dilation factor corresponds to an effective filter size of filterSize + (filterSize-1)*(dilationFactor-1).

Example: 'DilationFactor',2

Data Types: single | double

Cropping applied to edges of data, specified as the comma-separated pair consisting of 'Cropping' and one of the following.

  • 'same' — Cropping is set so that the output size is the same as the input size when the stride is 1. More generally, the output size of each spatial dimension is inputSize*stride, where inputSize is the size of the input along a spatial dimension.

  • Numeric scalar — The same cropping value is applied to both ends of all spatial dimensions.

  • Numeric vector — A different cropping value is applied along each spatial dimension. Use a vector of size d, where d is the number of spatial dimensions of the input data. The ith element of the vector specifies the cropping applied to the start and the end along the ith spatial dimension.

  • Numeric matrix — A different cropping value is applied to the start and end of each spatial dimension. Use a matrix of size 2-by-d, where d is the number of spatial dimensions of the input data. The element (1,d) specifies the cropping applied to the start of spatial dimension d. The element (2,d) specifies the cropping applied to the end of spatial dimension d. For example, in 2-D the format is [top, left; bottom, right].

Example: 'Cropping','same'

Data Types: single | double

Output Arguments

collapse all

Feature map, returned as a dlarray. The output dlY has the same underlying data type as the input dlX.

If the input data dlX is a formatted dlarray, dlY has the same dimension labels as dlX. If the input data is not a formatted dlarray, dlY is an unformatted dlarray or numeric array with the same dimension order as the input data.

The size of the 'C' channel dimension of dlY depends on the size of the weights input. The size of the 'C' dimension of output Y is the product of the size of the dimensions numFiltersPerGroup and numGroups in the weights argument. If weights is a formatted dlarray, this product is the same as the product of the size of the 'C' dimension and the second 'U' dimension.

Extended Capabilities

Introduced in R2019b