lstm

Long short-term memory

Description

The long short-term memory (LSTM) operation allows a network to learn long-term dependencies between time steps in time series and sequence data.

Note

This function applies the deep learning LSTM operation to dlarray data. If you want to apply an LSTM operation within a layerGraph object or Layer array, use the following layer:

example

dlY = lstm(dlX,H0,C0,weights,recurrentWeights,bias) applies a long short-term memory (LSTM) calculation to input dlX using the initial hidden state H0, initial cell state C0, and parameters weights, recurrentWeights, and bias. The input dlX is a formatted dlarray with dimension labels. The output dlY is a formatted dlarray with the same dimension labels as dlX, except for any 'S' dimensions.

The lstm function updates the cell and hidden states using the hyperbolic tangent function (tanh) as the state activation function. The lstm function uses the sigmoid function given by σ(x)=(1+ex)1 as the gate activation function.

[dlY,hiddenState,cellState] = lstm(dlX,H0,C0,weights,recurrentWeights,bias) also returns the hidden state and cell state after the LSTM operation.

[___] = lstm(___,'DataFormat',FMT) also specifies the dimension format FMT when dlX is not a formatted dlarray. The output dlY is an unformatted dlarray with the same dimension order as dlX, except for any 'S' dimensions.

Examples

collapse all

Perform an LSTM operation using three hidden units.

Create the input sequence data as 32 observations with 10 channels and a sequence length of 64

numFeatures = 10;
numObservations = 32;
sequenceLength = 64;

X = randn(numFeatures,numObservations,sequenceLength);
dlX = dlarray(X,'CBT');

Create the initial hidden and cell states with three hidden units. Use the same initial hidden state and cell state for all observations.

numHiddenUnits = 3;
H0 = zeros(numHiddenUnits,1);
C0 = zeros(numHiddenUnits,1);

Create the learnable parameters for the LSTM operation.

weights = dlarray(randn(4*numHiddenUnits,numFeatures),'CU');
recurrentWeights = dlarray(randn(4*numHiddenUnits,numHiddenUnits),'CU');
bias = dlarray(randn(4*numHiddenUnits,1),'C');

Perform the LSTM calculation

[dlY,hiddenState,cellState] = lstm(dlX,H0,C0,weights,recurrentWeights,bias);

View the size and dimensions of dlY.

size(dlY)
ans = 1×3

     3    32    64

dlY.dims
ans = 
'CBT'

View the size of hiddenState and cellState.

size(hiddenState)
ans = 1×2

     3    32

size(cellState)
ans = 1×2

     3    32

Check that the output hiddenState is the same as the last time step of output dlY.

if extractdata(dlY(:,:,end)) == hiddenState
   disp("The hidden state and the last time step are equal.");
else 
   disp("The hidden state and the last time step are not equal.")
end
The hidden state and the last time step are equal.

You can use the hidden state and cell state to keep track of the state of the LSTM operation and input further sequential data.

Input Arguments

collapse all

Input data, specified as a dlarray with or without dimension labels or a numeric array. When dlX is not a formatted dlarray, you must specify the dimension label format using 'DataFormat',FMT. If dlX is a numeric array, at least one of H0, C0, weights, recurrentWeights, or bias must be a dlarray.

dlX must contain a sequence dimension labeled 'T'. If dlX has any spatial dimensions labeled 'S', they are flattened into the 'C' channel dimensions. If dlX has any unspecified dimensions labeled 'U', they must be singleton.

Data Types: single | double

Initial hidden state vector, specified as a dlarray with or without dimension labels or a numeric array.

If H0 is a formatted dlarray, it must contain a channel dimension labeled 'C' and optionally a batch dimension labeled 'B' with the same size as the 'B' dimension of dlX. If H0 does not have a 'B' dimension, the function uses the same hidden state vector for each observation in dlX.

The size of the 'C' dimension determines the number of hidden units. The size of the 'C' dimension of H0 must be equal to the size of the 'C' dimensions of C0.

If H0 is a not a formatted dlarray, the size of the first dimension determines the number of hidden units and must be the same size as the first dimension or the 'C' dimension of C0.

Data Types: single | double

Initial cell state vector, specified as a dlarray with or without dimension labels or a numeric array.

If C0 is a formatted dlarray, it must contain a channel dimension labeled 'C' and optionally a batch dimension labeled 'B' with the same size as the 'B' dimension of dlX. If C0 does not have a 'B' dimension, the function uses the same cell state vector for each observation in dlX.

The size of the 'C' dimension determines the number of hidden units. The size of the 'C' dimension of C0 must be equal to the size of the 'C' dimensions of H0.

If C0 is a not a formatted dlarray, the size of the first dimension determines the number of hidden units and must be the same size as the first dimension or the 'C' dimension of H0.

Data Types: single | double

Weights, specified as a dlarray with or without dimension labels or a numeric array.

Specify weights as a matrix of size 4*NumHiddenUnits-by-InputSize, where NumHiddenUnits is the size of the 'C' dimension of both C0 and H0, and InputSize is the size of the 'C' dimension of dlX multiplied by the size of each 'S' dimension of dlX, where present.

If weights is a formatted dlarray, it must contain a 'C' dimension of size 4*NumHiddenUnits and a 'U' dimension of size InputSize.

Data Types: single | double

Recurrent weights, specified as a dlarray with or without dimension labels or a numeric array.

Specify recurrentWeights as a matrix of size 4*NumHiddenUnits-by-NumHiddenUnits, where NumHiddenUnits is the size of the 'C' dimension of both C0 and H0.

If recurrentWeights is a formatted dlarray, it must contain a 'C' dimension of size 4*NumHiddenUnits and a 'U' dimension of size NumHiddenUnits.

Data Types: single | double

Bias, specified as a dlarray vector with or without dimension labels or a numeric vector.

Specify bias as a vector of length 4*NumHiddenUnits, where NumHiddenUnits is the size of the 'C' dimension of both C0 and H0.

If bias is a formatted dlarray, the nonsingleton dimension must be labeled with 'C'.

Data Types: single | double

Dimension order of unformatted input data, specified as the comma-separated pair consisting of 'DataFormat' and a character array or string FMT that provides a label for each dimension of the data. Each character in FMT must be one of the following:

  • 'S' — Spatial

  • 'C' — Channel

  • 'B' — Batch (for example, samples and observations)

  • 'T' — Time (for example, sequences)

  • 'U' — Unspecified

You can specify multiple dimensions labeled 'S' or 'U'. You can use the labels 'C', 'B', and 'T' at most once.

You must specify 'DataFormat',FMT when the input data dlX is not a formatted dlarray.

Example: 'DataFormat','SSCB'

Data Types: char | string

Output Arguments

collapse all

LSTM output, returned as a dlarray. The output dlY has the same underlying data type as the input dlX.

If the input data dlX is a formatted dlarray, dlY has the same dimension labels as dlX, except for any 'S' dimensions. If the input data is not a formatted dlarray, dlY is an unformatted dlarray with the same dimension order as the input data.

The size of the 'C' dimension of dlY is the same as the number of hidden units, specified by the size of the 'C' dimension of H0 or C0.

Hidden state vector for each observation, returned as a dlarray or a numeric array with the same data type as H0.

If the input H0 is a formatted dlarray, then the output hiddenState is a formatted dlarray with the format 'CB'.

Cell state vector for each observation, returned as a dlarray or a numeric array. cellState is returned with the same data type as C0.

If the input C0 is a formatted dlarray, the output cellState is returned as a formatted dlarray with the format 'CB'.

Limitations

  • functionToLayerGraph does not support the lstm function. If you use functionToLayerGraph with a function that contains the lstm operation, the resulting LayerGraph contains placeholder layers.

More About

collapse all

Long Short-Term Memory

The LSTM operation allows a network to learn long-term dependencies between time steps in time series and sequence data. For more information, see the definition of Long Short-Tem Memory Layer on the lstmLayer reference page.

Extended Capabilities

Introduced in R2019b