fillmissing

Fill missing values

Syntax

F = fillmissing(A,'constant',v)

F = fillmissing(A,method)

F = fillmissing(A,movmethod,window)

F = fillmissing(___,dim)

F = fillmissing(___,Name,Value)

[F,TF] =
fillmissing(___)

Description

example

F = fillmissing(A,'constant',v) fills missing entries of an array or table with the constant value v. If A is a matrix or multidimensional array, then v can be either a scalar or a vector. When v is a vector, each element specifies the fill value in the corresponding column of A. If A is a table or timetable, then v can also be a cell array.

Missing values are defined according to the data type of A:

NaN — double, single, duration, and calendarDuration
NaT — datetime
<missing> — string
<undefined> — categorical
' ' — char
{''} — cell of character arrays

If A is a table, then the data type of each column defines the missing value for that column.

example

F = fillmissing(A,method) fills missing entries using the method specified by method. For example, fillmissing(A,'previous') fills missing entries with the previous non-missing entry of A.

example

F = fillmissing(A,movmethod,window) fills missing entries using a moving window mean or median with window length window. For example, fillmissing(A,'movmean',5) fills data with a moving average using a window length of 5.

example

F = fillmissing(___,dim) specifies the dimension of A to operate along. By default, fillmissing operates along the first dimension whose size does not equal 1. For example, if A is a matrix, then fillmissing(A,2) operates across the columns of A, filling missing data row by row.

example

F = fillmissing(___,Name,Value) specifies additional parameters for filling missing values using one or more name-value pair arguments. For example, if t is a vector of time values, then fillmissing(A,'linear','SamplePoints',t) interpolates the data in A relative to the times in t.

example

[F,TF] = fillmissing(___) also returns a logical array corresponding to the entries of A that were filled.

Examples

collapse all

Vector with `NaN` Values

Open Live Script

Create a vector that contains NaN values and replace each NaN with the previous non-missing value.

A = [1 3 NaN 4 NaN NaN 5];
F = fillmissing(A,'previous')

F = 1×7

     1     3     3     4     4     4     5

Interpolate Missing Data

Open Live Script

Use interpolation to replace NaN values in non-uniformly sampled data.

Define a vector of non-uniform sample points and evaluate the sine function over the points.

x = [-4*pi:0.1:0, 0.1:0.2:4*pi];
A = sin(x);

Inject NaN values into A.

A(A < 0.75 & A > 0.5) = NaN;

Fill the missing data using linear interpolation, and return the filled vector F and the logical vector TF. The value 1 (true) in entries of TF corresponds to the values of F that were filled.

[F,TF] = fillmissing(A,'linear','SamplePoints',x);

Plot the original data and filled data.

plot(x,A,'.', x(TF),F(TF),'o')
xlabel('x');
ylabel('sin(x)')
legend('Original Data','Filled Missing Data')

Replace `NaN` with Moving Median

Open Live Script

Use a moving median to fill missing numeric data.

Create a vector of sample points x and a vector of data A that contains missing values.

x = linspace(0,10,200); 
A = sin(x) + 0.5*(rand(size(x))-0.5); 
A([1:10 randi([1 length(x)],1,50)]) = NaN;

Replace NaN values in A using a moving median with a window of length 10, and plot both the original data and the filled data.

F = fillmissing(A,'movmedian',10);  
plot(x,F,'r.-',x,A,'b.-') 
legend('Filled Missing Data','Original Data')

Matrix with Missing Endpoints

Open Live Script

Create a matrix with missing entries and fill across the columns (second dimension) one row at a time using linear interpolation. For each row, fill leading and trailing missing values with the nearest non-missing value in that row.

A = [NaN NaN 5 3 NaN 5 7 NaN 9 NaN;
     8 9 NaN 1 4 5 NaN 5 NaN 5;
     NaN 4 9 8 7 2 4 1 1 NaN]

A = 3×10

   NaN   NaN     5     3   NaN     5     7   NaN     9   NaN
     8     9   NaN     1     4     5   NaN     5   NaN     5
   NaN     4     9     8     7     2     4     1     1   NaN

F = fillmissing(A,'linear',2,'EndValues','nearest')

F = 3×10

     5     5     5     3     4     5     7     8     9     9
     8     9     5     1     4     5     5     5     5     5
     4     4     9     8     7     2     4     1     1     1

Table with Multiple Data Types

Open Live Script

Fill missing values for table variables with different data types.

Create a table whose variables include categorical, double, and char data types.

A = table(categorical({'Sunny';'Cloudy';''}),[66;NaN;54],{'';'N';'Y'},[37;39;NaN],...
    'VariableNames',{'Description' 'Temperature' 'Rain' 'Humidity'})

A=3×4 table
    Description    Temperature       Rain       Humidity
    ___________    ___________    __________    ________

    Sunny               66        {0x0 char}       37   
    Cloudy             NaN        {'N'     }       39   
    <undefined>         54        {'Y'     }      NaN

Replace all missing entries with the value from the previous entry. Since there is no previous element in the Rain variable, the missing character vector is not replaced.

F = fillmissing(A,'previous')

F=3×4 table
    Description    Temperature       Rain       Humidity
    ___________    ___________    __________    ________

      Sunny            66         {0x0 char}       37   
      Cloudy           66         {'N'     }       39   
      Cloudy           54         {'Y'     }       39

Replace the NaN values from the Temperature and Humidity variables in A with 0.

F = fillmissing(A,'constant',0,'DataVariables',{'Temperature','Humidity'})

F=3×4 table
    Description    Temperature       Rain       Humidity
    ___________    ___________    __________    ________

    Sunny              66         {0x0 char}       37   
    Cloudy              0         {'N'     }       39   
    <undefined>        54         {'Y'     }        0

Alternatively, use the isnumeric function to identify the numeric variables to operate on.

F = fillmissing(A,'constant',0,'DataVariables',@isnumeric)

F=3×4 table
    Description    Temperature       Rain       Humidity
    ___________    ___________    __________    ________

    Sunny              66         {0x0 char}       37   
    Cloudy              0         {'N'     }       39   
    <undefined>        54         {'Y'     }        0

Input Arguments

collapse all

`A` — Input data
vector | matrix | multidimensional array | table | timetable

Input data, specified as a vector, matrix, multidimensional array, table, or timetable.

If A is a timetable, then only table values are filled. If the associated vector of row times contains a NaT or NaN value, then fillmissing produces an error. Row times must be unique and listed in ascending order.

`v` — Fill constant
scalar | vector | cell array

Fill constant, specified as a scalar, vector, or cell array. v can be a vector when A is a matrix or multidimensional array. v can be a cell array when A is a table or timetable.

`method` — Fill method
`'previous'` | `'next'` | `'nearest'` | `'linear'` | `'spline'` | `'pchip'` | `'makima'`

Fill method, specified as one of the following:

Method	Description
`'previous'`	previous non-missing value
`'next'`	next non-missing value
`'nearest'`	nearest non-missing value
`'linear'`	linear interpolation of neighboring, non-missing values (numeric, `duration`, and `datetime` data types only)
`'spline'`	piecewise cubic spline interpolation (numeric, `duration`, and `datetime` data types only)
`'pchip'`	shape-preserving piecewise cubic spline interpolation (numeric, `duration`, and `datetime` data types only)
`'makima'`	modified Akima cubic Hermite interpolation (numeric, `duration`, and `datetime` data types only)

`movmethod` — Moving method
`'movmean'` | `'movmedian'`

Moving method to fill missing data, specified as one of the following:

Method	Description
`'movmean'`	Moving average over a window of length `window` (numeric data types only)
`'movmedian'`	Moving median over a window of length `window` (numeric data types only)

`window` — Window length
positive integer scalar | two-element vector of positive integers | positive duration scalar | two-element vector of positive durations

Window length, specified as a positive integer scalar, a two-element vector of positive integers, a positive duration scalar, or a two-element vector of positive durations.

When window is a positive integer scalar, then the window is centered about the current element and contains window-1 neighboring elements. If window is even, then the window is centered about the current and previous elements. If window is a two-element vector of positive integers [b f], then the window contains the current element, b elements backward, and f elements forward.

When A is a timetable or 'SamplePoints' is specified as a datetime or duration vector, window must be of type duration, and the windows are computed relative to the sample points.

`dim` — Dimension to operate along
positive integer scalar

Dimension to operate along, specified as a positive integer scalar. If no value is specified, then the default is the first array dimension whose size does not equal 1.

When A is a table or timetable, dim is not supported. fillmissing operates along each table or timetable variable separately.

Consider a two-dimensional input array, A.

If dim=1, then fillmissing fills A column by column.
If dim=2, then fillmissing fills A row by row.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: fillmissing(A,'DataVariables',{'Temperature','Altitude'}) fills only the columns corresponding to the Temperature and Altitude variables of an input table

`'EndValues'` — Method for handling endpoints
`'extrap'` (default) | `'previous'` | `'next'` | `'nearest'` | `'none'` | scalar

Method for handling endpoints, specified as the comma-separated pair consisting of 'EndValues' and one of 'extrap', 'previous', 'next', 'nearest', 'none', or a constant scalar value. The endpoint fill method handles leading and trailing missing values based on the following definitions:

Method	Description
`'extrap'`	same as `method`
`'previous'`	previous non-missing value
`'next'`	next non-missing value
`'nearest'`	nearest non-missing value
`'none'`	no fill value
scalar	constant value (numeric, `duration`, and `datetime` data types only)

`'SamplePoints'` — Sample points
vector

Sample points for fill method, specified as the comma-separated pair consisting of 'SamplePoints' and a vector. The sample points represent the location of the data in A, and must be sorted and contain unique elements. Sample points do not need to be uniformly sampled. If A is a timetable, then the default sample points vector is the vector of row times. Otherwise, the default vector is [1 2 3 ...].

Moving windows are defined relative to the sample points. For example, if t is a vector of times corresponding to the input data, then fillmissing(rand(1,10),'movmean',3,'SamplePoints',t) has a window that represents the time interval between t(i)-1.5 and t(i)+1.5.

When the sample points vector has data type datetime or duration, then the moving window length must have type duration.

This name-value pair is not supported when the input data is a timetable.

Data Types: double | single | datetime | duration

`'DataVariables'` — Table variables to fill
variable name | cell array of variable names | numeric vector | logical vector | function handle | table `vartype` subscript

Table variables to fill, specified as the comma-separated pair consisting of 'DataVariables' and a variable name, a cell array of variable names, a numeric vector, a logical vector, a function handle, or a table vartype subscript. The 'DataVariables' value indicates which columns of the input table to fill, and can be one of the following:

A character vector specifying a single table variable name
A cell array of character vectors where each element is a table variable name
A vector of table variable indices
A logical vector whose elements each correspond to a table variable, where true includes the corresponding variable and false excludes it
A function handle that returns a logical scalar, such as @isnumeric
A table vartype subscript

Example: 'Age'

Example: {'Height','Weight'}

Example: @iscategorical

Example: vartype('numeric')

`'MissingLocations'` — Known missing indicator
vector | matrix | multidimensional array

Known missing indicator, specified as the comma-separated pair consisting of 'MissingLocations' and a logical vector, matrix, or multidimensional array of the same size as A. The indicator elements can be true to indicate a missing value in the corresponding location of A or false otherwise.

Data Types: logical

`'MaxGap'` — Maximum gap size to fill
numeric scalar | `duration` scalar | `calendarDuration` scalar

Maximum gap size to fill, specified as a numeric scalar, duration scalar, or calendarDuration scalar. Gaps are clusters of consecutive missing values whose size is the distance between the non-missing values surrounding the gap. The gap size is computed in units relative to the sample points. Gaps smaller than or equal to the max gap size are filled, and gaps larger than the gap size are not.

For example, consider the vector y = [25 NaN NaN 100] using the default sample points [1 2 3 4]. The gap size in the vector is computed from the sample points as 4 - 1 = 3, so a MaxGap value of 2 leaves the missing values unaltered, while a MaxGap value of 3 fills in the missing values.

For missing values at the beginning or end of the data:

A single missing value at the end of the input data has a gap size of 0 and is always filled.
Clusters of missing values occurring at the beginning or end of the input data are not completely surrounded by non-missing values, so the gap size is computed using the nearest existing sample points. For the default sample points 1:N, this produces a gap size that is 1 smaller than if the same cluster occurred in the middle of the data.

Output Arguments

collapse all

`F` — Filled data
vector | matrix | multidimensional array | table | timetable

Filled data, returned as a vector, matrix, multidimensional array, table, or timetable. F is the same size as A.

`TF` — Filled data indicator
vector | matrix | multidimensional array

Filled data indicator, returned as a vector, matrix, or multidimensional array. TF is a logical array where 1 (true) corresponds to entries in F that were filled and 0 (false) corresponds to unchanged entries. TF is the same size as A and F.

Data Types: logical

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

Usage notes and limitations:

The 'MaxGap' name-value pair is not supported.
The 'spline' and 'makima' methods are not supported.
The 'SamplePoints' and 'MissingLocations' name-value pairs are not supported.
The 'DataVariables' name-value pair cannot specify a function handle.
The 'EndValues' name-value pair can only specify 'extrap'.
The syntax fillmissing(A,movmethod,window) is not supported when A is a tall timetable.
The syntax fillmissing(A,'constant',v) must specify a scalar value for v.
The syntax fillmissing(A,___) does not support character vector variables when A is a tall table or tall timetable.

For more information, see Tall Arrays.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

The 'MaxGap' name-value pair is not supported.
The 'makima' option is not supported.
When the 'SamplePoints' value has type datetime or the input data is a timetable with datetime row times, only the methods 'constant', 'movmean', and 'movmedian' are supported.
By default, table and timetable inputs are assumed to contain data that does not change size. Therefore, the specified fill method must not use a fill value that changes the size of the values it replaces, and the size of the fill value must be constant between replacements.

Documentation

fillmissing

Syntax

Description

Examples

Vector with `NaN` Values

Interpolate Missing Data

Replace `NaN` with Moving Median

Matrix with Missing Endpoints

Table with Multiple Data Types

Input Arguments

`A` — Input data
vector | matrix | multidimensional array | table | timetable

`v` — Fill constant
scalar | vector | cell array

`method` — Fill method
`'previous'` | `'next'` | `'nearest'` | `'linear'` | `'spline'` | `'pchip'` | `'makima'`

`movmethod` — Moving method
`'movmean'` | `'movmedian'`

`window` — Window length
positive integer scalar | two-element vector of positive integers | positive duration scalar | two-element vector of positive durations

`dim` — Dimension to operate along
positive integer scalar

Name-Value Pair Arguments

`'EndValues'` — Method for handling endpoints
`'extrap'` (default) | `'previous'` | `'next'` | `'nearest'` | `'none'` | scalar

`'SamplePoints'` — Sample points
vector

`'DataVariables'` — Table variables to fill
variable name | cell array of variable names | numeric vector | logical vector | function handle | table `vartype` subscript

`'MissingLocations'` — Known missing indicator
vector | matrix | multidimensional array

`'MaxGap'` — Maximum gap size to fill
numeric scalar | `duration` scalar | `calendarDuration` scalar

Output Arguments

`F` — Filled data
vector | matrix | multidimensional array | table | timetable

`TF` — Filled data indicator
vector | matrix | multidimensional array

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

See Also

Topics

MATLAB Documentation

Support

Documentation

fillmissing

Syntax

Description

Examples

Vector with NaN Values

Interpolate Missing Data

Replace NaN with Moving Median

Matrix with Missing Endpoints

Table with Multiple Data Types

Input Arguments

A — Input data vector | matrix | multidimensional array | table | timetable

v — Fill constant scalar | vector | cell array

method — Fill method 'previous' | 'next' | 'nearest' | 'linear' | 'spline' | 'pchip' | 'makima'

movmethod — Moving method 'movmean' | 'movmedian'

window — Window length positive integer scalar | two-element vector of positive integers | positive duration scalar | two-element vector of positive durations

dim — Dimension to operate along positive integer scalar

Name-Value Pair Arguments

'EndValues' — Method for handling endpoints 'extrap' (default) | 'previous' | 'next' | 'nearest' | 'none' | scalar

'SamplePoints' — Sample points vector

'DataVariables' — Table variables to fill variable name | cell array of variable names | numeric vector | logical vector | function handle | table vartype subscript

'MissingLocations' — Known missing indicator vector | matrix | multidimensional array

'MaxGap' — Maximum gap size to fill numeric scalar | duration scalar | calendarDuration scalar

Output Arguments

F — Filled data vector | matrix | multidimensional array | table | timetable

TF — Filled data indicator vector | matrix | multidimensional array

Extended Capabilities

Tall Arrays Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

See Also

Topics

MATLAB Documentation

Support

Vector with `NaN` Values

Replace `NaN` with Moving Median

`A` — Input data
vector | matrix | multidimensional array | table | timetable

`v` — Fill constant
scalar | vector | cell array

`method` — Fill method
`'previous'` | `'next'` | `'nearest'` | `'linear'` | `'spline'` | `'pchip'` | `'makima'`

`movmethod` — Moving method
`'movmean'` | `'movmedian'`

`window` — Window length
positive integer scalar | two-element vector of positive integers | positive duration scalar | two-element vector of positive durations

`dim` — Dimension to operate along
positive integer scalar

`'EndValues'` — Method for handling endpoints
`'extrap'` (default) | `'previous'` | `'next'` | `'nearest'` | `'none'` | scalar

`'SamplePoints'` — Sample points
vector

`'DataVariables'` — Table variables to fill
variable name | cell array of variable names | numeric vector | logical vector | function handle | table `vartype` subscript

`'MissingLocations'` — Known missing indicator
vector | matrix | multidimensional array

`'MaxGap'` — Maximum gap size to fill
numeric scalar | `duration` scalar | `calendarDuration` scalar

`F` — Filled data
vector | matrix | multidimensional array | table | timetable

`TF` — Filled data indicator
vector | matrix | multidimensional array

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.