Fill missing values
fills
missing entries of an array or table with the constant value F
= fillmissing(A
,'constant',v
)v
.
If A
is a matrix or multidimensional array, then v
can
be either a scalar or a vector. When v
is a vector,
each element specifies the fill value in the corresponding column
of A
. If A
is a table or timetable,
then v
can also be a cell array.
Missing values are defined according to the data type of A
:
NaN
— double
, single
, duration
,
and calendarDuration
NaT
— datetime
<missing>
— string
<undefined>
— categorical
' '
— char
{''}
— cell
of
character arrays
If A
is a table, then the data type of each
column defines the missing value for that column.
specifies
additional parameters for filling missing values using one or more
name-value pair arguments. For example, if F
= fillmissing(___,Name,Value
)t
is
a vector of time values, then fillmissing(A,'linear','SamplePoints',t)
interpolates
the data in A
relative to the times in t
.
NaN
ValuesCreate a vector that contains NaN
values and replace each NaN
with the previous non-missing value.
A = [1 3 NaN 4 NaN NaN 5];
F = fillmissing(A,'previous')
F = 1×7
1 3 3 4 4 4 5
Use interpolation to replace NaN
values in non-uniformly sampled data.
Define a vector of non-uniform sample points and evaluate the sine function over the points.
x = [-4*pi:0.1:0, 0.1:0.2:4*pi]; A = sin(x);
Inject NaN
values into A
.
A(A < 0.75 & A > 0.5) = NaN;
Fill the missing data using linear interpolation, and return the filled vector F
and the logical vector TF
. The value 1 (true
) in entries of TF
corresponds to the values of F
that were filled.
[F,TF] = fillmissing(A,'linear','SamplePoints',x);
Plot the original data and filled data.
plot(x,A,'.', x(TF),F(TF),'o') xlabel('x'); ylabel('sin(x)') legend('Original Data','Filled Missing Data')
NaN
with Moving MedianUse a moving median to fill missing numeric data.
Create a vector of sample points x
and a vector of data A
that contains missing values.
x = linspace(0,10,200); A = sin(x) + 0.5*(rand(size(x))-0.5); A([1:10 randi([1 length(x)],1,50)]) = NaN;
Replace NaN
values in A
using a moving median with a window of length 10, and plot both the original data and the filled data.
F = fillmissing(A,'movmedian',10); plot(x,F,'r.-',x,A,'b.-') legend('Filled Missing Data','Original Data')
Create a matrix with missing entries and fill across the columns (second dimension) one row at a time using linear interpolation. For each row, fill leading and trailing missing values with the nearest non-missing value in that row.
A = [NaN NaN 5 3 NaN 5 7 NaN 9 NaN; 8 9 NaN 1 4 5 NaN 5 NaN 5; NaN 4 9 8 7 2 4 1 1 NaN]
A = 3×10
NaN NaN 5 3 NaN 5 7 NaN 9 NaN
8 9 NaN 1 4 5 NaN 5 NaN 5
NaN 4 9 8 7 2 4 1 1 NaN
F = fillmissing(A,'linear',2,'EndValues','nearest')
F = 3×10
5 5 5 3 4 5 7 8 9 9
8 9 5 1 4 5 5 5 5 5
4 4 9 8 7 2 4 1 1 1
Fill missing values for table variables with different data types.
Create a table whose variables include categorical
, double
, and char
data types.
A = table(categorical({'Sunny';'Cloudy';''}),[66;NaN;54],{'';'N';'Y'},[37;39;NaN],... 'VariableNames',{'Description' 'Temperature' 'Rain' 'Humidity'})
A=3×4 table
Description Temperature Rain Humidity
___________ ___________ __________ ________
Sunny 66 {0x0 char} 37
Cloudy NaN {'N' } 39
<undefined> 54 {'Y' } NaN
Replace all missing entries with the value from the previous entry. Since there is no previous element in the Rain
variable, the missing character vector is not replaced.
F = fillmissing(A,'previous')
F=3×4 table
Description Temperature Rain Humidity
___________ ___________ __________ ________
Sunny 66 {0x0 char} 37
Cloudy 66 {'N' } 39
Cloudy 54 {'Y' } 39
Replace the NaN
values from the Temperature
and Humidity
variables in A
with 0.
F = fillmissing(A,'constant',0,'DataVariables',{'Temperature','Humidity'})
F=3×4 table
Description Temperature Rain Humidity
___________ ___________ __________ ________
Sunny 66 {0x0 char} 37
Cloudy 0 {'N' } 39
<undefined> 54 {'Y' } 0
Alternatively, use the isnumeric
function to identify the numeric variables to operate on.
F = fillmissing(A,'constant',0,'DataVariables',@isnumeric)
F=3×4 table
Description Temperature Rain Humidity
___________ ___________ __________ ________
Sunny 66 {0x0 char} 37
Cloudy 0 {'N' } 39
<undefined> 54 {'Y' } 0
A
— Input dataInput data, specified as a vector, matrix, multidimensional array, table, or timetable.
If A
is a timetable, then only table values
are filled. If the associated vector of row times contains a NaT
or NaN
value,
then fillmissing
produces an error. Row times must
be unique and listed in ascending order.
Data Types: double
| single
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| logical
| char
| string
| cell
| table
| timetable
| categorical
| datetime
| duration
| calendarDuration
v
— Fill constantFill constant, specified as a scalar, vector, or cell array. v
can
be a vector when A
is a matrix or multidimensional
array. v
can be a cell array when A
is
a table or timetable.
Data Types: double
| single
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| logical
| char
| cell
| categorical
| datetime
| duration
method
— Fill method'previous'
| 'next'
| 'nearest'
| 'linear'
| 'spline'
| 'pchip'
| 'makima'
Fill method, specified as one of the following:
Method | Description |
---|---|
'previous' | previous non-missing value |
'next' | next non-missing value |
'nearest' | nearest non-missing value |
'linear' | linear interpolation of neighboring, non-missing values (numeric, duration ,
and datetime data types only) |
'spline' | piecewise cubic spline interpolation (numeric, duration ,
and datetime data types only) |
'pchip' | shape-preserving piecewise cubic spline interpolation (numeric, duration ,
and datetime data types only) |
'makima' | modified Akima cubic Hermite interpolation (numeric,
duration , and datetime data types
only) |
movmethod
— Moving method'movmean'
| 'movmedian'
Moving method to fill missing data, specified as one of the following:
Method | Description |
---|---|
'movmean' | Moving average over a window of length window (numeric
data types only) |
'movmedian' | Moving median over a window of length window (numeric
data types only) |
window
— Window lengthWindow length, specified as a positive integer scalar, a two-element vector of positive integers, a positive duration scalar, or a two-element vector of positive durations.
When window
is a positive integer scalar,
then the window is centered about the current element and contains window-1
neighboring
elements. If window
is even, then the window is
centered about the current and previous elements. If window
is
a two-element vector of positive integers [b f]
,
then the window contains the current element, b
elements
backward, and f
elements forward.
When A
is a timetable or 'SamplePoints'
is
specified as a datetime
or duration
vector, window
must
be of type duration
, and the windows are computed
relative to the sample points.
Data Types: double
| single
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| duration
dim
— Dimension to operate alongDimension to operate along, specified as a positive integer scalar. If no value is specified, then the default is the first array dimension whose size does not equal 1.
When A
is a table or timetable, dim
is
not supported. fillmissing
operates along each
table or timetable variable separately.
Consider a two-dimensional input array, A
.
If dim=1
, then fillmissing
fills A
column
by column.
If dim=2
, then fillmissing
fills A
row
by row.
Data Types: double
| single
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Specify optional
comma-separated pairs of Name,Value
arguments. Name
is
the argument name and Value
is the corresponding value.
Name
must appear inside quotes. You can specify several name and value
pair arguments in any order as
Name1,Value1,...,NameN,ValueN
.
fillmissing(A,'DataVariables',{'Temperature','Altitude'})
fills
only the columns corresponding to the Temperature
and Altitude
variables
of an input table'EndValues'
— Method for handling endpoints'extrap'
(default) | 'previous'
| 'next'
| 'nearest'
| 'none'
| scalarMethod for handling endpoints, specified as the comma-separated
pair consisting of 'EndValues'
and one of 'extrap'
, 'previous'
, 'next'
, 'nearest'
, 'none'
,
or a constant scalar value. The endpoint fill method handles leading
and trailing missing values based on the following definitions:
Method | Description |
---|---|
'extrap' | same as method |
'previous' | previous non-missing value |
'next' | next non-missing value |
'nearest' | nearest non-missing value |
'none' | no fill value |
scalar | constant value (numeric, duration , and datetime data
types only) |
Data Types: double
| single
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| logical
| datetime
| duration
'SamplePoints'
— Sample pointsSample points for fill method, specified as the comma-separated
pair consisting of 'SamplePoints'
and a vector.
The sample points represent the location of the data in A
,
and must be sorted and contain unique elements. Sample points do not
need to be uniformly sampled. If A
is a timetable,
then the default sample points vector is the vector of row times.
Otherwise, the default vector is [1 2 3 ...]
.
Moving windows are defined relative to the sample points. For
example, if t
is a vector of times corresponding
to the input data, then fillmissing(rand(1,10),'movmean',3,'SamplePoints',t)
has
a window that represents the time interval between t(i)-1.5
and t(i)+1.5
.
When the sample points vector has data type datetime
or duration
,
then the moving window length must have type duration
.
This name-value pair is not supported when the input data is a timetable.
Data Types: double
| single
| datetime
| duration
'DataVariables'
— Table variables to fillvartype
subscriptTable variables to fill, specified as the comma-separated pair consisting of
'DataVariables'
and a variable name, a cell array of variable
names, a numeric vector, a logical vector, a function handle, or a table
vartype
subscript. The 'DataVariables'
value
indicates which columns of the input table to fill, and can be one of the following:
A character vector specifying a single table variable name
A cell array of character vectors where each element is a table variable name
A vector of table variable indices
A logical vector whose elements each correspond to a table variable, where
true
includes the corresponding variable and
false
excludes it
A function handle that returns a logical scalar, such as
@isnumeric
A table vartype
subscript
Example: 'Age'
Example: {'Height','Weight'}
Example: @iscategorical
Example: vartype('numeric')
'MissingLocations'
— Known missing indicatorKnown missing indicator, specified as the comma-separated pair consisting of
'MissingLocations'
and a logical vector, matrix, or
multidimensional array of the same size as A
. The indicator
elements can be true
to indicate a missing value in the
corresponding location of A
or false
otherwise.
Data Types: logical
'MaxGap'
— Maximum gap size to fillduration
scalar | calendarDuration
scalarMaximum gap size to fill, specified as a numeric scalar,
duration
scalar, or calendarDuration
scalar.
Gaps are clusters of consecutive missing values whose size is the distance between the
non-missing values surrounding the gap. The gap size is computed in units relative to
the sample points. Gaps smaller than or equal to the max gap size are filled, and gaps
larger than the gap size are not.
For example, consider the vector y = [25 NaN NaN 100]
using the
default sample points [1 2 3 4]
. The gap size in the vector is
computed from the sample points as 4 - 1 = 3
, so a
MaxGap
value of 2
leaves the missing values
unaltered, while a MaxGap
value of 3
fills in
the missing values.
For missing values at the beginning or end of the data:
A single missing value at the end of the input data has a gap size of 0 and is always filled.
Clusters of missing values occurring at the beginning or end of the input data
are not completely surrounded by non-missing values, so the gap size is computed
using the nearest existing sample points. For the default sample points
1:N
, this produces a gap size that is 1 smaller than if the
same cluster occurred in the middle of the data.
F
— Filled dataFilled data, returned as a vector, matrix, multidimensional
array, table, or timetable. F
is the same size
as A
.
Data Types: double
| single
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| logical
| char
| string
| cell
| table
| timetable
| categorical
| datetime
| duration
| calendarDuration
TF
— Filled data indicatorFilled data indicator, returned as a vector, matrix, or multidimensional
array. TF
is a logical array where 1 (true
)
corresponds to entries in F
that were filled and
0 (false
) corresponds to unchanged entries. TF
is
the same size as A
and F
.
Data Types: logical
Usage notes and limitations:
The 'MaxGap'
name-value pair is not supported.
The 'spline'
and 'makima'
methods are not
supported.
The 'SamplePoints'
and 'MissingLocations'
name-value pairs are not supported.
The 'DataVariables'
name-value pair cannot specify a function handle.
The 'EndValues'
name-value pair can only specify 'extrap'
.
The syntax fillmissing(A,movmethod,window)
is not supported when
A
is a tall timetable.
The syntax fillmissing(A,'constant',v)
must specify a scalar value
for v
.
The syntax fillmissing(A,___)
does not support character vector
variables when A
is a tall table or tall timetable.
For more information, see Tall Arrays.
Usage notes and limitations:
The 'MaxGap'
name-value pair is not supported.
The 'makima'
option is not supported.
When the 'SamplePoints'
value has type
datetime
or the input data is a timetable with
datetime
row times, only the methods
'constant'
, 'movmean'
, and
'movmedian'
are supported.
By default, table and timetable inputs are assumed to contain data that does not change size. Therefore, the specified fill method must not use a fill value that changes the size of the values it replaces, and the size of the fill value must be constant between replacements.
Clean Missing
Data | filloutliers
| ismissing
| isnan
| missing
| rmmissing
| standardizeMissing
You have a modified version of this example. Do you want to open this example with your edits?