This example shows how to find, clean, and delete table rows with missing data.
Load sample data from a comma-separated text file, messy.csv
. The file contains many different missing data indicators:
Empty character vector ('')
period (.)
NA
NaN
-99
To specify the character vectors to treat as empty values, use the 'TreatAsEmpty'
name-value pair argument with the readtable
function. (Use the disp
function to display all 21 rows, even when running this example as a live script.)
T = readtable('messy.csv','TreatAsEmpty',{'.','NA'}); disp(T)
A B C D E ________ ____ __________ ____ ____ {'afe1'} 3 {'yes' } 3 3 {'egh3'} NaN {'no' } 7 7 {'wth4'} 3 {'yes' } 3 3 {'atn2'} 23 {'no' } 23 23 {'arg1'} 5 {'yes' } 5 5 {'jre3'} 34.6 {'yes' } 34.6 34.6 {'wen9'} 234 {'yes' } 234 234 {'ple2'} 2 {'no' } 2 2 {'dbo8'} 5 {'no' } 5 5 {'oii4'} 5 {'yes' } 5 5 {'wnk3'} 245 {'yes' } 245 245 {'abk6'} 563 {0x0 char} 563 563 {'pnj5'} 463 {'no' } 463 463 {'wnn3'} 6 {'no' } 6 6 {'oks9'} 23 {'yes' } 23 23 {'wba3'} NaN {'yes' } NaN 14 {'pkn4'} 2 {'no' } 2 2 {'adw3'} 22 {'no' } 22 22 {'poj2'} -99 {'yes' } -99 -99 {'bas8'} 23 {'no' } 23 23 {'gry5'} NaN {'yes' } NaN 21
T
is a table with 21 rows and five variables. 'TreatAsEmpty'
only applies to numeric columns in the file and cannot handle numeric values specified as text, such as '-99'
.
View the data type, description, units, and other descriptive statistics for each variable by creating a table summary using the summary
function.
summary(T)
Variables: A: 21x1 cell array of character vectors B: 21x1 double Values: Min -99 Median 14 Max 563 NumMissing 3 C: 21x1 cell array of character vectors D: 21x1 double Values: Min -99 Median 7 Max 563 NumMissing 2 E: 21x1 double Values: Min -99 Median 14 Max 563
When you import data from a file, the default is for readtable
to read any variables with nonnumeric elements as a cell array of character vectors.
Display the subset of rows from the table, T
, that have at least one missing value.
TF = ismissing(T,{'' '.' 'NA' NaN -99}); rowsWithMissing = T(any(TF,2),:); disp(rowsWithMissing)
A B C D E ________ ___ __________ ___ ___ {'egh3'} NaN {'no' } 7 7 {'abk6'} 563 {0x0 char} 563 563 {'wba3'} NaN {'yes' } NaN 14 {'poj2'} -99 {'yes' } -99 -99 {'gry5'} NaN {'yes' } NaN 21
readtable
replaced '.'
and 'NA'
with NaN
in the numeric variables, B
, D
, and E
.
Clean the data so that the missing values indicated by code -99
have the standard MATLAB® numeric missing value indicator, NaN
.
T = standardizeMissing(T,-99); disp(T)
A B C D E ________ ____ __________ ____ ____ {'afe1'} 3 {'yes' } 3 3 {'egh3'} NaN {'no' } 7 7 {'wth4'} 3 {'yes' } 3 3 {'atn2'} 23 {'no' } 23 23 {'arg1'} 5 {'yes' } 5 5 {'jre3'} 34.6 {'yes' } 34.6 34.6 {'wen9'} 234 {'yes' } 234 234 {'ple2'} 2 {'no' } 2 2 {'dbo8'} 5 {'no' } 5 5 {'oii4'} 5 {'yes' } 5 5 {'wnk3'} 245 {'yes' } 245 245 {'abk6'} 563 {0x0 char} 563 563 {'pnj5'} 463 {'no' } 463 463 {'wnn3'} 6 {'no' } 6 6 {'oks9'} 23 {'yes' } 23 23 {'wba3'} NaN {'yes' } NaN 14 {'pkn4'} 2 {'no' } 2 2 {'adw3'} 22 {'no' } 22 22 {'poj2'} NaN {'yes' } NaN NaN {'bas8'} 23 {'no' } 23 23 {'gry5'} NaN {'yes' } NaN 21
standardizeMissing
replaces three instances of -99
with NaN
.
Create a new table, T2
, and replace missing values with values from previous rows of the table. fillmissing
provides a number of ways to fill in missing values.
T2 = fillmissing(T,'previous');
disp(T2)
A B C D E ________ ____ _______ ____ ____ {'afe1'} 3 {'yes'} 3 3 {'egh3'} 3 {'no' } 7 7 {'wth4'} 3 {'yes'} 3 3 {'atn2'} 23 {'no' } 23 23 {'arg1'} 5 {'yes'} 5 5 {'jre3'} 34.6 {'yes'} 34.6 34.6 {'wen9'} 234 {'yes'} 234 234 {'ple2'} 2 {'no' } 2 2 {'dbo8'} 5 {'no' } 5 5 {'oii4'} 5 {'yes'} 5 5 {'wnk3'} 245 {'yes'} 245 245 {'abk6'} 563 {'yes'} 563 563 {'pnj5'} 463 {'no' } 463 463 {'wnn3'} 6 {'no' } 6 6 {'oks9'} 23 {'yes'} 23 23 {'wba3'} 23 {'yes'} 23 14 {'pkn4'} 2 {'no' } 2 2 {'adw3'} 22 {'no' } 22 22 {'poj2'} 22 {'yes'} 22 22 {'bas8'} 23 {'no' } 23 23 {'gry5'} 23 {'yes'} 23 21
Create a new table, T3
, that contains only the rows from T
without missing values. T3
has only 16 rows.
T3 = rmmissing(T); disp(T3)
A B C D E ________ ____ _______ ____ ____ {'afe1'} 3 {'yes'} 3 3 {'wth4'} 3 {'yes'} 3 3 {'atn2'} 23 {'no' } 23 23 {'arg1'} 5 {'yes'} 5 5 {'jre3'} 34.6 {'yes'} 34.6 34.6 {'wen9'} 234 {'yes'} 234 234 {'ple2'} 2 {'no' } 2 2 {'dbo8'} 5 {'no' } 5 5 {'oii4'} 5 {'yes'} 5 5 {'wnk3'} 245 {'yes'} 245 245 {'pnj5'} 463 {'no' } 463 463 {'wnn3'} 6 {'no' } 6 6 {'oks9'} 23 {'yes'} 23 23 {'pkn4'} 2 {'no' } 2 2 {'adw3'} 22 {'no' } 22 22 {'bas8'} 23 {'no' } 23 23
T3
contains 16 rows and five variables.
Sort the rows of T3
in descending order by C
, and then sort in ascending order by A
.
T3 = sortrows(T2,{'C','A'},{'descend','ascend'}); disp(T3)
A B C D E ________ ____ _______ ____ ____ {'abk6'} 563 {'yes'} 563 563 {'afe1'} 3 {'yes'} 3 3 {'arg1'} 5 {'yes'} 5 5 {'gry5'} 23 {'yes'} 23 21 {'jre3'} 34.6 {'yes'} 34.6 34.6 {'oii4'} 5 {'yes'} 5 5 {'oks9'} 23 {'yes'} 23 23 {'poj2'} 22 {'yes'} 22 22 {'wba3'} 23 {'yes'} 23 14 {'wen9'} 234 {'yes'} 234 234 {'wnk3'} 245 {'yes'} 245 245 {'wth4'} 3 {'yes'} 3 3 {'adw3'} 22 {'no' } 22 22 {'atn2'} 23 {'no' } 23 23 {'bas8'} 23 {'no' } 23 23 {'dbo8'} 5 {'no' } 5 5 {'egh3'} 3 {'no' } 7 7 {'pkn4'} 2 {'no' } 2 2 {'ple2'} 2 {'no' } 2 2 {'pnj5'} 463 {'no' } 463 463 {'wnn3'} 6 {'no' } 6 6
In C
, the rows are grouped first by 'yes'
, followed by 'no'
. Then in A
, the rows are listed alphabetically.
Reorder the table so that A
and C
are next to each other.
T3 = T3(:,{'A','C','B','D','E'}); disp(T3)
A C B D E ________ _______ ____ ____ ____ {'abk6'} {'yes'} 563 563 563 {'afe1'} {'yes'} 3 3 3 {'arg1'} {'yes'} 5 5 5 {'gry5'} {'yes'} 23 23 21 {'jre3'} {'yes'} 34.6 34.6 34.6 {'oii4'} {'yes'} 5 5 5 {'oks9'} {'yes'} 23 23 23 {'poj2'} {'yes'} 22 22 22 {'wba3'} {'yes'} 23 23 14 {'wen9'} {'yes'} 234 234 234 {'wnk3'} {'yes'} 245 245 245 {'wth4'} {'yes'} 3 3 3 {'adw3'} {'no' } 22 22 22 {'atn2'} {'no' } 23 23 23 {'bas8'} {'no' } 23 23 23 {'dbo8'} {'no' } 5 5 5 {'egh3'} {'no' } 3 7 7 {'pkn4'} {'no' } 2 2 2 {'ple2'} {'no' } 2 2 2 {'pnj5'} {'no' } 463 463 463 {'wnn3'} {'no' } 6 6 6
fillmissing
| ismissing
| readtable
| rmmissing
| sortrows
| standardizeMissing
| summary