Clean Timetable with Missing, Duplicate, or Nonuniform Times

This example shows how to create a regular timetable from one that has missing, duplicate, or nonuniform times. A timetable is a type of table that associates a time-stamp, or row time, with each row of data. In a regular timetable, the row times are sorted and unique, and differ by the same regular time step. The example also shows how to export the data from a timetable for use with other functions.

Timetables can be irregular. They can contain rows that are not sorted by their row times. Timetables can contain multiple rows with the same row time, though the rows can have different data values. Even when row times are sorted and unique, they can differ by time steps of different sizes. Timetables can even contain NaT or NaN values to indicate missing row times.

Timetables provide a number of different ways to resolve missing, duplicate, or nonuniform times, and to resample or aggregate data to regular row times.

  • To find missing row times, use ismissing.

  • To remove missing times and data, use rmmissing.

  • To sort a timetable by its row times, use sortrows.

  • To make a timetable with unique and sorted row times, use unique and retime.

  • To remove duplicate times, specify a unique time vector and use retime.

  • To make a regular timetable, specify a regular time vector and use retime.

Load Timetable

Load a sample timetable from the MAT-file badTimes that contains weather measurements taken over several hours on June 9, 2016. The timetable includes temperature, rainfall, and wind speed measurements taken at irregular times during that day.

load badTimes
TT
TT=12×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 06:01:04      73    0.01       2.3   
    09-Jun-2016 07:59:23      59    0.08       0.9   
    09-Jun-2016 09:53:57      59    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    NaT                       56       0         0   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    09-Jun-2016 08:49:10      62    0.01       2.7   
    09-Jun-2016 08:49:10    75.8    0.01       2.7   
    09-Jun-2016 08:49:10      82    0.01       2.7   
    09-Jun-2016 05:03:11    66.2    0.05         3   
    09-Jun-2016 08:49:10    67.2    0.01       2.7   
    09-Jun-2016 04:12:00    58.8     NaN       NaN   

Remove Rows with Missing Times

Remove rows that have NaT, or a missing value, as the row time. To find missing values in the vector of row times, use the ismissing function. ismissing returns a logical vector that contains 1 wherever TT.Time has a missing value. Index back into the timetable to keep only those rows that do not have missing values as row times. Assign those rows to TT2.

TF = ismissing(TT.Time);
TT2 = TT(~TF,:);
TT2
TT2=11×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 06:01:04      73    0.01       2.3   
    09-Jun-2016 07:59:23      59    0.08       0.9   
    09-Jun-2016 09:53:57      59    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    09-Jun-2016 08:49:10      62    0.01       2.7   
    09-Jun-2016 08:49:10    75.8    0.01       2.7   
    09-Jun-2016 08:49:10      82    0.01       2.7   
    09-Jun-2016 05:03:11    66.2    0.05         3   
    09-Jun-2016 08:49:10    67.2    0.01       2.7   
    09-Jun-2016 04:12:00    58.8     NaN       NaN   

This method removes only the rows that have missing row times. The table variables might still have missing data values. For example, the last row of TT2 has NaN values for the Rain and Windspeed variables.

Remove Rows with Missing Times or Missing Data

You can remove missing row times and missing data values using the rmmissing function. rmmissing removes any timetable row that has a missing row time, missing data values, or both.

Display the missing row time and missing data values of TT. Then remove all missing values from TT.

TT
TT=12×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 06:01:04      73    0.01       2.3   
    09-Jun-2016 07:59:23      59    0.08       0.9   
    09-Jun-2016 09:53:57      59    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    NaT                       56       0         0   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    09-Jun-2016 08:49:10      62    0.01       2.7   
    09-Jun-2016 08:49:10    75.8    0.01       2.7   
    09-Jun-2016 08:49:10      82    0.01       2.7   
    09-Jun-2016 05:03:11    66.2    0.05         3   
    09-Jun-2016 08:49:10    67.2    0.01       2.7   
    09-Jun-2016 04:12:00    58.8     NaN       NaN   

TT = rmmissing(TT)
TT=10×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 06:01:04      73    0.01       2.3   
    09-Jun-2016 07:59:23      59    0.08       0.9   
    09-Jun-2016 09:53:57      59    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    09-Jun-2016 08:49:10      62    0.01       2.7   
    09-Jun-2016 08:49:10    75.8    0.01       2.7   
    09-Jun-2016 08:49:10      82    0.01       2.7   
    09-Jun-2016 05:03:11    66.2    0.05         3   
    09-Jun-2016 08:49:10    67.2    0.01       2.7   

Sort Timetable and Determine Whether It Is Regular

Determine whether TT is sorted. Then, sort the timetable on its row times using the sortrows function.

TF = issorted(TT)
TF = logical
   0

TT = sortrows(TT)
TT=10×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 05:03:11    66.2    0.05         3   
    09-Jun-2016 06:01:04      73    0.01       2.3   
    09-Jun-2016 07:59:23      59    0.08       0.9   
    09-Jun-2016 08:49:10      62    0.01       2.7   
    09-Jun-2016 08:49:10    75.8    0.01       2.7   
    09-Jun-2016 08:49:10      82    0.01       2.7   
    09-Jun-2016 08:49:10    67.2    0.01       2.7   
    09-Jun-2016 09:53:57      59    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   

Determine whether TT is regular. A regular timetable has the same time interval between consecutive row times. Even a sorted timetable can have time steps that are not uniform.

TF = isregular(TT)
TF = logical
   0

Display the differences between row times.

diff(TT.Time)
ans = 9x1 duration
   00:57:53
   01:58:19
   00:49:47
   00:00:00
   00:00:00
   00:00:00
   01:04:47
   00:00:00
   00:00:00

Remove Duplicate Rows

Timetables can have duplicate rows. Timetable rows are duplicates if they have the same row times and the same data values. In this example, the last two rows of TT are duplicates.

To remove the duplicate rows, use the unique function. unique returns the unique rows and sorts them by their row times.

TT = unique(TT)
TT=9×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 05:03:11    66.2    0.05         3   
    09-Jun-2016 06:01:04      73    0.01       2.3   
    09-Jun-2016 07:59:23      59    0.08       0.9   
    09-Jun-2016 08:49:10      62    0.01       2.7   
    09-Jun-2016 08:49:10    67.2    0.01       2.7   
    09-Jun-2016 08:49:10    75.8    0.01       2.7   
    09-Jun-2016 08:49:10      82    0.01       2.7   
    09-Jun-2016 09:53:57      59    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   

Find Rows with Duplicate Times and Different Data

Timetables can have rows with duplicate row times but different data values. In this example, TT has several rows with the same row times but different values.

Find the rows that have duplicate row times. First, sort the row times and find consecutive times that have no difference between them. Times with no difference between them are the duplicates. Index back into the vector of row times and return a unique set of times that identify the duplicate row times in TT.

dupTimes = sort(TT.Time);
TF = (diff(dupTimes) == 0);
dupTimes = dupTimes(TF);
dupTimes = unique(dupTimes)
dupTimes = 2x1 datetime
   09-Jun-2016 08:49:10
   09-Jun-2016 09:53:57

Index into the timetable to display the rows with duplicate row times. When you index on times, the output timetable contains all rows with matching row times.

TT(dupTimes,:)
ans=6×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 08:49:10      62    0.01       2.7   
    09-Jun-2016 08:49:10    67.2    0.01       2.7   
    09-Jun-2016 08:49:10    75.8    0.01       2.7   
    09-Jun-2016 08:49:10      82    0.01       2.7   
    09-Jun-2016 09:53:57      59    0.03       3.4   
    09-Jun-2016 09:53:57      67    0.03       3.4   

Select First and Last Rows with Duplicate Times

Select either the first and the last of the rows with duplicate row times using the unique and retime functions.

First, create a vector of unique row times from TT using the unique function.

uniqueTimes = unique(TT.Time);

Select the first row from each set of rows that have duplicate times.

TT2 = retime(TT,uniqueTimes)
TT2=5×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 05:03:11    66.2    0.05         3   
    09-Jun-2016 06:01:04      73    0.01       2.3   
    09-Jun-2016 07:59:23      59    0.08       0.9   
    09-Jun-2016 08:49:10      62    0.01       2.7   
    09-Jun-2016 09:53:57      59    0.03       3.4   

Select the last rows from each set of rows that have duplicate times. Specify the 'previous' method of retime to copy data from the last row. When you specify 'previous', then retime starts at the end of the vector of row times and stops when it encounters a duplicate row time. Then it copies the data from that row.

TT2 = retime(TT,uniqueTimes,'previous')
TT2=5×3 timetable
            Time            Temp    Rain    WindSpeed
    ____________________    ____    ____    _________

    09-Jun-2016 05:03:11    66.2    0.05         3   
    09-Jun-2016 06:01:04      73    0.01       2.3   
    09-Jun-2016 07:59:23      59    0.08       0.9   
    09-Jun-2016 08:49:10      82    0.01       2.7   
    09-Jun-2016 09:53:57      67    0.03       3.4   

Aggregate Data from All Rows with Duplicate Times

Aggregate data from rows that have duplicate row times. For example, you can calculate the means of several measurements of the same quantity taken at the same time.

Calculate the mean temperature, rainfall, and wind speed for rows with duplicate row times using the retime function.

TT = retime(TT,uniqueTimes,'mean')
TT=5×3 timetable
            Time            Temp     Rain    WindSpeed
    ____________________    _____    ____    _________

    09-Jun-2016 05:03:11     66.2    0.05         3   
    09-Jun-2016 06:01:04       73    0.01       2.3   
    09-Jun-2016 07:59:23       59    0.08       0.9   
    09-Jun-2016 08:49:10    71.75    0.01       2.7   
    09-Jun-2016 09:53:57       63    0.03       3.4   

Make Timetable Regular

Create a regular timetable using retime. Interpolate the data onto a regular hourly time vector. To use linear interpolation, specify 'linear'. Each row time in TT begins on the hour, and there is a one-hour interval between consecutive row times.

TT = retime(TT,'hourly','linear')
TT=6×3 timetable
            Time             Temp       Rain      WindSpeed
    ____________________    ______    ________    _________

    09-Jun-2016 05:00:00    65.826      0.0522     3.0385  
    09-Jun-2016 06:00:00    72.875    0.010737     2.3129  
    09-Jun-2016 07:00:00    66.027    0.044867     1.6027  
    09-Jun-2016 08:00:00    59.158    0.079133     0.9223  
    09-Jun-2016 09:00:00    70.287    0.013344     2.8171  
    09-Jun-2016 10:00:00    62.183    0.031868     3.4654  

Instead of using a predefined time step such as 'hourly', you can specify a time step of your own. To specify a time step of 30 minutes, use the 'regular' input argument and the 'TimeStep' name-value pair argument. You can specify a time step of any size as a duration or calendar duration value.

TT = retime(TT,'regular','linear','TimeStep',minutes(30))
TT=11×3 timetable
            Time             Temp       Rain      WindSpeed
    ____________________    ______    ________    _________

    09-Jun-2016 05:00:00    65.826      0.0522     3.0385  
    09-Jun-2016 05:30:00     69.35    0.031468     2.6757  
    09-Jun-2016 06:00:00    72.875    0.010737     2.3129  
    09-Jun-2016 06:30:00    69.451    0.027802     1.9578  
    09-Jun-2016 07:00:00    66.027    0.044867     1.6027  
    09-Jun-2016 07:30:00    62.592       0.062     1.2625  
    09-Jun-2016 08:00:00    59.158    0.079133     0.9223  
    09-Jun-2016 08:30:00    64.722    0.046239     1.8697  
    09-Jun-2016 09:00:00    70.287    0.013344     2.8171  
    09-Jun-2016 09:30:00    66.235    0.022606     3.1412  
    09-Jun-2016 10:00:00    62.183    0.031868     3.4654  

Extract Regular Timetable Data

You can export the timetable data for use with functions to analyze data that is regularly spaced in time. For example, the Econometrics Toolbox™ and the Signal Processing Toolbox™ have functions you can use for further analysis on regularly spaced data.

Extract the timetable data as an array. You can use the Variables property to return the data as an array when the table variables can be concatenated.

A = TT.Variables
A = 11×3

   65.8260    0.0522    3.0385
   69.3504    0.0315    2.6757
   72.8747    0.0107    2.3129
   69.4507    0.0278    1.9578
   66.0266    0.0449    1.6027
   62.5923    0.0620    1.2625
   59.1579    0.0791    0.9223
   64.7224    0.0462    1.8697
   70.2868    0.0133    2.8171
   66.2348    0.0226    3.1412
      ⋮

TT.Variables is equivalent to using curly brace syntax, TT{:,:}, to access all variables.

A2 = TT{:,:}
A2 = 11×3

   65.8260    0.0522    3.0385
   69.3504    0.0315    2.6757
   72.8747    0.0107    2.3129
   69.4507    0.0278    1.9578
   66.0266    0.0449    1.6027
   62.5923    0.0620    1.2625
   59.1579    0.0791    0.9223
   64.7224    0.0462    1.8697
   70.2868    0.0133    2.8171
   66.2348    0.0226    3.1412
      ⋮

See Also

| | | | | | | | |

Related Topics