This example shows how to use the readtable
function to import mixed text and numeric data into a table, specify the data types for the variables, and then append a new variable to the table.
The sample file, outages.csv
, contains data representing electric utility outages in the US. The first few lines of the file are:
Region,OutageTime,Loss,Customers,RestorationTime,Cause
SouthWest,2002-01-20 11:49,672,2902379,2002-01-24 21:58,winter storm
SouthEast,2002-01-30 01:18,796,336436,2002-02-04 11:20,winter storm
SouthEast,2004-02-03 21:17,264.9,107083,2004-02-20 03:37,winter storm
West,2002-06-19 13:39,391.4,378990,2002-06-19 14:27,equipment fault
Import the data using readtable
and display the first five rows. The readtable
function automatically detects the delimiter and the variable types.
T = readtable('outages.csv'); head(T,5) % show first 5 rows of table
ans=5×6 table
Region OutageTime Loss Customers RestorationTime Cause
_____________ ________________ ______ __________ ________________ ___________________
{'SouthWest'} 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 {'winter storm' }
{'SouthEast'} 2003-01-23 00:49 530.14 2.1204e+05 NaT {'winter storm' }
{'SouthEast'} 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 {'winter storm' }
{'West' } 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 {'equipment fault'}
{'MidWest' } 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 {'severe storm' }
Specify Variable Data Types Before Import
Updating the variable data types to the appropriate MATLAB data types can benefit your data, based on the type of variables in your file. For example, the first and sixth columns in outages.csv
are categorical. By designating these two columns as categorical
arrays you can leverage MATLAB functions for processing categorical data.
Designate and specify the data types of the variables in one of these ways:
Specify the Format
name-value pair in readtable
Set the VariableTypes
property of the import options for the file
Use the Format
name-value pair to specify the variable data types, read the data, and display the first five rows. In the %{yyyy-MM-dd HH:mm}D
part of the formatSpec
specifier, the text between the curly braces describes the format of the date and time data. The values specified in Format
designate the:
First and last columns in the file as categorical data
Second and fifth columns as formatted date and time data
Third and fourth columns as floating-point values
formatSpec = '%C%{yyyy-MM-dd HH:mm}D%f%f%{yyyy-MM-dd HH:mm}D%C'; T = readtable('outages.csv','Format',formatSpec); head(T,5)
ans=5×6 table
Region OutageTime Loss Customers RestorationTime Cause
_________ ________________ ______ __________ ________________ _______________
SouthWest 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 winter storm
SouthEast 2003-01-23 00:49 530.14 2.1204e+05 NaT winter storm
SouthEast 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 winter storm
West 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 equipment fault
MidWest 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 severe storm
Alternatively, specify the data types for the variables by using the setvartype
function of the import options. First, create an import options object for the file. The data file contains different types of variables. Designate the first and last variables as categorical
arrays, the second and fifth variables as datetime
arrays, and the remaining variables as double
.
opts = detectImportOptions('outages.csv'); varNames = opts.VariableNames ; % variable names varTypes = {'categorical','datetime','double',... 'double','datetime','categorical'}; opts = setvartype(opts,varNames,varTypes);
Import the data using readtable
with opts
, and then display the first five rows.
T = readtable('outages.csv',opts);
head(T,5)
ans=5×6 table
Region OutageTime Loss Customers RestorationTime Cause
_________ ________________ ______ __________ ________________ _______________
SouthWest 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 winter storm
SouthEast 2003-01-23 00:49 530.14 2.1204e+05 NaT winter storm
SouthEast 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 winter storm
West 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 equipment fault
MidWest 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 severe storm
Append New Variable to Table
Table T
contains OutageTime
and RestorationTime
. Calculate the duration of each electrical outage and append this data to the table.
T.Duration = T.RestorationTime - T.OutageTime; head(T,5)
ans=5×7 table
Region OutageTime Loss Customers RestorationTime Cause Duration
_________ ________________ ______ __________ ________________ _______________ _________
SouthWest 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 winter storm 148:32:00
SouthEast 2003-01-23 00:49 530.14 2.1204e+05 NaT winter storm NaN
SouthEast 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 winter storm 226:59:00
West 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 equipment fault 00:26:00
MidWest 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 severe storm 65:05:00
detectImportOptions
| head
| preview
| readtable
| readtimetable
| setvaropts
| setvartype