This example shows how to split power outage data from a table into groups by region and cause of the power outages. Then it shows how to apply functions to calculate statistics for each group and collect the results in a table.
The sample file, outages.csv
, contains data representing electric utility outages in the United States. The file contains six columns: Region
, OutageTime
, Loss
, Customers
, RestorationTime
, and Cause
. Read outages.csv
into a table.
T = readtable('outages.csv');
Convert Region
and Cause
to categorical arrays, and OutageTime
and RestorationTime
to datetime
arrays. Display the first five rows.
T.Region = categorical(T.Region); T.Cause = categorical(T.Cause); T.OutageTime = datetime(T.OutageTime); T.RestorationTime = datetime(T.RestorationTime); T(1:5,:)
ans=5×6 table
Region OutageTime Loss Customers RestorationTime Cause
_________ ________________ ______ __________ ________________ _______________
SouthWest 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 winter storm
SouthEast 2003-01-23 00:49 530.14 2.1204e+05 NaT winter storm
SouthEast 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 winter storm
West 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 equipment fault
MidWest 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 severe storm
Determine the greatest power loss due to a power outage in each region. The findgroups
function returns G
, a vector of group numbers created from T.Region
. The splitapply
function uses G
to split T.Loss
into five groups, corresponding to the five regions. splitapply
applies the max
function to each group and concatenates the maximum power losses into a vector.
G = findgroups(T.Region); maxLoss = splitapply(@max,T.Loss,G)
maxLoss = 5×1
104 ×
2.3141
2.3418
0.8767
0.2796
1.6659
Calculate the maximum power loss due to a power outage by cause. To specify that Cause
is the grouping variable, use table indexing. Create a table that contains the maximum power losses and their causes.
T1 = T(:,'Cause');
[G,powerLosses] = findgroups(T1);
powerLosses.maxLoss = splitapply(@max,T.Loss,G)
powerLosses=10×2 table
Cause maxLoss
________________ _______
attack 582.63
earthquake 258.18
energy emergency 11638
equipment fault 16659
fire 872.96
severe storm 8767.3
thunder storm 23418
unknown 23141
wind 2796
winter storm 2883.7
powerLosses
is a table because T1
is a table. You can append the maximum losses as another table variable.
Calculate the maximum power loss by cause in each region. To specify that Region
and Cause
are the grouping variables, use table indexing. Create a table that contains the maximum power losses and display the first 15 rows.
T1 = T(:,{'Region','Cause'}); [G,powerLosses] = findgroups(T1); powerLosses.maxLoss = splitapply(@max,T.Loss,G); powerLosses(1:15,:)
ans=15×3 table
Region Cause maxLoss
_________ ________________ _______
MidWest attack 0
MidWest energy emergency 2378.7
MidWest equipment fault 903.28
MidWest severe storm 6808.7
MidWest thunder storm 15128
MidWest unknown 23141
MidWest wind 2053.8
MidWest winter storm 669.25
NorthEast attack 405.62
NorthEast earthquake 0
NorthEast energy emergency 11638
NorthEast equipment fault 794.36
NorthEast fire 872.96
NorthEast severe storm 6002.4
NorthEast thunder storm 23418
Determine power-outage impact on customers by cause and region. Because T.Loss
contains NaN
values, wrap sum
in an anonymous function to use the 'omitnan'
input argument.
osumFcn = @(x)(sum(x,'omitnan'));
powerLosses.totalCustomers = splitapply(osumFcn,T.Customers,G);
powerLosses(1:15,:)
ans=15×4 table
Region Cause maxLoss totalCustomers
_________ ________________ _______ ______________
MidWest attack 0 0
MidWest energy emergency 2378.7 6.3363e+05
MidWest equipment fault 903.28 1.7822e+05
MidWest severe storm 6808.7 1.3511e+07
MidWest thunder storm 15128 4.2563e+06
MidWest unknown 23141 3.9505e+06
MidWest wind 2053.8 1.8796e+06
MidWest winter storm 669.25 4.8887e+06
NorthEast attack 405.62 2181.8
NorthEast earthquake 0 0
NorthEast energy emergency 11638 1.4391e+05
NorthEast equipment fault 794.36 3.9961e+05
NorthEast fire 872.96 6.1292e+05
NorthEast severe storm 6002.4 2.7905e+07
NorthEast thunder storm 23418 2.1885e+07
Determine the mean durations of all U.S. power outages in hours. Add the mean durations of power outages to powerLosses
. Because T.RestorationTime
has NaT
values, omit the resulting NaN
values when calculating the mean durations.
D = T.RestorationTime - T.OutageTime;
H = hours(D);
omeanFcn = @(x)(mean(x,'omitnan'));
powerLosses.meanOutage = splitapply(omeanFcn,H,G);
powerLosses(1:15,:)
ans=15×5 table
Region Cause maxLoss totalCustomers meanOutage
_________ ________________ _______ ______________ __________
MidWest attack 0 0 335.02
MidWest energy emergency 2378.7 6.3363e+05 5339.3
MidWest equipment fault 903.28 1.7822e+05 17.863
MidWest severe storm 6808.7 1.3511e+07 78.906
MidWest thunder storm 15128 4.2563e+06 51.245
MidWest unknown 23141 3.9505e+06 30.892
MidWest wind 2053.8 1.8796e+06 73.761
MidWest winter storm 669.25 4.8887e+06 127.58
NorthEast attack 405.62 2181.8 5.5117
NorthEast earthquake 0 0 0
NorthEast energy emergency 11638 1.4391e+05 77.345
NorthEast equipment fault 794.36 3.9961e+05 87.204
NorthEast fire 872.96 6.1292e+05 4.0267
NorthEast severe storm 6002.4 2.7905e+07 2163.5
NorthEast thunder storm 23418 2.1885e+07 46.098
findgroups
| rowfun
| splitapply
| varfun